Class GaleAndChurch<T>

    • Constructor Summary

      Constructors 
      Constructor Description
      GaleAndChurch()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int contractionScore​(T p_sourceTuv1, T p_sourceTuv2, T p_targetTuv)
      Calculate the cost of contracting two source segments to one target segment.
      int deletionScore​(T p_sourceTuv)
      Calculate the cost of deletion of source segment.
      int expansionScore​(T p_sourceTuv, T p_targetTuv1, T p_targetTuv2)
      Calculate the cost of expanding one source segment to two target segments.
      int insertionScore​(T p_targetTuv)
      Calculate the cost of insertion of target segment.
      int match​(int len1, int len2)
      Return -100 * log probability that an source sentence of length len1 is a translation of a foreign sentence of length len2.
      int meldingScore​(T p_sourceTuv1, T p_sourceTuv2, T p_targetTuv1, T p_targetTuv2)
      Calculate the cost of melding of two source segments to two target segments.
      double prob​(int len1, int len2)
      Return the probability that an source sentence of length len1 is a translation of a foreign sentence of length len2.
      void setLocales​(LocaleId p_sourceLocale, LocaleId p_targetLocale)
      Set source and target locales.
      int substitutionScore​(T p_sourceTuv, T p_targetTuv)
      Calculate the cost of substitution of source segment by target segment.
    • Constructor Detail

      • GaleAndChurch

        public GaleAndChurch()
    • Method Detail

      • setLocales

        public void setLocales​(LocaleId p_sourceLocale,
                               LocaleId p_targetLocale)
        Set source and target locales.
        Specified by:
        setLocales in interface AlignmentScorer<T>
        Parameters:
        p_sourceLocale - Source locale
        p_targetLocale - Target locale
      • substitutionScore

        public int substitutionScore​(T p_sourceTuv,
                                     T p_targetTuv)
        Calculate the cost of substitution of source segment by target segment.
        Specified by:
        substitutionScore in interface AlignmentScorer<T>
        Parameters:
        p_sourceTuv - Source TUV. Source is in X sequence in the DP map.
        p_targetTuv - Target TUV. Target is in Y sequence in the DP map.
        Returns:
        cost of the substitution
      • deletionScore

        public int deletionScore​(T p_sourceTuv)
        Calculate the cost of deletion of source segment.
        Specified by:
        deletionScore in interface AlignmentScorer<T>
        Parameters:
        p_sourceTuv - Source TUV. Source is in X sequence in the DP map.
        Returns:
        cost of the deletion
      • insertionScore

        public int insertionScore​(T p_targetTuv)
        Calculate the cost of insertion of target segment.
        Specified by:
        insertionScore in interface AlignmentScorer<T>
        Parameters:
        p_targetTuv - Target TUV. Target is in Y sequence in the DP map.
        Returns:
        cost of the insertion
      • contractionScore

        public int contractionScore​(T p_sourceTuv1,
                                    T p_sourceTuv2,
                                    T p_targetTuv)
        Calculate the cost of contracting two source segments to one target segment.
        Specified by:
        contractionScore in interface AlignmentScorer<T>
        Parameters:
        p_sourceTuv1 - Source TUV1. Source is in X sequence in the DP map.
        p_sourceTuv2 - Source TUV2. Source is in X sequence in the DP map.
        p_targetTuv - Target TUV. Target is in Y sequence in the DP map.
        Returns:
        cost of the contraction
      • expansionScore

        public int expansionScore​(T p_sourceTuv,
                                  T p_targetTuv1,
                                  T p_targetTuv2)
        Calculate the cost of expanding one source segment to two target segments.
        Specified by:
        expansionScore in interface AlignmentScorer<T>
        Parameters:
        p_sourceTuv - Source TUV. Source is in X sequence in the DP map.
        p_targetTuv1 - Target TUV1. Target is in Y sequence in the DP map.
        p_targetTuv2 - Target TUV2. Target is in Y sequence in the DP map.
        Returns:
        cost of the expansion
      • meldingScore

        public int meldingScore​(T p_sourceTuv1,
                                T p_sourceTuv2,
                                T p_targetTuv1,
                                T p_targetTuv2)
        Calculate the cost of melding of two source segments to two target segments.
        Specified by:
        meldingScore in interface AlignmentScorer<T>
        Parameters:
        p_sourceTuv1 - Source TUV1. Source is in X sequence in the DP map.
        p_sourceTuv2 - Source TUV2. Source is in X sequence in the DP map.
        p_targetTuv1 - Target TUV1. Target is in Y sequence in the DP map.
        p_targetTuv2 - Target TUV2. Target is in Y sequence in the DP map.
        Returns:
        cost of the melding
      • match

        public int match​(int len1,
                         int len2)
        Return -100 * log probability that an source sentence of length len1 is a translation of a foreign sentence of length len2. The probability is based on two parameters, the mean and variance of number of foreign characters per source character. Gale and Church hardcoded foreign_chars_per_eng_char as 1. It apparently works OK for European language alignment. We take the coefficient as a parameter so that non European languages can be aligned as well.
      • prob

        public double prob​(int len1,
                           int len2)
        Return the probability that an source sentence of length len1 is a translation of a foreign sentence of length len2. The probability is based on two parameters, the mean and variance of number of foreign characters per source character. Gale and Church hardcoded foreign_chars_per_eng_char as 1. It apparently works OK for European language alignment. We take the coefficient as a parameter so that non European languages can be aligned as well.