Match Types

From Okapi Framework
Jump to: navigation, search

When working with TM systems, MT engines, leveraging steps, and other components that try to match an existing translation to a given source, all Okapi resources use the same categories to identify the types of match.

For example, in the XLIFF files generated by Okapi components, the type of match is reported in the okp:matchType attribute of the <alt-trans> element. The match-quality attribute provides a percentage.

The following table shows the different match types in decreasing order (best type first):

Type of Match Description
HUMAN_RECOMMENDED Improved translation edited by a human.
EXACT_UNIQUE_ID Matches EXACT and matches a unique id.
EXACT_PREVIOUS_VERSION Matches EXACT and comes from the preceding version of the same document (i.e., if v4 is leveraged this match must come from v3, not v2 or v1!!).
EXACT_LOCAL_CONTEXT Matches EXACT and a small number of segments before and/or after.
EXACT_DOCUMENT_CONTEXT Matches EXACT and comes from the same document either existing or different version. See also EXACT_PREVIOUS_VERSION
EXACT_STRUCTURAL Matches EXACT and the structural type of the segment (title, paragraph, list element etc.)
EXACT Matches text and codes exactly.
EXACT_TEXT_ONLY_UNIQUE_ID Matches EXACT_TEXT_ONLY and matches a unique id.
EXACT_TEXT_ONLY_PREVIOUS_VERSION Matches EXACT_TEXT_ONLY and comes from a previous version of the same document.
EXACT_TEXT_ONLY Matches text exactly, but there is a difference in one or more codes.
EXACT_REPAIRED Matches text and codes exactly, but only after the result of some automated repair (e.g. number replacement, code repair, capitalization, punctuation etc.)
FUZZY_UNIQUE_ID Matches FUZZY and matches a unique id.
FUZZY_PREVIOUS_VERSION Matches FUZZY and comes from a previous version of the same document.
FUZZY Matches both text and/or codes partially.
FUZZY_REPAIRED Matches both text and/or codes partially and some automated repair (e.g. number replacement, code repair, capitalization, punctuation etc..) was applied to the target.
PHRASE_ASSEMBLED Matches assembled from phrases in the TM or other resources (different algorithms could be used).
MT Indicates a translation coming from an MT engine.
UNKNOWN Unknown match type. Used as default value only when it cannot be identified with another type. A UNKOWN type always sorts below all other matches.