CheckMateQuality Check Configuration |
If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=CheckMate
This dialog box allows you to specify what checks are performed.
Note: a text unit in the Okapi tools corresponds to a unit of
extracted text, for example a paragraph in HTML or OpenOffice, a string table in
a Properties file, etc. while a segment is the unit resulting from
a segmentation. A text unit is composed of one or more segments and possible
inter-segment parts. When a text unit has not been segmented it is seen as
having a single segment. Some document may have been segmented, like XLIFF.
Other are typically not segmented, like TMX where each <tu> entry
corresponds to a text unit (and therefore a single segment).
Verifications that are done on the whole content of each text unit:
Warn if an entry does not have a translation -- This verification is always done. It checks if each entry has a corresponding translation. That is there is no entry for the given target language corresponding to the source entry. Empty translations are chcked in the option: Warn if a target segment is empty.
Warn if a target entry has a difference in leading white spaces -- Set this option to flag the text units where the leading white spaces are different between source and target.
Warn if a target entry has a difference in trailing white spaces -- Set this option to flag the text units where the trailing white spaces are different between source and target.
Verification that are done on each segment of each text unit (un-segmented text unit being seen as having a single segment):
Warn if a source segment does not have a corresponding target -- This verification is always done. It checks if all source segments have a corresponding target segment. That is a source segment is identified with a segment ID that does not exist in the target text unit.
Warn if there is an extra target segment -- This verification is always done. It checks if all target segments correspond to an existing source segment.
Warn if a target segment is empty when its source is not empty -- Set this option to flag the segments for which the translation is empty (if the corresponding source is not empty).
Warn if a target segment is not empty when its source is empty -- Set this option to flag the segments for which the target is not empty while its source is empty.
Warn if a target segment is the same as its source -- Set this
option to flag the segments where the translation is the same as the source.
This check is done only if the source segment contain in its text at least one
word-character (a character included in the regular expression: "[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]",
which is basically: any Unicode letter or digit. Note also that the inline codes
are not part of the text of the entry.
For example (with codes in gray):
<b>%s</b>
: %d" is not checked
because it has no character that could be part of a word.-------- S= %d"
is checked because the character 'S' could be part of a word.Include the codes in the comparison -- Set this option if the comparison done when verifying if the target is the same as the source should take inline codes into account. If this option is set and the only difference between the source and the translation is an inline code, the segment will not be flagged as having the target the same as the source (because the will have at least one code different).
Note that when a target is found to be the same as its source, the tool
checks the list of patterns that have their expected
target set to "<same>". If the string matches one of those patterns
no warning is generated as the target is expected to be the same.
Warn on doubled words -- Set this option to flag the target
segments where there is a sequential repetition of the same word, for example
"is is" in "this is is an example". The check is not case-sensitive, for example
"This this an example" is flagged.
Exceptions -- Enter the list of words that can be repeated. For
example, in French some sentences may have the expressions "vous vous"
or "nous nous". To allow this, enter "vous;nous": Each
word for which repetition is allowed separated by semi-colons. You must not
leave any space around the semi-colon. The exceptions are not case-sensitive.
Warn if a target is longer than the given percentage of the character length of its source -- Set this option to flag any target text that is longer than a given percentage of its source text.
Character length above which a text is considered "long" -- Enter the number of characters above which you consider a text to be a long text (vs. a short one). This allows you to set different percentages for short and longer text.
Percentage for "short" text -- Enter the percentage to use when the text is shorter or equal to the character length above which a text is considered long.
Percentage for "long" text -- Enter the percentage to use when the text is longer than the character length above which a text is considered long.
The length is based on the number of characters without counting the inline codes. These values must be tuned for each source/target language pair.
Warn if a target is shorter than the following percentage of the character length of its source -- Set this option to flag any target text that is shorter than a given percentage of its source text. This allows you to set different percentages for short and longer text.
Character length above which a text is considered "long" -- Enter the number of characters above which you consider a text to be a long text (vs. a short one). This allows you to set different percentages for short and longer text.
Percentage for "short" text -- Enter the percentage to use when the text is shorter or equal to the character length above which a text is considered long.
Percentage for "long" text -- Enter the percentage to use when the text is longer than the character length above which a text is considered long.
The length is based on the number of characters without counting the inline codes. These values must be tuned for each source/target language pair.
Warn if there is a code differences between source and target segments -- Set this option to verify that the target content has the same inline codes as the source content. This function compares the content of the codes between the source and target. Both missing codes (codes in the source but not in the target) and extra codes (codes in the target but not in the source) are indicated. A difference only in the order of the codes does not trigger a warning.
Codes allowed to be missing from the target -- List of the codes that are allowed to be missing in the translation. The strings listed here are codes that are in the source segment and not in its translation, and are allowed to be missing. The list applies to all entries of the input documents. The strings are case-sensitive.
Codes allowed to be extra in the target -- List of the codes that are allowed to be extra in the translation. The strings listed here are codes that are in the translation segment but not in its source, and are allowed to be extra. The list applies to all entries of the input documents. The strings are case-sensitive.
For both lists: Use Add to add a new string, Remove to remove the selected string from the list, and Remove All to clear the list.
Verify that the following source patterns are translated as expected -- Set this option to verify that each source pattern defined in the list has its corresponding expected part in the target content.
Src" indicator).
That is if the source pattern is looked at first, and if found, the
corresponding pattern is searched in the target. Otherwise ("Trg"
indicator) the target pattern is looked at first, and then searched in the
source. This allows for example to detect extra patterns in the target.<same>" keyword.Add -- Click this buttion to add a new pattern to the list.
Edit -- Click this button to edit the pattern currently selected. You can also double-click the pattern in the table.
Remove -- Click this button to remove the pattern currently selected from the table.
Move Up -- Click this button to move the pattern currently selected upward in the table.
Move Down -- Click this button to move the pattern currently selected downward in the table.
Import -- Click this button to import an existing file in the table.
Export -- Click this button to export the patterns in the table to a tab-delimited file.
Warn if some possibly corrupted characters are found in the target entry -- Set this option to check for special patterns that often indicate a file with corrupted characters. For example a UTF-8 file opens as ISO-8859-1, etc. This feature does not found all possible cases of corrupted characters, only some of the frequent ones.
Warn if a character is not included in the following character set encoding -- Set this option to check the characters of the text against a given character set encoding. Enter the name of a valid character set encoding, such as ISO-8859-1. You can also leave this field empty to use only the given list of characters provided in the field below this one.
Allow the characters matching the following regular expression -- Optionally enter a regular expression that matches a list of allowed characters. The characters specified here will be allowed even if they are part of the character set encoding specified above. Leave this field empty to not use any regular expression.
You can enter: only a character set encoding, or only a regular expression, or both.
Perform the verifications provided by the LanguageTool server -- Set this option to run the verifications provided by a LanguageTool server. To use this option you must have access to LanguageTool Checker run as a server. Most of the time this is simply a local server. You can start the application with Java Web Start: Start LanguageTool Checker from the Web. (You can also do this by clicking on the Start LanguageTool from the Web button.
Note that using LanguageTool may increase significantly the processing time. In addition, using the auto-translate option (see below) does increase the processing time further.
Auto-translate the messages from the LanguageTool checker - Set this option to have the messages coming from the languageTool checker translated into a given language. Most of the time, the error messages of LanguageTool are provided in the same language as the text verified (e.g. verifying a Polish text will give you back error messages in Polish). Use this option to have the messages automatically translated using Google MT and displayed along with the original messages.
From -- Enter the language of the original messages (e.g.
po for Polish).
Into -- Enter the code of the language into which you want to
translate the messages (e.g. en for English)
Start LanguageTool from the Web -- Click this button to start languageTool checker directly from the Web. This command uses the Java Web Start technology to download and execute the latest version of LanguageTool from its Web site.
You will be prompted by a Security Warning dialog asking you to confirm you want to launch the application. Click Run or Yes if you want to continue. Once the application is running: go to File menu and select the command Options. Select the target language. Make sure the option Run as server on port is set, and that the port specified matches the port you have selected in CheckMate. Minimize the application. Go back to CheckMate and you are now able to use LanguageTool.
Note that entries flagged as non-translatable are never processed, regardless of the choice for the scope. When the scope is not set to all entries, it is determined by the value of the Approved property The setting of this property is specific to each file format.
Process all entries -- Select this option to process all entries.
Process only approved entries -- Select this option to process only the entries that have the property Approved set to "yes".
Process only non-approved entries -- Select this option to process only the entries that do not have a property Approved, or that have one not set to "yes".
Path of the report file -- Enter the full path of the HTML
report to generate. You can use the variable ${rootDir} in the
path.
Open the report after completion -- Set this option to automatically open the report file after the process is complete.
Import -- Click this button to import an existing configuration file.
Export -- Click this button to export the current configuration to a file.