Okapi Framework - StepsBatch Translation Step |
|
- Overview |
|
If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=Batch_Translation_Step
This step creates a translation memory from the text units extracted from a raw document, using an external tool to provide the translation.
Takes: Raw document
Sends: Raw document
Here is the sequence of the different actions this steps performs:
${inputPath} and ${inputURI} variables.${outputPath} and ${outputURI}
variables.Each input document can be processed using one or more temporary HTML files, allowing tools with limitation to translate very large documents.
The text unit extracted from the input document can be segmented using SRX rules if needed.
Command line -- Enter the command-line to use. The command line
must take the temporary HTML document named ${inputPath} and generate
an output document in the same format named ${outputPath}. You can use
the following variables in the command line:
| Variable | Description | Example |
|---|---|---|
${inputPath} |
The full path of the input document. | |
${inputURI} |
The URI of the input document | |
${outputPath} |
The full path of the output document. | |
${srcLangName} |
English name of the language part of the source locale identifier | For "de-ch" this returns "German" |
${trgLangName} |
English name of the language part of the target locale identifier. | For "ja-jp" this returns "Japanese" |
${srcLang} |
Code of the language part of the source locale identifier. | For "de-ch" this returns "de" |
${trgLang} |
Code of the language part of the target locale identifier. | For "ja-jp" this returns "ja" |
${rootDir} |
The root directory for this project/batch | In Rainbow: the parameters folder. |
Example 1: The following command-line uses the open-source Apertium program under Linux to translate the temporary HTML document.
apertium -f html ${srcLang}-${trgLang} ${inputPath} ${outputPath}
Example 2: The following command-line uses the commercial ProMT application under Windows to translate the temporary HTML document.
"C:\Program Files\PRMT9\FILETRANS\FileTranslator.exe" ${inputPath} /as /ac /d:${srcLangName}-${trgLangName} /o:${outputPath}
Block size -- Enter the maximum number of text units that should be passed at the same time to the external tool. This allows you to process a very large input document even with external tools that can only process small documents.
Origin identifier -- Enter an optional string that identifies
the translation. The given string is output as a property of the translated
entry named Origin. For example in a TMX output it will be generated as
<prop type="Txt::Origin">myText</prop>, where myText is the given
string.
Mark the generated translation as machine translation results --
Set this option to mark the TM entries generated as the result of machine
translation. For example, when this option is set, the creationId
attribute of the target in the generated is set to "MT!".
Segment the text units, using the following SRX rules -- Set this option to segment the extracted text unit before sending them to the temporary HTML document. If this option is set each paragraph of the HTML document will be a sentence, if this option is not set, each paragraph of the HTML document will be an un-segmented paragraph. Note that only entries processed by the external tool are placed in the TMX output. Entries that already exist in the TM being populated or in the existing TM are not copied into the TMX output.
Enter the full path to the segmentation rules file in SRX that should be used
to segment the text units. You can use the
variable ${rootDir} in the path.
Import into the following Pensieve TM -- Set this option to import the translated entries into a given Pensieve TM. The entries added to the TM are indexed at the end of each input document (and therefore other steps down the pipeline can only access for a given document only the entries generated with the previous documents).
Enter the directory of the Pensive TM where to import the entries. If the TM
does not exist it will be created. If the TM exists already, the entries will be
added to the existing TM. You can use the variable ${rootDir} in the path.
Create the following TMX document -- Set this option to create a TMX document with the translated entries. A single TMX file is created for all input document. The file is not generated until end of the last document (and therefore cannot be used by other steps down the pipeline).
Enter the full path of the TMX document to generate. If the file exists
alreadys it will be overwritten. You can use the
variable ${rootDir} in the path.
Check for existing entries in an existing TM -- Set this option to lookup in an existing Pensieve TM each entry that may be send for translation. This allows to send only the entries for which you don't have an existing translation. Existing entries are not re-processed and are not placed in the optional TMX output.
Directory of the existing TM -- Enter the directory of the
Pensive TM to lookup for existing entries. This option is enabled only if the
option Check for existing entries in an existing TM is set. You can
use the variable ${rootDir} in the path.