How to Machine-Translate a TMX File
Imagine that you have a TMX file of segments to be translated, and you need to fill it with machine-translation entries so you can use the file as a fall-back TM in a tool where you do not have access to machine translation.
There are several ways to do this with the Okapi tools:
Using the Leveraging Step
If you want to use a machine translation system for which you have a connector, you can easily create a simple pipeline that uses the Leveraging Step.
1. Start Rainbow.
2. Drop your TMX document in the Input List 1 tab.
3. In the Languages and Encoding tab: select the proper languages and encoding. For a TMX document, only the target (output) encoding will be used as the input encoding is detected automatically.
4. In the Other Settings tab: if needed, change the name or location for the output file. We will keep the default which is the same name as the input file, with an extra .out
prepended to the .tmx
extension.
5. Select Utilities > Edit / Execute Pipeline. This opens the Edit / Execute Pipeline dialog box where you create the new pipeline.
We need three steps:
- Raw Document to Filter Events Step to extract the translatable text from the TMX input.
- Leveraging Step to perform the machine translation.
- Filter Events to Raw Document Step to re-write the document back into its original TMX format.
6. Use the Add Step button to add those three steps in that order.
The first and last steps have no parameters as they take their information from Rainbow's main tabs.
7. Select the Leveraging Step to set up your machine translation option. First,make sure the option Leverage the text units with existing translations is set. Those "existing translations" come from the connector you select. In this example we want to use a machine translation system, but you could also use translation memories. In our case an MT system accessible to everyone is Google Translate: Select the Google Translate Services. For more information on other systems see the "Connectors" page.
8. Make sure the option Leverage only if the match is equal or above this score has its value set to 95 or lower. Translation proposals coming from the Google MT Connector have a score of 95. If you set a higher value, no translation will be retained.
9. Make sure the option Fill the target with the leveraged translation is set. This tells the tool to copy the translation coming from the connector into the target.
Note that if there is already a target entry (empty or with text) the machine translation is copied over the existing one. The original target content is not overwritten by the machine translation is the following cases:
- If the text unit is marked as non-translatable.
- If the target as an approved property set to "yes".
None of those condition is likely to exist in text units coming directly from a TMX file.
Notice that you could generate a TMX document with the translation directly from this step, instead of re-writing our original TMX. But in this case we want to translate the original TMX file, keeping all its attributes, comments, etc. So the best way to do this is to re-write the original file with the modified text units.
10. At this point you are ready to process the input file. Click Execute to run the pipeline.
Depending on the number of files you process and their size it may take some time. Note also that the translation is fetched from the Internet so that may slow down the process a bit too.
When it is done you should have an output TMX document in the same directory as the input one, and that file should have the machine translation for each source entry.
Using the Batch Translation Step
In some cases you may have an MT system for which there is no connector in Okapi. You still can use it, as long as a few requirements are fulfilled:
- the MT system must be able to translate HTML files
- the MT system must have a command-line mode
For example, a system that fills those requirements is ProMT. It can translate HTML documents, and can be run from the command-line. Note that some version of ProMT are capable of taking the TMX file directly in input, but for the purpose of this example we assume you cannot do that.
1. Start Rainbow.
2. Drop your TMX document in the Input List 1 tab.
3. In the Languages and Encoding tab: select the proper languages and encoding. For a TMX document, only the target (output) encoding will be used as the input encoding is detected automatically.
4. Select Utilities > Batch Translation. This is a pre-defined pipeline, with a single step: the Batch Translation Step.
5. In the Command line field enter the DOS command that calls ProMT to translate an HTML document. For the input file use the variable ${inputPath}
, for the output use the variable ${outputPath}
. You also need to specify the language pair with the /d
parameter. You can use the two variable ${srcLangName}
and ${trgLangName}
for this.
"C:\Program Files\PRMT9\FILETRANS\FileTranslator.exe" ${inputPath} /as /ac /d:${srcLangName}-${trgLangName} /o:${outputPath}
6. Make sure the option Create the following TMX document is set and enter the full path of the TMX document to create.
6. At this point you are ready to execute the process: click Execute.
This will take the input TMX, convert chunks of its content into temporary HTML file, run the command line on that HTML document, get back the translation from the translated HTML and place it into the TMX output.
Note that because the Batch Translation Step is a step you can alos use it in your own pipelines, along with other steps, to perform a set of customized tasks that corresponds to your specific needs. See "How to Create a Pipeline in Rainbow" for more details.