Okapi Framework - Utilities

Translation Package Creation

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://okapiframework.org/wiki/index.php?title=Rainbow

This utility allows you to create a translation package for a given set of input documents.

Input Parameters

Configuration

Package Format Tab

Type of package to create -- Select the type of package you want to create. Note that not all types of input document can be prepared using some type of package. For example, the type Original + RTF Layer cannot be applied to original files that or in some kind of binary formats.

There are several types of package available:

Package Location Tab

Root of the output directory -- Enter the root of the directory where to generate the package. You can use the variable ${ProjDir} to specify the directory of your project.

Package name -- Enter the name of the package to generate.

The combination of both entries makes up the full path of the output directory.

Options Tab

Compress the package into a ZIP file -- Select this option to create a ZIP file that includes the whole package.

Segmentation

Pre-segment the extracted text with the following SRX rules -- Set this option to segment the content of each extracted text unit.

SRX file for the source -- Enter the path or URL of the SRX document where to take the SRX rules for the source language. You can use the variable ${ProjDir} to specify the directory of your project.

SRX file for the target -- Enter the path or URL of the SRX document where to take the SRX rules for the target language. Note that when the same document for both languages (most of the cases), the document is read only once, saving processing time. You can use the variable ${ProjDir} to specify the directory of your project.

Pre-Translation Tab

Pre-translate the extracted text using the following TM -- Set this option to use a translation resource to leverage matches into the prepared document (if the package to generate allows to do so). Select the translation resource to use.

Settings -- Click this button to enter the settings for the selected translation resource. Not all translation resource connectors require settings. the current settings are displayed under the translation resource selection.

Be aware that some translation resources can be slow to access (especially those accessed through Internet), and using them may cause the process to be extremely long to complete.

Penalize matches with a FileName attribute different from the document being processed -- Set this option to filter the TM matches based on the values of FileName attribute. When this option is set, only the matches that have, not only the source text but also the values for FileName that match are leveraged. When this option is not set, the leveraging is not filtered by the FileName attribute.

Penalize matches with a GroupName attribute different from the group being processed -- Set this option to filter the TM matches based on the values of GroupName attribute. When this option is set, only the matches that have, not only the source text but also the values for GroupName that match are leveraged. When this option is not set, the leveraging is not filtered by the GroupName attribute.

Leverage only matches greater or equal to -- Set the threshold under which matches should not be leveraged. the value must be between 0 (leverage anything) and 100 (leverage only exact matches). Note that if there are several matches (regardless of their scores) only the first one is generated in the TMX output and, if the selected package allows it, in the translation file.

About SimpleTM

SimpleTM is a provisional simple TM engine that can match only exact matches, and use attributes. It also allows multiple translations of the same source text. When the attributes are not used, you may get several exact matches, if at least of of them has a different translation, the segment is matched at 99% only. Segments that have the exact same text but inline codes that have different content or are in different order are matched at 99%. Optionally an additional 1% penality can be taken out of the score if the inline codes of the source and/or of the target have different content or are in a different order as the ones of the source text queried. The only types of matches SimpleTM can generate are: 100%, 99%, 98%, and 0%.

Using the attributes with a SimpleTM memory filters out any candidate that does not have matching values for the given attributes:

For example, given an segment to leverage that is:
Text = "Segment text."
FileName = myFile.ext
GroupName = myGroup

Candidates in the TM FileName Off
GroupName Off
FileName On
GroupName Off
FileName Off
GroupName On
FileName On
GroupName On
Text = "Segment text"
FileName = myFile.ext
groupName = myGroup
Match 100% Match 100% Match 100% Match 100%
Text = "Segment text"
FileName = someFile.ext
groupName = myGroup
Match 100% No Match Match 100% No Match
Text = "Segment text"
FileName = myFile.ext
groupName = someGroup
Match 100% Match 100% No Match No Match
Text = "Segment text"
FileName = someFile.ext
groupName = someGroup
Match 100% No Match No Match No Match

In the future, the SimpleTM engine will be replaced by more a powerful TM engine: the Pensieve TM.

About the Pensieve TM

The Pensieve TM engine is Okapi's own TM engine. It is still under development, but can be used already for production work.

Processing Details

The output of this utility varies depending on which type of package is selected.

Generic XLIFF Package

This package is an output where all translatable documents are extracted into XLIFF. You can translate this package with any XLIFF editor. Example of open-source tools that are XLIFF-capable are (among others): OmegaT, Virtaal, Qt Linguist, and Lokalize.

All the files to translated are in the work sub-directory.

The original sub-directory contains all the original documents and the parameters with which they have been extracted. You want to keep this sub-directory: It is used to merge back the extracted files into their original formats.

At the root of the package you will find:

To merge back the translation of this package use the Translation Package Post-Processing utility.

OmegaT Package

this package is an output where all translatable documents are extracted into XLIFF-compatible files and an OmegaT project is created, with all its files and directory structure in place. You can translate this package with OmegaT.

Note that the XLIFF documents generated in this package are intended to be used only with OmegaT.

All the files to translated are in the source sub-directory.

The original sub-directory contains all the original documents and the parameters with which they have been extracted. You want to keep this sub-directory: It is used to merge back the extracted files into their original formats.

At the root of the package you will find:

The tm sub-directory contains the unapproved.tmx, alternate.tmx and leverage.tmx files if they were generated.

The omegat sub-directory contains the project_save.tmx file if it was generated. This TMX document contains all entries set as approved, and well as all 100% matches of the leverage, if one was done.

To merge back the translation of this package:

Original + RTF Layer Package

This package is an output where all the translatable documents are converted into RTF files with Trados-compatible styles. You can translate this package with Trados Translator's Workbench or any compatible tool.

All the files to translated are in the work sub-directory.

The original sub-directory contains all the original documents and the parameters with which they have been extracted. You want to keep this sub-directory: It is used to merge back the extracted files into their original formats.

At the root of the package you will find:

To merge back the translation of this package use the Translation Package Post-Processing utility.

TMX Output

In all packages you may have one or more TMX documents generated. they correspond to possible existing translations obtained from the input documents themselves (some input formats such as PO or XLIFF may have been pre-translated already).

The approved.tmx document contains all the translations that were found in the original document and had an approved target property set to "yes".

The unapproved.tmx document contains all the translations that were found in the original document and had no approved target property or an approved target property set to something else than "yes".

The alternate.tmx document contains all the alternative translation that were found along with the text units in the original document. For example the translation found in the <alt-trans> elements of XLIFF files.

The leverage.tmx document contains the matches that were found if the pre-translation option was set.

Note that if one of these TMX output does not contain any entry it is not generated. Note also that the composition of each TMX document may be different depending on the package you have selected. See the details on each package description above.