Rainbow - ID-Based Alignment

From Okapi Framework
Jump to: navigation, search

Overview

This utility allows you to align the source and target content of documents based on the identifiers of the text units. In addition, each text unit can be segmented and this utility provides a basic default alignment of the segments, along with an Alignment Verification editor to fix any issue within those segments.

Note: This utility is designed to be used only with documents formats that provide identifiers (e.g. Properties files). Text units without identifier will not be aligned (but they can be output in a separate TMX file).

The result of the alignment is generated in a single TMX document even when the input is made of several input documents. You can use an attribute with the special variable ${filename} to specify from which document the entry came from. For example: origin=${filename}.

Caller Parameters

  • The list of the source documents to align (Input list 1).
  • The list of the corresponding target documents (Input list 2). If the document is multilingual, the target entry corresponding to the target language you specify is used.
  • The source language.
  • The target language.
  • The default source encoding.
  • The default target encoding.

Parameters

Options Tab

Segmentation

Segment the extracted text using the following SRX rules — Set this option to segment the content of each extracted text unit. If this option is set, the alignment is done first at the level of the text units (using their identifiers), then within each text unit, at the level of the segments. If this option is not set the alignment is done at the level of the text unit only.

SRX file for the source — Enter the path or URL of the SRX document where to take the SRX rules for the source language. You can use the variable ${ProjDir} for the directory of the project.

SRX file for the target — Enter the path or URL of the SRX document where to take the SRX rules for the target language. Note that when the same document for both languages (most of the cases), the document is read only once, saving processing time. You can use the variable ${ProjDir} to specify the directory of the project.

Verification and Correction

Verify in-line codes for text units with single segment — Set this option to check the in-line codes of the text units that have only a single segment. If this option is set the utility checks for possible issues and may prompt the user to validate the alignment. If this option is not set the utility assumes aligned text units with a single segment do not need manual validation.

Use auto-correction automatically — Set this option to automatically try to correct text units that have a different number of source and target segments. The user is always prompted to confirm the modifications made by auto-correction.

API key for Google MT — Enter the API key to use for any call to the Google Translate service (e.g. when calling Get Machine Translation of Source). Leave this field empty to not use the MT service.

Output Tab

TMX Output

Create a TMX document with the aligned entries — Set this option to create a TMX document from the results of the alignment.

Enter the full path of the TMX document where to put the results of the alignment. If the document already exists it will be overwritten. You can use the variable ${ProjDir} to specify the directory of the project.

Create a TMX document with the source entries not found — Set this option to create a TMX document with the source entries not found or skipped during the alignment.

Enter the full path of the TMX document where to put the entries not found or skipped. If the document already exists it will be overwritten. You can use the variable ${ProjDir} to specify the directory of the project.

Generate Trados workarounds — Set this option to generate Trados-specific output in the TMX document. This non-standard notation is needed to prevent loosing some characters such a the backslash or the curly braces when importing the TMX document in some versions of Trados Workbench. The TMX documents generated with this option are not standard TMX files and will likely not import properly in tools correctly supporting TMX.

Exclude segments where the source text matches this regular expression — Set this option to exclude some entries from the TMX output. Enter the regular expression pattern to use. Any entry where the source text matches the given expression will be excluded from the output.

SimpleTM Output

Create a SimpleTM database with the following path — Set this option to generate a SimpleTM database from the results of the alignment.

Enter the full path of the SimpleTM database where to put the results of the alignment. If the document already exists it will be overwritten. You can use the variable ${ProjDir} to specify the directory of the project.

Attributes

Use the following attributes — Set this option to include attributes along with each aligned entry saved in the output files.

Enter the list of pairs name = value for the attributes. Use one line per attribute. There are two special variables that can be used as attribute values:

  • ${filename} is the name of the document being aligned.
  • ${resname} is the identifier for the segment.