Okapi Framework - Steps

Sentence Alignment Step

- Overview
- Parameters

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=Sentence_Alignment_Step

Overview

This step aligns the sentences of each text unit from two documents.

Takes: Filter events
Sends: Filter events

Currently the events sends by this step are the same as the events it received (and not the generated aligned text units). This will be implemented in a future version.

Text units from the source and target documents must be perfectly synchronized (aligned). For example, if the source document has more text units than the target document an error will be generated. The step segments both source and target text units as they are processed using an internal SRX file. In future versions the segmentation rules will be configurable.

The aligner algorithm takes the sentences in the source and target TextUnits and finds the best possible alignment based on the character lengths of the sentences. Internal parameters take into account that some languages translate into fewer (or more) characters. Possible match types produced are: 1-1, 2-1, 1-2, 0-1, 1-0, 2-0, 2-3, 3-2 etc.

Entries set as non-translatable are not processed.

Parameters

Generate the following TMX document -- Set this option to generate a TMX files with the aligned entries.

Enter the directory of the TMX document to generate. If the file already exists it will be overwritten.

Segment the source content -- Set this option to segment the source content before trying to align it. If this option is not set the content is expected to be already segmented. If this option is set and the content is already segmented, the existing segmentation will be reset to the new one.

Use custom source segmentation rules -- Set this option to use a specified SRX file for segmenting the source. If this option is not set, and segmentation is required, the default rules are used.

Enter the full path of the SRX file to use for segmenting the source.

Segment the target content -- Set this option to segment the target content before trying to align it. If this option is not set the content is expected to be already segmented. If this option is set and the content is already segmented, the existing segmentation will be reset to the new one.

Use custom target segmentation rules -- Set this option to use a specified SRX file for segmenting the target. If this option is not set, and segmentation is required, the default rules are used.

Enter the full path of the SRX file to use for segmenting the target.