Okapi Framework - Steps

Segmentation Step

- Overview
- Parameters

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=Segmentation_Step

Overview

This step segments the content of extracted text units.

Takes: Filter events
Sends: Filter events

The separation between text units is based on the structure of the original file format, for example the content of two <p> elements in HTML gives you two text unit. This step allows you to break down the content of the text units into smaller parts, usually corresponding to sentences.

The segmentation is done using segmentation rules defined in a SRX document. This step support SRX 2.0. The SRX (Segmentation Rules eXchange) format is a standard way of describing rules on how to break text. It used regular expressions to specify patterns before and after a break or a non-break position.

Text units flagged as non-translatable are not segmented.

Text units that are already segmented are not re-segmented.

Parameters

Source

Segment the source text using the following SRX rules -- Set this option to segment the source text of the text units.

Enter the full path of the SRX document to use for segmenting the source text. You can use the variable ${rootDir} in the path.

Edit -- Click this button to open the SRX document in Ratel, the SRX editor of the Okapi framework. Note that when when you exit the editor the file being edited is set as the file to use.

Target

Segment existing target text using the following SRX rules -- Set this option to segment the target text of the text unit, if there is a text for the target locale being processed.

Enter the full path of the SRX document to use for segmenting the target text. This can be the same document as for the rules for the source. You can use the variable ${rootDir} in the path.

Edit -- Click this button to open the SRX document in Ratel, the SRX editor of the Okapi framework. Note that when when you exit the editor the file being edited is set as the file to use.

Options

Copy source into target if no target exists -- Set this option to copy the source content into the target if no target is already available.

Verify that a target segment matches each source segment when a target content exists -- Set this option to verify that, if there is a target available, all the source segment have a corresponding target segment in the target content. this laos verifies that both source and target have the same number of segments. Note that this verification does not ensure the the content of a target segment is the translation of its corresponding source text. It only matches their segment ID.