Okapi Framework - StepsSegmentation Step |
|
- Overview |
|
If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=Segmentation_Step
This step segments the content of extracted text units.
Takes: Filter events
Sends: Filter events
The separation between text units is based on the structure of the original
file format, for example the content of two <p> elements in HTML
gives you two text unit. This step allows you to break down the content of the
text units into smaller parts, usually corresponding to sentences.
The segmentation is done using segmentation rules defined in a SRX document. This step support SRX 2.0. The SRX (Segmentation Rules eXchange) format is a standard way of describing rules on how to break text. It used regular expressions to specify patterns before and after a break or a non-break position.
Text units flagged as non-translatable are not segmented.
Text units that are already segmented are not re-segmented.
Segment the source text using the following SRX rules -- Set this option to segment the source text of the text units.
Enter the full path of the SRX document to use for segmenting the source
text. You can use the variable ${rootDir} in the path.
Edit -- Click this button to open the SRX document in Ratel, the SRX editor of the Okapi framework. Note that when when you exit the editor the file being edited is set as the file to use.
Segment existing target text using the following SRX rules -- Set this option to segment the target text of the text unit, if there is a text for the target locale being processed.
Enter the full path of the SRX document to use for segmenting the target
text. This can be the same document as for the rules for the source. You can use
the variable ${rootDir} in the path.
Edit -- Click this button to open the SRX document in Ratel, the SRX editor of the Okapi framework. Note that when when you exit the editor the file being edited is set as the file to use.
Copy source into target if no target exists -- Set this option to copy the source content into the target if no target is already available.
Verify that a target segment matches each source segment when a target content exists -- Set this option to verify that, if there is a target available, all the source segment have a corresponding target segment in the target content. this laos verifies that both source and target have the same number of segments. Note that this verification does not ensure the the content of a target segment is the translation of its corresponding source text. It only matches their segment ID.