Id-Based Aligner Step

From Okapi Framework
Revision as of 19:20, 4 June 2016 by Ysavourel (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

This step aligns the text units of two input files based on matching ids. The ids are taken from the name (TextUnit.getName()) of each text unit. Any filter that produces unique names (i.e., id) for its text units will work with this aligner, for example the Properties Filter.

Takes: filter events. Sends: filter events.

If the option Generate a TMX file is set, the events returned are unchanged. If the option is not set, in the events returned each text unit is a new (aligned) bi-lingual text unit. The text units that are in the source but not in the target generate a warning. The text units in the target but not in the source are ignored.

The process expects both input to be non-segmented.

The 'source' file contains is the first input file and provides the source content. The 'target' file is the second input file and provides the target content. If the 'target' file is in a monolingual format (like a Java properties file) the source extracted from that file is used as target content. If the 'target' file is a multilingual file (like an XLIFF document) the target extracted from that file is used as target content.

If the 'target' file is multilingual, in addition to match in id, the step looks if the source content of both text unit with that same name have also the same source text. If they do not, not alignment is made.

Parameters

Generate a TMX file — Set this option to produce a TMX file. When this option is set, the event returned by the step are unchanged. When this option is not set, each text unit in the returned events is a new text unit with the aligned source and target content. Note: for target <tuv> data to be generated in the TMX, the "Copy to/over the target" option must also be checked.

TMX output path — Output path of the TMX file.

Fall back to source text — If no target text available, use the source text.

Copy to/over the target — Copy the target. Existing target will be lost, and the target will not be segmented. If the entry from the 'target' file is set to approved, the property is passed along too.

Create an alternate translation annotation — Set this option to attach an alternate translation annotation to the processed entry.

Suppress TUs with no target — Set this option to prevent the step from passing on any text units that lack a target.

Limitations

  • Assumes that each text unit has a unique name value. Make sure the filter being used is one that produces unique names (TextUnit.getName()) for all text units in the documents.
  • This step aligns the text units, not the possible segments inside the text units.