Post-segmentation Inline Codes Removal Step

From Okapi Framework
Revision as of 05:37, 14 November 2025 by Dkonovalyenko (talk | contribs) (→‎Parameters)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

This step attempts to simplify (trim and merge) as many inline codes as possible by looking at each linguistically distinct segment in a TextUnit.

The step must be run after segmentation. Joins adjacent inline codes inside segments, and optionally moves leading and trailing codes from the segment to an inter-segment Textpart. Original (un-merged) codes are saved as okp:merged attributes inside the generated XLIFF file. Trimmed codes are simply written outside the "mrk" elements.

Takes: Filter Events. Sends: Filter Events.

Parameters

Remove leading and trailing codes — Set this option to remove leading and trailing inline codes from the text units and place them outside the segment.

Merge codes — Set this option to merge adjacent inline codes in the text units.

Limitations

Currently bi-lingual formats such as XLIFF, TMX, TTX etc. will not have their codes simplified as the codes may differ in source and target. Codes must align with id's across source and target.