Okapi Framework - Filters

XLIFF Filter

- Overview
- Processing Details
- Limitations
- Parameters

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=XLIFF_Filter

Overview

The XLIFF Filter is an Okapi component that implements the IFilter interface for XLIFF (XML Localisation Interchange File Format) documents. The filter is implemented in the class net.sf.okapi.filters.xliff.XLIFFFilter of the Okapi library.

XLIFF is an OASIS Standard that defines a file format for transporting translatable text and localization-related information across a chain of translation and localization tools. The XLIFF specification are at http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html.

Processing Details

Input Encoding

The filter decides which encoding to use for the input document using the following logic:

Output Encoding

If the output encoding is UTF-8:

Line-Breaks

The type of line-breaks of the output is the same as the one of the original input.

White Spaces

If a <trans-unit> element has a xml:space="preserve" attribute, the white spaces inside the content of its source and target is left as it. If the xml:space is not present, or as a value different from "preserve", the content of the source and target is unwrapped.

Mapping

The entries of the document are mapped as follow:

XLIFF Document Resource
The approved attribute in <trans-unit>. The approved property of the target in the text unit.
The <note> elements. The note property of the source if the annotates attribute is "source".
The note property of the target if the annotates attribute is "target".
The note property of the text unit in all other cases.
The <alt-trans> element that has.its alttranstype attribute set to proposal. The AltTranslationsAnnotation annotation.
The <source> element. The source text of the text unit.
The <target> element. The target text of the text unit.
The resname attribute. (may also be id if the option is set) The name of the text unit.
The restype attribute The type of the text unit.
The coord attribute. The coordinates property of the text unit.
The target-language attribute. The targetLanguage property of the sub-document for the given <file>.
The <seg-source> element. Segmentation of the text unit.

Limitations

The content of the <sub> element is currently not supported as text. Any element found inside a <bpt>, <ept>, <ph>, and <it> (including <sub>) is included in the code of the parent inline element. A warning is generated when a <sub> element is detected. Such elements are rarely (if ever used).

Parameters

The filter offers the following options:

Use the trans-unit id attribute for the text unit name if there is no resname -- Select this option to use the value of the id attribute of the <trans-unit> element as a fall-back value if resname is not present. This may be useful for XLIFF document that use resname-like values for id and do not bother providing resname.

Ignore the segmentation information in the input -- Set this option to ignore any segmentation information contains in the input XLIFF. When this option is set all segmented content are reduced to a new un-segmented content when extracted. Note that any <alt-trans> data attached to a given segment is also lost.

Escape the greater-than characters -- Set this option to have all greater-than characters ('>') escaped as "&gt;" in the output.

Add the target-language attribute if not present -- Set this option to add the target-language attribute in <file> if it is not present.

Type of output segmentation -- Select one of the type of segmentation representation to use for the output. there are the following choices: