Okapi Framework - FiltersXLIFF Filter |
|
If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=XLIFF_Filter
The XLIFF Filter is an Okapi component that implements the IFilter interface for
XLIFF (XML Localisation Interchange File Format) documents. The filter is implemented in the class
net.sf.okapi.filters.xliff.XLIFFFilter of the Okapi library.
XLIFF is an OASIS Standard that defines a file format for transporting translatable text and localization-related information across a chain of translation and localization tools. The XLIFF specification are at http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html.
The filter decides which encoding to use for the input document using the following logic:
If the output encoding is UTF-8:
The type of line-breaks of the output is the same as the one of the original input.
If a <trans-unit> element has a xml:space="preserve"
attribute, the white spaces inside the content of its source and target is left
as it. If the xml:space is not present, or as a value different
from "preserve", the content of the source and target is unwrapped.
The entries of the document are mapped as follow:
| XLIFF Document | Resource |
|---|---|
The approved attribute in
<trans-unit>. |
The approved property of the target in the text unit. |
The <note> elements. |
The note property of the source if
the annotates attribute is "source".The note property of the target if the annotates
attribute is "target".The note property of the text unit in all other cases. |
The <alt-trans> element
that has.its alttranstype attribute set to proposal. |
The AltTranslationsAnnotation
annotation. |
The <source> element. |
The source text of the text unit. |
The <target> element. |
The target text of the text unit. |
The resname attribute. (may
also be id if the option is set) |
The name of the text unit. |
The restype attribute |
The type of the text unit. |
The coord attribute. |
The coordinates property of the text unit. |
The target-language attribute. |
The targetLanguage property of the
sub-document for the given <file>. |
The <seg-source> element. |
Segmentation of the text unit. |
The content of the <sub> element is currently not supported as
text. Any element found inside a <bpt>, <ept>,
<ph>, and <it> (including <sub>) is included in
the code of the parent inline element. A warning is generated when a <sub>
element is detected. Such elements are rarely (if ever used).
The filter offers the following options:
Use the trans-unit id attribute for the text unit name if there is no
resname -- Select this option to use the value of the id
attribute of the <trans-unit> element as a fall-back value if
resname is not present. This may be useful for XLIFF document that use
resname-like values for id and do not bother providing
resname.
Ignore the segmentation information in the input -- Set this
option to ignore any segmentation information contains in the input XLIFF. When
this option is set all segmented content are reduced to a new un-segmented
content when extracted. Note that any <alt-trans> data attached to
a given segment is also lost.
Escape the greater-than characters -- Set this option to have
all greater-than characters ('>') escaped as ">" in
the output.
Add the target-language attribute if not present -- Set this
option to add the target-language attribute in <file>
if it is not present.
Type of output segmentation -- Select one of the type of segmentation representation to use for the output. there are the following choices:
<seg-source> element
only if the original text unit was already represented like this in the
input file.<seg-source>
element, even if the original text unit was not segmented, and even if the
whole content of the text unit is made of a single segment.<seg-source> elements are removed.