ITS Components: Difference between revisions
m (1 revision imported) |
No edit summary |
||
Line 922: | Line 922: | ||
Okapi provides two main components for XLIFF 1.2: | Okapi provides two main components for XLIFF 1.2: | ||
* the [[XLIFF Filter]] which is used to read an existing XLIFF 1.2 document, extract its content and rewrite back the modified document. | * the [[XLIFF Filter]] which is used to read an existing XLIFF 1.2 document, extract its content and rewrite back the modified document. | ||
* the [http:// | * the [http://okapiframework.org/javadoc/net/sf/okapi/common/filterwriter/XLIFFWriter.html XLIFF Writer] which provides a way to create XLIFF document from the API. | ||
Both components have extensive ITS support. | Both components have extensive ITS support. |
Latest revision as of 08:30, 5 June 2016
Overview
This page provides a status of what components implement ITS 1.0 and 2.0 and to what degree.
The specification for ITS 1.0 is here: http://www.w3.org/TR/its/
The specification for ITS 2.0 is here: http://www.w3.org/TR/its20/
XML and HTML5
Okapi offers extensive ITS support for the XML Filter and the HTML5-ITS Filter, both for global and local markup.
Legend:
- ITS Engine - A Yes indicates that the data category is processed and the resulting information available using the ITraversal interface.
It also indicates that the implementation passes the ITS Test Suite for that data category. - Read - The markup existing in the input document is interpreted and represented in the extracted Okapi resources.
- Modify - If the Okapi representation of that data category is modified, it is modified in the output document as well.
- Remove - If the Okapi representation of that data category is removed, it is removed in the output document as well.
- Add - If an Okapi representation of that data category is added, the corresponding markup is also added in the output document.
- Global for structural - Denotes the capabilities for the data category when defined in a global rule and when related to an element that is not "within text".
- Global for inline - Denotes the capabilities for the data category when defined in a global rule and when related to an element that is declared as "within text".
- Local on structural - Denotes the capabilities for the data category when defined locally on an element that is not "within text".
- Local on inline - Denotes the capabilities for the data category when defined locally on an element that is declared as "within text".
- TBD - Means "To be Decided". TBI - Means "To Be Improved". N/A - Means "Not Applicable".
Data Category | Scope | ITS Engine | XML Filter | HTML5-ITS Filter | Okapi Representation | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Read | Modify | Remove | Add | Read | Modify | Remove | Add | ||||
Translate | Global for structural | Yes | Yes | N/A | N/A | N/A | Yes | N/A | N/A | N/A | Not to translate: not extracted |
Global for inline | Yes | Yes | N/A | N/A | N/A | Yes | N/A | N/A | N/A | Not to translate: inline code | |
Local on structural | Yes | Yes | No | No | TBD | Yes | No | No | TBD | Not to translate: not extracted | |
Local on inline | Yes | Yes | No | No | TBD | Yes | No | No | TBD | Not to translate: inline code | |
Localization Note | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | Note property on TextUnit |
Global for inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | LOCNOTE annotation on <mrk> |
|
Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | Note property on TextUnit | |
Local on inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | LOCNOTE annotation on <mrk> |
|
Terminology | Global for structural | Yes | Yes, TBI | TBD | TBD | TBD | Yes, TBI | TBD | TBD | TBD | TermAnnotation on source TextContainer |
Global for inline | Yes | Yes, TBI | TBD | TBD | TBD | Yes, TBI | TBD | TBD | TBD | TermAnnotation on source TextContainer | |
Local on structural | Yes | Yes, TBI | TBD | TBD | TBD | Yes, TBI | TBD | TBD | TBD | TermAnnotation on source TextContainer | |
Local on inline | Yes | Yes, TBI | TBD | TBD | TBD | Yes, TBI | TBD | TBD | TBD | TermAnnotation on source TextContainer | |
Directionality | Global for structural | Yes | Not supported | ||||||||
Global for inline | Yes | Not supported | |||||||||
Local on structural | Yes | Not supported | |||||||||
Local on inline | Yes | Not supported | |||||||||
Language Information | Global for structural | Yes | Not supported | ||||||||
Global for inline | Yes | Not supported | |||||||||
Local on structural | Yes | Not supported | |||||||||
Local on inline | Yes | Not supported | |||||||||
Element Within Text | Global for structural | Yes | Yes, partially | TBD | TBD | TBD | Yes, partially | TBD | TBD | TBD | TextUnit or inline code |
Global for inline | Yes | Yes, partially | TBD | TBD | TBD | Yes, partially | TBD | TBD | TBD | TextUnit or inline code | |
Local on structural | Yes | Yes, partially | TBD | TBD | TBD | Yes, partially | TBD | TBD | TBD | TextUnit or inline code | |
Local on inline | Yes | Yes, partially | TBD | TBD | TBD | Yes, partially | TBD | TBD | TBD | TextUnit or inline code | |
Domain | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | DOMAIN annotation on TextUnit |
Global for inline | Yes | Not supported | |||||||||
Text Analysis | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | TA annotation on source TextContainer |
Global for inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | TA annotation on Code | |
Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | TA annotation on source TextContainer | |
Local on inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | Yes | TA annotation on Code | |
Locale Filter | Global for structural | Yes | Yes | N/A | N/A | N/A | Yes | N/A | N/A | N/A | Not to localize: not extracted |
Global for inline | Yes | Yes | N/A | N/A | N/A | Yes | N/A | N/A | N/A | Not to localize: inline code | |
Local on structural | Yes | Yes | No | No | No | Yes | No | No | No | Not to localize: not extracted | |
Local on inline | Yes | Yes | No | No | No | Yes | No | No | No | Not to localize: inline code | |
Provenance | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | PROV annotation on TextUnit |
Global for inline | Yes | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | |
Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | PROV annotation on TextUnit | |
Local on inline | Yes | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Yes TBI | PROV annotation on Code | |
External Resource | Global for structural | Yes | Yes | No | No | No | Yes | No | No | No | EXTERNALRES annotation on TextUnit |
Global for inline | Yes | Yes | No | No | No | Yes | No | No | No | EXTERNALRES annotation on Code | |
Target Pointer | Global for structural | Yes | TBD | No | No | No | TBD | No | No | No | Target content of the TextUnit |
Global for inline | Yes | TBD | No | No | No | TBD | No | No | No | Target content of the Code | |
Id Value | Global for structural | Yes | Yes | N/A | N/A | N/A | Yes | N/A | N/A | N/A | TextUnit name |
Global for inline | Yes | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | |
Local on structural | Yes | Yes | N/A | N/A | N/A | Yes | N/A | N/A | N/A | TextUnit name | |
Local on inline | Yes | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | |
Preserve Space | Global for structural | Yes | Yes | No | No | No | Yes | N/A | N/A | N/A | Preserve Space on TextUnit |
Global for inline | Yes | TBD | No | No | No | TBD | N/A | N/A | N/A | PRESERVEWS annotation on Code | |
Local on structural | Yes | Yes | No | No | No | No | N/A | N/A | N/A | Preserve Space on TextUnit | |
Local on inline | Yes | TBD | No | No | No | TBD | N/A | N/A | N/A | PRESERVEWS annotation on Code | |
Localization Quality Issue | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | LQI annotation on TextUnit |
Global for inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | LQI annotation on source/target TextContainer or inline code | |
Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | LQI annotation on TextUnit | |
Local on inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | Yes TBI | LQI annotation on source/target TextContainer or inline code | |
Localization Quality Rating | Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | LQR annotation on TextUnit |
Local on inline | Yes | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | |
MT Confidence | Global for structural | Yes | Yes, TBI | TBD | TBD | TBD | Yes, TBI | TBD | TBD | TBD | MTCONFIDENCE annotation on source TextContainer |
Global for inline | Yes | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | |
Local on structural | Yes | Yes, TBI | TBD | TBD | TBD | Yes, TBI | TBD | TBD | TBD | MTCONFIDENCE annotation on source TextContainer | |
Local on inline | Yes | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD | |
Allowed Characters | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | ALLOWEDCHARS annotation on TextUnit |
Global for inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | ALLOWEDCHARS annotation on Code | |
Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | ALLOWEDCHARS annotation on TextUnit | |
Local on inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | Yes | ALLOWEDCHARS annotation on Code | |
Storage Size | Global for structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | STORAGESIZE annotation on TextUnit |
Global for inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | STORAGESIZE annotation on Code | |
Local on structural | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | TBD | STORAGESIZE annotation on TextUnit | |
Local on inline | Yes | Yes | TBD | TBD | TBD | Yes | TBD | TBD | Yes | STORAGESIZE annotation on Code |
You can find more information about ITS on this page.
XLIFF 1.2
Okapi provides two main components for XLIFF 1.2:
- the XLIFF Filter which is used to read an existing XLIFF 1.2 document, extract its content and rewrite back the modified document.
- the XLIFF Writer which provides a way to create XLIFF document from the API.
Both components have extensive ITS support.
- The
http://www.w3.org/ns/its-xliff/
namespace (using the prefixitsxlf
in this page) is used for the ITS information that cannot be directly mapped to XLIFF and is not represented by a native ITS construct. - The recommended mapping between ITS 2.0 and XLIFF 1.2 is defined at http://www.w3.org/International/its/wiki/XLIFF_1.2_Mapping.
Notes:
- Not all ITS data categories can be used in all XLIFF elements. ITS markup that is not at the locations defined in the table is not processed.
- For the XLIFF Filter, the Modify, Add and Remove actions apply only to annotation in the target container.
- The Modify, Add and Remove actions listed here are for the XLIFF Filter only. That is: to perform the same action on the original document, the filter used to create the XLIFF document must also support for those actions.
ITS is implemented as followed:
Data Category | XLIFF 1.2 Markup | XLIFF 1.2 Filter | Okapi Representation | XLIFF 1.2 Writer | ||||
---|---|---|---|---|---|---|---|---|
Read | Rewrite | Modify | Remove | Add | ||||
Translate | translate in <trans-unit> |
Yes | Yes | No | No | No | ITextUnit.[is/setIs]Translatable() |
Yes |
mtype='protected' in <mrk> or inline code |
Yes | Yes | No | No | Yes | Inline code or TRANSLATE annotation on Code | Yes | |
Localization Note | <note> element in the text unit. |
Yes | Yes | TBD | TBD | TBD | NOTE property on TextUnit | Yes |
comment='TEXT' anditsxlf:locNoteType='alert|description' in <mrk> |
Yes | Yes | Yes | Yes | Yes | LOCNOTE annotation on Code | Yes | |
Terminology | mtype='term' anditsxlf:termInfo itsxlf:termInfoRef anditsxlf:termConfidence in <mrk> |
Yes TBI | Yes TBI | TBD | TBD | TBD | TERM annotation on Code | Yes |
Directionality | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Language Information | xml:lang in <mrk> |
Yes | Yes | Yes | Yes | Yes | LANG annotation on Code | Yes |
Element Within Text | Inline codes | Yes | Yes | Yes | Yes | Yes | Inline codes | Yes |
Domain | itsxlf:domains in <trans-unit> |
Yes | Yes | TBD | TBD | Yes | DOMAIN annotation on TextUnit | Yes |
Text Analysis | ITS attributes in <mrk> |
Yes TBI | Yes TBI | TBD | TBD | Yes | TA annotation on Code | Yes |
Locale Filter | ITS attributes in <trans-unit> (and possibly translate="no" ) |
Yes | Yes | No | No | No | LOCFILTER annotation on TextUnit | Yes |
ITS attributes in <mrk> (and possibly mtype="protected" ) |
Yes | Yes | TBD | TBD | Yes | LOCFILTER annotation on Code | Yes | |
Provenance | ITS attributes in <file> , <group> , <trans-unit> , <source> , <target> |
Yes | Yes | Yes | Yes | Yes | PROV annotation (ITSProvenanceAnnotations) on StartSubDocument, StartGroup, TextUnit or TextContainer | Yes |
ITS attributes in <mrk> |
Yes | Yes | TBD | TBD | Yes | PROV annotation (ITSProvenanceAnnotations) on Code | Yes | |
External Resource | itsxlf:externalResourceRef in <trans-unit> |
Yes | Yes | TBD | TBD | TBD | EXTERNALRES annotation on TextUnit | Yes |
itsxlf:externalResourceRef in inline code |
Yes | Yes | TBD | TBD | Yes | EXTERNALRES annotation on Code | Yes | |
Id Value | resname in <trans-unit> |
Yes | Yes | No | No | No | ITextUnit.[get/set]Name() |
Yes |
Preserve Space | xml:space in <trans-unit> |
Yes | Yes | No | No | No | ITextUnit.[preserve/setPreserve]Whitespaces() |
Yes |
xml:space in <mrk> |
Yes | Yes | TBD | TBD | Yes | PRESERVEWS annotation on Code | Yes | |
Localization Quality Issue | ITS attributes in <source> or <target> |
Yes | Yes | Yes | Yes | Yes | LQI annotation (ITSLQIAnnotations) on TextContainer | Yes |
ITS attributes in <mrk> |
Yes | Yes | TBD | TBD | Yes | LQI annotation (ITSLQIAnnotations) on Code | Yes | |
Localization Quality Rating | ITS attribute in <target> |
Yes | Yes | TBD | TBD | TBD | LQR annotation on TextContainer | Yes |
ITS attribute in <mrk mtype="seg"> |
Yes | Yes | TBD | TBD | TBD | LQR annotation on Segment | Yes | |
ITS attribute in <mrk> |
Yes | Yes | TBD | TBD | Yes | LQR annotation on Code | Yes | |
MT Confidence | ITS attribute in <target> |
Yes | Yes | TBD | TBD | TBD | MTCONFIDENCE annotation on TextContainer | Yes TBI |
ITS attribute in <mrk mtype="seg"> |
Yes | Yes | TBD | TBD | TBD | MTCONFIDENCE annotation on Segment | Yes TBI | |
Allowed Characters | ITS attribute in <source> or <target> |
Yes | Yes | TBD | TBD | TBD | ALLOWEDCHARS annotation on TextContainer | Yes |
ITS attribute in <mrk> |
Yes | Yes | Yes | Yes | Yes | ALLOWEDCHARS annotation on Code | Yes | |
Storage Size | ITS attributes in in <source> or <target> |
Yes | Yes | TBD | TBD | TBD | STORAGESIZE annotation on TextContainer | Yes |
ITS attributes in <mrk> |
Yes | Yes | Yes | Yes | Yes | STORAGESIZE annotation on Code | Yes |
You can find more information on the XLIFF 1.2 Filter on this page.
OpenOffice Filter
Okapi provides support for several data categories in the OpenOffice Filter.
The ODFFilter class implements support for Translate, Localization Note, Terminology and Locale Filter data categories, for local markup.
Enrycher Step
Support for the Enrycher Web service is implemented in the Enrycher Step.
This step allows you to markup the source content of text units with Text Analysis annotations (TA annotation on inline codes).
LanguageTool Step
The LanguageTool library is used by the LanguageTool Step to annotate extracted content with Localization Quality Issue items.
The step can be used separately or from within CheckMate, an application dedicated to quality verification.
Microsoft Batch Translation Step
The Domain data category can be used to select the Microsoft Translator Hub engine to utilize by the Microsoft Batch Translation Step.
Quality Check Step
The Quality Check Step implements support for the Allowed Characters, Storage Size and Localization Quality Issue data categories.
The step can be used separately or from within CheckMate, an application dedicated to quality verification.
Terminology Extraction Step
The Text Analysis and the Terminology data categories can be utilized by the Term Extraction Step to extract term candidates.