Okapi Framework - Filters

OpenOffice / ODF Filter (BETA)

- Overview
- Processing Details
- Parameters
- Known issues

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=OpenOffice_Filter

Overview

The OpenOffice Filter is an Okapi component that implements the IFilter interface for OpenOffice.org documents: ODT (text), ODS (spreadsheet), ODP (slides), and ODG (graphics). The filter is implemented in the class net.sf.okapi.filters.openoffice.OpenOfficeFilter of the Okapi library.

This first filter is actually a wrapper that internally calls a second filter: The ODF Filter is an Okapi component that implements the IFilter interface for raw OpenDocument XML files. That filter is implement in the class net.sf.okapi.filters.openoffice.ODFFilter of the Okapi library.

Having access to the two filters allows you to process OpenOffice.org documents, or directly raw ODF documents if needed.

Processing Details

Encodings

The input encoding is automatically detected.

Any user-specified encoding is ignored by these filters. they always use UTF-8.

Line-Breaks

The type of line-breaks of the output is always set to a simple linefeed (LF).

Sub-Documents

An OpenOffice documents is a ZIP file with several documents inside. The main one (content.xml) contains the body of the data. But other files may also contain translatable text: meta.xml and style.xml.

All the different embedded files are treated as sub-documents by the filter. This means that, for example, when represented in XLIFF, a single ODT extracted to a single XLIFF document is made up three XLIFF <file> elements: One for content.xml, one for style.xml, and one for meta.xml. Note that very often, only content.xml will have extracted text.

Parameters

Options Tab

Extract notes -- Set this option to extract the content of <office:annotation> elements (notes) as translatable text. If this option is not set, notes are not extracted.

Extract references -- Set this option to extract the content of <text:bookmark-ref> elements. the content of these element is only a copy of the content of the referent. It is updated automatically within OpenOffice, so any translation done for these content will be automatically overwritten as soon as the document is updated. However, in some cases it may be useful to be able to have the referenced text as part of the segment where it is inserted.

Known Issues

This filter has several know issues:

Please, report any other issues to the Issues List of the project.