Okapi Framework - FiltersOpenOffice / ODF Filter (BETA) |
|
If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=OpenOffice_Filter
The OpenOffice Filter is an Okapi component that implements the IFilter interface for
OpenOffice.org documents: ODT (text), ODS (spreadsheet), ODP (slides), and ODG
(graphics). The filter is implemented in the class
net.sf.okapi.filters.openoffice.OpenOfficeFilter of the Okapi library.
This first filter is actually a wrapper that internally calls a second
filter: The ODF Filter is an Okapi component that implements the IFilter
interface for raw OpenDocument XML files. That filter is implement in the class
net.sf.okapi.filters.openoffice.ODFFilter of the Okapi library.
Having access to the two filters allows you to process OpenOffice.org documents, or directly raw ODF documents if needed.
The input encoding is automatically detected.
Any user-specified encoding is ignored by these filters. they always use UTF-8.
The type of line-breaks of the output is always set to a simple linefeed (LF).
An OpenOffice documents is a ZIP file with several documents inside. The main
one (content.xml) contains the body of the data. But other files
may also contain translatable text: meta.xml and style.xml.
All the different embedded files are treated as sub-documents by the filter.
This means that, for example, when represented in XLIFF, a single ODT extracted
to a single XLIFF document is made up three XLIFF <file> elements:
One for content.xml, one for style.xml, and one for
meta.xml. Note that very often, only content.xml will
have extracted text.
Extract notes -- Set this option to extract the content of
<office:annotation> elements (notes) as translatable text. If this option
is not set, notes are not extracted.
Extract references -- Set this option to extract the content of
<text:bookmark-ref> elements. the content of these element is only
a copy of the content of the referent. It is updated automatically within
OpenOffice, so any translation done for these content will be automatically
overwritten as soon as the document is updated. However, in some cases it may be
useful to be able to have the referenced text as part of the segment where it is
inserted.
This filter has several know issues:
Please, report any other issues to the Issues List of the project.