Okapi Framework - Filters

Overview

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://okapiframework.org/wiki/index.php?title=Filters

Several filters are available with the Okapi Framework.

The following table lists all the pre-defined configuration for each of the filters distributed with Okapi:

Pre-defined Configurations Filters Configurable Useability
DTD
Default DTD with text, e.g. XUL DTDs.
DTD Filter Yes production
HTML
Default HTML, XHTML and other HTML-like files (allows lax syntax).
HTML (Well-formed)
XHTML files and HTML with strict syntax. Use this configuration to get more structural information.
HTML Filter Yes production
IDML
Adobe InDesign XML files. Early aplha version.
IDML Filter No alpha
JSON
Default JSON files.
JSON Filter Yes production
OpenOffice.org Documents
OpenOffice.org different files formats: ODT, ODS, ODP, ODG, OTT, OTS, OTP, OTG files.
OpenOffice.org Filter Yes beta
but useable in production
Microsoft Office Documents
Microsoft Office documents: DOCX, XLSX, PPTX files.
OpenXML Filter Yes production
Pensieve TM
For Pensieve translation memories.
Pensieve TM Filter No production
PHP Content Default
For default PHP content files.
PHP Content Filter Yes beta
useable with care
Plain Text
Default plain text files.
Plain Text (Trim Trail)
Text files; trailing spaces and tabs removed from extracted lines.
Plain Text (Trim All)
Text files; leading and trailing spaces and tabs removed from extracted lines.
Plain Text (Paragraphs)
Text files extracted by paragraphs (separated by 1 or more empty lines).
Spliced Lines (Backslash)
Spliced lines filter with the backslash character (\) used as the splicer.
Spliced Lines (Underscore)
Spliced lines filter with the underscore character (_) used as the splicer.
Spliced Lines (Custom)
Spliced lines filter with a user-defined splicer.
Plain Text (Regex, Line=Paragraph)
Plain text files using regex-based linebreak search. Extracts by lines.
Plain Text (Regex, Block=Paragraph)
Plain text files using regex-based linebreak search. Extracts by paragraphs.
PlainText Filter Yes production
PO (Standard)
Classic bilingual PO files.
PO (Monolingual)
Monolingual PO files (msgid is a real ID, not the source text
PO Filter Yes production
Java Properties
Default Java properties files.
Java Properties (Output not escaped)
Java properties files with an output where extended characters are not escaped to \uHHHH
Skype Language Files
Skype language properties files (including support for HTML codes)
Properties Filter Yes production
Regex Default
For basic text file.
SRT Sub-Titles
For Sub-Titles SRT (Sub-Rip Text) files.
Text (Line=Paragraph)
For plain text file where each line is a paragraph.
Text (Block=Paragraph)
For plain text files where paragraphs are delimited by empty lines
Regex Filter Yes production
Ruby on Rails YAML
For YAML files used with Ruby on Rails.
Ruby on Rails YAML Filter Yes beta
but useable in production
Table Files
Table-like files such as tab-delimited, CSV, fixed-width columns, etc.
Table (Comma-Separated Values)
For comma-separated values, optional header with field names.
Table (Fixed-Width Columns)
Fixed-width columns table padded with white-spaces.
Table (Tab-Separated Values)
Columns, separated by one or more tabs.
Table Filter Yes production
TMX
For Translation Memory eXchange (TMX) documents.
TMX Filter Yes production
TTX
Trados TTX documents.
TTX (without forced Tuv in output)
For Trados TTX documents without forcing Tuv in output.
TTX Filter Yes beta
but useable in production
TS
For Qt TS files.
TS Filter Yes production
Trados-Tagged RTF
For Trados-tagged RTF files - Read only (no output supported)
Trados-Tagged RTF Filter No beta
but useable in production
Vignette Export/Import Content Default
For Vignette files created by or made for the Export/Import Content function..
Vignette Filter Yes production
XLIFF
For XML Localisation Interchange File Format (XLIFF) documents.
XLIFF Filter Yes production
Generic XML
Default XML support.
RESX
For Microsoft RESX documents (without binary data)
Mozilla RDF
For Mozilla RDF documents.
Java Properties XML
For Java properties file in XML format
Android Strings
For Android strings XML documents
WiX Localization
For WiX (Windows Installer XML) localization files
XML Filter Yes beta
but useable in production
XML Stream
Generic XML (handle large XML documents)
DITA
DITA documents
Java Properties XML + HTML
Java Properties XML with Embedded HTML
XML Stream Filter Yes beta
but useable in production

See the online Developer's Guide for more information on how to use the filters in your own scripts and applications, and how to develop new filters.