RTF Conversion Step

From Okapi Framework
Jump to navigation Jump to search


This steps removes the RTF layer of the input files.

Takes: Raw document. Sends: Raw document.

Only the visible text of the RTF is output. Hidden or deleted text, as well as images or other special content is discarded. Any formatting done other than using spaces, line-breaks and tabs is discarded.

You can use this step to post-process a Trados-Tagged RTF file: only the translated sections are output.

Note that you can also process a Trados-Tagged RTF file using the Trados-Tagged RTF Filter which gives you access to both the source and the target text along with their inline codes.

A warning is issued for each line where one or more characters cannot be represented in the output encoding selected.


Use Byte-Order-Mark for UTF-8 output — Set this option to add a Byte-Order-Mark (BOM) at the beginning of the file if the output encoding is UTF-8. For more information on the BOM see http://www.unicode.org/faq/utf_bom.html.

Try to update the encoding declarations — Set this option to automatically update the encoding declaration in the output file. This option works for XML and HTML files.

In XML files, the encoding="..." attribute of the XML declaration is updated (or added if it is not present yet). If the file has no XML declaration, nothing is updated.

In HTML files, the step looks for the pattern content=... charset=.... When found, the value of charset is set to the output encoding. If the pattern is not found, nothing is updated.

Type of line-break to use — Select the type of line-break to use in the output files. The choices are:

  • DOS/Windows Carriage Return + Line-Feed, \r\n, 0x0D+0x0A
  • Unix/Linux Line-Feed, \n, 0x0A
  • Macintosh Carriage-Return, \r, 0x0D


  • Characters encoded as SYMBOL fields are not currently output. These characters are, for example, drawing symbols inserted from Word's menu and using fonts such as Dingbats or Wingdings.
  • The automatic update of the encoding declarations for XML and HTML works based on pattern matching, not by parsing the files, and therefore is not perfectly accurate. For example a commented out declaration could be updated.