BOM Conversion Step

From Okapi Framework
Jump to: navigation, search

Overview

This steps adds or removes the Byte-Order-Mark (BOM) to or from UTF-8 and UTF-16 input files.

The Byte-Order-Mark is a special Unicode mark that is used to indicate if a file is in little-endian (U+FFFE) or big-endian (U+FEFF) mode. For more information on the BOM, see http://www.unicode.org/faq/utf_bom.html.

Takes: Raw document. Sends: Raw document.

Parameters

Options Tab

Actions on the Byte-Order-Mark

Remove the Byte-Order-Mark if it is present — Select this option to remove the BOM from the input files if one is detected. By default only the BOM of UTF-8 files are removed.

Remove also UTF-16 BOMs — Select this option to also remove the BOM from UTF-16 files if one is detected. This is not something that is recommended: UTF-16 files must have a BOM.

Add the Byte-Order-Mark if it is not already present — Select this option to add a BOM in the input files if one is not detected. Note that the input files must already be in UTF-8 or UTF-16. When using this option, you also must specify the encoding of each file, so the utility can add the proper type of BOM.

Limitations

Only UTF-8 and UTF-16 files are currently supported, not UTF-32.