Okapi Framework - Steps

Full-Width Conversion Step

- Overview
- Parameters

If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=Full-Width_Conversion_Step

Overview

This step converts characters from or to full-width form in extracted text units.

Takes: Filter events
Sends: Filter events

For historical reasons, some Asian character sets have two display form for some characters: half-width and full-width. This step allows you to convert from one form to the other. The modification is done in the text of the text units for the specified target locale. If there is no text for the specified target, the source text is copied to the target and processed.

Parameters

Convert full width characters to half-width or ASCII equivalents -- Select this option to convert all full-width character to their half-width or ASCII equivalent. For example, the character 'Q' (U+FF31) is converted to 'Q' (U+0051) and the character 'サ' (U+30B5) is converted to 'サ' (U+FF7B).

Additional non-Full-width characters can also be converted:

Include Squared Latin Abbreviations of the CJK Compatibility block -- Set this option to also convert the Squared Latin Abbreviations of the CJK Compatibility block into sequences of non-CJK characters. For example '㏀' (U+33C0) to "kΩ" (U+006B, U+03A9).

Include special characters of the Letter-Like Symbols block -- Set this option to also convert several characters of the Letter-Like Symbols block to character sequences. The conversions are shown in the following table:

Letter-Like Symbol Character sequence
U+2100 a/c
U+2101 a/s
U+2105 c/o
U+2103 °C
U+2109 °F
U+2116 No
U+212A K
U+212B Å

Convert half-width and ASCII characters to full width equivalents -- Select this option to convert all half-width and ASCII characters to their full-width equivalent. For example, the character 'Q' (U+0051) is converted to 'Q' (U+FF31) and the character 'サ' (U+FF7B) is converted to 'サ' (U+30B5).

Convert only the ASCII characters -- Set this option to convert only the ASCII characters to full-width. When this option is set only ASCII characters are affected, half-width chracaters are left half-width.