This step is intended to simplify the addition or removal of inter-segment whitespace when translating to or from Chinese or Japanese scripts that do not typically use it. The step will perform two separate tasks, depending on the source and target-locales:

  • When translating from a space-delimited language to a non-space-delimited language, whitespace following segment-ending punctuation will be removed.
  • When translating from a non-space-delimited language to a space-delimited language, whitespace will be added following segment-ending punctuation.

This step will perform no action when translating from one space-delimited language to another space-delimited language (for example, from English to French), or when translating between Chinese and Japanese.

The step can be configured to apply its space adjustment to each the following classes of punctuation:

  • Full Stop - Converts Ideographic Full Stop (U+3002) and Full-width Full Stop (U+FF0E) to/from a period.
  • Comma - Converts Ideographic Comma (U+3001) and Full-width Comma (U+FF0C) to/from a comma.
  • Exclamation Point - Converts Full-width Exclamation Mark (U+FF01) to/from an exclamation point.
  • Question Mark - Converts Full-width Question Mark (U+FF1F) to/from a question mark.


This process is not foolproof, as it relies on the assumption that each source segment contains a single sentence, and has also been translated to a single sentence in the target language.