JSON Filter

Overview

The JSON Filter is an Okapi component that implements the IFilter interface for JSON (Javascript Object Notation).

The implementation is based on the JSON specifications: http://www.json.org/

The following is an example of a very simple JSON file. The translatable text is highlighted:

{"menu": {
  "value": "File",
   "popup": {
      "menuitem": [
         {"value": "New"},
         {"value": "Open"},
         {"value": "Close"}
      ]
   }
}}

Processing Details

Input Encoding

JSON files are normally in one of the Unicode encoding, but the filter supports any encoding. It decides which encoding to use for the input file using the following logic:

If the file has a Unicode Byte-Order-Mark:
- Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used.
Else, if a header entry with a charset declaration exists in the first 1000 characters of the file:
- If the value of the charset is "charset" (case insensitive):
  - Then the file is likely to be a template with no encoding declared, so the current encoding (auto-detected or default) is used.
  - Else, the declared encoding is used. Note that if the encoding has been detected from a Byte-Order-Mark and the encoding declared in the header entry does not match, a warning is generated and the encoding of the Byte-Order-Mark is used.
Otherwise, the input encoding used is the default encoding that was specified when setting the filter options.

Output Encoding

If the output encoding is UTF-8:

If the input encoding was also UTF-8, a Byte-Order-Mark is used for the output document only if one was detected in the input document.
If the input encoding was not UTF-8, no Byte-Order-Mark is used in the output document.

Line-Breaks

The type of line-breaks of the output is the same as the one of the original input.

Comments

Though not technically legal in JSON these comment types are supported:

// comment
# comment
/* comment */
<!-- comment -->

Parameters

Options Tab

Stand-alone strings

Extract strings without associated key — Set this option to extract string that are not associated directly to a key value.

Strings with keys

Extract all key/strings pairs — Set this option to extract all strings that have a key associated. If a regular expression for exceptions is defined, the strings that have a key matching the expression are not extracted.

Do not extract key/string pairs — Set the option to not extract any string that has an associated key. If a regular expression for exceptions is defined, the strings that have a key matching the expression are extracted.

Excepted when the key matches the following regular expression — Enter a regular expression that correspond to the keys that should have a behavior inverse to the default behavior you have selected for the key/strings pairs. For example, you could exclude a key-value with key. In combination with Use the full key path you can exclude all nested elements in a JSON structure with ^.*?/excludedStructure/.*

Use the key as the resname — Set this option to use the value of the key as the value of the name of the extracted item (resname in XLIFF).

Use the full key path — Set this option to use the full key path in the resname. For example: /menu/value/popup/menuitem/value. The use key name as resname option must be set for this option to take effect. If enabled, exception regular expressions apply to the full path.

Include leading "/" on key path — Set this option to have a leading character '/' in the full key path.

Regex matching keys that are notes, values of which to appear as <note> in XLIFF — Specify regular expression. The values of the matching keys will be transferred to <note> elements in XLIFF.

Regex matching keys who's values are added as TextUnit Metadata — Specify regular expression. The values of the matching keys will be written out as <context-group> elements in XLIFF.

New Extraction Rules >= version M39

If specified these will override the corresponding rules above.

Regex matching keys who's values are extracted (overrides extraction exceptions)

Regex matching keys that are notes, values of which to appear as <note> in XLIFF

Regex matching keys which are ID's (resname in XLIFF), overrides "use key as resname"

Hint: If you have the following json, that contains the actual key in the value of a neighboring key/value pair

[
  {
	"key": "datePicker_marchMonth",
	"text": "March"
  },
  {
	"key": "datePicker_aprilMonth",
	"text": "April"
  }
]

and define simply the regex "key" in this configuration option, you would get the following xliff extracted

<trans-unit id="tu1" resname="datePicker_marchMonth" xml:space="preserve">
  <source xml:lang="en-US">March</source>
  <target xml:lang="de-DE"></target>
</trans-unit>
<trans-unit id="tu2" resname="datePicker_aprilMonth" xml:space="preserve">
  <source xml:lang="en-US">April</source>
  <target xml:lang="de-DE"></target>
</trans-unit>

Regex matching keys who's values are added as TextUnit Metadata

Regex matching keys that are numbers, values of which will be extracted as maxwidth property in XLIFF

If specified, its extracted value is used as maxwidth of all other elements of the array on that level.
There is only one matching array element for the regex allowed on each hierarchy level of the regex.
If there are nested array levels, for all parent-child levels and also different sibling levels different maxwidth values can be defined.
If there are different values defined, still the key of all definitions can be the same.
If on a sublevel no key matches the regex within the current level, but a key on a higher level does, the definition on the higher level determines the max length of the deepest hierarchy level (and only this one - not higher levels) and its siblings without matching key.
If on a higher level a key matches and on all lower levels also keys match, than for all elements of the corresponding levels values are extrated. BUT: For the higher level(s) the matching key must be defined after the last child element.

The size unit property to use when maxwidth poperties are extracted The string that is entered here is used as value for the size-unit attribute of the trans-unit in xliff with length restriction.

Example FPRM Settings:

Regex rules apply to key names.

extraction rules (use instead of rule exceptions): extractionRules=/widgets/body.*

note rules (add values to TextUnits as notes): noteRules=/widgets/name.*

id rules (overrides useKeyAsName): idRules=/widgets/id.*

generic metadata (matched key:values are added as metadata to TextUnit): genericMetaRules=/widgets/image.*

Content Processing Tab

Process text content with this sub-filter — Specify an Okapi filter ID (e.g. okf_html) to process the content of all translatable text with that filter. Leave this field blank for default behavior.

Find inline codes by patterns defined below — Set this option to use the specified regular expressions on the text of the extracted items. Any match will be converted to an inline code.

Note: This option cannot be used together with the sub-filtering option.

By default the expression is:

((%(([-0+#]?)[-0+#]?)((\d\$)?)(([\d\*]*)(\.[\d\*]*)?)[dioxXucsfeEgGpn])
|((\\r\\n)|\\a|\\b|\\f|\\n|\\r|\\t|\\v)
|(\{\d.*?\}))

Add — Click this button to add a new rule.

Remove — Click this button to remove the current rule.

Move Up — Click this button to move the current rule upward.

Move down — Click this button to move the current rule downward.

[Top-right text box] — Enter the regular expression for the current rule. Use the Modify button to enter the edit mode. The expression must be a valid regular expression. You can check the syntax (and the effect of the rule) as it automatically tests it against the test data in the text box below and shows the result in the bottom-right text box.

Modify — Click this button to edit the expression of the current rule. This button is labeled Accept when you are in edit mode.

Accept — Click this button to save any changes you have made to the expression and leave the edit mode. This button is labeled Modify when you are not in edit mode.

Discard — Click this button to leave the edit mode and revert the current rule to the expression it had before you started the edit mode.

Patterns — Click this button to display some help on regular expression patterns.

Test using all rules — Set this option to test all the rules at the same time. The syntax of the current rule is automatically checked. See the effect it has on the sample text. The result of the test are displayed in the bottom right result box. The parts of the text that are matches of the expressions are displayed in <> brackets. If the Test using all rules option is set, the test takes all rules of the set in account, if it is not set only the current rule is tested.

[Middle-right text box] — Optional test data to test the regular expression for the current rule or all rules depending on the Test using all rules option.

[Bottom-right text box] — Shows the result of the regular expression applied to the test data.

Limitations

Comments within a JSON string are parsed as part of the string content, not as comments. A configured subfilter will then process these as true comments (they will become part of the skeleton or whatever the filter is configured to do).

JSON Filter

Contents

Overview

Processing Details

Input Encoding

Output Encoding

Line-Breaks

Comments

Parameters

Options Tab

Stand-alone strings

Strings with keys

New Extraction Rules >= version M39

Example FPRM Settings:

Content Processing Tab

Limitations

Navigation menu

JSON Filter

Overview

Processing Details

Input Encoding

Output Encoding

Line-Breaks

Comments

Parameters

Options Tab

Stand-alone strings

Strings with keys

New Extraction Rules >= version M39

Example FPRM Settings:

Content Processing Tab

Limitations

Navigation menu

Search