Properties Filter

From Okapi Framework
Revision as of 19:12, 4 June 2016 by Ysavourel (talk | contribs) (1 revision imported)
Jump to navigation Jump to search

Overview

The Properties Filter is an Okapi component that implements the IFilter interface for properties files. The filter is implemented in the class net.sf.okapi.filters.properties.PropertiesFilter of the Okapi library.

The implementation is based on the specification found in the Java java.util.Properties class documentation. Support for a few additional features is also provided for compatibility with other type of properties files.

The following is an example of a very simple properties file. The translatable text is highlighted:

# Example of Java properties

labelOK= OK
msgBadFile: Invalid input file

Note that Java properties can also be represented in XML. To process such document, use the XML Filter or the XML Stream Filter.

Processing Details

Input Encoding

The filter decides which encoding to use for the input file using the following logic:

  • If the file has a Unicode Byte-Order-Mark:
    • Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used.
  • Otherwise, the input encoding used is the default encoding that was specified when setting the filter options.

Output Encoding

If the output encoding is UTF-8:

  • If the input encoding was also UTF-8, a Byte-Order-Mark is used for the output document only if one was detected in the input document.
  • If the input encoding was not UTF-8, no Byte-Order-Mark is used in the output document.

Line-Breaks

The type of line-breaks of the output is the same as the one of the original input.

Mapping

Each entry of the property file is mapped to a text unit resource as follow:

Properties Okapi Resources
The key of the entry The name of the text unit
The text of the entry The source content of the text unit
Comments (before the entry) The note property of the text unit (if the option is set)

Parameters

Options Tab

Localization directives

Localization directives are special comments you can use to override the default behavior of the filter regarding the parts to extract. The syntax and behavior of the directives are the same across all Okapi filters.Note that the directives override key conditions.

Use localization directives when they are present — Set this option to enable the filter to recognize localization directives. If this option is not set, any localization directive in the input file will be ignored.

Extract items outside the scope of localization directives — Set this option to extract any translatable item that is not within the scope of a localization directive. Selecting to extract or not outside localization directives allows you to mark up fewer parts of the source document. This option is enabled only when the Use localization directives when they are present option is set.

Key filtering

Use the following key condition: — Set this option to extract items based on their keys. You specify a regular expression pattern, if the key matches the pattern, the item is extracted or not depending on the action you specify. Note that directives have precedence over key condition.

Extract only the items with a key matching the given expression &mdsk; Select this option to extract only the items with keys that match the specified pattern.

Do not extract the items with a key matching the given expression — Select this option to not extract the items with keys that match the specified pattern.

Enter the pattern to test against the key. The pattern must be a valid regular expression. For example, with the following settings:

  • Use the following key condition = set
  • Extract only the items with a key matching the specified pattern = set
  • Pattern = .*text.*

The extracted text is highlighted:

key1 = Text for key1
text.err1 = Text for text.err1
menu_text_file = Text for menu_text_file

Configuration identifier of the sub-filter to use on the content — Enter the filter configuration identifier of the sub-filter to use on the extracted content. For exmple: okf_html. You should leave the entry empty if no sub-filter is to be used. For sub-filter you can use the HTML Filter, the XML Stream Filter or other filters derived from the AbstractMarkupFilter class.

Recognize additional comment markers — Set this option to take into account other comment styles in addition to the strict Java comments (single-line starting with '#' or '!'). When this option is set, the filter also recognizes comments single-lines starting with ';', as well as single-line where "//" is the first no-whitespace sequence. Note that // after a = are considered part of the value of the entry, not a comment.

Extract comments to note properties — Set this option to include the comments before each entry as a note property on the text unit of the corresponding entry. All comments lines are grouped into a single note.

Convert \n and \t to line-break and tab — Set this option to convert the escaped codes \n and \t to true line-breaks and tabs. All the other escaped characters remain escaped.

Inline Codes Tab

Has inline codes as defined below: — Set this option to use the specified regular expressions on the text of the extracted items. Any match will be converted to an inline code. By default the expression is:

((%(([-0+#]?)[-0+#]?)((\d\$)?)(([\d\*]*)(\.[\d\*]*)?)[dioxXucsfeEgGpn])
|((\\r\\n)|\\a|\\b|\\f|\\n|\\r|\\t|\\v)
|(\{\d.*?\}))

Add — Click this button to add a new rule.

Remove — Click this button to remove the current rule.

Move Up — Click this button to move the current rule upward.

Move down — Click this button to move the current rule downward.

[Top-right text box] — Enter the regular expression for the current rule. Use the Modify button to enter the edit mode. The expression must be a valid regular expression. You can check the syntax (and the effect of the rule) as it automatically tests it against the test data in the text box below and shows the result in the bottom-right text box.

Modify — Click this button to edit the expression of the current rule. This button is labeled Accept when you are in edit mode.

Accept — Click this button to save any changes you have made to the expression and leave the edit mode. This button is labeled Modify when you are not in edit mode.

Discard — Click this button to leave the edit mode and revert the current rule to the expression it had before you started the edit mode.

Patterns — Click this button to display some help on regular expression patterns.

Test using all rules — Set this option to test all the rules at the same time. The syntax of the current rule is automatically checked. See the effect it has on the sample text. The result of the test are displayed in the bottom right result box. The parts of the text that are matches of the expressions are displayed in <> brackets. If the Test using all rules option is set, the test takes all rules of the set in account, if it is not set only the current rule is tested.

[Middle-right text box] — Optional test data to test the regular expression for the current rule or all rules depending on the Test using all rules option.

[Bottom-right text box] — Shows the result of the regular expression applied to the test data.

Output Tab

Escape all extended characters — Set this option to convert all characters above U+007F into Unicode escape sequences (\uHHHH). When this option is not set, only the characters not supported by the output encoding are escaped.

Limitations

None known.