Markdown Filter

From Okapi Framework
Jump to: navigation, search

Overview

The Markdown Filter is an Okapi component for extracting translatable text from Markdown files. See https://en.wikipedia.org/wiki/Markdown for more information about the format. Markdown is a family of formats, not all of them mutually compatible. This filter is designed to work with markdown based on the CommonMark specification, with additional features to support GitHub-flavored Markdown.

Processing Details

Input Encoding

The filter decides which encoding to use for the input file using the following logic:

If the file has a Unicode Byte-Order-Mark: Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used. Otherwise, the input encoding used is the default encoding that was specified when setting the filter options.

HTML Elements

The HTML Inline Elements, i.e. the tags, and the HTML Block, a chunk of text sandwiched between a block-forming start tag and its corresponding end tag, are processed by the HTML filter. The HTML filter to use can be customized separately.

Inline Codes

The Inline Code Finder is supported by this filter.

The subfilter applies to the translatable text within the proper part of Markdown document. It does not apply to the HTML inline tags or HTML blocks. For that, you would need to enable and specify the inline code pattern for the HTML filter separately, name the configuration as okf_html@arbitary-name.fprm, and specify that name for the htmlSubfilter parameter.

Note, the support of the Inline Code Finder was temporarily unavailable in some snapshot builds of version 0.36, but it has been restored.

Parameters

Translate Hyperlink URLs (translateUrls)
By default, URLs in link and image statements are not exposed for translation. If this option is enabled, they will be extracted. Note: URLs are currently extracted inline in their containing segment, rather than as a subflow. Default: false
REGEX Pattern for Translatable URLs (urlToTranslatePattern)
When translateUrls=true, only the URLs that match this REGEX will be extracted. Default: .+ (all URLs)
Translate Code Blocks (translateCodeBlocks)
This option controls whether the contents of fenced code blocks are exposed for translation. Default: true
Translate YAML Metadata Header (translateImageAltText)
Some markdown formats support a YAML Metadata Header that contains key/value data. By default, this header is not exposed for translation. When the "Translate YAML Metadata Header" option is enabled, the header will be parsed and the metadata values will be exposed for translation. Default: false
Translate Image Alt Text (translateImageAltText)
The alt text for a graphic image in the form of ![alt text](https://foo.com/images/bar.jpg) or as the alt attribute of an img tag <img src="https://foo.com/images/bar.jpg" alt="alt text"> will be extracted if this parameter is true. Default: true.
HTML Subfilter Configuration ID (htmlSubfilter)
The custom configuration ID of the HTML filter that will be called to process HTML contents within Markdown documents. The configuration file must be saved in a known location with .fprm suffix. Specify nothing to use the default HTML filter configuration tailored for the Markdown filter. Default: (empty)
Enter non translatable block quotes (nonTranslateBlocks)
This option prevents some block quotes from translation. Block quotes that start with one of comma separated strings will not be extracted. Default: (empty - contents in all block quotes will be extracted)
Use Code Finder (useCodeFinder)
Determines whether to use the Inline Code Finder or not. Default: false
Number of Code Finder Rules (codeFinderRules.count)
The number of rules, i.e. regular expression patterns. Default: 1
Code Finder Rule N (codeFinderRules.ruleN)
Nth matching pattern for codes where N=0,1,2...
Sample Text (codeFinderRules.sample)
Sample text to test the rules on UI.
Use All Rules (codeFinderRules.useAllRulesWhenTesting)
Determines whether to apply all rules when testing on UI.


Limitations

Subflows are Not Supported

When there is a subflow of text in the middle of the main text, the subflow will be inter-mixed with the main flow of text. For example, for this run of Markdown text:

Please click ![The Information desk logo](images/circled-i.jpg) for help.


The extracted text in the XLIFF file will look like this:

Please click <x id="1"/>The Information desk logo<x id="2/> for help.