Markdown Filter

From Okapi Framework
Jump to navigation Jump to search

Overview

The Markdown Filter is an Okapi component for extracting translatable text from Markdown files. See https://en.wikipedia.org/wiki/Markdown for more information about the format. Markdown is a family of formats, not all of them mutually compatible. This filter is designed to work with markdown based on the CommonMark specification, with additional features to support GitHub-flavored Markdown.

Processing Details

Input Encoding

The filter decides which encoding to use for the input file using the following logic:

If the file has a Unicode Byte-Order-Mark: Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used. Otherwise, the input encoding used is the default encoding that was specified when setting the filter options.

HTML Elements

The HTML Inline Elements, i.e. the tags, and the HTML Block, a chunk of text sandwiched between a block-forming start tag and its corresponding end tag, are processed by the HTML filter. The HTML filter to use can be customized separately.

Inline Codes

The Inline Code Finder is supported by this filter.

The subfilter applies to the translatable text within the proper part of Markdown document. It does not apply to the HTML inline tags or HTML blocks. For that, you would need to enable and specify the inline code pattern for the HTML filter separately, name the configuration as okf_html@arbitary-name.fprm, and specify that name for the htmlSubfilter parameter.

Note, the support of the Inline Code Finder was temporarily unavailable in some snapshot builds of version 0.36, but it has been restored.

Parameters

Translate Hyperlink URLs (translateUrls)
By default, URLs in link and image statements are not exposed for translation. If this option is enabled, they will be extracted as a subflow. Default: false
REGEX Pattern for Translatable URLs (urlToTranslatePattern)
When translateUrls=true, only the URLs that match this REGEX will be extracted. Default: .+ (all URLs)
Translate Fenced Code Blocks (translateCodeBlocks)
This option controls whether the contents of fenced code blocks are exposed for translation. Default: true
Translate Indented Code Blocks (translateIndentedCodeBlocks)
This option controls whether the contents of indented code blocks are exposed for translation. Default: true
Translate Inline Code Blocks (translateInlineCodeBlocks)
This option controls whether the contents of inline code blocks (ie, text delimited by single backticks) are exposed for translation. Default: true
Translate YAML Metadata Header (translateImageAltText)
Some markdown formats support a YAML Metadata Header that contains key/value data. By default, this header is not exposed for translation. When the "Translate YAML Metadata Header" option is enabled, the header will be parsed and the metadata values will be exposed for translation. Default: false
Translate Image Alt Text (translateImageAltText)
The alt text for a graphic image in the form of ![alt text](https://foo.com/images/bar.jpg) or as the alt attribute of an img tag <img src="https://foo.com/images/bar.jpg" alt="alt text"> will be extracted if this parameter is true. Default: true.
Generate anchors based on header text. (generateHeaderAnchors)
Some markdown parsers support explicit named anchors in header markup, using the syntax {#my-anchor}. When set, this option will automatically generate anchors for headings in the source document, for the purpose of providing a stable anchor for hyperlinks that reference a (translatable) header value. Default: false.
Parses out certain MDX expressions using regex. (parseMdx) [Experimental]
When set, parses out multi-line export blocks as skeleton. Default: false.
Enter a String of characters that will be escaped as HTML entities. (htmlEntitesToEscape)
When set, encodes specific characters as HTML entities on export. Default: (none)
Support backslash escaping of punctuation (unescapeBackslashCharacters)
When set, parses backslash-escaped punctuation in source documents. Default: false.
Enter a String of punctuation characters that will be escaped when the option above is enabled. (charactersToEscape)
When unescapeBackslashCharacters is enabled, characters listed in this option will be backslash-escaped on export. Default: *_`{}[]<>()#+\-.!|
HTML subfilter configuration ID (htmlSubfilter)
The custom configuration ID of the HTML filter that will be called to process HTML contents within Markdown documents. The configuration file must be saved in a known location with .fprm suffix. Specify nothing to use the default HTML filter configuration tailored for the Markdown filter. Default: (empty)
YAML subfilter configuration ID (yamlSubfilter)
The custom configuration ID of the YAML filter that will be called to process any YAML metadata header detected in the document. This allows for customization of the metadata fields extracted for translation. Default: (emptY)
Enter non translatable block quotes (nonTranslateBlocks)
This option prevents some block quotes from translation. Block quotes that start with one of comma separated strings will not be extracted. Default: (empty - contents in all block quotes will be extracted)
Use Code Finder (useCodeFinder)
Determines whether to use the Inline Code Finder or not. Default: false
Number of Code Finder Rules (codeFinderRules.count)
The number of rules, i.e. regular expression patterns. Default: 1
Code Finder Rule N (codeFinderRules.ruleN)
Nth matching pattern for codes where N=0,1,2...
Sample Text (codeFinderRules.sample)
Sample text to test the rules on UI.
Use All Rules (codeFinderRules.useAllRulesWhenTesting)
Determines whether to apply all rules when testing on UI.

Notes

Translation of URLs as Subflows

When there is a subflow of text in the middle of the main text, the subflow will be extracted before the segment that contains it. For example, for this run of Markdown text:

Please click ![The Information desk logo](images/circled-i.jpg) for help.


The extracted text in the XLIFF file will look like this:

<trans-unit id="tu2" restype="x-img-link" xml:space="preserve">
<source xml:lang="en">images/circled-i.jpg</source>
</trans-unit>
<trans-unit id="tu1" xml:space="preserve">
<source xml:lang="en">Please click <bpt id="1">![</bpt>The Information desk logo<ept id="1">]([#$tu2])</ept> for help.</source>
</trans-unit>