DTD Filter

From Okapi Framework
Jump to navigation Jump to search


This filter allows you to process DTD (Document Type Definition) documents.

The filter is intended to process XML-DTD that have translatable text entity declarations, as shown below. An example of such DTDs are the Mozilla DTD files that go with the XUL documents used to build the user-interface parts.

Example of DTD with text entity declarations. The translatable text is highlighted:

<!ENTITY findWindow.title "Find Files">
<!ENTITY fileMenu.label "File">
<!ENTITY editMenu.label "Edit">

Processing Details

Input Encoding

The filter decides which encoding to use for the input file using the following logic:

  • If the file has a Unicode Byte-Order-Mark:
    • Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used.
  • Otherwise, the input encoding used is the default encoding that was specified when setting the filter options.

Output Encoding

If the output encoding is UTF-8:

  • If the input encoding was also UTF-8, a Byte-Order-Mark is used for the output document only if one was detected in the input document.
  • If the input encoding was not UTF-8, no Byte-Order-Mark is used in the output document.


The type of line-breaks of the output is the same as the one of the original input.


At this time, this filter does not have an editor to create or modify its configuration file. You need to use a text editor to edit custom configurations.

You can define a set of regular expressions to capture span of extracted text that should be treated as inline codes. For example, some strings may have variables that need to be protected from modification and treated as codes. Use the useCodeFinder and codeFinderRules options for this.

useCodeFinder: true
codeFinderRules: "#v1\ncount.i=1\nrule0=\\bVAR\\d\\b"

The options above will set the text "VAR1" as in-line code in the following HTML:

<!ENTITY dialog.fileCount "Number of files = VAR1">

Note that the regular expression is "\bVAR\d\b" but you must escape the back-slash in the YAML notation as well.


None known.