IDML Filter

From Okapi Framework
Jump to navigation Jump to search


This filter allows you to process IDML documents. IDML (InDesign Markup Language) is an XML-based format, introduced in Adobe InDesign CS4, for representing InDesign content. IDML is used in several InDesign and InCopy file types. The specification can be found on the Adobe Web site.

Processing Details

When processing an IDML filter, the filter looks at all the spreads in the document, and for each of them, gather the list of the stories used in <TextFrame> and <TextPath>. The text is extracted by spread, and for each spread by story in the order the appear in the spread.

Stories embedded inside other stories and not declared at a spread level are extracted in a special group.


Maximum attribute size — Set the size in MB for the attribute buffer. The default is 4MB (4 * 1024 * 1024)

Untag XML Structures — Set this option to skip embedded XML structural information when extracting translatable content.

Extract notes — Set this option to extract the content of notes (<Note> elements).

Extract master spreads — Set this option to extract the content of the master spreads if they exist. If this option is not set only the normal spreads are extracted.

Extract hidden layers — Set this option to extract also the hidden layers.

Extract hidden pasteboard items — (default is false)

Skip discretionary hyphens — (default is false)

Extract breaks inline — (default is false)

Extract hyperlink text sources inline — (default is false). When it is set to true, the hyperlink text sources are extracted inline, otherwise, they are represented as referencing groups of textual units.

Extract custom text variables — (default is false)

Extract index topics — (default is false)

Extract external hyperlinks — (default is false). When it is set to true, the external hyperlinks are extracted for translation.

Ignore character kerning — (default is false)

Ignore character tracking — (default is false)

Ignore character leading — (default is false)

Ignore character baseline shift — (default is false)

Special character pattern — (default is " | | | | | | | | |
|​|‌|­|‑|"). A matched content is treated as inline code.

Deprecated Parameters

Prior to release M34, the filter supported several additional parameters. The behavior of these has been subsumed by the more intelligent content processing performed by the updated version of the filter in versions M34 and later.

Simplify inline codes when possible — Set this option to reduce the number of inline codes by re-grouping adjacent codes when it is possible.

Create new text units on hard returns — Set this option to create separate text units when a hard return element (<Br/>) is found.
IMPORTANT: This option is not completed yet. Setting it may create extracted documents you will not be able to merge back. Always test merge before use this for production.

Maximum spread size — Set the maximum size for the spread files (in KBytes). Any spread file above the given value will either generate an error or will be skipped from extraction depending on the specified option. This allows you to skip over large spread files that may contain only graphics and require too much memory to be opened. Note that the skipped file are not checked for translatable text.

Generate an error when a spread is larger than the specified value — Set this option to generate an error if a spread size is above the specified Maximum spread size. If this option is not set, the spread is skipped with a warning message.