Okapi Framework Changes Log - Jan-30-2021
Note this document is common to both the okapi-lib distribution
and the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from 1.40.0 to 1.41.0
- Rainbow:
- Update merger to use
TextUnitMerger
- Core:
- Issue #996:
Reference identifier values escaped.
- Filters:
- IDML Filter:
- Issue #1016:
Hidden pasteboard items extraction capability provided.
- MIF Filter:
- Issue #965:
Font tag names escaped to allow reserved regular expression characters.
- Issue #987:
Tab, hard return, no-hyphen, soft hyphen hexadecimal representations
handling improved. Hard return symbols correctly transformed from new lines and written
back to strings in paragraph lines.
- Issue #990:
Hard returns can demarcate trans-units in Marker, PgfNumFormat,
XRefFormat > XRefDef and TextLine statements.
- Issue #991:
Consequential Strings in XRefs merged with hard returns between them.
- OpenXML Filter:
- Issue #958:
Font mapping for PPTX documents introduced.
- Issue #977:
The segmentation quality restored for some PPTX documents.
- Issue #981:
PPTX style definitions minified as much as possible.
- Issue #974:
Max attribute size conditional parameter provided.
- Issue #848:
The inserted and deleted table row revisions accepted.
- Issue #999:
Initial support for the presentation styles hierarchy provided.
- Issue #1009:
Support for styles hierarchy in presentation table cells provided.
- Markdown Filter:
- Issue #890:
codes representing inline markup for emphasis, strong emphasis, and links are now
assigned the ctypes
italic
, bold
, and link
respectively. Addiitonally, link references are now exposed as paired, rather
than singleton, tags.
- YAML Filter:
- Issue #967:
Comments after plain scalars preserved.
- XLIFF Filter
- Issue #966:
New option
subAsTu
allows to extract sub elements as separate text-units.
- Issue #688:
CDATA content can be subfiltered.
- Issue #998:
Subfiltered sources can be used for overriding empty targets.
- Issue #1002:
Subfiltered sources and targets merged correctly.
- XLIFF2 Filter
- PO Filter:
- Increased readaheadlimit buffer to prevent crash with some files. Remove deprecated code for notes
- Libraries:
- lib-core
- Issue #1003:
XLIFFContextGroup now adds "x-" prefix automatically for non-standard xliff 1.2 context types.
- lib-merge
- Complete Refactor of
TextUnitMerger
to better support xliff2 filter and resolve several outstanding bugs
TextUnitMerger
can now be accessed from OriginalDocumentXliffMergerStep
and Parameters set directly
- Steps:
- Format Conversion Step
- Updated the step to also output XLIFF.
- Simple Word Count Step
- Fixed the count of the ending part of the segments.
- Rainbow Translation Kit Creation Step
- Merge now uses
TextUnitMerger
so that merge results are consistent across Okapi
Changes from 1.39.0 to 1.40.0
- Connectors:
- Google MT:
- Updated language code support: now Serbian latin si mapped to latin MT, and serbian cyrillic is mapped to cyrillic MT.
zh-Hant is mapped to zh-TZ and zh-Hans to zh-TW, and 3-letter codes are handled properly.
- Filters:
- HTML Filter:
- Issue #946:
Base element recognition fixed.
- IDML Filter:
- Issue #923:
Translator’s experience with paired tags handling improved.
- Issue #926:
Font mapping support added.
- Issue #935:
Empty path geometry properties backed by default values.
- Issue #941:
The encoding fixated on UTF-8.
- MIF Filter:
- Issue #938:
The extraction support for cross-reference formats added.
- Issue #940:
The extraction of sequentially referenced paragraph number formats in the paragraph
styles catalog fixed.
- Issue #942:
The extraction support for text line statements added.
- Issue #943:
The extraction of nested text frames fixed.
- Issue #945:
The merge of numbered paragraph formats in empty paragraphs fixed.
- OpenXML Filter:
- Issue #930:
A crash on DOCX document merge with an RTL target language fixed.
- Issue #937:
Font mapping support for DOCX documents added.
- Issue #948:
The direct formatting additionally minified in DOCX and PPTX documents.
- Issue #952:
The styles optimisation is performed for hyperlinks, smartTags and sdts in a sequence
of runs of one paragraph.
- Issue #955:
Paragraph properties are considered when the styles optimisation is taken place.
- Issue #956:
Failing extraction and merging with accepting revisions fixed.
- Properties Filter:
- Issue 961:
Add a new option, "Use Java Properties escaping conventions". When set, the filter
will escape the characters :, =, !, #, and \
with a leading backslash (\) on output, and interpret leading backslashes
as escapes for these characters on input. This option is disabled (legacy behavior)
by default.
- Libraries:
- lib-merge
-
Add option in merger (preserveWhiteSpaceByDefault) to turn off white space normalization (force preserve whitespace globally).
Default is false, but most users should consider true as the default to preserve all whitespace
- Verification
- Fixed issue #920: Codes with type set to empty string are now stripped out as noise codes and are detected when missing.
- GTT
- Removed. As of December 4, 2019, Google Translator Toolkit (GTT) was shut down.
- Steps:
- Leveraging Step
- Fixed QueryManager used by Leveraging Step: Now MT Connector does not overwrite translations if they are CM or 100% matches
- Batch TM Leveraging Step
- Fixed Qthe step so MT Connector does not overwrite translations if they are CM or 100% matches
- GTT Batch Translation Step
- Removed. As of December 4, 2019, Google Translator Toolkit (GTT) was shut down.
- Rainbow Translation Kit Creation Step
- Fixed issue #932: Hook for OmegaT translation kit is now working again.
- Fixed issue #944: Create Rainbow TransKit for XLIFF2 results in tab-delimited files
Changes from M38 to 1.39.0
- Filters:
- HTML Filter:
- Fixed a bug that could cause some conditional rule matches to fail to apply.
- IDML Filter:
- Issue #897: Add an option to expose line breaks as inline codes rather than separate text units.
- Markdown Filter:
- Issue #886, #888: Expose formatting tags as paired codes, rather than standalone
- MIF Filter:
- Issue #901: Remove support for FrameMaker 7 and older
- Issue #893: Fix a bug that caused some hidden pages to be exposed for translation even when the filter was configured not to extract hidden pages.
- Issue #895: Support multiple text frames per page
- Issue #902, #909: Correctly extract content from anchored frames, including nested frames
- Issue #896, #904: Improve extraction of paragraph number formatting declarations. These will no be extracted as separate text units that correspond to the formatting template.
- Issue #896: Streamline inline codes in some cases.
- OpenXML Filter:
- Issue #847: Fix a bug when removing revision markers related to paragraph deletion
- Issue #878: Fix a bug where the filter would incorrectly strip the first defined style in documents that did not have default styles defined.
- Issue #882: Fix a crash
- Issue #851, #888: Improve handling of fonts
- Issue #884, #887: Streamline tag extraction, particularly in text units containing only a single formatting run.
- Issue #899: Improve handling of bidirectional text
- Regex Filter:
- Add new regular expression based rules
metaRules
that create named metadata entries from notes
- Versified Text Filter:
- This filter has been removed.
- XLIFF (1.2) Filter:
- Issue #855: Support the
context-group
element on import and export. When writing XLIFF, encode non-standard metadata as context-group/context
, rather than using the proprietary okp:meta
scheme.
- Issue #875: Improve handling of XLIFF documents containing WorldServer-specific (
iws:
) metadata.
- XMLStream Filter:
- Fixed a bug that could cause some conditional rule matches to fail to apply.
- Connectors:
- Google MT:
- Added a new option
failuresBeforeAbort
which sets the number of failed queries (after retries if any is allowed) after which the connector throws an exception to fail.
- Google AutoML Translation:
- The connector now uses the production endpoints (
https://automl.googleapis.com/v1
instead of https://automl.googleapis.com/v1beta1
).
- Libraries:
- Verification:
- Added a new option
useGenericCodes
to the Checker parameters to use the generic inline codes representation in the display text provided with the issues. This also affect the start/end positions of the highlighted text (if there is any).
- Fixed issue #910: Pointing to the wrong code when one of two codes with the same original data is missing.
- Steps:
- XLIFF Splitter:
- Add an option to restore file names based on the value of
file/@original
Changes from M37 to M38
- Global:
- Fixed issue #829: Added the ability to store the filterId in
StartDocument
, StartSubDocument
and StartSubfilter
and updated all filters to do that.
- JDK-11 support: Okapi now builds and all unit tests and integrations tests pass with JDK 11. This is checked durring the CI run.
If you want to try it, add the -Djdk.source.version=11 -Djdk.target.version=11
switches to all maven invocations, or set MAVEN_OPTS
.
- Updated third party dependencies to the latest versions (except for Lucene)
- Core:
- Fixed issue #813: Deprecated
LocaleId.equals(String)
and LocaleId.compareTo(String)
, added LocaleId.equalToString(String)
for convenience.
- Fixed issue #660: Some valid UTF-8 sequences where detected as EBCDIC (in
BOMNewlineEncodingDetector
).
- Deprecated global singletons declared by
Event
(END_BATCH_EVENT
, NOOP_EVENT
, START_BATCH_EVENT
, START_BATCH_ITEM_EVENT
)
- Connectors:
- Microsoft Translator:
- The Microsoft Translator Connector has been upgraded to support the v3 API. Support for v2 has been discontinued.
- Steps:
- Scoping Report:
- Added setter for a
IReportGenerator
so that one can use a custom report generator.
- Filters:
- JSON Filter:
- Rename
noteProductionKeys
to noteRules
. Note rules are now a single regular expression that defines all notes to be extracted
- Add new regular expression based rules
extractionRules
, idRules
and genericMetaRules
. See JSON Filter Documentation
- XLIFF Filter:
- Fixed issue #807: There is now an option (
useSegsForSdlProps
) to use the segment-level SDL-specific properties to re-write the SDL attributes. If this option is not set, the behavior is the same as before, using the SDL-specific properties and the the STATE property set in the target container.
- The SDL property
text-match
is now extracted as a read-only property for the segments.
- Fixed issue #844: Attributes with the same name from different namespaces get mixed.
- Updated XLIFF filter to leave segments that have the
multiple_exact="has_multiple_exact"
attribute unblocked when blocking segments with a tm_score="100%"
(see issue #844)
- XLIFF 2 Filter:
- Add support for inline codes, notes, groups, metadata and extended attributes and namespaces
- Remove "Experimental" status
- ICML Filter:
- Fixed issue #846: Forced line-breaks are now extracted as placeholders.
- Markdown Filter:
- Fixed issue #854: the "Translate Code Blocks" option has been split in two to allow separate control over extraction of fenced and inline code blocks. The existing option (
translateCodeBlocks
) has been renamed to "Translated Fenced Code Blocks". Translation of inline code blocks is now controlled by the new "Translate Inline Code Blocks" option (translateInlineCodeBlocks
).
- OpenXML Filter:
- Fixed issue #858: crash opening documents that were saved using strict mode.
- Fixed issue #850: parse XLSX documents with no shared strings table as valid documents with no translatable strings, rather than rejecting them.
- Fixed issue #843: improved handling of revision metadata related to content that was moved during editing.
- Fixed issue #835: reorder PPTX notes and comments during extraction so they appear near the slide content to which they refer.
- Fixed issue #834: improved handling of structural document tags.
- Fixed issue #830: improved handling of complex field codes spread across multiple paragraphs.
- Fixed issue #825: include the sheet and cell name in the trans-unit metadata we extract. (This data is exposed as
resname
in resulting XLIFFs.)
- Fixed issue #823: strip the
a:spc
attribute when aggressively cleaning tags.
- Fixed issue #803: streamline the way PPTX stlyes are exposed as tags.
- Fixed issue #864: XLSX merge fails when text followed by empty run with run properties.
- Libraries:
- Verification:
- Inline-Code Checker: Added support for the deleteable inline codes.
- Inline-Code Checker: Added an parameters option (
strictCodeOrder
) to verify the order of the inline codes. If this option is set the order of the codes between source and target is verified using code-types, IDs and code data. Deleteable codes are ignored. If this option is not set, the checker behaves like before, verifying open/close tags. Note that both verifications are done only if no missing or extra tags have been detected.
- Fixed issue #793: Patterns Checker: You can now use
$N
in the regular expression in addition to the <same>
special symbol. Each $N
occurrence is replaced at run-time by the text matching the given group N
. (You can still use $
for the end-of-string match).
- Fixed issue #799: Patterns Checker: There is a new option (Single Pattern) for the pattern rule. If set, we search for that pattern (in the source or target depending on the "search source" option) and trigger a warning if the pattern is found. If not set, we search for the pattern (in the source or target depending on the "search source" option) and trigger an issue if it is not found in the opposite text.
- Patterns Checker: There is a new parameters option
showOnlyPatternDescription
. If set, we show only the description defined in the pattern rule in the issue message, and you can now use the special code @@
in the description, it will be replaced by the pattern match found. If not set, you have the old behavior: the message is "The source/target part ‘<match>’ has no correspondence in the target/source" (where <match> is the text matching the source pattern).
- General Checker: The issues for white-space now have a segment ID, it is set to the first segment or the last one depending on whether the error is found at the front or at the end of the whole text container.
- General Checker: Now the white-space verification is not done if the target is empty.
- General Checker: There is now a parameters option
targetSameAsSourceWithNumbers
to include or not numbers in the segments that are checked. By default they are included (backward compatible)
- Added a new
SkipCheckAnnotation
to use on the source segments when one want to skip checks on that segments.
The steps with segment-level checks have also been updated to take the annotation into account.
- Rainbow:
- Can now run from command-line without a display
Changes from M36 to M37
- Core:
- Deprecated
net.sf.okapi.common.Base64
, you should use java.util.Base64
.
- Fixed issue #739: Code constructors are inconsistent
- Added
IFilter.stream()
, for convenience. This is mostly “syntactic sugar”, but allows one to write more “modern” code.
- Steps:
- Microsoft Batch Translations:
- Fixed the case when a batch of events ended with skipped segments.
- Implemented an option to add a prefix to the translation candidate text when copying it in the target.
The option is not set by default. The text of the prefix can be specified.
- External Command:
- Added support for the variables
${srcBCP47}
and ${trgBCP47}
.
- Filters:
- HTML Filter (
okf_html
):
- Updated the configuration files to make the filter aware of HTML 5 tags; also reviewed the existing tags and attributes.
Potentially disruptive changes:
- The
value
attribute of the option
tag is not extracted anymore.
This was a bug. The value
is not localizable, as it is supposed to be used to programatically
determine what the selection was (server or client side). Translating it can break functionality.
See the HTML spec.
- The
dd
tags are now handled the same way as the li
tags, meaning that empty dd
tags get extracted
- IDML Filter:
- Added a set of filter options to allow the filter to ignore kerning, tracking, and baseline shift properties within a configurable threshold, in order to improve segment quality by reducing tag noise. Issue 785.
- JSON Filter:
- Addresses the enhancement request in issue #751. JSON Filter now produces the <note> elements (and the enclosing <notes> elements if XLIFF2 writer is used) in XLIFF from the key-value pairs where the key is listed in the new configuration item
noteProductionKeys
, which is a comman separated list of keys.
Also added a new configuration item includeIts
to XLIFF2 Writer. (The XLIFF 1.2 Writer has had the same option.) This appears as "Includes ITS markup when available." in Options for XLIFF2 Writer on Rainbow.
- Markdown Filter:
- Addresses issue issue #741: Replaced use of the FlexMark Front Matter parser for translating Metadata Headers
with a configurable YAML subfilter.
This improves the translatability of embedded YAML
(such as, nested keys, keys with spaces, embedded Markdown or HTML inside key value pairs), as well as
allowing for exclusion specific keys.
- Addresses issue #737: Blank lines in the front YAML block are removed when merged
- OpenXML Filter:
- Fixed issue #795, a crash that could occur when extracting external hyperlinks.
- Fixed issue #794, a crash that could occur when extracting PPTX documents produced by LibreOffice.
- Fixed issue #790, an improvement to the way shading style properties are exposed in code data.
- Implemented Issue #780: a subfilter option that is applied to the contents of unstyled XLSX cell text.
- Fixed issue #743, a file descriptor leak when checking for encrypted DOCX files.
- Fixed issue #736, excluding hidden Powerpoint slides from translation by default, to be consistent with handling of other types of hidden text. Note: this change may cause problems merging kits produced with earlier versions of Okapi.
- Abstract Markup Filter:
- Partially addresses issue issue #749: AbstractMarkupFilter no longer populates
Code.outerData for runs of text that are both EXCLUDED and INLINE. Impacts all filters that use AbstractMarkupFilter (HTML, XmlStream etc.). This change manifests in the XLIFF output, for example:
Original XML Format:
<ph conref="2" translate="no"><?xm-replace_text Phrase?></ph>
XLIFF Content Before:
<?xm-replace_text Phrase?>
XLIFF Content After Change:
<ph conref="2" translate="no"><?xm-replace_text Phrase?></ph>
- Libraries:
- Translation:
- Changed the base implementation of
batchQueryText()
to use batchQuery()
instead of query()
.
Changes from M35 to M36
- General:
- Okapi targets Java 8, Java 7 is not supported anymore.
We build using Java 8, we do not test on Java 7. We started using Java 8 APIs and there is no intention to backport anything to Java 7.
Publicly availabilable security fixes and upgrades for Java 7 ceased as of April 2015.
- Starting with this version (M36) we will publish the release version of Okapi to Maven Central.
- Changed SWT library to the official in Maven central (3.106.3).
- Core
- All classes that implemented
hasNext()
and next()
are now declared to implement Iterator<Event>
.
- Deprecating
FilterIterable
. Now, that we are on JDK 8, we intend to add real stream support, and this hack will be removed.
- Added a
stream
method to IFilter
. See okapi/examples/java/example07
for usage.
- Filters
- IDML Filter:
- Fixed issue #627, which prevented some "track changes" additions
from being extracted for translation.
- Markdown Filter:
- Replaced use of the inline code finder to handle embedded HTML with an
HTML subfilter. This improves the translatability of embedded HTML
(for example, translatable attributes, as in issue #651), as well as
allowing for exclusion of MathML content (issue #645).
- Issue #684: correctly handle nested markup.
- Fixed issue #685 and #694, which caused duplicate HTML tags within the table element in merged documents.
- Fixed issue #686 (partially) and issue #728. Quoted paragraphs without HTML tags, quoted lists, and quoted tables are handled properly. Quoted paragraphs with HTML elements are still not handled properly.
- Partially fixed issue #687. Blank (empty) lines are retained in most cases. The filter now uses version 0.32.20 of flexmark-java.
- Implemented issue #692. The user can specify a custom HTML configuration id to be used by the HTML subfilter to process HTML sections within Markdown documents.
- Fixed issue #701, which caused the inline markup character such as "*" of "*emphasized part of text*" at the beginning of the line to be separated from the translation unit.
- Fixed issue #708, which caused the ATX heading that immediately follows a list item gets prepended with extra spaces.
- Fixed issue #711, where the link references in absence of an anchor text (which works as anchor text), the anchor text, the image reference's alt text, or the title text in the reference definition was not extracted.
- Issue #713: The inline code finder was disabled in SNAPSHOT versions made after January 21, 2018, with an assmption that it would conflict with the HTML subfilter. After a careful analysis and experiment, it was determined that the assumption was not right, and the inline code finder has been reinstated.
- Fixed issue #714, where the extracted text from the anchor text, which can have inline markups, had the markups literaly (e.g. 'the *important* page') instead of being replaced by place holders ('the <g id="1"/>important<g id="2"/> page').
- Fixed issue #715, where neighboring markups were breaking up a run of text into two trans-units. Example:
Here is **strongly** *emphasized* text.
- Fixed issue #716, where a run of text that includes HTML inline tags such as <b> was broken up to small trans-units at each tag.
- Implemented a new feature mentioned in issue #720. By specifying the new configuration parameter
urlToTranslatePattern
with a regular expression, only URLs matching the pattern will be extracted.
- Fixed issue #725 where newline characters were lost in the YAML metadata (front matter).
- A new feature to prevent blocks of text matching a specified pattern from extraction (thus translation) has been added. See issue #726.
- Fixed issue #727 where a task list item of the form
- [ ] Task to be completed
was losing the space between the angular brackets when merged. (Note: the Markdown filter does not formerly support task lists. The code that handles link reference nodes is handling the task lists by coincident.)
- OpenXML Filter:
- Fixed issue #679: Fixed a case where the filter didn't didn't properly
escape the value of certain types of content (eg, watermarks),
leading to corrupt target documents.
- Fixed issue #703: when using extended code attributes, the filter would sometimes
incorrectly indicate that italic or bold formatting was present.
- Fixed issue #734: Multi-line formulas could be truncated when processing XLSX files.
- Table Filter:
- Allowed the FilterConfigurationMapper to be used for the sub-filter mapping.
- XLIFF 1.2 and 2.0 Filter:
- Added a new filter, "XLIFF 1.2 and 2.0 Filter" (okf_autoxliff), which
will automatically detect XLIFF version and then delegate operations to the
XLIFF or XLIFF-2 filter as appropriate.
- XLIFF Filter:
- Issue #662: Added support for the inline code finder when extracting
XLIFF content.
- Added the okf_xliff-iws configuration with enhanced support
for the "IWSXLIFF" produced by WorldServer in some cases. The filter
reads and writes translation status values and exposes IWS-specific
segment metadata as Property objects on the translation unit.
- Now if the filter finds a note referencing the target and there is no target element, one is created to preserve the note.
- XLIFF-2 Filter:
- Fixed issue #697: Fixed crash when parsing XLIFF-2 files with
<group>
elements.
- XML Filter:
- Added support for comment nodes of pointers (e.g.
locNotePointer
)
- TEX Filter
- Added the initial Beta version of a filter for TEX files.
- Multi-Parsers Filter
- Added the initial Beta version of a filter for two-levels complex formats
(e.g. CSV with some columns in Markdown, some in HTML, some in plain text).
- Steps:
- Rainbow Translation Kit Creation Step
- Fixed issue #732 where the input file could not be the output of a previous XSLT Transform step.
- Fixed issue #733 where the SendOutput option of Rainbow Kit Extraction Step did not work for XLIFF Packages.
- Connectors:
- DeepL:
- The connector
DeepLv1Connector
for the production API has been implemented.
- The connector
DeepLConnector
has also been implemented, but it is for a deprecated API (that still works at this time, but may be discontinued at any time).
- Microsoft Translator:
- Fixed the issue where segment with only whitespace were causing an error when passed to the connector.
- KantanMT:
- Deprecated the old connector for the v1 of the API.
- Implemented a new connector for the v2.1 of the API.
The connector includes extra methods to list, query, start and stop engines.
- Libraries:
- Segmentation:
- The Okapi recommended segmentation rules file (
okapi_default_icu4j.srx
) is now embedded int the release .jar.
This means it can be accessed as a resource stream (SRXDocument.class.getResourceAsStream("okapi_default_icu4j.srx")
).
That can be used either directly (srxDoc.loadRules(...the stream...)
) or for setSourceSrxStream(...)
/ setTargetSrxStream(...)
in the SegmentationStep
Parameters
.
It means the applications using Okapi from Maven don't have to somehow download and provide their own copy of the recommended .srx
- Rainbow:
- Added file extension mapping for
.tsv
to okf_table_tsv
. This resolves issue #683.
- Added file extension mappings for
.csv
to okf_table_csv
, and .markdown
to okf_markdown
.
- Tikal:
- Removed the -x2 and the -m2 options. Extraction using the JSON skeleton is no-longer supported.
Changes from M34 to M35
- Filters:
- Markdown Filter:
- Fixed issue #610, adding two new options for processing. The
"Translate Code Blocks" option (translateCodeBlocks),
enabled by default, controls whether or not fenced code blocks
should be extracted. The "Translate YAML Metadata Header" (translateHeaderMetadata),
disabled by default, controls whether values from YAML-style "front
matter" headers should be extracted.
- Fixed issue #651, adding a new option for processing. The
"Translate Image Alt Text" option (translateImageAltText),
enabled by default, controls whether or not alt text from image
references is extracted by default.
- Improved default inline code finder regex used to process embedded
HTML and XML tags.
- XLIFF Filter:
- Fixed the issue of the priority value being output in the
annotates attribute for the
<note>
element.
- Now, in SDLXLIFF files, the SDL-specific properties are attached to their respective segment (in read-only mode). The overall
target TextContainer still holds the properties for the last segment, and if they are updated, the value in each segment is updated.
This behavior is still not satisfactory, but at least allows readers to get the correct properties for each segment.
- OpenXML Filter:
- Added two new Excel options, "Translate Diagram Data" and
"Translate Drawings". Both are disabled by default. "Translate
Diagram Data" will extract text from SmartArt and other embedded
diagrams, while "Translate Drawings" will extract text from embedded
drawings such as text fields.
- Added the "Use Included Slide Numbers Only" PowerPoint option.
When checked, the configuration can specify specific slide numbers
for extraction. Other slides will be ignored by the filter.
- Fixed a bug in which ctype values for formatting codes
extracted from PPTX documents showed up as x-empty rather
than as useful style information.
- Fixed a bug in which ctype values for formatting codes
extracted from XLSX documents showed up as x-empty rather
than as useful style information.
- Fixed a bug when PPTX documents relate to external resources, e.g.
hyperlinks that refer to videos.
- YAML Filter:
- Fix issue #643: Multi-line scalar string values will now be extracted
as a single text unit, rather than one text unit per line.
Warning: YAML files extracted with previous versions of Okapi may
no longer merge successfully due to this change.
- Synchronize access to snake yaml parser to ensure thread-safety. This resolves issue #658.
- IDML Filter:
- Fixed issue #636, adding a new option "Skip discretionary hyphens" which is disabled
by default.
- Archive Filter:
- Fixed issue #641, preventing some errors to be reported incorrectly.
- SDL Trados Package Filter:
- There is a new filter for SDL packages: .sdlppx and .sdlrpx files.
- Steps:
- Translation Comparison:
- Improved the HTML report output (added link to go directly to the
summary, right-aligned the values in the summary table).
- Quality Check:
- The black list file can now take an optional third column with a comment.
- The black list file can now take an optional fourth column indicating the severity (0, 1 or 2).
- Microsoft Batch Submission Step:
- This step has been removed due to Microsoft disabling the
underlying API. See additional notes under "Microsoft
Translation Hub" in the Connectors section, below.
- Connectors:
- Google MT:
- Issue #635: Added the "Use Phrase-Based MT" option. This option
will force Google to use its phrase-based MT system, rather than
neural MT, which it normally uses preferentially.
- MyMemory:
- Updated the API URL to use https instead of http.
- ModernMT:
- Fixed issue #650 where the connector was cast incorrectly to ITMQuery.
- Fixed the issue of the optional context parameter not getting set.
- Fixed issue #655 where the language parameters were not set.
- Microsoft Translation Hub:
- Removed AddTranslation and AddTranslationList
methods. These methods were implemented using the
AddTranslation and AddTranslationArray calls to
the Microsoft Translator API; those API calls have been deprecated
and will cease to function on January 31, 2018. Please see the announcement from
Microsoft for more detail.
- Rainbow:
- Fixed issue #7 in for Longhorn: BCONF files can support pipeline files with a lot more multi-bytes characters (e.g. Chinese).
- Pensieve TM:
- Fixed the filter writer so non-segmented entries with an empty target does not get imported.
(It causes null pointer later if the source is retrieved).
- Tikal:
- Removed the -a option to add translations to a
connector, as this was only supported for Microsoft.
- General:
- Fixed a bug that caused the JSONEncoder to not correctly
JSON-escape nested content processed by a subfilter.
- Updated the XLIFF-2 library to 1.1.7.
Changes from M33 to M34
- Filters:
- IDML Filter:
- Complete rewrite of the filter, fixing many problems and changing
several behaviors. Warning: files extracted with previous
versions of Okapi will no longer merge successfully due to changes
in text units and skeleton processing. Resolves issues #544,
#497, #302, #378, #293, #259, #245, #471, and #317. Additionally
deprecates the filter configuration options "Maximum Spread Size",
"Generate an error when a spread is larger than the specified
value", and "Create new text units on hard returns".
- ITS Filter:
- The separator for the
annotatorsRef
values is now '\s
'
(white-space) rather than '
' (ASCII space). Resolves
issue #612.
- JSON Filter:
- Fixed issue #410: add an option to toggle whether or not '/' is
escaped in output.
- OpenXML Filter:
- Fixed issue #503: chart content will now be exposed for
translation in Excel and Powerpoint files.
- Fixed issue #614: fixed a case where text with different styles
could be incorrectly merged into a single run.
- XLIFF Filter:
- Fixed issue #110: CDATA sections that appear in markup will be
preserved.
- Steps:
- Terminology Leveraging:
- Added a step to annotate the source and target content with
terminology information leveraged from a term base.
- XML Validation:
- Made using the DTDs found in the input documents an option (set by
default).
- Connectors:
- ModernMT API Connector:
- Google MT:
- Improve retry behavior, particularly when exceeding the API rate
limits.
- Tikal:
- You can now use the
-mmt
option to use the ModernMT
API Connector.
- Bugs in the usage of the
-lingo24
options have been
fixed.
- Libraries:
- Core:
- Fixed how CDATA is preserved when in the skeleton. Resolves issue
#624.
- Quality Check:
- Added an option to show only relative paths in Quality Check
report. Resolves issue #616.
- Improved how some issues with localizable items were reported.
- General:
- The XLIFF 2 library has been upgraded to the version 1.1.6.
Changes from M32 to M33
- CheckMate:
- Updated LanguageTool integration to support the new JSON API (Issue
#582).
Note: As a result of this change, CheckMate will no longer work
with LanguageTool version 3.3 or earlier. For more information, see
the LanguageTool
API Migration page.
- Tikal:
- You can now process several files at once with the
-xm
and -lm
commands. Note that in that case you cannot use
the -to
and -from
options. This resolves
issue #598.
- Filters:
- Markdown Filter:
- There is a new filter for Markdown (
.md
) files.
- XLIFF Filter:
- Fixed missing prefix on output for some prefixed XLIFF inline
elements.
- Improved support for
equiv-text
attribute.
- Issue #466: Add a new
skipNoMrkSegSource
option that
will cause trans-units
containing seg-source
but no internal mrk
data to be treated as skeleton.
This behavior (which is consistent with SDL Studio) is enabled in
the okf_xliff-sdl
configuration, but disabled by
default in okf_xliff
.
- Issue #551: XLIFF files containing entities that are invalid in
XML 1.0 (for example, ) will no longer break the filter.
(These entities, and their corresponding characters, will be
stripped before parsing.)
- Issue #602: the
maxwidth
, maxheight
,
and size-unit
attributes are now parsed and exposed as
resource properties. Updates to these properties will be reflected
in the merged XLIFF file.
- XML Filter:
- The handling of files with UTF-16LE and UTF-16BE declaration has
been improved. Output to these encodings is treated as an output to
UTF-16 with a BOM.
- Add the
inlineCdata
option, which will cause CDATA
markup to appear as inline codes rather than being stripped.
- OpenXML Filter:
- Support for
.dotx
, .dotm
, .ppsx
,
.ppsm
, .potx
, .potm
, .xltx
,
.xltm
files has been added.
- Fix a bug where certain XLSX files would cause an infinite loop in
parsing.
- Fix a bug where certain XLSX files would fail to extract.
- Fix a bug where paragraph spacing properties were incorrectly
stripped when using "aggressive" cleanup mode.
- PO Filter:
- Issue #584: Add "Include msgctxt in note" option to
include context data in trans-unit notes.
- JSON Filter:
- A new option to not have the leading slash in the full key path
has been added. This resolves issue #603.
- lib-verification:
- Updated the LanguageTool integration to support the new JSON API, as
described above.
- Updated the BlacklistChecker to do case sensitive validation if the
blaclist terms are identical except for the case.
- Connectors:
- SimpleTM:
- Fixed the
setPenalizeSourceWithDifferentCodes()
method so it sets the value correctly.
- Google MT:
- Update the connector to support NMT models, when available.
- General:
- The code is now under Apache License version 2.0.
- The
XLIFFWriter
class now supports serializing the maxwidth
,
maxheight
, and size-unit
attributes on <group>
and <trans-unit>
elements, by attaching the
corresponding net.sf.okapi.common.resource.Property
key
to the appropriate resource.
- The displayText field of
Code
objects is now stored in
a field on the object, rather than in an annotation.
- The
equals
and compareTo
methods on the
TextFragment
class will no longer indicate equality when
compared to non-TextFragment
instances.
- Improved error message in
GenericSkeletonWriter
. This
resolves issue #593.
Changes from M31 to M32
- Tikal:
- The
-nocopy
option is now respected when extracting
using the -seg
option. This resolves issue #571.
- Filters:
- OpenXML Filter:
- Support for Microsoft Visio 2013+ (
.vsdx
/.vsdm
)
files has been added.
- Added an option to ignore placeholder text in PowerPoint master
and layout slides.
- Fixed Issue #576: XLSX files missing certain optional style data
could crash the filter.
- ITS/XML Filter:
- Improved support for the ITS Target Pointer data category.
Caveats: target element/attribute must exist (empty), and must be
after the source; inline codes are not supported yet. Addresses
issue #574.
- HTML Filter:
- Updated the pre-defined filter configuration to allow
title
and dir
on all elements as per latest HTML
specification.
- XLIFF Filter:
- Existing
ctype
values for most inline codes are now
parsed and stored as the type field on Code objects.
- Connectors:
- Microsoft Translation Hub:
- IMPORTANT: authentication changes. The connector has been
updated to support Azure-based Microsoft Translator subscriptions,
and the old (DataMarket) method of authentication has been removed.
The clientId and clientSecret parameters have
been removed and replaced with a single parameter called azureKey.
If you have an existing Microsoft Translator subscription, you must
migrate it to Azure by April 30, 2017. For information on how to do
this, see this
Microsoft support article.
- General:
- IMPORTANT: The
LocaleId
class now uses ICU's ULocale
.
A few methods have been deprecated and others may have a slight
different behavior. But this gives us good BCP-47 support going
forward.
- Updated Windows EXEs with version 3.9 of Launch4j to solve the issue
of not being able to start with Java 8 with updates >= u100. This
resolves issue #575.
- Added a maven profile named
reports
. It produces
various reports that might come in handy (clirr
, findbugs
,
jdepend-maven-plugin
, versions
). To run it:
mvn install -P reports
- Improved the build to include the creation of a DMG file and for
signature for Mac distribution.
- Added a new class,
net.sf.okapi.common.filters.FilterIterable
,
implementing Iterable<Event>
.
You can now do:
for (Event event : new FilterIterable(filter)) {
... process event ...
}
On Java 8 you can also do:
new FilterIterable(filter).forEach( ... lambda or method
reference processing event ... );
Changes from M30 to M31
- Filters:
- General:
- Fixed issue #519: When using the subfilter feature in several
filters (including JSON, YAML, ITS, and XML Stream), it is no longer
possible to produce multiple text units with the same ID.
- XLIFF Filter:
- Fixed case of memoQ XLIFF files using
<source>
and <target>
elements in its own extensions.
This resolves issue #547.
- JSON Filter:
- The "Use Full Key Path" option will no longer include the prefix
"null/" in all generated resource names. Instead, key paths will
begin with "/", as in "/foo/bar".
- Properties Filter:
- Added an option to use the key value as the text-unit ID value.
This addresses issue #520.
- OpenXML Filter:
- Fixed a case in which some hidden spreadsheet cells could still be
exposed for translation.
- Fixed cases where revision metadata unrelated to translatable text
could prevent filtering if the "Automatically Accept Revisions"
option was not checked.
- If the "Treat Tab as Character" option is enabled, the '\t'
character will now be replaced with a <tab> element on output.
- Added the tsComplexFieldDefinitionsToExtract parameter,
which allows users to specify which field codes should be
translated. By default, only HYPERLINK codes are translated,
matching the existing behavior.
- XML Filter:
- In the RESX pre-defined configuration: fixed the code-finder
expression for mustache codes, and added support for basic HTML.
This resolves issue #559.
- PdfFilter:
- New PDF filter. Extraction only.
- YAML Filter:
- Fix Issue: #555: YAML filter skips single quote at beginning of
string
- Fix Issue: #556: YAML filter should escape sequences before
passing to subfilters.
- CheckMate:
- Now sessions with very large configurations can be saved. This fix
will produces session files (
.qcs
) that are not backward
compatible with previous version of CheckMate if the size of the
configuration is larger than about 21K. The fix produces backward
compatible files under that size. This resolves issue #548.
- Rainbow:
- Fixed issue #559: Batch configurations will now map .mqxliff
files to the XLIFF Filter by default.
- Changed the default encodings of the UI to UTF-8 when running on a
Macintosh or on Unix/Linux.
- Connectors:
- Microsoft Translation Hub:
- The connector will now correctly handle batch queries when
provided with content in excess of 10 segments or 10,000 characters.
Callers should no longer need to do additional batching of their
own.
- The query(String) method will now send content for
translation as a raw string, without doing additional tag
processing. This brings the implementation in line with the javadoc,
and is intended to be used in cases when a caller wishes to send raw
HTML to Translation Hub. The query(TextFragment) method
will continue to replace inline codes with dummy HTML tags before
calling the API. A new batchQueryText(List<String>)
method has been added to perform batch queries of text without
additional processing, to complement the existing batchQuery(List<TextFragment>)
method.
- Improve error handling, including handling of invalid segments,
latency spikes, and the X-MS-Trans-Info diagnostic header.
- Content using the zh-hk locale will now be sent as
Traditional Chinese rather than Simplified Chinese.
- Ensured the language codes starting with
sr-Cyrl
(Serbian in Cyrillic) are mapped properly to the Microsoft internal
code.
- Ensured the language codes
in
are mapped to id
as expected by Microsoft internal code.
- OpenTran:
- Removed connector for the (defunct) OpenTran service.
- General:
- Add ability to add ICU4J segmentation rules via SRX configuration
option "
icu4jBreakRules
". Add okapi_default_icu4j.srx
file with enhanced rules.
- Now using the Okapi XLIFF 2 library version 1.1.4.
Changes from M29 to M30
- Filters:
- HTMLEncoder:
- Quote Mode Options Added: Added the ability to configure quote
escaping rules to filter configs: UNESCAPED, ALL,
NUMERIC_SINGLE_QUOTES, DOUBLE_QUOTES_ONLY.
- MIF Filter:
- Support FrameMaker 2015 files.
- TS Filter:
- The text units are now extracted with the flag to preserve
whitespace set.
- XLIFF Filter:
- Fixed issue #521:
<phase>
elements will no
longer be reordered when processing a file with the filter.
- Fixed issue #207: Added the "Preserve whitespace on 'default'"
option, which forces the filter to preserve whitespace when the
xml:space="default"
attribute is present, as if the "preserve"
value was
present instead.
- Fixed issue #539: Implemented support for
translate
in <group>
and <bin-unit>
.
- XLIFF-2 Filter:
- Added an initial and experimental implementation of a filter for
XLIFF v2.0 files.
- HTML Filter:
- When merging in a right-to-left target language, the
dir="rtl"
attribute will be added to the <html>
element.
- HTML5 Filter:
- When merging in a right-to-left target language, the
dir="rtl"
attribute will be added to the <html>
element.
- ITS Filter:
- Fixed the issue when inline empty elements with two tags like
<span></span>
Text
was output incorrectly. (See contribution from the FREME
project).
- Improved the pre-defined rules for the Android Strings filter
configuration.
- OpenXML Filter:
- Support for
.dotx
, .dotm
, .ppsx
,
.ppsm
, .potx
, .potm
, .ppsx
,
.xltm
files has been added.
- Fix issue #297: richly styled text in Excel spreadsheet cells will
now produce text units containing inline codes that represent
formatting.
- Improve handling of Smart Tags.
- The filter will no longer expose redundant copies of text stored
in "Alternate Content" blocks for translation. The filter will
expose the primary version of the text for translation and strip the
fallback content from the target document. Fallback content can be
regenerated, if necessary, by opening the document in Office.
- Fix issue #524: a bug that caused duplicate TextUnit IDs to be
generated in some cases involving nested content.
- Fix a bug that hid hyperlink URLs from translation.
- Fix issue #526: Word documents containing charts that contained
entities could be corrupted on merge.
- Exposed several new options in the Rainbow UI, including the
ability to treat tabs and line breaks as characters, the handling of
soft hyphens, and whether to automatically accept document
revisions.
- Fix issue #532: Worksheet names in Excel documents can now be
exposed for translation using the new "Translate Sheet Names"
option.
- Fix issue #533: Add an option to expose hyperlinks stored in .rels
files for translation.
- OpenOffice Filter:
- Formula results are no longer extracted for translation in ODS
files.
- Automatically-generated numbers are no longer extracted for
translation.
- Document metadata is now only extracted if the extractMetadata
configuration parameter is enabled. This parameter is enabled by
default.
- Subfilters:
- Fix issue #530 - JSON content could be corrupted when processed
with a subfilter.
- Fixed issues related to subfilters that call additional
subfilters.
- Steps:
- Id-Based Aligner Step:
- Add an option to allow alignment based on TextUnit ID, rather than
resource name. This is useful for aligning formats where no resource
name exists, but the TextUnit ordering is known to be stable between
files.
- Localizables Checker Step:
- Checks dates, times and numbers and flags text units where the
target instance is either missing or not localized properly.
- Language Tool Step:
- Enhance to provide morphologically valid bilingual term and black
term checking if LanguageTool supports a stemmer for the locale.
Otherwise resort to full word comparisons.
- General:
- Fix Issue #523: XLIFFWriter will now preserve properties set on
empty targets.
- Changed XLIFF 2 library to version 1.1.1.
- Upgrade ICU4J to version 57.1.
- Split Quality Check step into independent steps: character, general,
length, patterns, inline codes etc.
Changes from M28 to M29
- Tikal:
- The scoping report option now outputs character counts in addition
to word counts by default.
- Filters:
- IDML Filter:
- Fixed a concurrency issue that could cause crashes when multiple
instances of the filter were used simultaneously.
- OpenXML Filter:
- The way formatting information is converted to codes has
changed. The filter will now attempt to streamline code generation
by considering whether the formatting applied to a text run can be
considered a "nested" format within the existing formatting. For
example, a bold, italic run would be considered "nested" within a
bold run. This allows for a more natural code mapping that should
be more intuitive for translators, and is also more closely
aligned with other tools.
- Style inheritance is now considered when calculating the
formatting in effect for a run of text.
- Right-to-left (RTL) support has been added for paragraphs, table
content in DOCX files and some DrawingML constructs.
- Fixed issue #486. Simple and complex fields are now represented
as a single code for the entire field.
- Fixed issue #487. Runs that differ only in script specified for
non-overlapping codepoint ranges can now be merged. This reduces
the number of inline codes produced in some cases.
- Fixed issue #502. Cells that are in rows and columns that are
hidden will no longer be exposed for translation by default. This
brings the behavior of the Excel filter into alignment with the
behavior of the other OpenXML filters. A new option, "Translate
Hidden Rows and Columns", has been added to the configuration for
the Excel portion of the OpenXML filter.
- The "Clean Tags Aggressively" option will now strip
<w:bCs> and <w:szCs> tags from Word documents.
- Fixed a crash that could occur when parsing files with enormous
attribute values.
- The non-breaking hyphen is now converted to a character, rather
than treated as a tag.
- ITS Filter:
- Added type for text units coming from attributes (value:
x-<attribute-name>).
- Table Filter:
- Fixed issue #511: now empty targets with delimiters are merged
properly.
- TXML Filter:
- Fixed issue #501, where segment elements commented out were
deleted from the output file.
- XLIFF Filter:
- Fixed issue #500, where
alt-trans
proposals with a
match-quality
score in decimal form ("100.00") were
treated as having a score of 0.
- Added support to change sdlxliff original attribute values based
on okf_xliff-sdl filter configuration. conf and locked attributes
are also supported.
- Libraries:
- XLIFFWriter:
- Added support for
state-qualifier
output in main <target>
.
- Pensieve:
- IMPORTANT:
Code.codesToString()
changes.
The pensieve TM format has changed and is not backwards
compatible. You will need to export your TM's and re-import them
with M29.
- Steps:
- Added character count steps:
- The Character Count step calculates character counts per the
GMX-V 2.0 standard and stores them in a Metrics annotation (like
the Word Count step). There are also steps for counting all GMX
non-translatable categories (ProtectedCharacterCount, etc.) and
Okapi categories (Condordance, FuzzyMatch, MT, etc.).
- GMX "-Only" word count steps:
- The AlphanumericOnly, NumericOnly, and MeasurementOnly word
count steps now follow the GMX standard in that they only give
non-zero counts for TUs that consist solely of tokens of the
relevant type. (Previously they merely counted relevant tokens.)
- Translation Comparison step:
- Added an option to use the target of the alt-trans element for a
given origin value when processing an XLIFF file as second file.
This allows to compare an MT candidate placed as alt-trans entry
with the actual translation in the main target element.
- Scoping Report step:
- The Scoping Report step now can report character counts when the
relevant annotations are present. Use both the Word Count and
Character Count steps to get full detail. The default template has
been updated to include character counts for the included
categories.
- Post-segmentation Inline Codes Removal Step
- Added step that attempts to simplify (trim and merge) as many
inline codes as possible by looking at each linguistically
distinct segment in a TextUnit.
- Connectors:
- KantanMT Support
- Added a new connector to support KantanMT.
- Microsoft Translation Hub
- Fixed an issue when working with trained engines with certain
target languages..
Changes from M27 to M28
- Tikal:
- Fixed issue #444: Now the -imp command can use the -approved
option to import only approved entries.
- Rainbow:
- Fixed issue #464: Exported batch configurations should now
include default filter mappings for all known extensions.
- Files with the .sdlxliff extension will now use okf_xliff-sdl
filter configuration by default.
- Fixed help page for command-line -?
- Library:
- Fixed issue #442 where the blacklist for the term checker was
not working for Japanese.
- Fixed issue #439: Quality issues based on patterns containing
newline characters did not display correctly in CheckMate.
- Fixed issue in the verification library where the check for
leading and trailing whitespace did not take into account empty
string, causing an index out of range error.
- Fixed issue #456: SRX issue with empty "beforebreak" in rule.
- Changed SWT libraries to version 4.4
- Filters:
- OpenXML Filter:
- Fixed issue #165: Strings in Excel files are now extracted in
a more logical order (sheet-by-sheet, one row at a time, ordered
by columns). Additionally, strings that appear multiple times in
are now exposed for translation once for each occurrence, and
may be translated independently. Warning: these changes
may create problems merging translated content from Excel files
that were processed with previous versions of Okapi.
- Fixed issue #338: the "Exclude color" configuration option is
now correctly applied to Excel files. This option works for cell
background colors only. The colors available in the Rainbow
filter configuration UI are aligned against the Excel 2011
"standard colors"; additional RGB values may be excluded by
editing your filter configuration file directly.
- Fixed issue #390: the "Exclude column" configuration option is
now correctly applied to Excel files.
- Fixed issue #440: markers for spelling and grammar errors are
now stripped when exposing text for translation.
- Fixed issue #441: added new options to expose line breaks and
tabs for translation as literal characters.
- Fixed issue #443: added the "Exclude Graphical Metadata"
option to prevent metadata associated from graphics and
textboxes from being exposed for translation.
- Fixed issue #447: the "Translate Document Properties" option
will now work for PowerPoint files.
- Fixed issue #448: the "Translate Comments" option will now
work for PowerPoint files.
- Fixed issue #449: the "Translate Slide Masters" option will
now also expose master layout content that is used by slides in
the document.
- Fixed issue #451: the "Translate Document Properties" option
will now work for Excel files.
- Fixed issue #452: Word files containing nested graphicData
sections are no longer corrupted during processing.
- Fixed issue #453: Word files that do not contain a word/styles.xml
part can now be processed.
- Fixed issue #454: entities occurring in alternateContent
sections of Word documents are now handled correctly.
- Fixed issue #457: empty lines in Word files are sometimes
stripped.
- Fixed issue #458: target text is lost in complicated run
structures
- Fixed issue #467: tabs in Word files are sometimes stripped.
- Fixed issue #473: deletion change tracking can cause target
corruption.
- Fixed issue #474: files containing formulas could become
corrupted.
- Fixed issue #476: insertion change tracking can cause target
corruption.
- Fixed issue #482: <w:lang> tags are now
stripped during extraction.
- Fixed issue #484: Added a "Clean Tags Aggressively" option to
the filter. When this option is enabled, the filter will strip
certain types of formatting markup (whitespace and vertical
alignment adjustment) that is spuriously inserted when
converting other formats (such as PDF) to OpenXML. This produces
smoother segmentation in some cases.
- Fixed issue #485: Strip machine-generated "_GoBack" bookmarks
from Word files.
- ICML Filter:
- Added the ICML Filter for WCML files.
- HTML Filter:
- Added support for ASPX comments and fixed tag-like attribute
values extra output.
- PO Filter:
- A Plural-Forms: header that declares nplurals=1
is now handled correclty.
- Table Filter:
- Blank lines inside qualified CSV cells are now preserved.
- CSV text qualifiers can now be optionally added on output when
required to maintain well-formedness.
- Step:
- Rainbow Translation Kit Creation:
- Improved support for XLIFF 2 packages
- Rainbow Translation Kit Merging:
- Improved support for XLIFF 2 packages
- Text Rewriting step:
- Fixed the case where the target had only inline codes and the
source text and inline codes. Now the base text is taken from
the source.
- Now expansion is done before the last inline code.
Changes from M26 to M27
- General
- Fix resource and memory leaks
- Filters:
- XLIFF Filter:
- Added warning when inline code (other than
mrk
)
has no id
attribute.
- Fixed location of
<phase-group>
when
re-writing.
- Fixed case where XML declaration was not followed by a
line-break on output.
- Added fallback check for TMX values for the
<it>
pos
attribute (error/warning still generated as
using TMX values in XLIFF is not valid).
- Added better support for SDLXLIFF
- Optional parameters for writing out tool element in xliff
header
- Fixed issue #430 where the ITS namespace declaration and
version was not added when needed.
- HTML Filter:
- Added the
placeholder
attribute to list of
translatable attributes in default HTML configuration (for
HTML5)
- Fix lower casing of start tags during pre-processing cleanup
- Upgrade to Jericho 3.4-dev
- Steps:
- Rainbow Translation Kit Creation Step:
- Updated XLIFF2 library to 1.0 release.
- Implemented v2 support for the Transifex packages.
- Rainbow Translation Kit Post-Processing Step:
- Implemented v2 support for the Transifex packages.
- Connectors:
- MyMemory Connector:
- Fixed the issue of the return match value being sometimes a
Double and sometimes a Long.
- Make Connectors more error tolerant. Continue processing if
there is an exception on a single text Unit
- Library:
- XLIFF Writer:
- Added support to output the
coord
attribute (COORDINATES
property on the text container).
- Transifex library:
- Fixed issue #427 where the API v2 was not supported.
- Segmentation library:
- Fixed issue #426 where the part of the text matched by the
previous rule was not scanned for match in the next rule.
- Fixed issue #489: Added the okp:treatIsolatedCodesAsWhitespace
option to allow the segmenter to treat each isolated code as a
single whitespace character when applying segmentation rules.
- Verification library:
- Fixed issue #418: the description of the rule is now displayed
for target-driven error.
- Improved reading of LQI entries: the ITS type is preserved
when reading the okp:lqiType value.
- Fixed issue #442: Allow flagging blacklist terms in
substrings.
- Fixed issue #400: Allow flagging blacklist terms in source.
- Parameters editor for Verification library:
- Fixed issue #417: The description of each pattern is now
preserved when re-ordering the patterns.
- Fixed issue #442: Add option to allow flagging blacklist terms
in substrings.
- Fixed issue #400: Add option to allow flagging blacklist terms
in source.
Changes from M25 to M26
- Rainbow:
- Added
.mqxliff
extension in the list of extensions
associated with the XLIFF Filter.
- Tikal:
- Fixed issue preventing custom filter configurations to work as
sub-filters.
- Added the ability to output scoping reports.
- Filters:
- Fixed Issue 409: Inconsistent handling of
<bx
pos="begin"/>
in extraction to Moses inline format.
- XLIFF Filter:
- Added support for
<sub>
elements (plain
text, with nested codes, or with nested codes with nested
sub-flows).
- XMLStream Filter:
- Added the .ditamap extension to the list of extensions for the
DITA pre-defined configuration.
- Steps:
- LanguageTool Step:
- Resolved issue #416 (added suggestion to annotation).
- Filters Plugin for OmegaT:
- Libraries:
- Fixed the issue where sub-filter start and end events where not
handled properly for outputting RTF layered files.
- Verification library:
- Improved mapping of LanguageTools ITS type to issue
annotation.
- Updated the XLIFF2 library to use 0.22-snapshot.
- Discontinued the MacOS 32-bit distribution (no Java 7 support)
- TMXWriter: fixed bug where some property entries were written
before the
<seg>
element,
- Improved stream-only pipeline capabilities.
Changes from M24 to M25
- Tikal:
- Fixed default parameter for the default TM resource. Now you can
just run
tikal -q "text to search"
.
- Added
-x2
and -m2
options for
extraction/merge with new skeleton file.
- Changed -x and -m to extract/merge with the new skeleton file.
- Added
-x1
and -m1
options for
extraction/merge with original document (similar to previous
versions. For fully backward-compatible merge you must use M24).
- Updated Tikal merge function with original file to use the new
common text-unit merger.
- Added options for JAR version switch
- Filters Plugin for OmegaT:
- Added basic support for XLIFF 2 documents (under construction).
- Now target text passed as translation only if it is different
from the source.
- Added support for alternate translations (e.g. from XLIFF 1.2
documents)
- Steps:
- Added the TTXSplitter Step. It allows to split a given TTX
document into several ones with the same source word count.
- Added the TTXJoiner Step. It allows to join back TTX documents
created with the TTXSplitter Step.
- Consolidated merge steps into: SkeletonXliffMergerStep,
LegacyXliffMergerStep, OriginalDocumentXliffMergerStep and
CombinedXliffMergerStep classes.
- Rainbow Translation Kit Creation Step:
- Reinstated output for XLIFF v2 using the latest library.
- Added support for extraction using the new skeleton file.
- Rainbow Translation Kit Merging Step:
- Added basic support for merging XLIFF v2 packages.
- Added support for merging using the new skeleton file.
- Search and Replace Step:
- Fixed issue #392 where the reading of the replacement table
was trimming all lines. Now replacing or searching for space and
replacing by nothing works.
- Filters:
- Changed the reading of
<alt-trans>
elements to allow entries with empty <target>
(e.g. some XTM's XLIFF have <alt-trans>
with
empty targets).
- Added the option "Allow modification of existing
<alt-trans> elements"
- YAML Filter:
- OpenXML Filter:
- Fixed issue #402 (Cannot stop the filter before the document
is done)
- Fixed issue #350 (merge problem when docx has a
OpenXml.Drawing object)
- PO Filter:
- Fixed issue with
#,
(fuzzy flag) in front of #~
(obsolete) entries.
- JSON Filter:
- Refactored the filter.
- Fixed issue #359 (Need to improve extraction selection)
- Fixed issue #373 (Encoder and xml:space='preserve')
- Fixed issue #397 (Filter not extracting all strings as
expected)
- Connectors:
- Translate Toolkit TM Connector:
- Updated the parameters API to use
set/getUrl()
instead of set/getHost()
and set/getPort()
.
- Updated the default host and port (now obsolete) to
localhost
and 8080
to allow local setups to continue to
work.
- Updated the default URL to
https://amagama-live.translatehouse.org/api/v1/
(the previous URL is obsolete)
- Libraries:
- Major refactoring of the serialization.
- Major refactoring of the RawDocument object
- Updated SWT libraries to 4.3
- Added lib-tkit library for extraction/merge with skeleton in
JSON.
- Added sort capability to the Filter Configuration common edit
dialog.
Changes from M23 to M24
- Tikal:
- Changed default resource for
-q
command from
OpenTran to Translate Toolkit
- Rainbow:
- Made usability improvements to the Testing Console for rapid
iteration when creating custom filter configurations.
- Steps:
- Added the Copy
Or Move Step: Copies or moves files to a specified location
with the option to overwrite or backup existing files or skip
copying files if there is an existing file.
- Rainbow
Translation Kit Creation Step:
- Removed the experimental output to XLIFF 2.0 (too outdated to
be useful) See the Okapi
XLIFF Toolkit project for more up-to-date support for
XLIFF 2.0.
- Added option to specify the post-processing hook for OmegaT
tkits.
- Format
Conversion Step:
- Added the Word Table output format.
- Filters:
- IDML Filter:
- Changed the default spread size threshold to 2000 Kb and
updated the warning/error to show the spread size.
- XML Filter:
- Android Strings pre-defined settings: Exposed content of
<item> elements for translation (used in <plurals>,
<string-array> elements).
- HTML Filter:
- Added option to treat CDATA as an inline element.
- Content of excluded inline elements is exposed for, e.g.,
inclusion in XLIFF equiv-text attributes.
- Fixed issue #336: The filter will no longer produce
translatable segments consisting only of tags.
- XML Stream Filter:
- Fixed issue #336: The filter will no longer produce
translatable segments consisting only of tags.
- OpenXML Filter:
- Fixed issue #351: Improve filter performance.
- JSON Filter:
- Fixed issue #377: Support for subfiltering in JSON.
- Fixed issue #373: JSONFilter should use the JSONEncoder.
- Note: Changes in escaping/unescaping behavior in this filter
break compatibility with files extracted by previous versions.
- Filters Plugin for OmegaT:
- Added capability to specify a custom filter parameters file for
each Okapi filter in the plugin. This closes issue #376.
- Connectors:
- Added the Bilingual
File Connector: Directly query a bilingual file format such
as TMX, PO, etc., without importing to a TM first.
- Library:
- Important: Changed minimum requirement
from Java 1.6 to Java 1.7.
- Fixed ITS content writer to output
locQualityIssueProfileRef
and not locQualityIssueProfile
.
- Improved report output of quality checker.
- Updated and cleaned up the build files.
- Added
HUMAN_RECOMMENDED
type to the MatchType
list.
- Modified the base implementation for
IParameters
,
this may result in compilation errors in your code if you access
directly some variables: you should now use the corresponding
getter and setter methods.
- Added support for tuv-level attributes that were missing in
TMXWriter.
Changes from M22 to M23
- Rainbow:
- Added the Inconsistency Check Step to the pre-defined Quality
Check pipeline.
- CheckMate:
- Fixed issue #358: The Check Document button now works in all
cases.
- Filters Plugin for OmegaT
- Added .mxliff as one of the default extensions for XLIFF.
- Fixed issue #364: .sdlxliff files with UTF-8 BOM open now.
- Steps:
- Added the Inconsistency Check Step: a way to flag entries with
the same source that have different targets or the entries with
the same target that have different source.
- Rainbow Translation Kit Creation Step:
- Added a
libVersion
attribute in the manifest
indicating the version of the library used to create the
manifest.
- Add option to use encapsulation notation (
<bpt>/<ept>/<ph>/<it>
)
for inline codes in OmegaT tkits.
- LanguageTool Step:
- Updated the library to version 2.2.
- Encoding Conversion Step:
- Fixed issue #318: ASCII characters in NCR form are now
un-escaped except for ", ', &, < and >.
- Search and Replace Step:
- Fixed issue #183: Added simple log of the replacements.
- Fixed issue #362: Step for Terminology fixes on translation
candidates.
- Quality Check Step:
- Resolved issue #357: Added function to detect blacklisted
terms.
- Improved ITS LQI support.
- Filters:
- IDML Filter:
- Implemented issue #356: By default spread above the threshold
cause an error. The option allows to skip without error.
- XML Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #361: MIME type can be different in sub-classes of
XMLFilter.
- HTML5-ITS Filter:
- Continued implementation of ITS 2.0.
- JSON Filter:
- Resolved issue #360: The use of the key for the resname value
is now optional.
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #364: Woodstox XML parser is now always used.
- Alt-trans with empty target are now skipped.
- Added support for the
<tool>
and <phase>
elements as well as the state-qualifier
attribute.
- OpenXML Filter:
- Fixed issue #291: Sub-documents are now processed in correct
order.
- Fixed issue #319: 'squishable' tests has been changed.
- Simplification Filter:
- Fixed issue #355: parameters of sub-filter are properly read
in the cases where the primary filter uses a sub-filter.
- XINI Filter:
- Fixed a case where placeholders were being renumbered
incorrectly when reading a XINI file.
- Library:
- Verification Library:
- Fixed issue with non-initialized start/end variable when
checking patterns from the target.
- Added support for sub-document in Quality Checker library.
- Continued implementation of ITS 2.0 in XLIFFWriter,
XLIFFContent, etc.
- Fixed issue #352: XMLWriter now throw OkapiIOException if an
error occurs.
- Updated XLIFF Writer to match ITS/XLIFF official mapping (http://www.w3.org/International/its/wiki/XLIFF_1.2_Mapping).
- Added the experimental lib-concurrent package to improve
multi-threaded pipelines. See ThrededWorkQueue
Step page for details.
Changes from M21 to M22
- Tikal:
- Made it possible to run tikal.sh from another directory on Mac
OS X.
- Updated the way the application root folder was computed to
allow call from Network share.
- Rainbow:
- Fixed
-log
option to allow it anywhere in the
command-line.
- Filters:
- Table Filter:
- Fixed issue #300 (enhancement): Added a new Table Filter
for 2-column (source + target), tab separated files.
- OpenXML Filter:
- Fixed issue #166: Text from mc:Fallback and mc:Choice
Requites="wps", WordArt, TextArt, and Watermarks is handled
properly now.
- Fixed issue #169: Segmentation around inline codes seems
to work properly.
- Fixed issue #286: PPTX smart-tags are now imported.
- Fixed issue #323: Files are not corrupted anymore when
using text areas.
- Fixed issue #324: Nested <w:p> merge properly now.
- Fixed issue #325: The slides of PPTX documents are now
extracted in order.
- Fixed issue #329: Text from PPTX diagrams are now
extracted.
- Fixed issue #351: Creation of XLIFF work on documents with
SmartArt graphics.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #332: When using the global_cdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Fixed issue #339: The filter was not grouping the tags
back properly when merging back.
- Added handling of variable placeholders for the
pre-defined settings for RESX files.
- ITS Filters (XML Filter and HTML5+ITS Filter):
- TMX Filter:
- Fixed the issue where <it> codes where mapped to
placeholder rather than opening/closing internal codes.
- XLIFF Filter:
- Continued implementation of ITS 2.0: Improved support for
LQI, added support for Provenance.
- Simplification Filter:
- General:
- Filters that update language properties (like xml:lang)
during merging will now be region-insensitive when doing so.
- Steps:
- Term Extraction Step:
- Added support for Text Analysis annotations.
- Made the three extraction methods options, and attached
the relevant options to the statistical method.
- Full-Width Conversion Step:
- Added log message if at least one character was modified
(per input file). This resolves issue #327.
- Enrycher Step:
- Improved hanlding of nested annotations.
- Batch TM Leveraging Step:
- Fixed issue #331: Entries with no text are now not sent
for translation.
- Format Conversion Step:
- Fixed the issue where the "Output generic inline codes" was
not recognized for the Tab-delimited table output.
- MS Batch Translation Step:
- MT candidate with a very low score (e.g. from error) are
not output in the TMX.
- Space Checker Step:
- Improved reporting of errors and changes.
- Fixed issue #346: Iterating through text fragments ran out
of bounds. Indexing error was fixed.
- Fixed issue #348: inline code index marker broken as a
result of spacing changes. Index marker error was fixed.
- Translation Comparison Step:
- Consolidated Paragraph Alignment and Sentence Alignment steps
- Connectors:
- Microsoft MT Connector:
- Improved error handling (e.g. problem with inline codes in
result).
- Filters Plugin for OmegaT:
- Added *.xliff and *.sdlxliff as default extensions.
- Changed default for isFileSupported() to return true (this
allows user-defined extensions).
- Libraries:
Changes from M20 to M21
- All applications:
- Applications now launch correctly on Mac OS X when they are
located in a path containing a space.
- Rainbow:
- Added the
-log
option to specify result log
file. By default the log file is {user.home}\rainbowBatchLog.txt
- Tikal:
- Added the
-safe
option to prompt user when
overriding a directory when extracting.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Added support for
okp:engine
attribute in <alttrans>
.
- Wiki Filter:
- Fixed issue #315: WikiFilter didn't work with
preserve_whitespace: true.
- Regex Filter:
- Improved the macStrings default settings to include
slash+star comments with next extracted string.
- IDML Filter:
- Fixed issue #316: Added default to not extract hidden
layers and added the option "Extract hidden layers".
- Enabled the option "Create new paragraphs on hard
returns".
Important: This option is still BETA and may prevent you
to merge back the extracted file. Make sure to test the
round-trip before using this option for real projects.
- TMX Filter:
- Fixed a bug where attribute values on <tuv> elements
were being written back to the skeleton without proper
escaping.
- Improved filter performance.
- Properties Filter:
- Fixed issue #313 where the extended characters were not
escaped when using the sub-filter.
- TS Filter:
- Changed the instantiation of the XML parser to use
Woodstox.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #303: When using the global_pcdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Steps:
- Microsoft Batch Translation Step:
- Added support for the
${domain}
variable for
the category.
- Added support for
${rootDir}
and other
variables for the path of the Engine mapping file.
- Quality Check Step:
- Added option to save or not the session. This option is
not accessible when editing the parameters from CheckMate,
but only when editing a step's parameters.
- Rainbow Translation Kit Creation Step:
- Added to XLIFF outputs the option of outputting
ctype
and equiv-text
attributes in inline codes.
- Added to OmegaT output the output of
ctype
and equiv-text
attributes in inline codes.
- Added the option to merge a new OmegaT translation kit
with an existing one, rather than overwriting it.
- Enrycher Step:
- Added support for segmented text units and implemented
handling of inline codes.
- Added parameter for number of events to process on each
call to the service.
- Translation Comparison Step:
- Added option to log the average scores per documents in a
tab-delimited file.
- Added the output of a new tab-delimited file with all
scores, along with the HTML report.
- Extended the repartition table to use 11 brackets instead
of 3, and include the two scores.
- Segmentation and Desegmentation Steps:
- Added the option to renumber code IDs after segmentation
so that they are 1-indexed as much as possible. A
corresponding option on the desegmentation step reverses the
process. This option will not work correctly with formats
that use non-consecutive or non-numeric code IDs, such as
XLIFF.
- Connectors:
- Microsoft Translator Connector:
- Added information about the engine in the query results.
- Libraries:
- Continued implementation of ITS 2.0 in XLIFF Writer.
- Changed options settings for the XLIFFWriter class to use an
object rather than multiple setters.
- Filters Plugin for OmegaT:
- Fixed issue #322: Updated the TS filter to use the Woodstox
parser, and added the dependencies.
- Added the XLIFF Filter to the plug-in.
- Added basic support for some ITS data categories in the
Comments pane (Text Analysis, Terminology).
Changes from M19 to M20
- Rainbow:
- Improved the logging output and UI responsivness during
lengthy processes.
- Updated the user's preference dialog to allow the selection of
the log levels as defined by SLF4J (Normal, Debug, Trace)
- Rainbow's input root directory now supports expansion of
system environment variables.
- Tikal:
- Fixed -lfc command output.
- Use the -continue option to specify that batch operations
should continue processing even if one or more files in the
batch fail to process.
- Summary information will be included at the end of batch
commands.
- Timing information is included for each file processed, and
total elapsed time is included in the batch summary.
- Added the -pd option to specify a directory to search for
custom filter configurations.
- Fixed a crash when merging (-m) a file with no extension.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- Continued the implementation of ITS 2.0
- Fixed issue where the HTML-type special characters were
not escaped when converted to inline codes by the code
finder.
- Fixed issue #311 where preserve space property was not
applied to attributes.
- TXML Filter:
- Fixed issue #266: Translations in the
<revisions>
elements are now ignored.
- XLIFF Filter:
- Improved support for
<mrk>
elements.
- Added support for several ITS features.
- Connectors:
- Microsoft Translator Connector:
- Fixed the internal conversion of the language code.
- Libraries:
- Fixed issue #282 for the Abstract Markup Filter.
- Fixed issues in reporting libraries.
- Added support for including ITS annotations in the XLIFF 1.2
writer.
- Steps:
- Rainbow Translation Kit Creation:
- Added the "Include post-processing hook" option for OmegaT
packages. this allows OmegaT to merge back the documents
automatically.
- Remove Target:
- Fixed issue #270 where the step could not be run without
some of the optional parameters set.
- Microsoft Batch Translation:
- Added an option to send the generated TMX document as a
raw document for the input of the next step.
- Added a option to point to a .properties file containing a
mapping of keys to categories for more convenient lookup.
See wiki.
- Quality Check Step:
- Fixed issue #304 where the default check on parentheses
didn't include full-width characters.
- Inline Code Removal Step:
- Added option to replace line break related codes with
spaces. By default codes are simply removed.
- Added the Space Check Step. It allows to fix automatically
spaces around inline codes of the target based on the source.
- Added the Cleanup Step. It allows to normalize quotation
marks, punctuations, remove suspect entries, etc. this can be
used for example when preparing an aligned document for MT
training.