Okapi Framework Changes Log - Aug-20-2017
Note this document is common to both the okapi-lib distribution
and the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from M33 to M34
- Filters:
- IDML Filter:
- Complete rewrite of the filter, fixing many problems and changing
several behaviors. Warning: files extracted with previous
versions of Okapi will no longer merge successfully due to changes
in text units and skeleton processing. Resolves issues #544,
#497, #302, #378, #293, #259, #245, #471, and #317. Additionally
deprecates the filter configuration options "Maximum Spread Size",
"Generate an error when a spread is larger than the specified
value", and "Create new text units on hard returns".
- ITS Filter:
- The separator for the
annotatorsRef values is now '\s'
(white-space) rather than ' ' (ASCII space). Resolves
issue #612.
- JSON Filter:
- Fixed issue #410: add an option to toggle whether or not '/' is
escaped in output.
- OpenXML Filter:
- Fixed issue #503: chart content will now be exposed for
translation in Excel and Powerpoint files.
- Fixed issue #614: fixed a case where text with different styles
could be incorrectly merged into a single run.
- XLIFF Filter:
- Fixed issue #110: CDATA sections that appear in markup will be
preserved.
- Steps:
- Terminology Leveraging:
- Added a step to annotate the source and target content with
terminology information leveraged from a term base.
- XML Validation:
- Made using the DTDs found in the input documents an option (set by
default).
- Connectors:
- ModernMT API Connector:
- Google MT:
- Improve retry behavior, particularly when exceeding the API rate
limits.
- Tikal:
- You can now use the
-mmt option to use the ModernMT
API Connector.
- Bugs in the usage of the
-lingo24 options have been
fixed.
- Libraries:
- Core:
- Fixed how CDATA is preserved when in the skeleton. Resolves issue
#624.
- Quality Check:
- Added an option to show only relative paths in Quality Check
report. Resolves issue #616.
- Improved how some issues with localizable items were reported.
- General:
- The XLIFF 2 library has been upgraded to the version 1.1.6.
Changes from M32 to M33
- CheckMate:
- Updated LanguageTool integration to support the new JSON API (Issue
#582).
Note: As a result of this change, CheckMate will no longer work
with LanguageTool version 3.3 or earlier. For more information, see
the LanguageTool
API Migration page.
- Tikal:
- You can now process several files at once with the
-xm
and -lm commands. Note that in that case you cannot use
the -to and -from options. This resolves
issue #598.
- Filters:
- Markdown Filter:
- There is a new filter for Markdown (
.md) files.
- XLIFF Filter:
- Fixed missing prefix on output for some prefixed XLIFF inline
elements.
- Improved support for
equiv-text attribute.
- Issue #466: Add a new
skipNoMrkSegSource option that
will cause trans-units containing seg-source
but no internal mrk data to be treated as skeleton.
This behavior (which is consistent with SDL Studio) is enabled in
the okf_xliff-sdl configuration, but disabled by
default in okf_xliff.
- Issue #551: XLIFF files containing entities that are invalid in
XML 1.0 (for example, ) will no longer break the filter.
(These entities, and their corresponding characters, will be
stripped before parsing.)
- Issue #602: the
maxwidth, maxheight,
and size-unit attributes are now parsed and exposed as
resource properties. Updates to these properties will be reflected
in the merged XLIFF file.
- XML Filter:
- The handling of files with UTF-16LE and UTF-16BE declaration has
been improved. Output to these encodings is treated as an output to
UTF-16 with a BOM.
- Add the
inlineCdata option, which will cause CDATA
markup to appear as inline codes rather than being stripped.
- OpenXML Filter:
- Support for
.dotx, .dotm, .ppsx,
.ppsm, .potx, .potm, .xltx,
.xltm files has been added.
- Fix a bug where certain XLSX files would cause an infinite loop in
parsing.
- Fix a bug where certain XLSX files would fail to extract.
- Fix a bug where paragraph spacing properties were incorrectly
stripped when using "aggressive" cleanup mode.
- PO Filter:
- Issue #584: Add "Include msgctxt in note" option to
include context data in trans-unit notes.
- JSON Filter:
- A new option to not have the leading slash in the full key path
has been added. This resolves issue #603.
- lib-verification:
- Updated the LanguageTool integration to support the new JSON API, as
described above.
- Updated the BlacklistChecker to do case sensitive validation if the
blaclist terms are identical except for the case.
- Connectors:
- SimpleTM:
- Fixed the
setPenalizeSourceWithDifferentCodes()
method so it sets the value correctly.
- Google MT:
- Update the connector to support NMT models, when available.
- General:
- The code is now under Apache License version 2.0.
- The
XLIFFWriter class now supports serializing the maxwidth,
maxheight, and size-unit attributes on <group>
and <trans-unit> elements, by attaching the
corresponding net.sf.okapi.common.resource.Property key
to the appropriate resource.
- The displayText field of
Code objects is now stored in
a field on the object, rather than in an annotation.
- The
equals and compareTo methods on the
TextFragment class will no longer indicate equality when
compared to non-TextFragment instances.
- Improved error message in
GenericSkeletonWriter. This
resolves issue #593.
Changes from M31 to M32
- Tikal:
- The
-nocopy option is now respected when extracting
using the -seg option. This resolves issue #571.
- Filters:
- OpenXML Filter:
- Support for Microsoft Visio 2013+ (
.vsdx/.vsdm)
files has been added.
- Added an option to ignore placeholder text in PowerPoint master
and layout slides.
- Fixed Issue #576: XLSX files missing certain optional style data
could crash the filter.
- ITS/XML Filter:
- Improved support for the ITS Target Pointer data category.
Caveats: target element/attribute must exist (empty), and must be
after the source; inline codes are not supported yet. Addresses
issue #574.
- HTML Filter:
- Updated the pre-defined filter configuration to allow
title
and dir on all elements as per latest HTML
specification.
- XLIFF Filter:
- Existing
ctype values for most inline codes are now
parsed and stored as the type field on Code objects.
- Connectors:
- Microsoft Translation Hub:
- IMPORTANT: authentication changes. The connector has been
updated to support Azure-based Microsoft Translator subscriptions,
and the old (DataMarket) method of authentication has been removed.
The clientId and clientSecret parameters have
been removed and replaced with a single parameter called azureKey.
If you have an existing Microsoft Translator subscription, you must
migrate it to Azure by April 30, 2017. For information on how to do
this, see this
Microsoft support article.
- General:
- IMPORTANT: The
LocaleId class now uses ICU's ULocale.
A few methods have been deprecated and others may have a slight
different behavior. But this gives us good BCP-47 support going
forward.
- Updated Windows EXEs with version 3.9 of Launch4j to solve the issue
of not being able to start with Java 8 with updates >= u100. This
resolves issue #575.
- Added a maven profile named
reports. It produces
various reports that might come in handy (clirr, findbugs,
jdepend-maven-plugin, versions). To run it:
mvn install -P reports
- Improved the build to include the creation of a DMG file and for
signature for Mac distribution.
- Added a new class,
net.sf.okapi.common.filters.FilterIterable,
implementing Iterable<Event>.
You can now do:
for (Event event : new FilterIterable(filter)) {
... process event ...
}
On Java 8 you can also do:
new FilterIterable(filter).forEach( ... lambda or method
reference processing event ... );
Changes from M30 to M31
- Filters:
- General:
- Fixed issue #519: When using the subfilter feature in several
filters (including JSON, YAML, ITS, and XML Stream), it is no longer
possible to produce multiple text units with the same ID.
- XLIFF Filter:
- Fixed case of memoQ XLIFF files using
<source>
and <target> elements in its own extensions.
This resolves issue #547.
- JSON Filter:
- The "Use Full Key Path" option will no longer include the prefix
"null/" in all generated resource names. Instead, key paths will
begin with "/", as in "/foo/bar".
- Properties Filter:
- Added an option to use the key value as the text-unit ID value.
This addresses issue #520.
- OpenXML Filter:
- Fixed a case in which some hidden spreadsheet cells could still be
exposed for translation.
- Fixed cases where revision metadata unrelated to translatable text
could prevent filtering if the "Automatically Accept Revisions"
option was not checked.
- If the "Treat Tab as Character" option is enabled, the '\t'
character will now be replaced with a <tab> element on output.
- Added the tsComplexFieldDefinitionsToExtract parameter,
which allows users to specify which field codes should be
translated. By default, only HYPERLINK codes are translated,
matching the existing behavior.
- XML Filter:
- In the RESX pre-defined configuration: fixed the code-finder
expression for mustache codes, and added support for basic HTML.
This resolves issue #559.
- PdfFilter:
- New PDF filter. Extraction only.
- YAML Filter:
- Fix Issue: #555: YAML filter skips single quote at beginning of
string
- Fix Issue: #556: YAML filter should escape sequences before
passing to subfilters.
- CheckMate:
- Now sessions with very large configurations can be saved. This fix
will produces session files (
.qcs) that are not backward
compatible with previous version of CheckMate if the size of the
configuration is larger than about 21K. The fix produces backward
compatible files under that size. This resolves issue #548.
- Rainbow:
- Fixed issue #559: Batch configurations will now map .mqxliff
files to the XLIFF Filter by default.
- Changed the default encodings of the UI to UTF-8 when running on a
Macintosh or on Unix/Linux.
- Connectors:
- Microsoft Translation Hub:
- The connector will now correctly handle batch queries when
provided with content in excess of 10 segments or 10,000 characters.
Callers should no longer need to do additional batching of their
own.
- The query(String) method will now send content for
translation as a raw string, without doing additional tag
processing. This brings the implementation in line with the javadoc,
and is intended to be used in cases when a caller wishes to send raw
HTML to Translation Hub. The query(TextFragment) method
will continue to replace inline codes with dummy HTML tags before
calling the API. A new batchQueryText(List<String>)
method has been added to perform batch queries of text without
additional processing, to complement the existing batchQuery(List<TextFragment>)
method.
- Improve error handling, including handling of invalid segments,
latency spikes, and the X-MS-Trans-Info diagnostic header.
- Content using the zh-hk locale will now be sent as
Traditional Chinese rather than Simplified Chinese.
- Ensured the language codes starting with
sr-Cyrl
(Serbian in Cyrillic) are mapped properly to the Microsoft internal
code.
- Ensured the language codes
in are mapped to id
as expected by Microsoft internal code.
- OpenTran:
- Removed connector for the (defunct) OpenTran service.
- General:
- Add ability to add ICU4J segmentation rules via SRX configuration
option "
icu4jBreakRules". Add okapi_default_icu4j.srx
file with enhanced rules.
- Now using the Okapi XLIFF 2 library version 1.1.4.
Changes from M29 to M30
- Filters:
- HTMLEncoder:
- Quote Mode Options Added: Added the ability to configure quote
escaping rules to filter configs: UNESCAPED, ALL,
NUMERIC_SINGLE_QUOTES, DOUBLE_QUOTES_ONLY.
- MIF Filter:
- Support FrameMaker 2015 files.
- TS Filter:
- The text units are now extracted with the flag to preserve
whitespace set.
- XLIFF Filter:
- Fixed issue #521:
<phase> elements will no
longer be reordered when processing a file with the filter.
- Fixed issue #207: Added the "Preserve whitespace on 'default'"
option, which forces the filter to preserve whitespace when the
xml:space="default"
attribute is present, as if the "preserve" value was
present instead.
- Fixed issue #539: Implemented support for
translate
in <group> and <bin-unit>.
- XLIFF-2 Filter:
- Added an initial and experimental implementation of a filter for
XLIFF v2.0 files.
- HTML Filter:
- When merging in a right-to-left target language, the
dir="rtl"
attribute will be added to the <html> element.
- HTML5 Filter:
- When merging in a right-to-left target language, the
dir="rtl"
attribute will be added to the <html> element.
- ITS Filter:
- Fixed the issue when inline empty elements with two tags like
<span></span>Text
was output incorrectly. (See contribution from the FREME
project).
- Improved the pre-defined rules for the Android Strings filter
configuration.
- OpenXML Filter:
- Support for
.dotx, .dotm, .ppsx,
.ppsm, .potx, .potm, .ppsx,
.xltm files has been added.
- Fix issue #297: richly styled text in Excel spreadsheet cells will
now produce text units containing inline codes that represent
formatting.
- Improve handling of Smart Tags.
- The filter will no longer expose redundant copies of text stored
in "Alternate Content" blocks for translation. The filter will
expose the primary version of the text for translation and strip the
fallback content from the target document. Fallback content can be
regenerated, if necessary, by opening the document in Office.
- Fix issue #524: a bug that caused duplicate TextUnit IDs to be
generated in some cases involving nested content.
- Fix a bug that hid hyperlink URLs from translation.
- Fix issue #526: Word documents containing charts that contained
entities could be corrupted on merge.
- Exposed several new options in the Rainbow UI, including the
ability to treat tabs and line breaks as characters, the handling of
soft hyphens, and whether to automatically accept document
revisions.
- Fix issue #532: Worksheet names in Excel documents can now be
exposed for translation using the new "Translate Sheet Names"
option.
- Fix issue #533: Add an option to expose hyperlinks stored in .rels
files for translation.
- OpenOffice Filter:
- Formula results are no longer extracted for translation in ODS
files.
- Automatically-generated numbers are no longer extracted for
translation.
- Document metadata is now only extracted if the extractMetadata
configuration parameter is enabled. This parameter is enabled by
default.
- Subfilters:
- Fix issue #530 - JSON content could be corrupted when processed
with a subfilter.
- Fixed issues related to subfilters that call additional
subfilters.
- Steps:
- Id-Based Aligner Step:
- Add an option to allow alignment based on TextUnit ID, rather than
resource name. This is useful for aligning formats where no resource
name exists, but the TextUnit ordering is known to be stable between
files.
- Localizables Checker Step:
- Checks dates, times and numbers and flags text units where the
target instance is either missing or not localized properly.
- Language Tool Step:
- Enhance to provide morphologically valid bilingual term and black
term checking if LanguageTool supports a stemmer for the locale.
Otherwise resort to full word comparisons.
- General:
- Fix Issue #523: XLIFFWriter will now preserve properties set on
empty targets.
- Changed XLIFF 2 library to version 1.1.1.
- Upgrade ICU4J to version 57.1.
- Split Quality Check step into independent steps: character, general,
length, patterns, inline codes etc.
Changes from M28 to M29
- Tikal:
- The scoping report option now outputs character counts in addition
to word counts by default.
- Filters:
- IDML Filter:
- Fixed a concurrency issue that could cause crashes when multiple
instances of the filter were used simultaneously.
- OpenXML Filter:
- The way formatting information is converted to codes has
changed. The filter will now attempt to streamline code generation
by considering whether the formatting applied to a text run can be
considered a "nested" format within the existing formatting. For
example, a bold, italic run would be considered "nested" within a
bold run. This allows for a more natural code mapping that should
be more intuitive for translators, and is also more closely
aligned with other tools.
- Style inheritance is now considered when calculating the
formatting in effect for a run of text.
- Right-to-left (RTL) support has been added for paragraphs, table
content in DOCX files and some DrawingML constructs.
- Fixed issue #486. Simple and complex fields are now represented
as a single code for the entire field.
- Fixed issue #487. Runs that differ only in script specified for
non-overlapping codepoint ranges can now be merged. This reduces
the number of inline codes produced in some cases.
- Fixed issue #502. Cells that are in rows and columns that are
hidden will no longer be exposed for translation by default. This
brings the behavior of the Excel filter into alignment with the
behavior of the other OpenXML filters. A new option, "Translate
Hidden Rows and Columns", has been added to the configuration for
the Excel portion of the OpenXML filter.
- The "Clean Tags Aggressively" option will now strip
<w:bCs> and <w:szCs> tags from Word documents.
- Fixed a crash that could occur when parsing files with enormous
attribute values.
- The non-breaking hyphen is now converted to a character, rather
than treated as a tag.
- ITS Filter:
- Added type for text units coming from attributes (value:
x-<attribute-name>).
- Table Filter:
- Fixed issue #511: now empty targets with delimiters are merged
properly.
- TXML Filter:
- Fixed issue #501, where segment elements commented out were
deleted from the output file.
- XLIFF Filter:
- Fixed issue #500, where
alt-trans proposals with a
match-quality score in decimal form ("100.00") were
treated as having a score of 0.
- Added support to change sdlxliff original attribute values based
on okf_xliff-sdl filter configuration. conf and locked attributes
are also supported.
- Libraries:
- XLIFFWriter:
- Added support for
state-qualifier output in main <target>.
- Pensieve:
- IMPORTANT:
Code.codesToString() changes.
The pensieve TM format has changed and is not backwards
compatible. You will need to export your TM's and re-import them
with M29.
- Steps:
- Added character count steps:
- The Character Count step calculates character counts per the
GMX-V 2.0 standard and stores them in a Metrics annotation (like
the Word Count step). There are also steps for counting all GMX
non-translatable categories (ProtectedCharacterCount, etc.) and
Okapi categories (Condordance, FuzzyMatch, MT, etc.).
- GMX "-Only" word count steps:
- The AlphanumericOnly, NumericOnly, and MeasurementOnly word
count steps now follow the GMX standard in that they only give
non-zero counts for TUs that consist solely of tokens of the
relevant type. (Previously they merely counted relevant tokens.)
- Translation Comparison step:
- Added an option to use the target of the alt-trans element for a
given origin value when processing an XLIFF file as second file.
This allows to compare an MT candidate placed as alt-trans entry
with the actual translation in the main target element.
- Scoping Report step:
- The Scoping Report step now can report character counts when the
relevant annotations are present. Use both the Word Count and
Character Count steps to get full detail. The default template has
been updated to include character counts for the included
categories.
- Post-segmentation Inline Codes Removal Step
- Added step that attempts to simplify (trim and merge) as many
inline codes as possible by looking at each linguistically
distinct segment in a TextUnit.
- Connectors:
- KantanMT Support
- Added a new connector to support KantanMT.
- Microsoft Translation Hub
- Fixed an issue when working with trained engines with certain
target languages..
Changes from M27 to M28
- Tikal:
- Fixed issue #444: Now the -imp command can use the -approved
option to import only approved entries.
- Rainbow:
- Fixed issue #464: Exported batch configurations should now
include default filter mappings for all known extensions.
- Files with the .sdlxliff extension will now use okf_xliff-sdl
filter configuration by default.
- Fixed help page for command-line -?
- Library:
- Fixed issue #442 where the blacklist for the term checker was
not working for Japanese.
- Fixed issue #439: Quality issues based on patterns containing
newline characters did not display correctly in CheckMate.
- Fixed issue in the verification library where the check for
leading and trailing whitespace did not take into account empty
string, causing an index out of range error.
- Fixed issue #456: SRX issue with empty "beforebreak" in rule.
- Changed SWT libraries to version 4.4
- Filters:
- OpenXML Filter:
- Fixed issue #165: Strings in Excel files are now extracted in
a more logical order (sheet-by-sheet, one row at a time, ordered
by columns). Additionally, strings that appear multiple times in
are now exposed for translation once for each occurrence, and
may be translated independently. Warning: these changes
may create problems merging translated content from Excel files
that were processed with previous versions of Okapi.
- Fixed issue #338: the "Exclude color" configuration option is
now correctly applied to Excel files. This option works for cell
background colors only. The colors available in the Rainbow
filter configuration UI are aligned against the Excel 2011
"standard colors"; additional RGB values may be excluded by
editing your filter configuration file directly.
- Fixed issue #390: the "Exclude column" configuration option is
now correctly applied to Excel files.
- Fixed issue #440: markers for spelling and grammar errors are
now stripped when exposing text for translation.
- Fixed issue #441: added new options to expose line breaks and
tabs for translation as literal characters.
- Fixed issue #443: added the "Exclude Graphical Metadata"
option to prevent metadata associated from graphics and
textboxes from being exposed for translation.
- Fixed issue #447: the "Translate Document Properties" option
will now work for PowerPoint files.
- Fixed issue #448: the "Translate Comments" option will now
work for PowerPoint files.
- Fixed issue #449: the "Translate Slide Masters" option will
now also expose master layout content that is used by slides in
the document.
- Fixed issue #451: the "Translate Document Properties" option
will now work for Excel files.
- Fixed issue #452: Word files containing nested graphicData
sections are no longer corrupted during processing.
- Fixed issue #453: Word files that do not contain a word/styles.xml
part can now be processed.
- Fixed issue #454: entities occurring in alternateContent
sections of Word documents are now handled correctly.
- Fixed issue #457: empty lines in Word files are sometimes
stripped.
- Fixed issue #458: target text is lost in complicated run
structures
- Fixed issue #467: tabs in Word files are sometimes stripped.
- Fixed issue #473: deletion change tracking can cause target
corruption.
- Fixed issue #474: files containing formulas could become
corrupted.
- Fixed issue #476: insertion change tracking can cause target
corruption.
- Fixed issue #482: <w:lang> tags are now
stripped during extraction.
- Fixed issue #484: Added a "Clean Tags Aggressively" option to
the filter. When this option is enabled, the filter will strip
certain types of formatting markup (whitespace and vertical
alignment adjustment) that is spuriously inserted when
converting other formats (such as PDF) to OpenXML. This produces
smoother segmentation in some cases.
- Fixed issue #485: Strip machine-generated "_GoBack" bookmarks
from Word files.
- ICML Filter:
- Added the ICML Filter for WCML files.
- HTML Filter:
- Added support for ASPX comments and fixed tag-like attribute
values extra output.
- PO Filter:
- A Plural-Forms: header that declares nplurals=1
is now handled correclty.
- Table Filter:
- Blank lines inside qualified CSV cells are now preserved.
- CSV text qualifiers can now be optionally added on output when
required to maintain well-formedness.
- Step:
- Rainbow Translation Kit Creation:
- Improved support for XLIFF 2 packages
- Rainbow Translation Kit Merging:
- Improved support for XLIFF 2 packages
- Text Rewriting step:
- Fixed the case where the target had only inline codes and the
source text and inline codes. Now the base text is taken from
the source.
- Now expansion is done before the last inline code.
Changes from M26 to M27
- General
- Fix resource and memory leaks
- Filters:
- XLIFF Filter:
- Added warning when inline code (other than
mrk)
has no id attribute.
- Fixed location of
<phase-group> when
re-writing.
- Fixed case where XML declaration was not followed by a
line-break on output.
- Added fallback check for TMX values for the
<it>
pos attribute (error/warning still generated as
using TMX values in XLIFF is not valid).
- Added better support for SDLXLIFF
- Optional parameters for writing out tool element in xliff
header
- Fixed issue #430 where the ITS namespace declaration and
version was not added when needed.
- HTML Filter:
- Added the
placeholder attribute to list of
translatable attributes in default HTML configuration (for
HTML5)
- Fix lower casing of start tags during pre-processing cleanup
- Upgrade to Jericho 3.4-dev
- Steps:
- Rainbow Translation Kit Creation Step:
- Updated XLIFF2 library to 1.0 release.
- Implemented v2 support for the Transifex packages.
- Rainbow Translation Kit Post-Processing Step:
- Implemented v2 support for the Transifex packages.
- Connectors:
- MyMemory Connector:
- Fixed the issue of the return match value being sometimes a
Double and sometimes a Long.
- Make Connectors more error tolerant. Continue processing if
there is an exception on a single text Unit
- Library:
- XLIFF Writer:
- Added support to output the
coord attribute (COORDINATES
property on the text container).
- Transifex library:
- Fixed issue #427 where the API v2 was not supported.
- Segmentation library:
- Fixed issue #426 where the part of the text matched by the
previous rule was not scanned for match in the next rule.
- Fixed issue #489: Added the okp:treatIsolatedCodesAsWhitespace
option to allow the segmenter to treat each isolated code as a
single whitespace character when applying segmentation rules.
- Verification library:
- Fixed issue #418: the description of the rule is now displayed
for target-driven error.
- Improved reading of LQI entries: the ITS type is preserved
when reading the okp:lqiType value.
- Fixed issue #442: Allow flagging blacklist terms in
substrings.
- Fixed issue #400: Allow flagging blacklist terms in source.
- Parameters editor for Verification library:
- Fixed issue #417: The description of each pattern is now
preserved when re-ordering the patterns.
- Fixed issue #442: Add option to allow flagging blacklist terms
in substrings.
- Fixed issue #400: Add option to allow flagging blacklist terms
in source.
Changes from M25 to M26
- Rainbow:
- Added
.mqxliff extension in the list of extensions
associated with the XLIFF Filter.
- Tikal:
- Fixed issue preventing custom filter configurations to work as
sub-filters.
- Added the ability to output scoping reports.
- Filters:
- Fixed Issue 409: Inconsistent handling of
<bx
pos="begin"/> in extraction to Moses inline format.
- XLIFF Filter:
- Added support for
<sub> elements (plain
text, with nested codes, or with nested codes with nested
sub-flows).
- XMLStream Filter:
- Added the .ditamap extension to the list of extensions for the
DITA pre-defined configuration.
- Steps:
- LanguageTool Step:
- Resolved issue #416 (added suggestion to annotation).
- Filters Plugin for OmegaT:
- Libraries:
- Fixed the issue where sub-filter start and end events where not
handled properly for outputting RTF layered files.
- Verification library:
- Improved mapping of LanguageTools ITS type to issue
annotation.
- Updated the XLIFF2 library to use 0.22-snapshot.
- Discontinued the MacOS 32-bit distribution (no Java 7 support)
- TMXWriter: fixed bug where some property entries were written
before the
<seg> element,
- Improved stream-only pipeline capabilities.
Changes from M24 to M25
- Tikal:
- Fixed default parameter for the default TM resource. Now you can
just run
tikal -q "text to search".
- Added
-x2 and -m2 options for
extraction/merge with new skeleton file.
- Changed -x and -m to extract/merge with the new skeleton file.
- Added
-x1 and -m1 options for
extraction/merge with original document (similar to previous
versions. For fully backward-compatible merge you must use M24).
- Updated Tikal merge function with original file to use the new
common text-unit merger.
- Added options for JAR version switch
- Filters Plugin for OmegaT:
- Added basic support for XLIFF 2 documents (under construction).
- Now target text passed as translation only if it is different
from the source.
- Added support for alternate translations (e.g. from XLIFF 1.2
documents)
- Steps:
- Added the TTXSplitter Step. It allows to split a given TTX
document into several ones with the same source word count.
- Added the TTXJoiner Step. It allows to join back TTX documents
created with the TTXSplitter Step.
- Consolidated merge steps into: SkeletonXliffMergerStep,
LegacyXliffMergerStep, OriginalDocumentXliffMergerStep and
CombinedXliffMergerStep classes.
- Rainbow Translation Kit Creation Step:
- Reinstated output for XLIFF v2 using the latest library.
- Added support for extraction using the new skeleton file.
- Rainbow Translation Kit Merging Step:
- Added basic support for merging XLIFF v2 packages.
- Added support for merging using the new skeleton file.
- Search and Replace Step:
- Fixed issue #392 where the reading of the replacement table
was trimming all lines. Now replacing or searching for space and
replacing by nothing works.
- Filters:
- Changed the reading of
<alt-trans>
elements to allow entries with empty <target>
(e.g. some XTM's XLIFF have <alt-trans> with
empty targets).
- Added the option "Allow modification of existing
<alt-trans> elements"
- YAML Filter:
- OpenXML Filter:
- Fixed issue #402 (Cannot stop the filter before the document
is done)
- Fixed issue #350 (merge problem when docx has a
OpenXml.Drawing object)
- PO Filter:
- Fixed issue with
#, (fuzzy flag) in front of #~
(obsolete) entries.
- JSON Filter:
- Refactored the filter.
- Fixed issue #359 (Need to improve extraction selection)
- Fixed issue #373 (Encoder and xml:space='preserve')
- Fixed issue #397 (Filter not extracting all strings as
expected)
- Connectors:
- Translate Toolkit TM Connector:
- Updated the parameters API to use
set/getUrl()
instead of set/getHost() and set/getPort().
- Updated the default host and port (now obsolete) to
localhost
and 8080 to allow local setups to continue to
work.
- Updated the default URL to
https://amagama-live.translatehouse.org/api/v1/
(the previous URL is obsolete)
- Libraries:
- Major refactoring of the serialization.
- Major refactoring of the RawDocument object
- Updated SWT libraries to 4.3
- Added lib-tkit library for extraction/merge with skeleton in
JSON.
- Added sort capability to the Filter Configuration common edit
dialog.
Changes from M23 to M24
- Tikal:
- Changed default resource for
-q command from
OpenTran to Translate Toolkit
- Rainbow:
- Made usability improvements to the Testing Console for rapid
iteration when creating custom filter configurations.
- Steps:
- Added the Copy
Or Move Step: Copies or moves files to a specified location
with the option to overwrite or backup existing files or skip
copying files if there is an existing file.
- Rainbow
Translation Kit Creation Step:
- Removed the experimental output to XLIFF 2.0 (too outdated to
be useful) See the Okapi
XLIFF Toolkit project for more up-to-date support for
XLIFF 2.0.
- Added option to specify the post-processing hook for OmegaT
tkits.
- Format
Conversion Step:
- Added the Word Table output format.
- Filters:
- IDML Filter:
- Changed the default spread size threshold to 2000 Kb and
updated the warning/error to show the spread size.
- XML Filter:
- Android Strings pre-defined settings: Exposed content of
<item> elements for translation (used in <plurals>,
<string-array> elements).
- HTML Filter:
- Added option to treat CDATA as an inline element.
- Content of excluded inline elements is exposed for, e.g.,
inclusion in XLIFF equiv-text attributes.
- Fixed issue #336: The filter will no longer produce
translatable segments consisting only of tags.
- XML Stream Filter:
- Fixed issue #336: The filter will no longer produce
translatable segments consisting only of tags.
- OpenXML Filter:
- Fixed issue #351: Improve filter performance.
- JSON Filter:
- Fixed issue #377: Support for subfiltering in JSON.
- Fixed issue #373: JSONFilter should use the JSONEncoder.
- Note: Changes in escaping/unescaping behavior in this filter
break compatibility with files extracted by previous versions.
- Filters Plugin for OmegaT:
- Added capability to specify a custom filter parameters file for
each Okapi filter in the plugin. This closes issue #376.
- Connectors:
- Added the Bilingual
File Connector: Directly query a bilingual file format such
as TMX, PO, etc., without importing to a TM first.
- Library:
- Important: Changed minimum requirement
from Java 1.6 to Java 1.7.
- Fixed ITS content writer to output
locQualityIssueProfileRef
and not locQualityIssueProfile.
- Improved report output of quality checker.
- Updated and cleaned up the build files.
- Added
HUMAN_RECOMMENDED type to the MatchType
list.
- Modified the base implementation for
IParameters,
this may result in compilation errors in your code if you access
directly some variables: you should now use the corresponding
getter and setter methods.
- Added support for tuv-level attributes that were missing in
TMXWriter.
Changes from M22 to M23
- Rainbow:
- Added the Inconsistency Check Step to the pre-defined Quality
Check pipeline.
- CheckMate:
- Fixed issue #358: The Check Document button now works in all
cases.
- Filters Plugin for OmegaT
- Added .mxliff as one of the default extensions for XLIFF.
- Fixed issue #364: .sdlxliff files with UTF-8 BOM open now.
- Steps:
- Added the Inconsistency Check Step: a way to flag entries with
the same source that have different targets or the entries with
the same target that have different source.
- Rainbow Translation Kit Creation Step:
- Added a
libVersion attribute in the manifest
indicating the version of the library used to create the
manifest.
- Add option to use encapsulation notation (
<bpt>/<ept>/<ph>/<it>)
for inline codes in OmegaT tkits.
- LanguageTool Step:
- Updated the library to version 2.2.
- Encoding Conversion Step:
- Fixed issue #318: ASCII characters in NCR form are now
un-escaped except for ", ', &, < and >.
- Search and Replace Step:
- Fixed issue #183: Added simple log of the replacements.
- Fixed issue #362: Step for Terminology fixes on translation
candidates.
- Quality Check Step:
- Resolved issue #357: Added function to detect blacklisted
terms.
- Improved ITS LQI support.
- Filters:
- IDML Filter:
- Implemented issue #356: By default spread above the threshold
cause an error. The option allows to skip without error.
- XML Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #361: MIME type can be different in sub-classes of
XMLFilter.
- HTML5-ITS Filter:
- Continued implementation of ITS 2.0.
- JSON Filter:
- Resolved issue #360: The use of the key for the resname value
is now optional.
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #364: Woodstox XML parser is now always used.
- Alt-trans with empty target are now skipped.
- Added support for the
<tool> and <phase>
elements as well as the state-qualifier attribute.
- OpenXML Filter:
- Fixed issue #291: Sub-documents are now processed in correct
order.
- Fixed issue #319: 'squishable' tests has been changed.
- Simplification Filter:
- Fixed issue #355: parameters of sub-filter are properly read
in the cases where the primary filter uses a sub-filter.
- XINI Filter:
- Fixed a case where placeholders were being renumbered
incorrectly when reading a XINI file.
- Library:
- Verification Library:
- Fixed issue with non-initialized start/end variable when
checking patterns from the target.
- Added support for sub-document in Quality Checker library.
- Continued implementation of ITS 2.0 in XLIFFWriter,
XLIFFContent, etc.
- Fixed issue #352: XMLWriter now throw OkapiIOException if an
error occurs.
- Updated XLIFF Writer to match ITS/XLIFF official mapping (http://www.w3.org/International/its/wiki/XLIFF_1.2_Mapping).
- Added the experimental lib-concurrent package to improve
multi-threaded pipelines. See ThrededWorkQueue
Step page for details.
Changes from M21 to M22
- Tikal:
- Made it possible to run tikal.sh from another directory on Mac
OS X.
- Updated the way the application root folder was computed to
allow call from Network share.
- Rainbow:
- Fixed
-log option to allow it anywhere in the
command-line.
- Filters:
- Table Filter:
- Fixed issue #300 (enhancement): Added a new Table Filter
for 2-column (source + target), tab separated files.
- OpenXML Filter:
- Fixed issue #166: Text from mc:Fallback and mc:Choice
Requites="wps", WordArt, TextArt, and Watermarks is handled
properly now.
- Fixed issue #169: Segmentation around inline codes seems
to work properly.
- Fixed issue #286: PPTX smart-tags are now imported.
- Fixed issue #323: Files are not corrupted anymore when
using text areas.
- Fixed issue #324: Nested <w:p> merge properly now.
- Fixed issue #325: The slides of PPTX documents are now
extracted in order.
- Fixed issue #329: Text from PPTX diagrams are now
extracted.
- Fixed issue #351: Creation of XLIFF work on documents with
SmartArt graphics.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #332: When using the global_cdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Fixed issue #339: The filter was not grouping the tags
back properly when merging back.
- Added handling of variable placeholders for the
pre-defined settings for RESX files.
- ITS Filters (XML Filter and HTML5+ITS Filter):
- TMX Filter:
- Fixed the issue where <it> codes where mapped to
placeholder rather than opening/closing internal codes.
- XLIFF Filter:
- Continued implementation of ITS 2.0: Improved support for
LQI, added support for Provenance.
- Simplification Filter:
- General:
- Filters that update language properties (like xml:lang)
during merging will now be region-insensitive when doing so.
- Steps:
- Term Extraction Step:
- Added support for Text Analysis annotations.
- Made the three extraction methods options, and attached
the relevant options to the statistical method.
- Full-Width Conversion Step:
- Added log message if at least one character was modified
(per input file). This resolves issue #327.
- Enrycher Step:
- Improved hanlding of nested annotations.
- Batch TM Leveraging Step:
- Fixed issue #331: Entries with no text are now not sent
for translation.
- Format Conversion Step:
- Fixed the issue where the "Output generic inline codes" was
not recognized for the Tab-delimited table output.
- MS Batch Translation Step:
- MT candidate with a very low score (e.g. from error) are
not output in the TMX.
- Space Checker Step:
- Improved reporting of errors and changes.
- Fixed issue #346: Iterating through text fragments ran out
of bounds. Indexing error was fixed.
- Fixed issue #348: inline code index marker broken as a
result of spacing changes. Index marker error was fixed.
- Translation Comparison Step:
- Consolidated Paragraph Alignment and Sentence Alignment steps
- Connectors:
- Microsoft MT Connector:
- Improved error handling (e.g. problem with inline codes in
result).
- Filters Plugin for OmegaT:
- Added *.xliff and *.sdlxliff as default extensions.
- Changed default for isFileSupported() to return true (this
allows user-defined extensions).
- Libraries:
Changes from M20 to M21
- All applications:
- Applications now launch correctly on Mac OS X when they are
located in a path containing a space.
- Rainbow:
- Added the
-log option to specify result log
file. By default the log file is {user.home}\rainbowBatchLog.txt
- Tikal:
- Added the
-safe option to prompt user when
overriding a directory when extracting.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Added support for
okp:engine attribute in <alttrans>.
- Wiki Filter:
- Fixed issue #315: WikiFilter didn't work with
preserve_whitespace: true.
- Regex Filter:
- Improved the macStrings default settings to include
slash+star comments with next extracted string.
- IDML Filter:
- Fixed issue #316: Added default to not extract hidden
layers and added the option "Extract hidden layers".
- Enabled the option "Create new paragraphs on hard
returns".
Important: This option is still BETA and may prevent you
to merge back the extracted file. Make sure to test the
round-trip before using this option for real projects.
- TMX Filter:
- Fixed a bug where attribute values on <tuv> elements
were being written back to the skeleton without proper
escaping.
- Improved filter performance.
- Properties Filter:
- Fixed issue #313 where the extended characters were not
escaped when using the sub-filter.
- TS Filter:
- Changed the instantiation of the XML parser to use
Woodstox.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #303: When using the global_pcdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Steps:
- Microsoft Batch Translation Step:
- Added support for the
${domain} variable for
the category.
- Added support for
${rootDir} and other
variables for the path of the Engine mapping file.
- Quality Check Step:
- Added option to save or not the session. This option is
not accessible when editing the parameters from CheckMate,
but only when editing a step's parameters.
- Rainbow Translation Kit Creation Step:
- Added to XLIFF outputs the option of outputting
ctype
and equiv-text attributes in inline codes.
- Added to OmegaT output the output of
ctype
and equiv-text attributes in inline codes.
- Added the option to merge a new OmegaT translation kit
with an existing one, rather than overwriting it.
- Enrycher Step:
- Added support for segmented text units and implemented
handling of inline codes.
- Added parameter for number of events to process on each
call to the service.
- Translation Comparison Step:
- Added option to log the average scores per documents in a
tab-delimited file.
- Added the output of a new tab-delimited file with all
scores, along with the HTML report.
- Extended the repartition table to use 11 brackets instead
of 3, and include the two scores.
- Segmentation and Desegmentation Steps:
- Added the option to renumber code IDs after segmentation
so that they are 1-indexed as much as possible. A
corresponding option on the desegmentation step reverses the
process. This option will not work correctly with formats
that use non-consecutive or non-numeric code IDs, such as
XLIFF.
- Connectors:
- Microsoft Translator Connector:
- Added information about the engine in the query results.
- Libraries:
- Continued implementation of ITS 2.0 in XLIFF Writer.
- Changed options settings for the XLIFFWriter class to use an
object rather than multiple setters.
- Filters Plugin for OmegaT:
- Fixed issue #322: Updated the TS filter to use the Woodstox
parser, and added the dependencies.
- Added the XLIFF Filter to the plug-in.
- Added basic support for some ITS data categories in the
Comments pane (Text Analysis, Terminology).
Changes from M19 to M20
- Rainbow:
- Improved the logging output and UI responsivness during
lengthy processes.
- Updated the user's preference dialog to allow the selection of
the log levels as defined by SLF4J (Normal, Debug, Trace)
- Rainbow's input root directory now supports expansion of
system environment variables.
- Tikal:
- Fixed -lfc command output.
- Use the -continue option to specify that batch operations
should continue processing even if one or more files in the
batch fail to process.
- Summary information will be included at the end of batch
commands.
- Timing information is included for each file processed, and
total elapsed time is included in the batch summary.
- Added the -pd option to specify a directory to search for
custom filter configurations.
- Fixed a crash when merging (-m) a file with no extension.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- Continued the implementation of ITS 2.0
- Fixed issue where the HTML-type special characters were
not escaped when converted to inline codes by the code
finder.
- Fixed issue #311 where preserve space property was not
applied to attributes.
- TXML Filter:
- Fixed issue #266: Translations in the
<revisions>
elements are now ignored.
- XLIFF Filter:
- Improved support for
<mrk> elements.
- Added support for several ITS features.
- Connectors:
- Microsoft Translator Connector:
- Fixed the internal conversion of the language code.
- Libraries:
- Fixed issue #282 for the Abstract Markup Filter.
- Fixed issues in reporting libraries.
- Added support for including ITS annotations in the XLIFF 1.2
writer.
- Steps:
- Rainbow Translation Kit Creation:
- Added the "Include post-processing hook" option for OmegaT
packages. this allows OmegaT to merge back the documents
automatically.
- Remove Target:
- Fixed issue #270 where the step could not be run without
some of the optional parameters set.
- Microsoft Batch Translation:
- Added an option to send the generated TMX document as a
raw document for the input of the next step.
- Added a option to point to a .properties file containing a
mapping of keys to categories for more convenient lookup.
See wiki.
- Quality Check Step:
- Fixed issue #304 where the default check on parentheses
didn't include full-width characters.
- Inline Code Removal Step:
- Added option to replace line break related codes with
spaces. By default codes are simply removed.
- Added the Space Check Step. It allows to fix automatically
spaces around inline codes of the target based on the source.
- Added the Cleanup Step. It allows to normalize quotation
marks, punctuations, remove suspect entries, etc. this can be
used for example when preparing an aligned document for MT
training.