Okapi Framework Changes Log - Aug-31-2015
Note this document is common to both the okapi-lib distribution
and the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from M27 to M28
- Tikal:
- Fixed issue #444: Now the -imp command can use the -approved
option to import only approved entries.
- Rainbow:
- Fixed issue #464: Exported batch configurations should now include
default filter mappings for all known extensions.
- Files with the .sdlxliff extension will now use okf_xliff-sdl
filter configuration by default.
- Fixed help page for command-line -?
- Library:
- Fixed issue #442 where the blacklist for the term checker was not
working for Japanese.
- Fixed issue #439: Quality issues based on patterns containing
newline characters did not display correctly in CheckMate.
- Fixed issue in the verification library where the check for
leading and trailing whitespace did not take into account empty
string, causing an index out of range error.
- Fixed issue #456: SRX issue with empty "beforebreak" in rule.
- Changed SWT libraries to version 4.4
- Filters:
- OpenXML Filter:
- Fixed issue #165: Strings in Excel files are now extracted in a
more logical order (sheet-by-sheet, one row at a time, ordered by
columns). Additionally, strings that appear multiple times in are
now exposed for translation once for each occurrence, and may be
translated independently. Warning: these changes may
create problems merging translated content from Excel files that
were processed with previous versions of Okapi.
- Fixed issue #338: the "Exclude color" configuration option is
now correctly applied to Excel files. This option works for
cell background colors only. The colors available in the Rainbow
filter configuration UI are aligned against the Excel 2011
"standard colors"; additional RGB values may be excluded by
editing your filter configuration file directly.
- Fixed issue #390: the "Exclude column" configuration option is
now correctly applied to Excel files.
- Fixed issue #440: markers for spelling and grammar errors are
now stripped when exposing text for translation.
- Fixed issue #441: added new options to expose line breaks and
tabs for translation as literal characters.
- Fixed issue #443: added the "Exclude Graphical Metadata" option
to prevent metadata associated from graphics and textboxes from
being exposed for translation.
- Fixed issue #447: the "Translate Document Properties" option
will now work for PowerPoint files.
- Fixed issue #448: the "Translate Comments" option will now work
for PowerPoint files.
- Fixed issue #449: the "Translate Slide Masters" option will now also expose master layout content that is used by slides in the document.
- Fixed issue #451: the "Translate Document Properties" option
will now work for Excel files.
- Fixed issue #452: Word files containing nested graphicData
sections are no longer corrupted during processing.
- Fixed issue #453: Word files that do not contain a word/styles.xml
part can now be processed.
- Fixed issue #454: entities occurring in alternateContent
sections of Word documents are now handled correctly.
- Fixed issue #457: empty lines in Word files are sometimes
stripped.
- Fixed issue #458: target text is lost in complicated run
structures
- Fixed issue #467: tabs in Word files are sometimes stripped.
- Fixed issue #473: deletion change tracking can cause target
corruption.
- Fixed issue #474: files containing formulas could become
corrupted.
- Fixed issue #476: insertion change tracking can cause target
corruption.
- Fixed issue #482: <w:lang> tags are now stripped
during extraction.
- Fixed issue #484: Added a "Clean Tags Aggressively" option to
the filter. When this option is enabled, the filter will strip
certain types of formatting markup (whitespace and vertical
alignment adjustment) that is spuriously inserted when converting
other formats (such as PDF) to OpenXML. This produces smoother
segmentation in some cases.
- Fixed issue #485: Strip machine-generated "_GoBack" bookmarks
from Word files.
- ICML Filter:
- Added the ICML Filter for WCML files.
- HTML Filter:
- Added support for ASPX comments and fixed tag-like attribute
values extra output.
- PO Filter:
- A Plural-Forms: header that declares
nplurals=1 is now handled correclty.
- Table Filter:
- Blank lines inside qualified CSV cells are now preserved.
- CSV text qualifiers can now be optionally added on output when
required to maintain well-formedness.
- Step:
- Rainbow Translation Kit Creation:
- Improved support for XLIFF 2 packages
- Rainbow Translation Kit Merging:
- Improved support for XLIFF 2 packages
- Text Rewriting step:
- Fixed the case where the target had only inline codes and the
source text and inline codes. Now the base text is taken from the
source.
- Now expansion is done before the last inline code.
Changes from M26 to M27
- General
- Fix resource and memory leaks
- Filters:
- XLIFF Filter:
- Added warning when inline code (other than
mrk)
has no id attribute.
- Fixed location of
<phase-group> when
re-writing.
- Fixed case where XML declaration was not followed by a
line-break on output.
- Added fallback check for TMX values for the
<it>
pos attribute (error/warning still generated as using
TMX values in XLIFF is not valid).
- Added better support for SDLXLIFF
- Optional parameters for writing out tool element in xliff header
- Fixed issue #430 where the ITS namespace declaration and version
was not added when needed.
- HTML Filter:
- Added the
placeholder attribute to list of
translatable attributes in default HTML configuration (for HTML5)
- Fix lower casing of start tags during pre-processing cleanup
- Upgrade to Jericho 3.4-dev
- Steps:
- Rainbow Translation Kit Creation Step:
- Updated XLIFF2 library to 1.0 release.
- Implemented v2 support for the Transifex packages.
- Rainbow Translation Kit Post-Processing Step:
- Implemented v2 support for the Transifex packages.
- Connectors:
- MyMemory Connector:
- Fixed the issue of the return match value being sometimes a
Double and sometimes a Long.
- Make Connectors more error tolerant. Continue processing if
there is an exception on a single text Unit
- Library:
- XLIFF Writer:
- Added support to output the
coord attribute (COORDINATES
property on the text container).
- Transifex library:
- Fixed issue #427 where the API v2 was not supported.
- Segmentation library:
- Fixed issue #426 where the part of the text matched by the
previous rule was not scanned for match in the next rule.
- Fixed issue #489: Added the okp:treatIsolatedCodesAsWhitespace
option to allow the segmenter to treat each isolated code as a
single whitespace character when applying segmentation rules.
- Verification library:
- Fixed issue #418: the description of the rule is now displayed
for target-driven error.
- Improved reading of LQI entries: the ITS type is preserved when
reading the okp:lqiType value.
- Fixed issue #442: Allow flagging blacklist terms in substrings.
- Fixed issue #400: Allow flagging blacklist terms in source.
- Parameters editor for Verification library:
- Fixed issue #417: The description of each pattern is now
preserved when re-ordering the patterns.
- Fixed issue #442: Add option to allow flagging blacklist terms
in substrings.
- Fixed issue #400: Add option to allow flagging blacklist terms
in source.
Changes from M25 to M26
- Rainbow:
- Added
.mqxliff extension in the list of extensions
associated with the XLIFF Filter.
- Tikal:
- Fixed issue preventing custom filter configurations to work as
sub-filters.
- Added the ability to output scoping reports.
- Filters:
- Fixed Issue 409: Inconsistent handling of
<bx
pos="begin"/> in extraction to Moses inline format.
- XLIFF Filter:
- Added support for
<sub> elements (plain
text, with nested codes, or with nested codes with nested
sub-flows).
- XMLStream Filter:
- Added the .ditamap extension to the list of extensions for the
DITA pre-defined configuration.
- Steps:
- LanguageTool Step:
- Resolved issue #416 (added suggestion to annotation).
- Filters Plugin for OmegaT:
- Libraries:
- Fixed the issue where sub-filter start and end events where not
handled properly for outputting RTF layered files.
- Verification library:
- Improved mapping of LanguageTools ITS type to issue annotation.
- Updated the XLIFF2 library to use 0.22-snapshot.
- Discontinued the MacOS 32-bit distribution (no Java 7 support)
- TMXWriter: fixed bug where some property entries were written
before the
<seg> element,
- Improved stream-only pipeline capabilities.
Changes from M24 to M25
- Tikal:
- Fixed default parameter for the default TM resource. Now you can
just run
tikal -q "text to search".
- Added
-x2 and -m2 options for
extraction/merge with new skeleton file.
- Changed -x and -m to extract/merge with the new skeleton file.
- Added
-x1 and -m1 options for
extraction/merge with original document (similar to previous
versions. For fully backward-compatible merge you must use M24).
- Updated Tikal merge function with original file to use the new
common text-unit merger.
- Added options for JAR version switch
- Filters Plugin for OmegaT:
- Added basic support for XLIFF 2 documents (under construction).
- Now target text passed as translation only if it is different from
the source.
- Added support for alternate translations (e.g. from XLIFF 1.2
documents)
- Steps:
- Added the TTXSplitter Step. It allows to split a given TTX
document into several ones with the same source word count.
- Added the TTXJoiner Step. It allows to join back TTX documents
created with the TTXSplitter Step.
- Consolidated merge steps into: SkeletonXliffMergerStep,
LegacyXliffMergerStep, OriginalDocumentXliffMergerStep and
CombinedXliffMergerStep classes.
- Rainbow Translation Kit Creation Step:
- Reinstated output for XLIFF v2 using the latest library.
- Added support for extraction using the new skeleton file.
- Rainbow Translation Kit Merging Step:
- Added basic support for merging XLIFF v2 packages.
- Added support for merging using the new skeleton file.
- Search and Replace Step:
- Fixed issue #392 where the reading of the replacement table was
trimming all lines. Now replacing or searching for space and
replacing by nothing works.
- Filters:
- Changed the reading of
<alt-trans> elements
to allow entries with empty <target> (e.g.
some XTM's XLIFF have <alt-trans> with empty
targets).
- Added the option "Allow modification of existing
<alt-trans> elements"
- YAML Filter:
- OpenXML Filter:
- Fixed issue #402 (Cannot stop the filter before the document is
done)
- Fixed issue #350 (merge problem when docx has a OpenXml.Drawing
object)
- PO Filter:
- Fixed issue with
#, (fuzzy flag) in front of #~
(obsolete) entries.
- JSON Filter:
- Refactored the filter.
- Fixed issue #359 (Need to improve extraction selection)
- Fixed issue #373 (Encoder and xml:space='preserve')
- Fixed issue #397 (Filter not extracting all strings as expected)
- Connectors:
- Translate Toolkit TM Connector:
- Updated the parameters API to use
set/getUrl()
instead of set/getHost() and set/getPort().
- Updated the default host and port (now obsolete) to
localhost
and 8080 to allow local setups to continue to work.
- Updated the default URL to
https://amagama-live.translatehouse.org/api/v1/
(the previous URL is obsolete)
- Libraries:
- Major refactoring of the serialization.
- Major refactoring of the RawDocument object
- Updated SWT libraries to 4.3
- Added lib-tkit library for extraction/merge with skeleton in JSON.
- Added sort capability to the Filter Configuration common edit
dialog.
Changes from M23 to M24
- Tikal:
- Changed default resource for
-q command from
OpenTran to Translate Toolkit
- Rainbow:
- Made usability improvements to the Testing Console for rapid
iteration when creating custom filter configurations.
- Steps:
- Added the Copy
Or Move Step: Copies or moves files to a specified location
with the option to overwrite or backup existing files or skip
copying files if there is an existing file.
- Rainbow
Translation Kit Creation Step:
- Removed the experimental output to XLIFF 2.0 (too outdated to be
useful) See the Okapi
XLIFF Toolkit project for more up-to-date support for XLIFF
2.0.
- Added option to specify the post-processing hook for OmegaT
tkits.
- Format
Conversion Step:
- Added the Word Table output format.
- Filters:
- IDML Filter:
- Changed the default spread size threshold to 2000 Kb and updated
the warning/error to show the spread size.
- XML Filter:
- Android Strings pre-defined settings: Exposed content of
<item> elements for translation (used in <plurals>,
<string-array> elements).
- HTML Filter:
- Added option to treat CDATA as an inline element.
- Content of excluded inline elements is exposed for, e.g.,
inclusion in XLIFF equiv-text attributes.
- Fixed issue #336: The filter will no longer produce translatable
segments consisting only of tags.
- XML Stream Filter:
- Fixed issue #336: The filter will no longer produce translatable
segments consisting only of tags.
- OpenXML Filter:
- Fixed issue #351: Improve filter performance.
- JSON Filter:
- Fixed issue #377: Support for subfiltering in JSON.
- Fixed issue #373: JSONFilter should use the JSONEncoder.
- Note: Changes in escaping/unescaping behavior in this filter
break compatibility with files extracted by previous versions.
- Filters Plugin for OmegaT:
- Added capability to specify a custom filter parameters file for
each Okapi filter in the plugin. This closes issue #376.
- Connectors:
- Added the Bilingual
File Connector: Directly query a bilingual file format such as
TMX, PO, etc., without importing to a TM first.
- Library:
- Important: Changed minimum requirement
from Java 1.6 to Java 1.7.
- Fixed ITS content writer to output
locQualityIssueProfileRef
and not locQualityIssueProfile.
- Improved report output of quality checker.
- Updated and cleaned up the build files.
- Added
HUMAN_RECOMMENDED type to the MatchType
list.
- Modified the base implementation for
IParameters,
this may result in compilation errors in your code if you access
directly some variables: you should now use the corresponding getter
and setter methods.
- Added support for tuv-level attributes that were missing in
TMXWriter.
Changes from M22 to M23
- Rainbow:
- Added the Inconsistency Check Step to the pre-defined Quality
Check pipeline.
- CheckMate:
- Fixed issue #358: The Check Document button now works in all
cases.
- Filters Plugin for OmegaT
- Added .mxliff as one of the default extensions for XLIFF.
- Fixed issue #364: .sdlxliff files with UTF-8 BOM open now.
- Steps:
- Added the Inconsistency Check Step: a way to flag entries with the
same source that have different targets or the entries with the same
target that have different source.
- Rainbow Translation Kit Creation Step:
- Added a
libVersion attribute in the manifest
indicating the version of the library used to create the manifest.
- Add option to use encapsulation notation (
<bpt>/<ept>/<ph>/<it>)
for inline codes in OmegaT tkits.
- LanguageTool Step:
- Updated the library to version 2.2.
- Encoding Conversion Step:
- Fixed issue #318: ASCII characters in NCR form are now
un-escaped except for ", ', &, < and >.
- Search and Replace Step:
- Fixed issue #183: Added simple log of the replacements.
- Fixed issue #362: Step for Terminology fixes on translation
candidates.
- Quality Check Step:
- Resolved issue #357: Added function to detect blacklisted terms.
- Improved ITS LQI support.
- Filters:
- IDML Filter:
- Implemented issue #356: By default spread above the threshold
cause an error. The option allows to skip without error.
- XML Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #361: MIME type can be different in sub-classes of
XMLFilter.
- HTML5-ITS Filter:
- Continued implementation of ITS 2.0.
- JSON Filter:
- Resolved issue #360: The use of the key for the resname value is
now optional.
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #364: Woodstox XML parser is now always used.
- Alt-trans with empty target are now skipped.
- Added support for the
<tool> and <phase>
elements as well as the state-qualifier attribute.
- OpenXML Filter:
- Fixed issue #291: Sub-documents are now processed in correct
order.
- Fixed issue #319: 'squishable' tests has been changed.
- Simplification Filter:
- Fixed issue #355: parameters of sub-filter are properly read in
the cases where the primary filter uses a sub-filter.
- XINI Filter:
- Fixed a case where placeholders were being renumbered
incorrectly when reading a XINI file.
- Library:
- Verification Library:
- Fixed issue with non-initialized start/end variable when
checking patterns from the target.
- Added support for sub-document in Quality Checker library.
- Continued implementation of ITS 2.0 in XLIFFWriter, XLIFFContent,
etc.
- Fixed issue #352: XMLWriter now throw OkapiIOException if an error
occurs.
- Updated XLIFF Writer to match ITS/XLIFF official mapping (http://www.w3.org/International/its/wiki/XLIFF_1.2_Mapping).
- Added the experimental lib-concurrent package to improve
multi-threaded pipelines. See ThrededWorkQueue
Step page for details.
Changes from M21 to M22
- Tikal:
- Made it possible to run tikal.sh from another directory on Mac
OS X.
- Updated the way the application root folder was computed to
allow call from Network share.
- Rainbow:
- Fixed
-log option to allow it anywhere in the
command-line.
- Filters:
- Table Filter:
- Fixed issue #300 (enhancement): Added a new Table Filter for
2-column (source + target), tab separated files.
- OpenXML Filter:
- Fixed issue #166: Text from mc:Fallback and mc:Choice
Requites="wps", WordArt, TextArt, and Watermarks is handled
properly now.
- Fixed issue #169: Segmentation around inline codes seems to
work properly.
- Fixed issue #286: PPTX smart-tags are now imported.
- Fixed issue #323: Files are not corrupted anymore when using
text areas.
- Fixed issue #324: Nested <w:p> merge properly now.
- Fixed issue #325: The slides of PPTX documents are now
extracted in order.
- Fixed issue #329: Text from PPTX diagrams are now extracted.
- Fixed issue #351: Creation of XLIFF work on documents with
SmartArt graphics.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #332: When using the global_cdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Fixed issue #339: The filter was not grouping the tags back
properly when merging back.
- Added handling of variable placeholders for the pre-defined
settings for RESX files.
- ITS Filters (XML Filter and HTML5+ITS Filter):
- TMX Filter:
- Fixed the issue where <it> codes where mapped to
placeholder rather than opening/closing internal codes.
- XLIFF Filter:
- Continued implementation of ITS 2.0: Improved support for
LQI, added support for Provenance.
- Simplification Filter:
- General:
- Filters that update language properties (like xml:lang)
during merging will now be region-insensitive when doing so.
- Steps:
- Term Extraction Step:
- Added support for Text Analysis annotations.
- Made the three extraction methods options, and attached the
relevant options to the statistical method.
- Full-Width Conversion Step:
- Added log message if at least one character was modified
(per input file). This resolves issue #327.
- Enrycher Step:
- Improved hanlding of nested annotations.
- Batch TM Leveraging Step:
- Fixed issue #331: Entries with no text are now not sent for
translation.
- Format Conversion Step:
- Fixed the issue where the "Output generic inline codes" was
not recognized for the Tab-delimited table output.
- MS Batch Translation Step:
- MT candidate with a very low score (e.g. from error) are not
output in the TMX.
- Space Checker Step:
- Improved reporting of errors and changes.
- Fixed issue #346: Iterating through text fragments ran out of
bounds. Indexing error was fixed.
- Fixed issue #348: inline code index marker broken as a result
of spacing changes. Index marker error was fixed.
- Translation Comparison Step:
- Consolidated Paragraph Alignment and Sentence Alignment steps
- Connectors:
- Microsoft MT Connector:
- Improved error handling (e.g. problem with inline codes in
result).
- Filters Plugin for OmegaT:
- Added *.xliff and *.sdlxliff as default extensions.
- Changed default for isFileSupported() to return true (this
allows user-defined extensions).
- Libraries:
Changes from M20 to M21
- All applications:
- Applications now launch correctly on Mac OS X when they are
located in a path containing a space.
- Rainbow:
- Added the
-log option to specify result log file.
By default the log file is {user.home}\rainbowBatchLog.txt
- Tikal:
- Added the
-safe option to prompt user when
overriding a directory when extracting.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Added support for
okp:engine attribute in <alttrans>.
- Wiki Filter:
- Fixed issue #315: WikiFilter didn't work with
preserve_whitespace: true.
- Regex Filter:
- Improved the macStrings default settings to include
slash+star comments with next extracted string.
- IDML Filter:
- Fixed issue #316: Added default to not extract hidden layers
and added the option "Extract hidden layers".
- Enabled the option "Create new paragraphs on hard returns".
Important: This option is still BETA and may prevent you to
merge back the extracted file. Make sure to test the
round-trip before using this option for real projects.
- TMX Filter:
- Fixed a bug where attribute values on <tuv> elements
were being written back to the skeleton without proper
escaping.
- Improved filter performance.
- Properties Filter:
- Fixed issue #313 where the extended characters were not
escaped when using the sub-filter.
- TS Filter:
- Changed the instantiation of the XML parser to use Woodstox.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #303: When using the global_pcdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Steps:
- Microsoft Batch Translation Step:
- Added support for the
${domain} variable for
the category.
- Added support for
${rootDir} and other
variables for the path of the Engine mapping file.
- Quality Check Step:
- Added option to save or not the session. This option is not
accessible when editing the parameters from CheckMate, but
only when editing a step's parameters.
- Rainbow Translation Kit Creation Step:
- Added to XLIFF outputs the option of outputting
ctype
and equiv-text attributes in inline codes.
- Added to OmegaT output the output of
ctype and
equiv-text attributes in inline codes.
- Added the option to merge a new OmegaT translation kit with
an existing one, rather than overwriting it.
- Enrycher Step:
- Added support for segmented text units and implemented
handling of inline codes.
- Added parameter for number of events to process on each call
to the service.
- Translation Comparison Step:
- Added option to log the average scores per documents in a
tab-delimited file.
- Added the output of a new tab-delimited file with all
scores, along with the HTML report.
- Extended the repartition table to use 11 brackets instead of
3, and include the two scores.
- Segmentation and Desegmentation Steps:
- Added the option to renumber code IDs after segmentation so
that they are 1-indexed as much as possible. A corresponding
option on the desegmentation step reverses the process. This
option will not work correctly with formats that use
non-consecutive or non-numeric code IDs, such as XLIFF.
- Connectors:
- Microsoft Translator Connector:
- Added information about the engine in the query results.
- Libraries:
- Continued implementation of ITS 2.0 in XLIFF Writer.
- Changed options settings for the XLIFFWriter class to use an
object rather than multiple setters.
- Filters Plugin for OmegaT:
- Fixed issue #322: Updated the TS filter to use the Woodstox
parser, and added the dependencies.
- Added the XLIFF Filter to the plug-in.
- Added basic support for some ITS data categories in the Comments
pane (Text Analysis, Terminology).
Changes from M19 to M20
- Rainbow:
- Improved the logging output and UI responsivness during lengthy
processes.
- Updated the user's preference dialog to allow the selection of
the log levels as defined by SLF4J (Normal, Debug, Trace)
- Rainbow's input root directory now supports expansion of system
environment variables.
- Tikal:
- Fixed -lfc command output.
- Use the -continue option to specify that batch operations should
continue processing even if one or more files in the batch fail to
process.
- Summary information will be included at the end of batch
commands.
- Timing information is included for each file processed, and
total elapsed time is included in the batch summary.
- Added the -pd option to specify a directory to search for custom
filter configurations.
- Fixed a crash when merging (-m) a file with no extension.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- Continued the implementation of ITS 2.0
- Fixed issue where the HTML-type special characters were not
escaped when converted to inline codes by the code finder.
- Fixed issue #311 where preserve space property was not
applied to attributes.
- TXML Filter:
- Fixed issue #266: Translations in the
<revisions>
elements are now ignored.
- XLIFF Filter:
- Improved support for
<mrk> elements.
- Added support for several ITS features.
- Connectors:
- Microsoft Translator Connector:
- Fixed the internal conversion of the language code.
- Libraries:
- Fixed issue #282 for the Abstract Markup Filter.
- Fixed issues in reporting libraries.
- Added support for including ITS annotations in the XLIFF 1.2
writer.
- Steps:
- Rainbow Translation Kit Creation:
- Added the "Include post-processing hook" option for OmegaT
packages. this allows OmegaT to merge back the documents
automatically.
- Remove Target:
- Fixed issue #270 where the step could not be run without
some of the optional parameters set.
- Microsoft Batch Translation:
- Added an option to send the generated TMX document as a raw
document for the input of the next step.
- Added a option to point to a .properties file containing a
mapping of keys to categories for more convenient lookup. See
wiki.
- Quality Check Step:
- Fixed issue #304 where the default check on parentheses
didn't include full-width characters.
- Inline Code Removal Step:
- Added option to replace line break related codes with
spaces. By default codes are simply removed.
- Added the Space Check Step. It allows to fix automatically
spaces around inline codes of the target based on the source.
- Added the Cleanup Step. It allows to normalize quotation marks,
punctuations, remove suspect entries, etc. this can be used for
example when preparing an aligned document for MT training.