Okapi Framework Changes Log - Sep-28-2014
Note this document is common to both the okapi-lib distribution
and the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from M25 to M26
- Rainbow:
- Added
.mqxliff extension in the list of extensions
associated with the XLIFF Filter.
- Tikal:
- Fixed issue preventing custom filter configurations to work as
sub-filters.
- Added the ability to output scoping reports.
- Filters:
- Fixed Issue 409: Inconsistent handling of
<bx
pos="begin"/> in extraction to Moses inline format.
- XLIFF Filter:
- Added support for
<sub> elements (plain text,
with nested codes, or with nested codes with nested sub-flows).
- XMLStream Filter:
- Added the .ditamap extension to the list of extensions for the
DITA pre-defined configuration.
- Steps:
- LanguageTool Step:
- Resolved issue #416 (added suggestion to annotation).
- Filters Plugin for OmegaT:
- Libraries:
- Fixed the issue where sub-filter start and end events where not
handled properly for outputting RTF layered files.
- Verification library:
- Improved mapping of LanguageTools ITS type to issue annotation.
- Updated the XLIFF2 library to use 0.22-snapshot.
- Discontinued the MacOS 32-bit distribution (no Java 7 support)
- TMXWriter: fixed bug where some property entries were written before
the
<seg> element,
- Improved stream-only pipeline capabilities.
Changes from M24 to M25
- Tikal:
- Fixed default parameter for the default TM resource. Now you can
just run
tikal -q "text to search".
- Added
-x2 and -m2 options for
extraction/merge with new skeleton file.
- Changed -x and -m to extract/merge with the new skeleton file.
- Added
-x1 and -m1 options for
extraction/merge with original document (similar to previous versions.
For fully backward-compatible merge you must use M24).
- Updated Tikal merge function with original file to use the new
common text-unit merger.
- Added options for JAR version switch
- Filters Plugin for OmegaT:
- Added basic support for XLIFF 2 documents (under construction).
- Now target text passed as translation only if it is different from
the source.
- Added support for alternate translations (e.g. from XLIFF 1.2
documents)
- Steps:
- Added the TTXSplitter Step. It allows to split a given TTX document
into several ones with the same source word count.
- Added the TTXJoiner Step. It allows to join back TTX documents
created with the TTXSplitter Step.
- Consolidated merge steps into: SkeletonXliffMergerStep,
LegacyXliffMergerStep, OriginalDocumentXliffMergerStep and
CombinedXliffMergerStep classes.
- Rainbow Translation Kit Creation Step:
- Reinstated output for XLIFF v2 using the latest library.
- Added support for extraction using the new skeleton file.
- Rainbow Translation Kit Merging Step:
- Added basic support for merging XLIFF v2 packages.
- Added support for merging using the new skeleton file.
- Search and Replace Step:
- Fixed issue #392 where the reading of the replacement table was
trimming all lines. Now replacing or searching for space and
replacing by nothing works.
- Filters:
- Changed the reading of
<alt-trans> elements to
allow entries with empty <target> (e.g. some
XTM's XLIFF have <alt-trans> with empty
targets).
- Added the option "Allow modification of existing <alt-trans>
elements"
- YAML Filter:
- OpenXML Filter:
- Fixed issue #402 (Cannot stop the filter before the document is
done)
- Fixed issue #350 (merge problem when docx has a OpenXml.Drawing
object)
- PO Filter:
- Fixed issue with
#, (fuzzy flag) in front of #~
(obsolete) entries.
- JSON Filter:
- Refactored the filter.
- Fixed issue #359 (Need to improve extraction selection)
- Fixed issue #373 (Encoder and xml:space='preserve')
- Fixed issue #397 (Filter not extracting all strings as expected)
- Connectors:
- Translate Toolkit TM Connector:
- Updated the parameters API to use
set/getUrl()
instead of set/getHost() and set/getPort().
- Updated the default host and port (now obsolete) to
localhost
and 8080 to allow local setups to continue to work.
- Updated the default URL to
https://amagama-live.translatehouse.org/api/v1/
(the previous URL is obsolete)
- Libraries:
- Major refactoring of the serialization.
- Major refactoring of the RawDocument object
- Updated SWT libraries to 4.3
- Added lib-tkit library for extraction/merge with skeleton in JSON.
- Added sort capability to the Filter Configuration common edit
dialog.
Changes from M23 to M24
- Tikal:
- Changed default resource for
-q command from OpenTran
to Translate Toolkit
- Rainbow:
- Made usability improvements to the Testing Console for rapid
iteration when creating custom filter configurations.
- Steps:
- Added the Copy
Or Move Step: Copies or moves files to a specified location with
the option to overwrite or backup existing files or skip copying files
if there is an existing file.
- Rainbow
Translation Kit Creation Step:
- Removed the experimental output to XLIFF 2.0 (too outdated to be
useful) See the Okapi
XLIFF Toolkit project for more up-to-date support for XLIFF
2.0.
- Added option to specify the post-processing hook for OmegaT tkits.
- Format
Conversion Step:
- Added the Word Table output format.
- Filters:
- IDML Filter:
- Changed the default spread size threshold to 2000 Kb and updated
the warning/error to show the spread size.
- XML Filter:
- Android Strings pre-defined settings: Exposed content of
<item> elements for translation (used in <plurals>,
<string-array> elements).
- HTML Filter:
- Added option to treat CDATA as an inline element.
- Content of excluded inline elements is exposed for, e.g.,
inclusion in XLIFF equiv-text attributes.
- Fixed issue #336: The filter will no longer produce translatable
segments consisting only of tags.
- XML Stream Filter:
- Fixed issue #336: The filter will no longer produce translatable
segments consisting only of tags.
- OpenXML Filter:
- Fixed issue #351: Improve filter performance.
- JSON Filter:
- Fixed issue #377: Support for subfiltering in JSON.
- Fixed issue #373: JSONFilter should use the JSONEncoder.
- Note: Changes in escaping/unescaping behavior in this filter break
compatibility with files extracted by previous versions.
- Filters Plugin for OmegaT:
- Added capability to specify a custom filter parameters file for each
Okapi filter in the plugin. This closes issue #376.
- Connectors:
- Added the Bilingual
File Connector: Directly query a bilingual file format such as
TMX, PO, etc., without importing to a TM first.
- Library:
- Important: Changed minimum requirement
from Java 1.6 to Java 1.7.
- Fixed ITS content writer to output
locQualityIssueProfileRef
and not locQualityIssueProfile.
- Improved report output of quality checker.
- Updated and cleaned up the build files.
- Added
HUMAN_RECOMMENDED type to the MatchType
list.
- Modified the base implementation for
IParameters, this
may result in compilation errors in your code if you access directly
some variables: you should now use the corresponding getter and setter
methods.
- Added support for tuv-level attributes that were missing in
TMXWriter.
Changes from M22 to M23
- Rainbow:
- Added the Inconsistency Check Step to the pre-defined Quality Check
pipeline.
- CheckMate:
- Fixed issue #358: The Check Document button now works in all cases.
- Filters Plugin for OmegaT
- Added .mxliff as one of the default extensions for XLIFF.
- Fixed issue #364: .sdlxliff files with UTF-8 BOM open now.
- Steps:
- Added the Inconsistency Check Step: a way to flag entries with the
same source that have different targets or the entries with the same
target that have different source.
- Rainbow Translation Kit Creation Step:
- Added a
libVersion attribute in the manifest
indicating the version of the library used to create the manifest.
- Add option to use encapsulation notation (
<bpt>/<ept>/<ph>/<it>)
for inline codes in OmegaT tkits.
- LanguageTool Step:
- Updated the library to version 2.2.
- Encoding Conversion Step:
- Fixed issue #318: ASCII characters in NCR form are now un-escaped
except for ", ', &, < and >.
- Search and Replace Step:
- Fixed issue #183: Added simple log of the replacements.
- Fixed issue #362: Step for Terminology fixes on translation
candidates.
- Quality Check Step:
- Resolved issue #357: Added function to detect blacklisted terms.
- Improved ITS LQI support.
- Filters:
- IDML Filter:
- Implemented issue #356: By default spread above the threshold
cause an error. The option allows to skip without error.
- XML Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #361: MIME type can be different in sub-classes of
XMLFilter.
- HTML5-ITS Filter:
- Continued implementation of ITS 2.0.
- JSON Filter:
- Resolved issue #360: The use of the key for the resname value is
now optional.
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Fixed issue #364: Woodstox XML parser is now always used.
- Alt-trans with empty target are now skipped.
- Added support for the
<tool> and <phase>
elements as well as the state-qualifier attribute.
- OpenXML Filter:
- Fixed issue #291: Sub-documents are now processed in correct
order.
- Fixed issue #319: 'squishable' tests has been changed.
- Simplification Filter:
- Fixed issue #355: parameters of sub-filter are properly read in
the cases where the primary filter uses a sub-filter.
- XINI Filter:
- Fixed a case where placeholders were being renumbered incorrectly
when reading a XINI file.
- Library:
- Verification Library:
- Fixed issue with non-initialized start/end variable when checking
patterns from the target.
- Added support for sub-document in Quality Checker library.
- Continued implementation of ITS 2.0 in XLIFFWriter, XLIFFContent,
etc.
- Fixed issue #352: XMLWriter now throw OkapiIOException if an error
occurs.
- Updated XLIFF Writer to match ITS/XLIFF official mapping (http://www.w3.org/International/its/wiki/XLIFF_1.2_Mapping).
- Added the experimental lib-concurrent package to improve
multi-threaded pipelines. See ThrededWorkQueue
Step page for details.
Changes from M21 to M22
- Tikal:
- Made it possible to run tikal.sh from another directory on Mac OS
X.
- Updated the way the application root folder was computed to allow
call from Network share.
- Rainbow:
- Fixed
-log option to allow it anywhere in the
command-line.
- Filters:
- Table Filter:
- Fixed issue #300 (enhancement): Added a new Table Filter for
2-column (source + target), tab separated files.
- OpenXML Filter:
- Fixed issue #166: Text from mc:Fallback and mc:Choice
Requites="wps", WordArt, TextArt, and Watermarks is handled
properly now.
- Fixed issue #169: Segmentation around inline codes seems to
work properly.
- Fixed issue #286: PPTX smart-tags are now imported.
- Fixed issue #323: Files are not corrupted anymore when using
text areas.
- Fixed issue #324: Nested <w:p> merge properly now.
- Fixed issue #325: The slides of PPTX documents are now
extracted in order.
- Fixed issue #329: Text from PPTX diagrams are now extracted.
- Fixed issue #351: Creation of XLIFF work on documents with
SmartArt graphics.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #332: When using the global_cdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Fixed issue #339: The filter was not grouping the tags back
properly when merging back.
- Added handling of variable placeholders for the pre-defined
settings for RESX files.
- ITS Filters (XML Filter and HTML5+ITS Filter):
- TMX Filter:
- Fixed the issue where <it> codes where mapped to
placeholder rather than opening/closing internal codes.
- XLIFF Filter:
- Continued implementation of ITS 2.0: Improved support for LQI,
added support for Provenance.
- Simplification Filter:
- General:
- Filters that update language properties (like xml:lang) during
merging will now be region-insensitive when doing so.
- Steps:
- Term Extraction Step:
- Added support for Text Analysis annotations.
- Made the three extraction methods options, and attached the
relevant options to the statistical method.
- Full-Width Conversion Step:
- Added log message if at least one character was modified (per
input file). This resolves issue #327.
- Enrycher Step:
- Improved hanlding of nested annotations.
- Batch TM Leveraging Step:
- Fixed issue #331: Entries with no text are now not sent for
translation.
- Format Conversion Step:
- Fixed the issue where the "Output generic inline codes" was not
recognized for the Tab-delimited table output.
- MS Batch Translation Step:
- MT candidate with a very low score (e.g. from error) are not
output in the TMX.
- Space Checker Step:
- Improved reporting of errors and changes.
- Fixed issue #346: Iterating through text fragments ran out of
bounds. Indexing error was fixed.
- Fixed issue #348: inline code index marker broken as a result of
spacing changes. Index marker error was fixed.
- Translation Comparison Step:
- Consolidated Paragraph Alignment and Sentence Alignment steps
- Connectors:
- Microsoft MT Connector:
- Improved error handling (e.g. problem with inline codes in
result).
- Filters Plugin for OmegaT:
- Added *.xliff and *.sdlxliff as default extensions.
- Changed default for isFileSupported() to return true (this allows
user-defined extensions).
- Libraries:
Changes from M20 to M21
- All applications:
- Applications now launch correctly on Mac OS X when they are
located in a path containing a space.
- Rainbow:
- Added the
-log option to specify result log file. By
default the log file is {user.home}\rainbowBatchLog.txt
- Tikal:
- Added the
-safe option to prompt user when
overriding a directory when extracting.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Added support for
okp:engine attribute in <alttrans>.
- Wiki Filter:
- Fixed issue #315: WikiFilter didn't work with
preserve_whitespace: true.
- Regex Filter:
- Improved the macStrings default settings to include slash+star
comments with next extracted string.
- IDML Filter:
- Fixed issue #316: Added default to not extract hidden layers
and added the option "Extract hidden layers".
- Enabled the option "Create new paragraphs on hard returns".
Important: This option is still BETA and may prevent you to
merge back the extracted file. Make sure to test the
round-trip before using this option for real projects.
- TMX Filter:
- Fixed a bug where attribute values on <tuv> elements
were being written back to the skeleton without proper escaping.
- Improved filter performance.
- Properties Filter:
- Fixed issue #313 where the extended characters were not
escaped when using the sub-filter.
- TS Filter:
- Changed the instantiation of the XML parser to use Woodstox.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #303: When using the global_pcdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Steps:
- Microsoft Batch Translation Step:
- Added support for the
${domain} variable for the
category.
- Added support for
${rootDir} and other variables
for the path of the Engine mapping file.
- Quality Check Step:
- Added option to save or not the session. This option is not
accessible when editing the parameters from CheckMate, but only
when editing a step's parameters.
- Rainbow Translation Kit Creation Step:
- Added to XLIFF outputs the option of outputting
ctype
and equiv-text attributes in inline codes.
- Added to OmegaT output the output of
ctype and
equiv-text attributes in inline codes.
- Added the option to merge a new OmegaT translation kit with an
existing one, rather than overwriting it.
- Enrycher Step:
- Added support for segmented text units and implemented
handling of inline codes.
- Added parameter for number of events to process on each call
to the service.
- Translation Comparison Step:
- Added option to log the average scores per documents in a
tab-delimited file.
- Added the output of a new tab-delimited file with all scores,
along with the HTML report.
- Extended the repartition table to use 11 brackets instead of
3, and include the two scores.
- Segmentation and Desegmentation Steps:
- Added the option to renumber code IDs after segmentation so
that they are 1-indexed as much as possible. A corresponding
option on the desegmentation step reverses the process. This
option will not work correctly with formats that use
non-consecutive or non-numeric code IDs, such as XLIFF.
- Connectors:
- Microsoft Translator Connector:
- Added information about the engine in the query results.
- Libraries:
- Continued implementation of ITS 2.0 in XLIFF Writer.
- Changed options settings for the XLIFFWriter class to use an
object rather than multiple setters.
- Filters Plugin for OmegaT:
- Fixed issue #322: Updated the TS filter to use the Woodstox
parser, and added the dependencies.
- Added the XLIFF Filter to the plug-in.
- Added basic support for some ITS data categories in the Comments
pane (Text Analysis, Terminology).
Changes from M19 to M20
- Rainbow:
- Improved the logging output and UI responsivness during lengthy
processes.
- Updated the user's preference dialog to allow the selection of the
log levels as defined by SLF4J (Normal, Debug, Trace)
- Rainbow's input root directory now supports expansion of system
environment variables.
- Tikal:
- Fixed -lfc command output.
- Use the -continue option to specify that batch operations should
continue processing even if one or more files in the batch fail to
process.
- Summary information will be included at the end of batch commands.
- Timing information is included for each file processed, and total
elapsed time is included in the batch summary.
- Added the -pd option to specify a directory to search for custom
filter configurations.
- Fixed a crash when merging (-m) a file with no extension.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- Continued the implementation of ITS 2.0
- Fixed issue where the HTML-type special characters were not
escaped when converted to inline codes by the code finder.
- Fixed issue #311 where preserve space property was not applied
to attributes.
- TXML Filter:
- Fixed issue #266: Translations in the
<revisions>
elements are now ignored.
- XLIFF Filter:
- Improved support for
<mrk> elements.
- Added support for several ITS features.
- Connectors:
- Microsoft Translator Connector:
- Fixed the internal conversion of the language code.
- Libraries:
- Fixed issue #282 for the Abstract Markup Filter.
- Fixed issues in reporting libraries.
- Added support for including ITS annotations in the XLIFF 1.2
writer.
- Steps:
- Rainbow Translation Kit Creation:
- Added the "Include post-processing hook" option for OmegaT
packages. this allows OmegaT to merge back the documents
automatically.
- Remove Target:
- Fixed issue #270 where the step could not be run without some
of the optional parameters set.
- Microsoft Batch Translation:
- Added an option to send the generated TMX document as a raw
document for the input of the next step.
- Added a option to point to a .properties file containing a
mapping of keys to categories for more convenient lookup. See
wiki.
- Quality Check Step:
- Fixed issue #304 where the default check on parentheses didn't
include full-width characters.
- Inline Code Removal Step:
- Added option to replace line break related codes with spaces.
By default codes are simply removed.
- Added the Space Check Step. It allows to fix automatically spaces
around inline codes of the target based on the source.
- Added the Cleanup Step. It allows to normalize quotation marks,
punctuations, remove suspect entries, etc. this can be used for
example when preparing an aligned document for MT training.