Okapi Framework Changes Log - @date@
Note this document is common to both the okapi-lib distribution and
the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from M14 to M15
- Tikal:
- Added the option
-gg for the Google MT v2 conector. This fixes issue #189.
- Mapped the
-google option to -gg and removed
references to the Google MT v1 connector which is not available anymore.
- Rainbow:
- Update the ID-Based Alignment utility to use the GT v2 API for the MT
services instead of v1.
- Moved old utilities at the bottom of the Utilities menu.
- Filters Plugin for OmegaT:
- Existing segmentation is auto-detected.
- Fixed code re-construction issues for some cases.
- Improved backward compatibility, and implemented support for
alternate translation of repeated source segments for OmeagT 2.5.
- Now if output encoding is not defined for a TTX file, UTF-16 is used
(instead of UTF-8).
- Steps:
- Text Modification Step:
- Added the option for text expansion.
- Added the option to replace ASCII letters with Cyrillic, Arabic
and Chinese characters.
- Rainbow Kit Creation Step:
- Fixed incorrectly mapped inline code that resulted in invalid merged
files.
- The Translation Table format is now available as an package type 9allows
extraction and merge of tab-delimited files)
- Format Conversion Step:
- For the parallel corpus output, the source entries are also now always
output even when there is no corresponding target (in that case the target
entry is output as an empty line).
- Id-Based Aligner Step:
- Implemented support for bilingual files as the reference file.
- Microsoft Batch Translation Step:
- Fixed issue where leveraged text was not copied into the target in some
cases when it should.
- XML validation step:
- Fixed the schema/DTD selection error.
- Term Extraction Step:
- The TermsAnnotation annotations are taken into account and are
extracted.
- Connectors:
- Google MT v2 Connector:
- Fixed error when list of segments to translate is empty or made of
segments with only codes.
- Removed query length limit for single query mode to let Google Translate
to deal with it in case it changes)
- Filters:
- IDML Filter:
- Fixed issue of not merging last cell of a table that is the last element
of a story.
- Fixed TextPath-attached stories not being extracted. This fixes issue
#194.
- HTML Filter:
- Files with no encoding declaration detected within the fisrt 1 KB have
an encoding declaration added on output (only if the file has a <head>
element).
- Added support for HTML5 charset declaration.
- XLIFF Filter:
- Fixed issue where the
datatype attribute for
<file> was not read properly.
- RTF Filter:
- Improved warning messages for decoding error (e.g. when a font name is
corrupted). dangerous cases have now an extra message warning about the lost
of the character.
- TTX Filter:
- Replace the "include un-segmented part" option by a new segmentMode
option: Mode 0: auto-detect, if a segment is detected mode 1 is used, If no
segment is detected mode 2 is used. Mode 1: only existing segments are
extracted. Mode 2: all text is extracted, segmented or not.
- Added support for Catalyst-generated TTX (that have no
<Raw> nor
<ut> elements)
- Properties Filter:
- Added inline code support for HTML tags in default
- PO Filter:
- The translator's comments are now extracted into the common transNote
read-only property.
- TS Filter:
- Implemented the support for translator's note and simple notes.
- Regex Filter:
- Added a pre-defined configuration for Apple iOS .strings files.
- OpenXML Filter:
- Fixed the issue where a ZIP stream was not closed properly.
- Fixed issue #143 (Excel parameters not working)
- Fixed issue #144 (Color not save for Excel options))
- XML Filter:
- Fixed issue of missing start tag in case of non-translatable inline with
no translatable text before.
- Changed unwrapping default: ends are not trimmed anymore (was causing
lost of spaces in fragment extraction).
- Fixed issue where translation context was not reset after an embedded
structural translatable element was extracted (e.g. para inside para case)
- TermsAnnotation annotations are now generated for entries
matching the ITS Terminology datacategory.
- Libraries:
- Added support for multiple monitors when centering a dialog.
- Fixed case where tuid was written twice with TMXWriter.
- Fixed the link of the "Patterns" button in Inline Code panel.
- Continued to improve the XLIFF 2.0 experimental library.
- Fixed the case of batch configuration with no plugins.
- Removed Google MT v1 connector from the list of default connectors.
- Fixed icon and cancel command in Filter Configuration common dialog.
- Improved cloning of skeleton parts.
- The XLIFFWriter output the common transNote property as a
<note from="translator">. This fixes issue #195.
- Added
changeFontSize() to InputDialog UI.
Changes from M13 to M14
- Rainbow:
- Fixed input root setting when running from the command-line.
- Fixed missing command-line IDs for Translation Kit creation and
post-processing pre-defined pipelines.
- Tikal:
- Added the
-bpt option in the Leverage Files from Moses command.
- Added the
-to option in the Leverage Files from Moses command.
- Made the key parameter of the MyMemory connector optional (kept it for
backward compatibility)
- Updated bash script (tikal.sh) to allow more than 9 arguments.
- Added the
-od option in the Extract Files and Merge Files
commands.
- Added the
-sd option in the Merge Files commands.
- Filters Plugin for OmegaT:
- Added support for TXML files.
- Updated filter names to indicate they are from the Okapi plugin (as we
may have duplicated filters with the default OmegaT filters)
- CheckMate:
- Added support for TXML files.
- Steps:
- Id-Based Copy Step:
- Added the options to mark the leveraged text units as non-translatable.
- Added the option to mark the leveraged translation as approved.
- Rainbow Translation Kit Creation Step:
- Added the support for Versified Text + RTF packages.
- The XINI package output now support pre-translated items.
- Changed the default output directory from
${rootDir} to
${inputRootDir}.
- Continued to improved the experimental XLIFF 2.0 package.
- Updated OmegaT package so the
<dictionnary_dir> element
required for OmegaT 2.5 is generated.
- Fixed handling of input file without filter for XLIFF+RTF and
Versified+RTF outputs.
- Fixed the problem with the destination folder when merging several
manifests one after the other.
- XML Validation Step:
- Fixed case of relative path for included DTDs.
- Translation Comparison Step:
- Added the option to use generic representation for inline codes.
- Improved the error handling.
- Segmentation Step:
- Added suport for
${inpuRootDir}.
- Leveraging Step:
- Added suport for
${inpuRootDir} for the TMX output.
- Microsoft Batch Translation Step:
- Added suport for
${inpuRootDir} for the TMX output.
- Moses Leveraging Step:
- Added the option to not use
<g> notation in new
<alt-trans> elements in output.
- Filters:
- Added the TXML Filter. It allows you to process Wordfast pro native
translation files. This filter is early BETA for now.
- HTML Filter:
- Fixed un-escaping for '&' issue with non-extractable attribute values.
- Added the
escapeCharacters option to allow output using
character entity references.
- Versified Text Filter:
- Added support for bilingual variant of the format.
- XLIFF Filter:
- Added the option to not use
<g> notation in new
<alt-trans> elements in output.
- Modified the generation of group IDs to take into account the ones
existing in the input file. This fixes issue #185.
- Added support for cases where a
<seg-source> element has no
<mrk> element: it's treated as a single-segment entry.
- RTF Filter:
- Updated filter to strip out null characters present in some varities of
RTF files.
- TMX Filter:
- Fixed the problem of some attribute values not being escaped properly in
some cases.
- XML Filter:
- Added the option
extractIfOnlyCodes to allow entries with
only codes or whitespace but no text to not be extracted. For backward
compatibility this option is set to true.
- IDML Filter:
- Added support for stories nested inside other stories.
- Fixed issue with empty <Content> element, and tested with more complex
CS5 files.
- RainbowTKit Filter:
- Improve error handling in Rainbow translation kit post-processing.
- Table Filter:
- Add support for adding qualifiers when needed for the CSV output. This
fixes issue #187.
- TTX Filter:
- Added the option of including or not unsegmented text and added the
pre-defined configuration
preSegmented
- Changed the parsing so the IDs are unique within each text unit rather
than reset for each segment
- Properties Filter:
- Added support for sub-filters.
- Added the pre-defined configuration
okf_properties-html-subfilter
- Connectors:
- MyMemory TM Connector:
- Changed the code to use the REST interface rather than the SOAP
interface.
- Microsoft Translator Connector:
- Changed the queryList() method to batchQuery() so it can be used through
the IQuery interface.
- The original score is now set in score, while the re-calculated score is
the combined score.
- Google MT v2 Connector:
- Libraries:
- Continued to improve the XLIFF 2.0 experimental library.
- Updated IQuery to include batchQuery() and batchLeverage().
- Updated QueryResult to include the new combinedScore and quality fields.
- Changed various classes to use the combined score rather than the score
for their filtering.
- Made various fixes and improvments to the core libraries.
Changes from M12 to M13
- Rainbow:
- Added the user preference "Use the last session's locales and encodings
as defaults".
- CheckMate:
- Fixed the null pointer issue when target pattern different from the
source match were not found.
- Added check of the opening/closing sequence for inline codes. This check
is done only if no other error is found in the segment.
- Added the option "Try to guess opening/closing types for placeholder
codes" for the codes.
- Tikal:
- Added the -a command to add translations to a resource.
- Fixed usage screen for minor details.
- Longhorn:
- Longhorn is now part of the regular releases. Longhorn is a server
application that allows you to execute batch configurations remotely on any
set of input files. Batch configurations which include pre-defined pipelines
and filter configurations, can be exported from Rainbow. For more
information see
http://www.opentag.com/okapi/wiki/index.php?title=Longhorn.
- Steps:
- Rainbow Translation Kit Creation Step:
- Changed the XLIFF and XLIFF + RTF output behavior so single and double
quotes are not escaped in element content.
- Changed the XLIFF + RTF output to use short notation for empty inline
codes.
IMPORTANT: This updated notation may have impact on TM matches
using the previous notation. Some TM tools will have coded the two parts
<x id="1"> and </x> as two separate inline
codes (because they are styled as two spans of tw4winInternal) and will see
the new notation <x id="1"/> as a single code. For
Trados-2007 TM for example you may want to update your TM by replacing the
two adjacent codes with a single one. Export the TM to TMX 1.4b, replace
<bpt\s.*?><x id="(\d\d?)"></bpt><ept\s.*?></x></ept>
by <ph><x id="\1"/></ph> and create a new TM based
on the modified notation. That new TM should yield segments that will match
with the new notation.
- Added the option "Create a ZIP file for the package"
- Continued experimental implementation of XLIFF 2.0.
- Rainbow Translation Kit Merging Step:
- Added the option to generate raw document events or filter events.
- Added support for post-processing directly .rkp zipped files.
- Extraction Verification Step:
- Improved support for bilingual file formats.
- Search and Replace Step:
- Fixed UI so the list of expression is not too high and drive button out
of the screen.
- Added the Microsoft Batch Translation Step: it allows to batch translate
resources using the Microsoft Translator TM/MT engine (the Collaborative
Translations Framework)
- Added the Microsoft Batch Submission Step: it allows to
submit human or post-edited translations to the Microsoft Collaborative Translations Framework.
Those translations can be retrieved later.
- Filters:
- XLIFF Filter:
- Added internal flag to output un-escaped single and double quotes.
- Fixed filter-writer to take into account un-segmented entries with
existing translation in the layered output (e.g. RTF).
- Fixed bug of un-balanced opening and closing codes segmented entry when
codes are nested over parts.
- Updated how outer data of empty inline codes are stored (e.g. now uses
<x id='1'/> instead of <x id='1'></x>)
- PHP Content Filter:
- Fixed the case of multiple placeholder codes with the same ID. This
fixes issue #179.
- IDML Filter:
- Implemented extraction of master spreads (as an option set by default).
- Fixed bug of in the simplification of trailing inline codes after a
single character.
- Connectors:
- Microsoft Translator Connector:
- Changed the name from Microsoft MT Connector (since it handles more than
MT matches)
- Implemented ITMQuery and multiple results.
- Libraries:
- Added support for quote mode in XMLEncoder.
- Upgraded ICU4J library from 4.6 to 4.8.
- Added PIPELINE_PARAMETERS event.
- Upgraded the SWT library from 3.6.1 to 3.7.
Changes from M11 to M12
- Rainbow:
- Fixed YAML escaping/un-escaping for quotes in the Edit Code Finder Rules
tool.
- Added the Create New Document command.
- Tikal:
- Updated the
-tt option to make the server parameters
optional and default to the Amagama server.
- Filters:
- PO Filter:
- Implemented case of no quote on msgid/msgstr/etc. line.
- Improved error reporting with line numbers.
- Added msgctxt in the generated resname values. This allows two entries
with the same source and same domain but with a different context to have
distinct resname auto-generated values.
- MIF Filter:
- Implemented extraction for index markers (default=yes) and hyperlinks
(default=no).
- When needed, leading codes (font, etc.) are now extracted as inline code
rather than kept in the skeleton.
- Regex Filter:
- Added support for escaped character notation that uses doubled
characters (e.g. "").
- Made the back-slash escape notation optional.
- XML Filter:
- Fixed bug where standalone="yes" was re-written standalone="true" in the
XML declaration. This fixes issue #173.
- IDML Filter:
- Improved support for special characters (hair spaces, forced
line-breaks, etc.)
- Steps:
- Added the Remove Target Step, to remove target entries from text units.
- Added the Inline Codes Simplifier Step, to join adjacent inline codes in
text units.
- Added the GTT Batch Translation Step, to create TM files using Google
Translator Toolkit.
- Added the Repetition Analysis Step, to detect text repetitions for word
count report.
- Added the Extraction Verification Step, to check if a document can be
extracted/merged/re-extracted and generate the same events (this help
verifying that extracted text can be safely merged back).
- Rainbow Translation Kit Creation Step:
- Added option for creating XLIFF 2.0 packages. This is for experimental
tests only. XLIFF 2.0 is not defined yet.
- Added the option of setting the sentence_seg flag in the project for
OmegaT packages.
- Fixed synchronization error with empty source on merging PO and
Transifex packages.
- Implemented support for input files without filter configuration: they
are copied into the original folder if the selected kit has one.
- Rainbow Translation Kit Merging Step:
- Fixed merging for segmented entries.
- Added option to preserve or not segmentation for the next step (false to
save time if there is no step afterward)
- Leveraging Step:
- Added the option to add a prefix to the leveraged translation, when a
given threshold is equal or below the score of the leveraged match, and
depending on the target content.
- Added option to copy the source when the source content has no text (but
may have white-spaces and/or codes).
- Search and Replace Step:
- Added the option "replace all instances of the pattern" to replace all
or only the first match in each item searched.
- Quality Check Step:
- Changed the inline code verification so only selected types of codes are
ignored rater than all the ones without nativae data. now, by default only
mrk and x-df-s types are ignored.
- Segmentation Step:
- Prevented the options to be applied when no
segmentation is done.
- Line-Break Conversion Step:
- Updated output so the output can be the same file as the input.
- Byte-Order-Mark Conversion Step:
- Updated output so the output can be the same file as the input.
- RTF Conversion Step:
- Updated output so the output can be the same file as the input.
- XML Character Fixing Step:
- Updated output so the output can be the same file as the input.
- Encoding Conversion Step:
- Updated output so the output can be the same file as the input.
- Term Extraction Step:
- Changed code to include numbers as part of the possible terms.
- Connectors:
- GlobalSight Connector:
- Changed the conversion of the score to handle the floats sent by the
latest API, instead of integers.
- Translate Toolkit TM Connector:
- Updated the default to use the Amagama server.
- Added support for letter-coded inline codes.
- Removed Cross-Language Gateway MT services connector from default
distribution.
- Microsoft MT Connector:
- Implemented support for inline codes.
- Libraries:
- Fixed corrupted DefaultFilters.properties so XINI filter and MIF Filter
UI are accessible again.
- Added support for encoded storage of strings (e.g. passwords) in several
types of parameters files.
- Added output helper methods to RawDocument.
- The Lucene library has been updated to 3.1.0. You may need to re-index
your Pensieve TMs. You can do this by exporting the TM to TMX, then
re-importing it back.
- Updated TextUnit to use ITextUnit and a new implementation. This is a
major low-level refactoring that comes with some behavior changes and
additions:
- in createTarget() IResource.COPY_CONTENT should be replaced by
IResource.COPY_SEGMENTED_CONTENT to have the same behavior.
- the new getTargetSegments() does create a target if it does not exists
(with a COPY_SEGMENTATION option).
- Moved IQuery and QueryResult from the okapi-lib-translation project to
the core (common.query package).
- Implemented plugin support for connectors.
Changes from M10 to M11
- Filters Plugin for OmegaT:
- Added the Transifex filter. Transifex projects can be edited directly
from OmegaT now (write-rights needed to save the translations)
- Rainbow:
- Added the "Export Batch Configuration" command to create single-file
configuration that can be transported and re-installed elsewhere.
- Added the "Install Batch Configuration" command to create a set of local
files from a batch configuration.
- Added the "Translation Kit Creation" pre-defined pipeline, to create
translation kit. (BETA). When done, this will replace the "Translation
Package Creation" utility.
- Added the "Translation Kit Post-Processing" pre-defined pipeline, to
merge back extracted file (BETA). When done, this will replace the
"Translation Package Post-Processing" utility.
- Changed the "Translation Package Creation" utility to not output
non-translatable entries in OmegaT package.
- Fixed which page was called for the command-line help.
- Tikal:
- Fixed the -lfc command so it includes custom configurations as weel as
the default ones.
- Added the -noalttrans option to the -x command.
- Fixed problem with application's path containing spaces. This fixes
issue #162.
- Integrated Transifex, MIF and Archive filters.
- CheckMate:
- Added the check for absolute maximum length.
- Added option to automatically recheck the documents when they change.
- Steps:
- Added the "Rainbow Translation Kit Creation" step, to create translation
package in various format from a pipeline.
- Added the "Rainbow Translation Kit Merging" step, to merge back
extracted files.
- XML Characters Fixing Step:
- Implemented support for decimal and hexadecimal NCRs in addition to raw
characters.
- Added the "Id-Based Copy" step, to copy the source text from one file
into the target of another for the entries with the same id.
- Scoping Report Step:
- Many changes have been implemented: better support for GMX and Okapi
categories, option for custom templates, etc.
- RTF Conversion:
- Added warning when a character cannot be encoding in the output
encoding.
- Added option to update the encoding declaration in XML/HTML files when
possible.
- Filters:
- Added the MIF Filter (Beta)
- Added the Rainbow Translation Kit Filter, to process translation
packages.
- Added the Transifex Filter, to process remote Transifex projects.
- Added the Archive Filter, to process any files inside a zip or jar file.
- HTML Filter:
- Fixed the case where non-quoted one-word translatable attributes could
be merged back as non-quoted multi-words. This resolves issue #126.
- TS Filter:
- Added resname for group (<context> elements)
- Changed to use Woodsox Stax parser instead of the defautl VM parser.
- TTX Filter:
- Improved handling of unsegmented content to mimick TagEditor's behavior
closer: Leading whitespace characters are now excluded from the entries.
This addresses issue #164.
- Changed to use Woodsox Stax parser instead of the defautl VM parser.
- PO Filter:
- Added mapping for msgctxt to the context property.
- Added the option to protect approved entries (i.e not empty and not
fuzzy).
- TMX Filter:
- Changed to use Woodsox Stax parser instead of the defautl VM parser.
- XLIFF Filter:
- Fixed issue with out-of-segment inline codes collapsing in previous
empty segmented target entry.
- Fixed the bug were the approved property was not writeable. Now you can
add, delete or modify it.
- Changed to use Woodsox Stax parser instead of the defautl VM parser.
- OpenXML Filter
- Fixed the problem of < and/or > in text boxes causing merging error.
This resolves issue #142.
- Table Filter:
- Fixed the issue with incorrectly setting the inline code finder rules.
- Plain Text Filter:
- Fixed the issue with incorrectly setting the inline code finder rules.
- Connectors:
- Libraries:
- Fixed the short search cases in NGramTokenizer/Analyzer. This fixes
issue #159.
- Added Transifex client library.
Changes from M9 to M10
- Rainbow:
- Changed the help system to use the wiki (the plan is to have a snapshot
of the wiki also available as local help).
- Added the "Tools" > "Plugins Manager" command.
- Added the "Plugins Location" option in the "User Preferences" dialog
box.
- Fixed locale variables (${TrgLoc}, etc.) to get a consistant casing
regardless of the casing of the value in the Languages and Encodings tab.
- Fixed corrupted path when dropping file on output using ${TrgLoc}.
- Added the Code Finder Editor (in Tools menu) to edit code finder rules
for filters using them but not having any UI to define them.
- The root of a list is now automatically adjusted to the longest root
possible when a document above the current root is added, the documents
already listed have their relative path adjusted as well.
- Create Translation package:
- Refactored heavily the leveraging mechanism and the output (e.g. added <alt-trans> output in
Generic XLIFF)
- Re-organized the Utilities menu with sub-menus.
- Added the "XML Analysis", the "XML Characters Fixing", and the "XML
Validation" pre-defined pipelines.
- Changed command-line processing to handle pre-defined pipelines in
addition to utilities (and pipeline).
- CheckMate:
- Added option in the Term tab to match strings only when enclosed in
codes.
- Added the "Reset to Defaults" command in the Configuration dialog.
- Added the "Accept all next documents with their defaults" option when
adding documents to the session.
- Implemented tab-delimited format for the report, and added choice
between HTML or tab-delimited in Configuration dialog.
- Added support for term list in CSV format.
- Added warning when the target from RTF contains an hidden part.
- OmegaT plugin
- A new component, the Okapi Filters for OmegaT plugin is now part of the
distributions. It allows to use some of the Okapi filters directly in
OmegaT. Currently the filters for the following formats are included: JSON,
TTX, Qt-TS and IDML.
- Tikal:
- Added the -xm command to extract files to Moses InlineText format.
- Added the -lm command to leverage files from their Moses InlineText
corresponding files.
- Updated handling of default for source and target languages (allows
autodetection before using system defaults).
- Added the -s command to segment files.
- Steps:
- Added the RTF Conversion Step, and replaced the "RTF
Conversion" utility by a pre-defined pipeline.
- Added the BOM Conversion Step, and replaced the "byte-Order-Mark Conversion" utility by a
pre-defined pipeline.
- Added the Encoding Conversion Step, and replaced the "Encoding Conversion" utility by a
pre-defined pipeline.
- Added the Create Target Step (allows you to create
target from the source).
- Added the Scoping Report Step (generates word-count
reports)
- Added the XML Characters Fixing Step (replaces invalid
XML characters by markers)
- Added the XLIFF Joiner Step (allows to re-join
together XLIFF documents created by the XLIFF Splitter step).
- Added the Moses InlineText Extraction Step (to extract
entries to Moses text files)
- Added the Moses InlineText Leveraging Step (to
leverage translation from a Moses text file)
- Added the XML Analysis Step (to generate the list of
elements in a set of XML document and guess their localization-related
properties), this also closes issue #153.
- Leveraging Step:
- Added the threshold option for the match to copy into
the target.
- Segmentation Step:
- Added option to overwrite type of output segmentation
in files such as XLIFF.
- Added option "Overwrite existing segmentation".
- Sentence Alignment Step:
- Improved handling of whitespaces.
- Updated default SRX rules for the step.
- Search and Replace Step:
- Added option to replace on source and or target when using filter
events.
- Added support for \uHHHH notation on all modes
- IMPORTANT: The behavior of this step has changed when no
target is in the text unit (in filter event mode):
- Before M10: A copy of the source was automatically copied as the target
and the sreach and replaced performed on that text.
- Starting with M10: text unit with empty target are simply not processed.
You must now have a Create Target step before this step to copy the source
into the target.
- XLIFF Splitter Step:
- Improved support for large documents. This fixes issues #146 and #147.
- Translation resources:
- Microsoft MT Connector:
- Updated the connector to use the v2 HTTP API instead of the v1 SOAP one
(which is no longer accessible)
- Filters:
- Added the Moses Text Filter for processing Moses MT system data files.
- XML Stream Filter:
- Fixed issue #145 about PI being moved.
- Fixed issue #150 about inline codes being incorrectly escaped.
- Changed default so apostrophes are not escaped in output.
- Implemented default extraction for CDATA sections when no sub-filter is
defined.
- Fixed issue null point with empty CDATA sections.
- Fixed issue of conditions not being applied on CDATA sections.
- HTML Filter:
- Fixed escaping of inline codes detected using the codeFinder option.
- Fixed issue #90 (CDATA section not extracted)
- IDML Filter:
- Refactored the filter completely. The filter is still beta, but has
been improved significantly.
- TS Filter:
- Fixed issue with null character output on string with inline codes when
using TS encoder.
- Finished implementation of the <byte> element as inline code.
- Table Filter:
- Improved handling of qualifiers.
- Added pre-defined filter configuration for Haiku catkeys file format.
- TTX Filter:
- Improved mapping of leveraged entries with a score to Okapi annotations.
- Fixed filter-writer for pre-segmented RTF output.
- Fixed handling of split opening tags (TTX tags with Type="start" and
leftEdge="split").
- Improved handling of isolated </df> tags in un-segmented content.
- Changed extraction to extract non-segmented parts of text entries (this
closes issue #151 and 157)
- Trados-Tagged RTF Filter:
- Improved parsing for fldinst and xmlopen fields: their content is not
included in the extracted text.
- Added warning when part of the target segment is hidden.
- XLIFF Filter:
- Added the option to override the original target language.
- Improved mapping of alt-trans attributes to Okapi annotations.
- Implemented options to add possible new <alt-trans> elements in output
files (in addition to the one in the original document), and to
include or not extension information in the new <alt-trans>.
- Added the output option: "Segment only if the entry is segmented and
regardless how the input was".
- Libraries:
- Completely refactored the IQuery, QueryResult, QueryManager and related
classes.
- Removed the ScoreAnnotation and ScoreInfo classes that were deprecated
in M7. Use AltTranslationsAnnotation instead.
- Added capability to write extra data in header and phase-name in
trans-unit for XLIFFWriter class.
- Upgraded the SWT libraries to 3.6.1.
- Fixed the handling of literal and 
 in most XML-based filters
and encoders, so \r is not stripped out or converted to \n.
- Added setBoolean, setString and setInteger by name in IParameters.
- Upgraded ICU4J library from 4.0.1 to 4.6.
- Fixed issue with not re-balancing codes after inser/append in
TextFragment (this solve several bx/ex-related issues)
- Changed transferCodes() method used in merging in Rainbow, Tikal and
xliffkit, to fix merging issue with <g>-type XLIFF codes.
Changes from M8 to M9
- Rainbow:
- Translation package Post-Processing utility:
- Fixed the bug where pre-translated XLIFF entries with translate='no'
could not be merged back properly, for example for PO files.
- Added the user option "Always show the Log when starting a process".
- Tikal:
- Fixed the bug in the Merge command where pre-translated XLIFF entries
with translate='no' could not be merged back properly, for example for PO
files.
- Switched help to use the wiki.
- Ratel:
- Windows position and size are now saved for the next session.
- CheckMate:
- Added capability to save and load configurations outside the session.
- Improved pattern checks defaults and processing.
- Added support for short vs. long text in text length verification (new
Length tab)
- Added experimental support for terminology verification.
- Added support for exceptions in verification of double-words.
- Added some limited support for string-based term verification.
- Translation resources:
- Added
batchQuery method to the IQuery interface.
- Added
leverage method to the IQuery interface.
- Open-Tran connector:
- Changed implementation to use the REST API instead of the XML-RPC.
- Improved support for queries with inline codes.
- SimpleTM connector:
- IMPORTANT: Changed the H2 database dependency from version
1.1.103 (.data.db files) to 1.2.135 (.h2.db files), this breaks backward
compatibility: the new SimpleTM connector cannot open the old .data.db
files. To convert an older TM: Use a M8 or prior version of Rainbow to run
the SimpleTM to TMX step to export your database to TMX. Then, Use
this version of Rainbow to run the Generate SimpleTM step to convert
your TMX document into a new .h2.data file.
- Steps:
- Added the Resource Simplifier Step. It modifies normal
reources of filter events into simpler resources for some third-party tools.
- Added the XLIFF Spitter Step. It splits several
<file> inside an XLIFF documents into separate documents.
- Added the Id-Based Aligner Step. It aligns text units from two input
files, based on their unique IDs (resname).
- Added the XML Validation Step. It performs well-formness XML
verification and optionally, DTD or schema validation.
- Sentence Aligner Step:
- Updated so entries with empty text are skipped and don't cause an error.
- Diff Leverage Step:
- Added support for 3 input files: new source, old source, old
translation. The second and third files must have the same text units (same
number and same order).
- Filters:
- Modified several filters to generate unique extraction ids in
non-text-unit events.
- Vignette Filter:
- Added support for monolingual documents.
- XML Filter:
- Fixed the bug where text extracted from attribute values was not
processed for the codeFinder option.
- Libraries:
- Implemented the Appendable and CharSequence interfaces for TextFragment.
- IMPORTANT: Changed
TextFragment.toString() to
return the coded text instead of the original content of the fragment. The
previous behavior of toString() is now accessible using
text().
- The
net.sf.okapi.lib.extra.pipelinebuilder package has been
added. It allows you to easily script run pipelines, for example using
Jython.
Changes from M7 to M8
- Installation:
- Added a distribution for the Windows 64-bit platform.
- Rainbow:
- Fixed the bug where the initial character of input file was truncated if
root had a final slash or backslash.
- Replaced the Line-Break conversion utility by the "Line-Break
conversion" pre-defined
pipeline.
- Added the "Run Quality Check Session" command to to Tools menu.
- Fixed the issue #139 where a target SRX was required for segmentation in
"Translation Package Creation".
- CheckMate:
- Added CheckMate: a standalone application to run a the quality checker.
- Translation resources:
- Added a first simple connector implementation for the TDA Search
services.
- Steps:
- Added the Term Extraction step.
- Added the Quality Check step. Including support for
Language Tool Checker.
- Added Line-Break Conversion step.
- Added the Image Modification step.
- Full-Width Conversion step:
- Added the option to convert Squared Latin Abbreviations parts of the CJK
Compatibility block to non-CJK.
- Added the option to convert some of the Letter-Like Symbols block to
simple character sequences.
- Format Conversion step:
- Added the output to Parallel Corpus Files (for example to use as input
for training MT systems)
- Added the option "Output only approved entries".
- Search and Replace step:
- Added support for
\n, \r, \t, and
\N in replacement feild when in regex mode. Resolve issue #123.
- Filters:
- XML Filter:
- Added support for unique ID in pre-defined configuration for RESX files.
- Added the
omitXMLDeclaration option to the parameters file.
- XMLStream Filter:
- Added new filter for streamed XML, e.g. to handle large documents.
- TTX Filter:
- Replaced ScoreInfo annotation by AltTranslation annotation.
- Added the option of escaping the character "greater-than" in output.
- Improved the supported for overlapping TTX
<df> tags.
- Trados RTF
- Improved the RTF filter
- Integrated it as Trados RTF filter (Reading mode only, and inline codes
only when represented with Trados styles). This filter
cannot be used for normal extract/merge operations, but is useable for any
function that requires only extraction.
- Table Filter:
- Fixed issue #138 where tab was not useable as separator in "csv" mode.
- Fixed issue #136 where a defined Record ID was not set properly.
- Fixed issue #137 where the Source column of Source was incorrectly set
- Libraries:
- Added
getDefaultConfigurationFromExtension() to filter
configurationn mapper.
Changes from M6 to M7
- Rainbow:
- Fixed issue where output example was not updated when the top input
file was removed.
- Fixed issue where pipeline file was not written as UTF-8.
- Translation package Creation:
- Fixed issue #132 where we generated segment <mrk> in XLIFF if the text
was pre-translated but not segmented.
- ID-Based Alignment
- Implemented request #134: A TMX output can now be created for
un-aligned entries.
- Libraries:
- Changed the SVN structure to allow checking-out and building the
libraries separately from the UI and apps. To get the base libraries
only:
http://okapi.googlecode.com/svn/trunk/okapi. To get everything:
http://okapi.googlecode.com/svn/trunk.
- Changed the TextContainer class and refactor all dependencies. This
modification is a major code change.
- Added the
setRootDirectory() method to the IQuery interface.
- Updated QueryManager to handle empty inline codes and inline codes
with references when leveraging fuzzy matches
- Added spin-like input part to the generic editor.
- Fixed bug where platform type for "cocoa" was not handled, and
therefore Mac not detected in some occurrences.
- Added support for ftnsep, ftnsepc, aftnsep and aftnsepc control
words in the RTF parser, so any defined paragraph or character is
skipped.
- Added the following generic UI parts to the generic editor:
SpinInputPart and SeparatorPart,
- The ScoresAnnotation class has been deprecated, use the new
AltTranslationsAnnotation instead.
- Fixed help location issue for SRX editor (Ratel).
- Translation resources:
- Updated all the connectors for the IQuery change and Implemented
${rootDir} for all the connectors using locale files: SimpleTM and Pensieve.
- Apertium MT:
- Cross-Language Gateway MT services:
- Filters:
- OpenXML filter:
- Fixed an issue with open/closing group in some conditions.
- Fixed an issue with a case of text box resulting in hanging.
- XML filter:
- Added pre-defined configuration for WiX (Windows Installer XML)
Localization files.
- Improved handling of empty elements.
- XLIFF filter:
- Improved the reading of pre-segmented content, so the segment Ids are
now preserved instead of re-generated.
- Fixed parent-id for StartSubDocument event/resource.
- Implemented read-only property for build-num in <file> and extradata in
<trans-unit>.
- Improved support for segmentation choices in output. Now the filter can
remove, add or keep the segmentation for each trans-unit.
- Vignette filter:
- Fixed issue of 64K limit of blocks (due to Java DataOutputStream
writeUTF() limitation): added multi-chunks write/read function.
- Ruby on Rails YAML filter:
- Added support for Ruby on Rails YAML filter. It offers partial support of
YAML files.
- Versified Text Filter:
- Added support for filter on versified text documents.
- HTML filter:
- Fixed default configurations to extract ALT attribute of AREA elements.
- TMX filter:
- Fixed the bug where the option "escape greater-than characters" was not
working.
- Steps:
- Implemented ${rootDir} for the follwoing steps: Format Conversion,
Generate SimpleTM, Segmentation, TM Import, Leveraging, Batch Translation.
- Segmentation step:
- Made copy of source into empty target an option.
- Added the option of verifying source and target segments match after
segmentation.
- Added the "Diff Leverage" step.
- Added the "External Command" step.
- Sentence Alignment step:
- Added support to use a single bilingual input file.
- Format Conversion step:
- Added the option to generate output files with
automated extension.
- Text Modification step:
- Implemeted Request #100: An option to modify or not entries without text.
Changes from M5 (0.5.1) to M6
- Installation:
- Updated the Macintosh distributions with application bundles for Rainbow
and Ratel.
- Changed the Macintosh distributions to GunZIP files to preserve executable
flag of the shell scripts.
- Rainbow:
- Translation Package Creation:
- Fixed the issue where pre-segmented RTF
output was losing referents in target.
- Fixed the deletion of the empty TMX files when the package is zipped.
- Added English-India in the locales list.
- Fixed bug where steps using 3 input lists for more than 3 input files
were getting null values instead of raw documents.
- Added support for plugins for steps, filters and parameters editors.
Just drop the JAR in the
dropins folder.
- Updated the way the utilities menu is stored.
- Replaced the "URI Conversion" utility by a pre-define pipeline using the
"URI Conversion" step.
- Tikal:
- Added support for plugins for filters and parameters editors. Just drop
the JAR in the
dropins folder.
- Steps:
- Format Conversion step:
- Fixed the issue where monolingual segmented input was not output
properly in tab-delimited format.
- Added the "Desegmentation" step.
- Added the "URI Conversion" step.
- Added Import/Export functions to the dialog box of the "Search and
Replace" step
- Libraries:
- Changed QueryManager:
- Allow code changes in target for the
non-segmented queries.
- Prevents exact matches to have the target codes "adjusted" from the
source.
- Added setReferentCopies() to GenericSkeletonWriter to allow correct
output for writers refering more than once to the referents (e.g. when
creating pre-segmented RTF with source and target).
- Moved lib-plugins to common.
- Translation resources:
- Added in SimpleTM an option for code content and order difference
between query and source text
- Filters:
- HTML filter:
- Added support for inline codes using regular expressions.
- Table filter:
- Fixed issue #124 where part of the copy of the file configuration was
dropped for TSV files whn creating package for XLIFF.
- TTX Filter:
- Fixed issue #130 where empty TargetLanguage attributes were not updated
with the target language code.
- XML filter:
- Improved the pre-defined configuration for Android resources files.
- Fixed issue #128: help example for codeFinder:
count=1 is now
count.i=1.
Changes from M5 (0.5) to M5 (0.5.1)
- Rainbow:
- Translation Package Creation:
- Fixed the bug where the encoder manager
for RTF output was not properly set and cause some formats like HTML, TMX,
etc. to have un-escaped characters.
- Changed the RTF writer to allow other skeleton writers than
GenericSkeletonWriter.
- Replaced the Search and Replace utility by the "Search and Replace with
Filter" and the "Search and Replace without Filter" pre-defined pipelines.
- Replaced the Text Rewriting utility by the "Text Rewriting" pre-defined
pipeline.
- Tikal:
- Fixed the issue of not having the HTML filter mapped when using the
Vignette filter.
- Added support for accessing Microsoft MT engine (
-ms
option).
- Translation resources:
- Added a connector for Microsoft MT Web services (http://api.microsofttranslator.com/V1/SOAP.svc),
a Microsoft Bing AppID is needed to use it. You can obtain one at
http://www.bing.com/developers/appids.aspx.
- Google MT: made it consistent with other connector when result
is same as target, now the result is returned.
- SimpleTM: Added made the feature "penalize exact matches when target has
different codes than the query" an option. (default is true, backward
compatible).
- Libraries:
- Fixed issue with GenericSkeletonWriter and in-line codes in segmented
text unit that were outside any segment.
- Fixed issue with GenericFilterWriter output stream not nullified in
close() (causing for example no output using FilterEventsToRawDocument).
- Steps / Pipeline:
- Added MULTI_EVENT (new resource and Event) handling to pipeline.
- Changed step handlers to return Event by default.
- Fixed the parameters setting bug preventing to save the parameters for
pre-defined pipeline from one session to the next.
- Leveraging step:
- Fixed the bug preventing to enter a TMX path.
- Made adding an
MT! prefix to the TMX entries an option.
- Added an option to enabled/disable the step.
- Search and Replace step: Improved the behavior of the dialog box for
add/edit item.
- Format Conversion step: Fixed bug where the table-delimited output was
not closed properly for "one output per input" use case.
- Added Text Modification step.
- Filters:
- PHP Content filter: Added UI for the localization directives options
(default behaviour is the same).
- OpenXML filter: Changed the parameters editor to use GridLayout instead
of BorderLayout.
- TMX filter: Fixed losing original line-breaks between <tu> when
re-writing.
- Vignette filter: Fixed bug of un-escaped and non-CDATA RTF output.
- Properties filter: Added the option "Convert \n and \t to line-break and
tab".
- Table filter:
- Fixed issue #119 where csv action "Exclude leading/trailing..." was not
updated properly in the parameters editor
- Fixed issue #118 where some csv cases were not extracted properly
- Installation:
- Updated licence information for third-party packages.
- Removed all the dependencies to swing2swt.
Changes from M4 to M5
- Libraries:
- Changed minimum requirement to Java 1.6 instead of Java 1.5.
- Removed distribution for Mac Carbon, added distribution for Mac
Cocoa-64-bit.
- Updated to Lucene 3.0.0
- Refactored Pensieve TM engine, added new API.
- Rainbow:
- Added the duration of the process in the log.
- Updated the UI of the Pipeline Edit / Execute facility to make the
panels of each step accessible without clicking.
- Replaced the utility "Generate SimpleTM Dabase" by the
pre-defined pipeline "Import
Into Pensieve TM" (the previous utility's functionality is still available
using a custom pipeline).
- Replaced the utility "Export SimpleTM Database" by the
pre-defined pipeline "Convert
File Format" (the previous utility's functionality is still available using
a custom pipeline).
- Fixed issue with Text Rewriting and empty <target> for XLIFF input.
- Replaced the utility "Translation Comparison" by the pre-defined
pipeline "Translation Comparison".
- Added the pre-defined pipeline "Create Translations in Batch Mode"
- Replaced the utility "XSL Transformation" by the pre-defined pipeline
"XSL Transformation".
- Replaced the utility "Used Characters Listing" by the pre-defined
pipeline "Used Characters Listing".
- Ratel:
- Fixed selection bug in UI.
- Updated the default segmentation rules.
- Steps:
- Added Batch Translation step (tested with ProMT and Apertium).
- Added Codes Removal step
- Added Leveraging step
- Completed initial Tokenization and Word-Count steps.
- Added the Sentence Alignment step.
- Translation resources:
- Fixed issue with score > 100 in Pensieve TM.
- Added NCR support for Apertium connector.
- Filters:
- In the Properties Filter: Added pre-defined configuration for Skype's
.lang format.
- In the RTF parser:
- Fixed the issue with \'HHc being read as \'HH\'HH in some cases.
- Added support for additional DBCS encodings.
- Added TTX Filter for Trados TagEditor documents (Beta).
- In the HTML Filter: Added pre-defined configuration for well-formed files, providing groups
and extra meta-data.
- In the XML Filter: Changed the ITS extension
idPointer to
idValue and modified its behavior to allow ID values to be
generated from the expression, not just from the content pointed by the
expression. The values are backward compatible, but existing parameters file
will have to rename any reference to idPointer by idValue.
- Added the Vignette Filter for Vignette export XML documents (Alpha)
- Added the Pensieve Filter for reading and writing Pensieve translation
memories.
Changes from M3 to M4
- Filters:
- XLIFF filter: Added property for target-language and option to add it.
Changed some of the language selection behaviors and set fall-back to ID
option to false.
- Fixed several bugs in the OpenXML filter (MS Office 2007 documents)
- The JSON Filter has been added, to support for example AJAX or Palm
WebOS applications.
- The PHP Content Filter has been added, to support PHP include files.
- Added default DITA configuration to the XML Filter.
- Fixed several issues with the TS, Table, TMX, and XLIFF filters.
- Added
whiteSpaces ITS extension support in the XML Filter.
- The PHP Content Filter has been added.
- Library, Translation resources:
- All the TM and MT connectors have been moved to the package
net.sf.okapi.connectors.
- Modified the OpenTran connector to use the REST interface instead of
RCP.
- Added the connector to the MyMemory server (http://mymemory.translated.net)
- Improved Google MT connector.
- Improved GlobalSight TM connector for inline codes, and adjusted it for
GS version 7.1.6.
- Added Pensieve TM engine and its connector.
- Added the connector for the open-source Apertium MT system web
service (http://wiki.apertium.org/wiki/Main_Page)
- Changed language identification from String to LocaleId objects
across the whole framework.
- Steps and Rainbow utilities:
- Added the SimpleTM2TMX step.
- Added Import and Export utilities for SimpleTM files.
- Continued improving the Tokenization and WordCount steps.
- Implement an option to select the XSLT processor to use with the
XSL
Transformation utility.
- Updated the Translation Package Creation utility to
select from several resources for the pre-translation options, and to
allow specifying threshold instead "exact match only".
- Updated the Text Rewriting utility to select from
several resources for the translation options.
- Added the FormatConversion step.
- Improved inline compatibility in projects generated for OmegaT.
- Tikal:
- Added support for accessing the MyMemory repository (
-mm option)
- Corrected display of extended characters on the console for some
languages/platforms.
- Added threshold and max-hits options for TM query command (
-opt
option)
- Added a command to create PO files from any input (
-2po
command).
- Added a command to create TMX files from any input (
-2tmx
command).
- Added a command to create Table files from any input (
-2tbl
command).
- Added capability to query a Pensieve TM (
-pen option).
- Added support for accessing GlobalSight TM servers (
-gs option).
- Added support for accessing Apertium MT servers (
-apertium option).
- Added segmentation and leveraging options for the extraction command.
- Added a commands to import any file into a Pensieve TM (
-imp
command).
- Added a command to export a Pensieve TM to a TMX file (
-exp
command).
Changes from M2 to M3
- The build system has been completely redone and now uses Maven as its
main builder. This has resulted in several changes in the structure of the
Okapi classes, and in the way the files are distributed.
- Filters:
- Added the TS Filter (beta) for Qt translation files.
- Fixed handling of fuzzy flag for plural entries in the PO filter.
- Fixed handling of
approved, state and
coord properties in the XLIFF
Filter.
- Improved XML Filter:
- Improved rewriting of document type subset declaration.
- Added support for protecting custom entity references.
- Added support for ID defined using
xml:id or the
idPointer ITS extension feature.
- Properties Filter:
- Change the default configuration to always escape output.
- Added pre-defined configuration for non-escaped output.
- Fixed various issues in the OpenXML Filter (docx, pptx, etc.), and
PO Filter.
- Libraries:
- The Google MT connector has been enhanced to have the inline codes taken
into account, not simply pushed to the end of the text.
- Fixed one error in default segmentation rules.
- Added a connector component for the Translate Toolkit TM server.
- Added steps such as Word-count and Tokenizer.
- The command-line tool Tikal has been added.
- Rainbow (okapi-apps distribution only):
- Improved handling of un-approved translations in TMX generated
during a translation package creation.
- Added option to choose to merge only approved translations in
translation package post-processing.
Changes from M1 to M2
- Filters:
- The DTD Filter has been added.
- The PlainText Filter has been added.
- The Table Filter has been added.
- Several pre-defined filter configurations have been added or updated:
Mozilla-RDF, XML Android Strings, XML Java properties, RESX, Monoligual PO,
SRT (Sub-titles), plain-text lines, plain-text paragraphs, CSV, etc.
- The OpenXML Filter (DOCX, PPTX, XSLX files) has been improved and now
provides much inline code simplification.
- The definition of the parameters for the RegEx Filter have been modified
to allow the support of target text, ID, etc. This new format is not
compatible with the one of M1.
- Other filters (HTML, Properties, XLIFF, TMX, PO, and OpenDocument
filters) have been improved.
- Libraries:
- A new TM connector to query remote GlobalSight TM servers has been
added. (See the Java Example05 of the okapi-lib distribution for an
illustration on how to use this component).
- A connector to query the remote OpenTran server has been added. (See the
Java Example05 of the okapi-lib distribution for an illustration on how to
use this component).
- New
RawDocument object model.
- The events mechanism has been augmented to work with batch items in the
pipeline.
- The encoding detection and handling of BOM has been modified in most
filters and utilities.
- The pipeline mechanism has been extensively re-written.
- Many steps for the pipeline have been created, they are experimental for
now.
- Rainbow:
- The selection of the filter settings is now done using the new filter
configuration mapping system integrated in the library.
- An experimental interface for creating and executing pipelines has been
added (see Utilities > Edit / Execute Pipeline)
- The creation of OmegaT, XLIFF and RTF translation packages has been modified to handle
pre-segmentation and pre-leveraging.
- Uses the latest libraries.
- Ratel:
- Better preservation of comments in SRX files; and capability to add
comments from within Ratel.
- Uses the latest libraries.