Okapi Framework Changes Log - @date@
Note this document is common to both the okapi-lib distribution
and the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from M21 to M22
- Tikal:
- Made it possible to run tikal.sh from another directory on Mac OS
X.
- Updated the way the application root folder was computed to allow
call from Network share.
- Rainbow:
- Fixed
-log option to allow it anywhere in the
command-line.
- Filters:
- Table Filter:
- Fixed issue #300 (enhancement): Added a new Table Filter for
2-column (source + target), tab separated files.
- OpenXML Filter:
- Fixed issue #166: Text from mc:Fallback and mc:Choice
Requites="wps", WordArt, TextArt, and Watermarks is handled
properly now.
- Fixed issue #169: Segmentation around inline codes seems to
work properly.
- Fixed issue #286: PPTX smart-tags are now imported.
- Fixed issue #323: Files are not corrupted anymore when using
text areas.
- Fixed issue #324: Nested <w:p> merge properly now.
- Fixed issue #325: The slides of PPTX documents are now
extracted in order.
- Fixed issue #329: Text from PPTX diagrams are now extracted.
- Fixed issue #351: Creation of XLIFF work on documents with
SmartArt graphics.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #332: When using the global_cdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Fixed issue #339: The filter was not grouping the tags back
properly when merging back.
- Added handling of variable placeholders for the pre-defined
settings for RESX files.
- ITS Filters (XML Filter and HTML5+ITS Filter):
- TMX Filter:
- Fixed the issue where <it> codes where mapped to
placeholder rather than opening/closing internal codes.
- XLIFF Filter:
- Continued implementation of ITS 2.0: Improved support for LQI,
added support for Provenance.
- Simplification Filter:
- General:
- Filters that update language properties (like xml:lang) during
merging will now be region-insensitive when doing so.
- Steps:
- Term Extraction Step:
- Added support for Text Analysis annotations.
- Made the three extraction methods options, and attached the
relevant options to the statistical method.
- Full-Width Conversion Step:
- Added log message if at least one character was modified (per
input file). This resolves issue #327.
- Enrycher Step:
- Improved hanlding of nested annotations.
- Batch TM Leveraging Step:
- Fixed issue #331: Entries with no text are now not sent for
translation.
- Format Conversion Step:
- Fixed the issue where the "Output generic inline codes" was not
recognized for the Tab-delimited table output.
- MS Batch Translation Step:
- MT candidate with a very low score (e.g. from error) are not
output in the TMX.
- Space Checker Step:
- Improved reporting of errors and changes.
- Fixed issue #346: Iterating through text fragments ran out of
bounds. Indexing error was fixed.
- Fixed issue #348: inline code index marker broken as a result of
spacing changes. Index marker error was fixed.
- Translation Comparison Step:
- Consolidated Paragraph Alignment and Sentence Alignment steps
- Connectors:
- Microsoft MT Connector:
- Improved error handling (e.g. problem with inline codes in
result).
- Filters Plugin for OmegaT:
- Added *.xliff and *.sdlxliff as default extensions.
- Changed default for isFileSupported() to return true (this allows
user-defined extensions).
- Libraries:
Changes from M20 to M21
- All applications:
- Applications now launch correctly on Mac OS X when they are
located in a path containing a space.
- Rainbow:
- Added the
-log option to specify result log file. By
default the log file is {user.home}\rainbowBatchLog.txt
- Tikal:
- Added the
-safe option to prompt user when
overriding a directory when extracting.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- XLIFF Filter:
- Continued implementation of ITS 2.0.
- Added support for
okp:engine attribute in <alttrans>.
- Wiki Filter:
- Fixed issue #315: WikiFilter didn't work with
preserve_whitespace: true.
- Regex Filter:
- Improved the macStrings default settings to include slash+star
comments with next extracted string.
- IDML Filter:
- Fixed issue #316: Added default to not extract hidden layers
and added the option "Extract hidden layers".
- Enabled the option "Create new paragraphs on hard returns".
Important: This option is still BETA and may prevent you to
merge back the extracted file. Make sure to test the
round-trip before using this option for real projects.
- TMX Filter:
- Fixed a bug where attribute values on <tuv> elements
were being written back to the skeleton without proper escaping.
- Improved filter performance.
- Properties Filter:
- Fixed issue #313 where the extended characters were not
escaped when using the sub-filter.
- TS Filter:
- Changed the instantiation of the XML parser to use Woodstox.
- XML Stream Filter (Abstract Markup Filter):
- Fixed issue #303: When using the global_pcdata_subfilter
option, the filter will no longer generate extra segments
consisting only of placeholders.
- Steps:
- Microsoft Batch Translation Step:
- Added support for the
${domain} variable for the
category.
- Added support for
${rootDir} and other variables
for the path of the Engine mapping file.
- Quality Check Step:
- Added option to save or not the session. This option is not
accessible when editing the parameters from CheckMate, but only
when editing a step's parameters.
- Rainbow Translation Kit Creation Step:
- Added to XLIFF outputs the option of outputting
ctype
and equiv-text attributes in inline codes.
- Added to OmegaT output the output of
ctype and
equiv-text attributes in inline codes.
- Added the option to merge a new OmegaT translation kit with an
existing one, rather than overwriting it.
- Enrycher Step:
- Added support for segmented text units and implemented
handling of inline codes.
- Added parameter for number of events to process on each call
to the service.
- Translation Comparison Step:
- Added option to log the average scores per documents in a
tab-delimited file.
- Added the output of a new tab-delimited file with all scores,
along with the HTML report.
- Extended the repartition table to use 11 brackets instead of
3, and include the two scores.
- Segmentation and Desegmentation Steps:
- Added the option to renumber code IDs after segmentation so
that they are 1-indexed as much as possible. A corresponding
option on the desegmentation step reverses the process. This
option will not work correctly with formats that use
non-consecutive or non-numeric code IDs, such as XLIFF.
- Connectors:
- Microsoft Translator Connector:
- Added information about the engine in the query results.
- Libraries:
- Continued implementation of ITS 2.0 in XLIFF Writer.
- Changed options settings for the XLIFFWriter class to use an
object rather than multiple setters.
- Filters Plugin for OmegaT:
- Fixed issue #322: Updated the TS filter to use the Woodstox
parser, and added the dependencies.
- Added the XLIFF Filter to the plug-in.
- Added basic support for some ITS data categories in the Comments
pane (Text Analysis, Terminology).
Changes from M19 to M20
- Rainbow:
- Improved the logging output and UI responsivness during lengthy
processes.
- Updated the user's preference dialog to allow the selection of the
log levels as defined by SLF4J (Normal, Debug, Trace)
- Rainbow's input root directory now supports expansion of system
environment variables.
- Tikal:
- Fixed -lfc command output.
- Use the -continue option to specify that batch operations should
continue processing even if one or more files in the batch fail to
process.
- Summary information will be included at the end of batch commands.
- Timing information is included for each file processed, and total
elapsed time is included in the batch summary.
- Added the -pd option to specify a directory to search for custom
filter configurations.
- Fixed a crash when merging (-m) a file with no extension.
- Filters:
- ITS Filters (XML Filter and HTML5+ITS Filter):
- Continued the implementation of ITS 2.0
- Fixed issue where the HTML-type special characters were not
escaped when converted to inline codes by the code finder.
- Fixed issue #311 where preserve space property was not applied
to attributes.
- TXML Filter:
- Fixed issue #266: Translations in the
<revisions>
elements are now ignored.
- XLIFF Filter:
- Improved support for
<mrk> elements.
- Added support for several ITS features.
- Connectors:
- Microsoft Translator Connector:
- Fixed the internal conversion of the language code.
- Libraries:
- Fixed issue #282 for the Abstract Markup Filter.
- Fixed issues in reporting libraries.
- Added support for including ITS annotations in the XLIFF 1.2
writer.
- Steps:
- Rainbow Translation Kit Creation:
- Added the "Include post-processing hook" option for OmegaT
packages. this allows OmegaT to merge back the documents
automatically.
- Remove Target:
- Fixed issue #270 where the step could not be run without some
of the optional parameters set.
- Microsoft Batch Translation:
- Added an option to send the generated TMX document as a raw
document for the input of the next step.
- Added a option to point to a .properties file containing a
mapping of keys to categories for more convenient lookup. See
wiki.
- Quality Check Step:
- Fixed issue #304 where the default check on parentheses didn't
include full-width characters.
- Inline Code Removal Step:
- Added option to replace line break related codes with spaces.
By default codes are simply removed.
- Added the Space Check Step. It allows to fix automatically spaces
around inline codes of the target based on the source.
- Added the Cleanup Step. It allows to normalize quotation marks,
punctuations, remove suspect entries, etc. this can be used for
example when preparing an aligned document for MT training.
Changes from M18 to M19
- Rainbow:
- Changed the logging system.
- Updated the Translation Comparison pre-defined pipeline, to
include a word-count step before the comparison.
- Tikal:
- Changed the logging system.
- Fixed issue #278: Added default mapping for the RTF Filter.
- Changed loading of the filters so all are taken into account on
initialization.
- Added -logger flag, to output all messages to the current logger
instead of the console. Might be pretty noisy, but you have full
control by tweaking the logger configuration.
- Changed the existing -trace flag to also output all debug messages
from other components when in console mode (no effect on logger).
- CheckMate:
- Changed the logging system.
- Added several features (See Library / Checker below for details).
- Filters Plugin for OmegaT:
- Changed the logging system.
- Steps:
- Added the Segments
to Text Units Converter Step.
- Added the
Enrycher Step (Alpha).
- Id-Based Aligner:
- Fixed issue where the multilingual files where detected on the
source input rather than the target.
- Added Fuzzy scores to the reports. Added statistics to the
html report. Added wordcount analysis if there is a
SimpleWordCountStep before the TranslationComparisonStep. Added
SimpleWordCountStep to the predefined translation comparison
pipeline.
- Rainbow Translation Kit merging Step:
- Fixed issue #281: pre-segmented source is now re-segmented
with original IDs if needed, avoiding to have different segment
IDs between source and target, that cause incorrect source
segment ID in the final output if it is a segmented document
like XLIFF.
- Simple Word Count Step:
- Removed the access to the un-supported option of counting
target content.
- Translation Comparison Step:
- Added output based on source word-count, if one is available.
- MS Batch Translation Step:
- Fixed the issue of not passing the category to the connector.
- Filters:
- MIF Filter:
- Restored the escaping of the original data of inline codes
(Issue #285)
- Reworked the parsing of the lone Strings content for escaped
characters. Now all control characters are mapped or made inline
codes. Then if the MIF version is less than 8, the decoding
using the character set is used, otherwise, the character is
enclosed in an inlien code as we can't know what it is.
- Added the HTML5-ITS Filter.
- Added the Doxygen Filter.
- Added the Wiki Filter.
- HTML Filter:
- Fixed issue #276: Code finder is now applied also to extracted
text from attributes.
- XML Stream Filter:
- Fixed issue #290: Incorrect output when configuring empty
element with [INLINE, EXCLUDE]
- XML Filter:
- Refactored the filter to use a shared code with the HTML5+ITS
Filter
- Implemented numerous changes for the ITS 2.0 support.
- XLIFF Filter:
- Added read support for the
its:allowedCharacters
attribute.
- Libraries:
- Change the logging system to use SLF4J everywhere.
- ITS Engine/Filter:
- Checker:
- Added check for the ITS Allowed Characters data category.
- Removed the Threaded Pipeline package.
Changes from M17 to M18
- Rainbow:
- Changed the batch mode so the log and error output go to the
rainbowBatchLog.txt file in the user home directory.
- Added the
-rd option (root directory) to the
command-line mode.
- Tikal:
- Resolved issue #252: Added
-rd option to set the
root directory (Patch from Mihai)
- Fixed issue #254: Added support to change the display encoding
using the
.Tikal configuration file and the displayEncoding
setting.
- Fixed issue #213: Added the
-noalttrans option for
the "Leverage Files From Moses" command.
- CheckMate:
- Added several features (See Library / Checker below for details).
- Longhorn:
- Made API available to execute pipelines with multiple target
languages.
- Filters:
- RainbowKit Filter:
- Added pre-defined configuration "noprompt" for processing
without prompting.
- XML Filter:
- Added implementation for the ITS 2.0 Domain data category.
- Added implementation for the ITS 2.0 External Resources
Reference data category.
- Added implementation for the ITS 2.0 Locale Filter data
category.
- Added implementation for the ITS 2.0 Preserve Space data
category.
- Added implementation for the ITS 2.0 Storage Size data
category.
- XLIFF Filter:
- Added support for the
maxbytes and ITS Storage
Size data category.
- MIF Filter:
- Fixed: For MIF 10 files, the wrong encoding was used to read
the files.
- TMX Filter:
- Added option to set TextUnit Segmentation flag based on the
filter configuration.
Important: This introduce a change in behavior. Now, by
default, TMX entries that are set to segType="segment" or not
set at all will be marked as segment in the extracted
resource. this may affect some output, for example XLIFF,
where such entries will be noted as segmented as well.
- Steps:
- Batch Translation Step:
- Fixed issue #260 where an error occured at the end of the
batch process when the option 'send TMX to the next step' was
not set.
- Translation Comparison Step:
- Added support for
${rootDir}, ${inputRootDir}
and locale variables.
- Format Conversion Step:
- Added the UI for the option "Do not output entries without
text".
- Segmentation and Desegmentation Step:
- Will now affect all target locales, that you are processing.
- Connectors:
- Microsoft Translator Connector:
- Updated code and interface to use Azure Marketplace Access
Tokens instead of AppIDs.
- Libraries:
- Upgraded the SWT library from 3.7 to 4.2.
- XLIFFWriter:
- Added support for its:domain and its:externalResourcesRef
properties
- Checker:
- Added XLIFF validation against XLIFF schema (
xliff-core-1.2-transitional.xsd)
(feature, issue #263)
- Adding the ability to generate XML report (feature, issue
#264)
- Fixed issue #257: different leading/trailing spaces does not
detect inconsistent non-breaking spaces.
- Added support for validating the ITS Storage Size property for
text unit.
- Removed extra files leftover from xliff-lib and olifant migration.
- Added the TARGET_LOCALES step parameter and depracted the
TARGET_LOCALE one.
- ITSEngine class for ITS 2.0:
- Added support for the Target Pointer data category.
- Added support for the Domain data category.
- Added support for the External Resources Reference data
category.
- Added support for the Locale Filter data category.
- Added support for the Preserve Space data category.
- Added support for the Localization Quality Issue data
category.
- Added support for the Storage Size data category.
- Added options to pre- and post-process text that is being
letter-coded to prevent interference with letter-decoding.
- Trados Utilities Plugin:
Changes from M16 to M17
- Rainbow:
- Resolved issue #236: Added custom parameters folder default in
User Preferences.
- Longhorn:
- fixed concurrency issues.
- fixed problem of reading existing projects.
- Tikal:
- Fixed issue #249 where the HTML filter was not always loaded for
sub-filter cases (Patch from Mihai)
- Filters:
- XLIFF Filter:
- Fixed issue #218 where file name was missing from XML parsing
error (Patch from Mihai).
- Fixed issue #219 where empty <target> where not created
instead of being created empty (Patch from Mihai)
- Fixed the reading of the coord target property.
- Plain Text Filter:
- Fixed the link for the help button.
- Fixed issue #248: Code finder not invoked.
- Table Filter:
- Fixed the link for the help button.
- Fixed bug when reseting the reader that caused "Marker
invalid" error in some cases.
- JSON Filter:
- Fixed issue 214 where the rewriting created duplicated
characters in some cases.
- VersifiedText Filter:
- Added support for Trados-type segment markers.
- XML Filter:
- Fixed issue #240 where code finder matches were not escaped
properly in some cases.
- Implemented local withinText in the ITS engine for ITS 2.0
- Implemented initial support for XPath variables for ITS 2.0
- Implemented idValue data category for ITS 2.0
- Removed the Drupal filter. This filter will be addressed in a
different way later.
- Steps:
- Batch Translation Step:
- Added support to allow TMX output to be also the input file.
- Added option to send the generated TMX as the only input for
the next step.
- Added support of the
${inputRootDir} variable
for all paths and the command line.
- Fixed issue #230 where the length of the segment was limited
to 65000 characters.
- Rainbow Translation Kit Creation Step:
- Added the option to send the prepared files to the next step
(instead of the filter events).
- Implemented capability to add support material to the package.
- Fixed issue with manifest file when zipping the package.
- Fixed concurrency problems.
- Rainbow Translation Kit Merging Step:
- Added auto-stripping of white spaces before initial text when
convertingg XLIFF+RTF to XLIFF.
- Libraries:
- Verification library:
- Added motherTongue parameter to call for bilingual mode, for
LanguageTool (Issue #229).
- Added option "Verify for same language family" (Issue #220,
Patch from Mihai).
- Moved the XLIFF 2.0 library (okapi-lib-xliff package) to a
separate project (http://code.google.com/p/okapi-xliff-toolkit/).
- Moved the Olifant and TMdb modules to a separate project (http://code.google.com/p/okapi-olifant/)
- Dependent modules are now using the maven2 repository to get the
jar.
- Fixed missing language-only values for LCID mapping.
- Added support for wildcard characters in DefaultFilenameFilter.
Old constructor with extension only should be backward compatible.
- Added getWikiPage() method to the AbstractParametersEditor.
- Added batch input count support for the Pipeline Parameters event.
- Fixed issue #233. (Patch from Shane Perry): Changed default
placeHolderMode to true in XLIFFWriter. This affects the direct
users of the class, indirect users (like kit creations have this
setting defined by the parameters). Tikal has also been updated to
force its own mode rather than rely on the default.
- Fixed issue #237: Removed the old and now defunc special case for
TTX in the InputdocumentPanel dialog when guessing the file format.
The TTX format should now be guessed properly.
- Fixed issue #243: ID duplication in sub-filter cases.
Changes from M15 to M16
- Rainbow:
- Rainbow now returns a non-zero error code if the command-line ends
with an error (Issue #201)
- Tikal:
- Fixed issue #215 by always adding the HTML filter to the mapping
because now several filters use it as sub-filter.
- Ratel:
- Added an option to allow backward compatibility with Java regular
expressions.
- Improved the UI to separate standard options from extensions.
- Changed the UI to start with an empty document with a single
default language map.
- Filters:
- TMX Filter:
- Added an option to stop processing when encountering invalid
<tu>
elements. By default invalid <tu> elements
are excluded.
- PHP Content Filter:
- Fixed the issue #204: duplicated code when a string is an
inline code with concatenation and linebreak.
- Properties Filter:
- Fixed the default rule for Java inline codes so escaped is
supported.
- Fixed the issue where custom filter configurations for
sub-filters were not loaded properly.
- Table Filter:
- Added handling of 2 escaping modes for qualifiers in CSV
files:
- by duplication of the qualifier (e.g. "translate=""yes""")
- by backslash escaping (e.g. "translate=\"yes\"")
- Added a new parameter to CSV filter: int escapingMode
int ESCAPING_MODE_DUPLICATION = 1 (default)
int ESCAPING_MODE_BACKSLASH = 2
- Added the "CSV escaping mode" group to the parameters editor.
- Fixed a bug with disabling CSV-related elements in the
parameters editor.
- TS Filter:
- Fixed the default rule for Java inline codes so escaped is
supported.
- PO Filter:
- Fixed the default rule for Java inline codes so escaped is
supported.
- IDML Filter:
- Added the option to skip large spread files.
- HTML Filter:
- Improve insersion of encoding declaration when missing
(support for XHTML and custom XML)
- Improve support for translate='yes\no'
- XLIFF Filter:
- Fixed segment id for case where first segment id is "0" and it
is preceeded by a non-segment part.
- XINI Filter:
- Fixed parameters editing error.
- Added Drupal filter, this filter allows you to process selected
node content from a Drupal host. The filter is Alpha for now.
- Steps:
- Rainbow Translation Kit Creation Step:
- Fixed the issue of the missing tab before existing target in
the Translation Table package format.
- Rainbow Translation Kit Merging Step:
- Added option to output the merged files in a specified
directory (vs. the one defined in the manifest).
- Segmentation Step:
- Added 3 options for treating already segmented text.
- Leveraging Step:
- Added the option to not query if the entry has already a
candidate equals or above a given score.
- XSL Transformation Step:
- Added the option to specify a custom XPath class in addition
to the custom XSL Transformer class.
- Fixed the step so a temporary output is used when the input
and output have the same path.
- Library:
- Fixed duplicated ID values in TextFragment.insert() when the
inserted fragment had overlaps.
- Segmenter now uses ICU to implement SRX.
Important: This may affect segmentation with your existing SRX
files. For example \w now matches also accented letters (the
normal Java \w does not). See
SRX and Java for details. use the new "use Java regular
expressions" options to force backward compatibility.
Changes from M14 to M15
- Tikal:
- Added the option
-gg for the Google MT v2 conector.
This fixes issue #189.
- Mapped the
-google option to -gg and
removed references to the Google MT v1 connector which is not
available anymore.
- Rainbow:
- Update the ID-Based Alignment utility to use the GT v2 API for the
MT services instead of v1.
- Moved old utilities at the bottom of the Utilities menu.
- Filters Plugin for OmegaT:
- Existing segmentation is auto-detected.
- Fixed code re-construction issues for some cases.
- Improved backward compatibility, and implemented support for
alternate translation of repeated source segments for OmeagT 2.5.
- Now if output encoding is not defined for a TTX file, UTF-16 is
used (instead of UTF-8).
- Steps:
- Text Modification Step:
- Added the option for text expansion.
- Added the option to replace ASCII letters with Cyrillic,
Arabic and Chinese characters.
- Rainbow Kit Creation Step:
- Fixed incorrectly mapped inline code that resulted in invalid
merged files.
- The Translation Table format is now available as an package
type 9allows extraction and merge of tab-delimited files)
- Format Conversion Step:
- For the parallel corpus output, the source entries are also
now always output even when there is no corresponding target (in
that case the target entry is output as an empty line).
- Id-Based Aligner Step:
- Implemented support for bilingual files as the reference file.
- Microsoft Batch Translation Step:
- Fixed issue where leveraged text was not copied into the
target in some cases when it should.
- XML validation step:
- Fixed the schema/DTD selection error.
- Term Extraction Step:
- The TermsAnnotation annotations are taken into account
and are extracted.
- Connectors:
- Google MT v2 Connector:
- Fixed error when list of segments to translate is empty or
made of segments with only codes.
- Removed query length limit for single query mode to let Google
Translate to deal with it in case it changes)
- Filters:
- IDML Filter:
- Fixed issue of not merging last cell of a table that is the
last element of a story.
- Fixed TextPath-attached stories not being extracted. This
fixes issue #194.
- HTML Filter:
- Files with no encoding declaration detected within the fisrt 1
KB have an encoding declaration added on output (only if the
file has a <head> element).
- Added support for HTML5 charset declaration.
- XLIFF Filter:
- Fixed issue where the
datatype attribute for <file>
was not read properly.
- RTF Filter:
- Improved warning messages for decoding error (e.g. when a font
name is corrupted). dangerous cases have now an extra message
warning about the lost of the character.
- TTX Filter:
- Replace the "include un-segmented part" option by a new
segmentMode option: Mode 0: auto-detect, if a segment is
detected mode 1 is used, If no segment is detected mode 2 is
used. Mode 1: only existing segments are extracted. Mode 2: all
text is extracted, segmented or not.
- Added support for Catalyst-generated TTX (that have no
<Raw>
nor <ut> elements)
- Properties Filter:
- Added inline code support for HTML tags in default
- PO Filter:
- The translator's comments are now extracted into the common transNote
read-only property.
- TS Filter:
- Implemented the support for translator's note and simple
notes.
- Regex Filter:
- Added a pre-defined configuration for Apple iOS .strings
files.
- OpenXML Filter:
- Fixed the issue where a ZIP stream was not closed properly.
- Fixed issue #143 (Excel parameters not working)
- Fixed issue #144 (Color not save for Excel options))
- XML Filter:
- Fixed issue of missing start tag in case of non-translatable
inline with no translatable text before.
- Changed unwrapping default: ends are not trimmed anymore (was
causing lost of spaces in fragment extraction).
- Fixed issue where translation context was not reset after an
embedded structural translatable element was extracted (e.g.
para inside para case)
- TermsAnnotation annotations are now generated for
entries matching the ITS Terminology datacategory.
- Libraries:
- Added support for multiple monitors when centering a dialog.
- Fixed case where tuid was written twice with TMXWriter.
- Fixed the link of the "Patterns" button in Inline Code panel.
- Continued to improve the XLIFF 2.0 experimental library.
- Fixed the case of batch configuration with no plugins.
- Removed Google MT v1 connector from the list of default
connectors.
- Fixed icon and cancel command in Filter Configuration common
dialog.
- Improved cloning of skeleton parts.
- The XLIFFWriter output the common transNote property as a
<note from="translator">. This fixes issue #195.
- Added
changeFontSize() to InputDialog UI.
Changes from M13 to M14
- Rainbow:
- Fixed input root setting when running from the command-line.
- Fixed missing command-line IDs for Translation Kit creation and
post-processing pre-defined pipelines.
- Tikal:
- Added the
-bpt option in the Leverage Files from
Moses command.
- Added the
-to option in the Leverage Files from
Moses command.
- Made the key parameter of the MyMemory connector optional (kept it
for backward compatibility)
- Updated bash script (tikal.sh) to allow more than 9 arguments.
- Added the
-od option in the Extract Files and Merge
Files commands.
- Added the
-sd option in the Merge Files commands.
- Filters Plugin for OmegaT:
- Added support for TXML files.
- Updated filter names to indicate they are from the Okapi plugin
(as we may have duplicated filters with the default OmegaT filters)
- CheckMate:
- Added support for TXML files.
- Steps:
- Id-Based Copy Step:
- Added the options to mark the leveraged text units as
non-translatable.
- Added the option to mark the leveraged translation as
approved.
- Rainbow Translation Kit Creation Step:
- Added the support for Versified Text + RTF packages.
- The XINI package output now support pre-translated items.
- Changed the default output directory from
${rootDir}
to ${inputRootDir}.
- Continued to improved the experimental XLIFF 2.0 package.
- Updated OmegaT package so the
<dictionnary_dir>
element required for OmegaT 2.5 is generated.
- Fixed handling of input file without filter for XLIFF+RTF and
Versified+RTF outputs.
- Fixed the problem with the destination folder when merging
several manifests one after the other.
- XML Validation Step:
- Fixed case of relative path for included DTDs.
- Translation Comparison Step:
- Added the option to use generic representation for inline
codes.
- Improved the error handling.
- Segmentation Step:
- Added suport for
${inpuRootDir}.
- Leveraging Step:
- Added suport for
${inpuRootDir} for the TMX
output.
- Microsoft Batch Translation Step:
- Added suport for
${inpuRootDir} for the TMX
output.
- Moses Leveraging Step:
- Added the option to not use
<g> notation
in new <alt-trans> elements in output.
- Filters:
- Added the TXML Filter. It allows you to process Wordfast pro
native translation files. This filter is early BETA for now.
- HTML Filter:
- Fixed un-escaping for '&' issue with non-extractable
attribute values.
- Added the
escapeCharacters option to allow
output using character entity references.
- Versified Text Filter:
- Added support for bilingual variant of the format.
- XLIFF Filter:
- Added the option to not use
<g> notation
in new <alt-trans> elements in output.
- Modified the generation of group IDs to take into account the
ones existing in the input file. This fixes issue #185.
- Added support for cases where a
<seg-source>
element has no <mrk> element: it's treated
as a single-segment entry.
- RTF Filter:
- Updated filter to strip out null characters present in some
varities of RTF files.
- TMX Filter:
- Fixed the problem of some attribute values not being escaped
properly in some cases.
- XML Filter:
- Added the option
extractIfOnlyCodes to allow
entries with only codes or whitespace but no text to not be
extracted. For backward compatibility this option is set to
true.
- IDML Filter:
- Added support for stories nested inside other stories.
- Fixed issue with empty <Content> element, and tested
with more complex CS5 files.
- RainbowTKit Filter:
- Improve error handling in Rainbow translation kit
post-processing.
- Table Filter:
- Add support for adding qualifiers when needed for the CSV
output. This fixes issue #187.
- TTX Filter:
- Added the option of including or not unsegmented text and
added the pre-defined configuration
preSegmented
- Changed the parsing so the IDs are unique within each text
unit rather than reset for each segment
- Properties Filter:
- Added support for sub-filters.
- Added the pre-defined configuration
okf_properties-html-subfilter
- Connectors:
- MyMemory TM Connector:
- Changed the code to use the REST interface rather than the
SOAP interface.
- Microsoft Translator Connector:
- Changed the queryList() method to batchQuery() so it can be
used through the IQuery interface.
- The original score is now set in score, while the
re-calculated score is the combined score.
- Google MT v2 Connector:
- Libraries:
- Continued to improve the XLIFF 2.0 experimental library.
- Updated IQuery to include batchQuery() and batchLeverage().
- Updated QueryResult to include the new combinedScore and quality
fields.
- Changed various classes to use the combined score rather than the
score for their filtering.
- Made various fixes and improvments to the core libraries.
Changes from M12 to M13
- Rainbow:
- Added the user preference "Use the last session's locales and
encodings as defaults".
- CheckMate:
- Fixed the null pointer issue when target pattern different from
the source match were not found.
- Added check of the opening/closing sequence for inline codes. This
check is done only if no other error is found in the segment.
- Added the option "Try to guess opening/closing types for
placeholder codes" for the codes.
- Tikal:
- Added the -a command to add translations to a resource.
- Fixed usage screen for minor details.
- Longhorn:
- Longhorn is now part of the regular releases. Longhorn is a server
application that allows you to execute batch configurations remotely
on any set of input files. Batch configurations which include
pre-defined pipelines and filter configurations, can be exported
from Rainbow. For more information see
http://www.opentag.com/okapi/wiki/index.php?title=Longhorn.
- Steps:
- Rainbow Translation Kit Creation Step:
- Changed the XLIFF and XLIFF + RTF output behavior so single
and double quotes are not escaped in element content.
- Changed the XLIFF + RTF output to use short notation for empty
inline codes.
IMPORTANT: This updated notation may have impact
on TM matches using the previous notation. Some TM tools will
have coded the two parts <x id="1"> and </x>
as two separate inline codes (because they are styled as two
spans of tw4winInternal) and will see the new notation <x
id="1"/> as a single code. For Trados-2007 TM
for example you may want to update your TM by replacing the two
adjacent codes with a single one. Export the TM to TMX 1.4b,
replace <bpt\s.*?><x
id="(\d\d?)"></bpt><ept\s.*?></x></ept>
by <ph><x
id="\1"/></ph> and create a
new TM based on the modified notation. That new TM should yield
segments that will match with the new notation.
- Added the option "Create a ZIP file for the package"
- Continued experimental implementation of XLIFF 2.0.
- Rainbow Translation Kit Merging Step:
- Added the option to generate raw document events or filter
events.
- Added support for post-processing directly .rkp zipped files.
- Extraction Verification Step:
- Improved support for bilingual file formats.
- Search and Replace Step:
- Fixed UI so the list of expression is not too high and drive
button out of the screen.
- Added the Microsoft Batch Translation Step: it allows to batch
translate resources using the Microsoft Translator TM/MT engine (the
Collaborative Translations Framework)
- Added the Microsoft Batch Submission Step: it allows to submit
human or post-edited translations to the Microsoft Collaborative
Translations Framework. Those translations can be retrieved later.
- Filters:
- XLIFF Filter:
- Added internal flag to output un-escaped single and double
quotes.
- Fixed filter-writer to take into account un-segmented entries
with existing translation in the layered output (e.g. RTF).
- Fixed bug of un-balanced opening and closing codes segmented
entry when codes are nested over parts.
- Updated how outer data of empty inline codes are stored (e.g.
now uses
<x id='1'/> instead of <x
id='1'></x>)
- PHP Content Filter:
- Fixed the case of multiple placeholder codes with the same ID.
This fixes issue #179.
- IDML Filter:
- Implemented extraction of master spreads (as an option set by
default).
- Fixed bug of in the simplification of trailing inline codes
after a single character.
- Connectors:
- Microsoft Translator Connector:
- Changed the name from Microsoft MT Connector (since it handles
more than MT matches)
- Implemented ITMQuery and multiple results.
- Libraries:
- Added support for quote mode in XMLEncoder.
- Upgraded ICU4J library from 4.6 to 4.8.
- Added PIPELINE_PARAMETERS event.
- Upgraded the SWT library from 3.6.1 to 3.7.
Changes from M11 to M12
- Rainbow:
- Fixed YAML escaping/un-escaping for quotes in the Edit Code Finder
Rules tool.
- Added the Create New Document command.
- Tikal:
- Updated the
-tt option to make the server parameters
optional and default to the Amagama server.
- Filters:
- PO Filter:
- Implemented case of no quote on msgid/msgstr/etc. line.
- Improved error reporting with line numbers.
- Added msgctxt in the generated resname values. This allows two
entries with the same source and same domain but with a
different context to have distinct resname auto-generated
values.
- MIF Filter:
- Implemented extraction for index markers (default=yes) and
hyperlinks (default=no).
- When needed, leading codes (font, etc.) are now extracted as
inline code rather than kept in the skeleton.
- Regex Filter:
- Added support for escaped character notation that uses doubled
characters (e.g. "").
- Made the back-slash escape notation optional.
- XML Filter:
- Fixed bug where standalone="yes" was re-written
standalone="true" in the XML declaration. This fixes issue #173.
- IDML Filter:
- Improved support for special characters (hair spaces, forced
line-breaks, etc.)
- Steps:
- Added the Remove Target Step, to remove target entries from text
units.
- Added the Inline Codes Simplifier Step, to join adjacent inline
codes in text units.
- Added the GTT Batch Translation Step, to create TM files using
Google Translator Toolkit.
- Added the Repetition Analysis Step, to detect text repetitions for
word count report.
- Added the Extraction Verification Step, to check if a document can
be extracted/merged/re-extracted and generate the same events (this
help verifying that extracted text can be safely merged back).
- Rainbow Translation Kit Creation Step:
- Added option for creating XLIFF 2.0 packages. This is for
experimental tests only. XLIFF 2.0 is not defined yet.
- Added the option of setting the sentence_seg flag in the
project for OmegaT packages.
- Fixed synchronization error with empty source on merging PO
and Transifex packages.
- Implemented support for input files without filter
configuration: they are copied into the original folder if the
selected kit has one.
- Rainbow Translation Kit Merging Step:
- Fixed merging for segmented entries.
- Added option to preserve or not segmentation for the next step
(false to save time if there is no step afterward)
- Leveraging Step:
- Added the option to add a prefix to the leveraged translation,
when a given threshold is equal or below the score of the
leveraged match, and depending on the target content.
- Added option to copy the source when the source content has no
text (but may have white-spaces and/or codes).
- Search and Replace Step:
- Added the option "replace all instances of the pattern" to
replace all or only the first match in each item searched.
- Quality Check Step:
- Changed the inline code verification so only selected types of
codes are ignored rater than all the ones without nativae data.
now, by default only mrk and x-df-s types are ignored.
- Segmentation Step:
- Prevented the options to be applied when no segmentation is
done.
- Line-Break Conversion Step:
- Updated output so the output can be the same file as the
input.
- Byte-Order-Mark Conversion Step:
- Updated output so the output can be the same file as the
input.
- RTF Conversion Step:
- Updated output so the output can be the same file as the
input.
- XML Character Fixing Step:
- Updated output so the output can be the same file as the
input.
- Encoding Conversion Step:
- Updated output so the output can be the same file as the
input.
- Term Extraction Step:
- Changed code to include numbers as part of the possible terms.
- Connectors:
- GlobalSight Connector:
- Changed the conversion of the score to handle the floats sent
by the latest API, instead of integers.
- Translate Toolkit TM Connector:
- Updated the default to use the Amagama server.
- Added support for letter-coded inline codes.
- Removed Cross-Language Gateway MT services connector from default
distribution.
- Microsoft MT Connector:
- Implemented support for inline codes.
- Libraries:
- Fixed corrupted DefaultFilters.properties so XINI filter and MIF
Filter UI are accessible again.
- Added support for encoded storage of strings (e.g. passwords) in
several types of parameters files.
- Added output helper methods to RawDocument.
- The Lucene library has been updated to 3.1.0. You may need to
re-index your Pensieve TMs. You can do this by exporting the TM to
TMX, then re-importing it back.
- Updated TextUnit to use ITextUnit and a new implementation. This
is a major low-level refactoring that comes with some behavior
changes and additions:
- in createTarget() IResource.COPY_CONTENT should be replaced by
IResource.COPY_SEGMENTED_CONTENT to have the same behavior.
- the new getTargetSegments() does create a target if it does
not exists (with a COPY_SEGMENTATION option).
- Moved IQuery and QueryResult from the okapi-lib-translation
project to the core (common.query package).
- Implemented plugin support for connectors.
Changes from M10 to M11
- Filters Plugin for OmegaT:
- Added the Transifex filter. Transifex projects can be edited
directly from OmegaT now (write-rights needed to save the
translations)
- Rainbow:
- Added the "Export Batch Configuration" command to create
single-file configuration that can be transported and re-installed
elsewhere.
- Added the "Install Batch Configuration" command to create a set of
local files from a batch configuration.
- Added the "Translation Kit Creation" pre-defined pipeline, to
create translation kit. (BETA). When done, this will replace the
"Translation Package Creation" utility.
- Added the "Translation Kit Post-Processing" pre-defined pipeline,
to merge back extracted file (BETA). When done, this will replace
the "Translation Package Post-Processing" utility.
- Changed the "Translation Package Creation" utility to not output
non-translatable entries in OmegaT package.
- Fixed which page was called for the command-line help.
- Tikal:
- Fixed the -lfc command so it includes custom configurations as
weel as the default ones.
- Added the -noalttrans option to the -x command.
- Fixed problem with application's path containing spaces. This
fixes issue #162.
- Integrated Transifex, MIF and Archive filters.
- CheckMate:
- Added the check for absolute maximum length.
- Added option to automatically recheck the documents when they
change.
- Steps:
- Added the "Rainbow Translation Kit Creation" step, to create
translation package in various format from a pipeline.
- Added the "Rainbow Translation Kit Merging" step, to merge back
extracted files.
- XML Characters Fixing Step:
- Implemented support for decimal and hexadecimal NCRs in
addition to raw characters.
- Added the "Id-Based Copy" step, to copy the source text from one
file into the target of another for the entries with the same id.
- Scoping Report Step:
- Many changes have been implemented: better support for GMX and
Okapi categories, option for custom templates, etc.
- RTF Conversion:
- Added warning when a character cannot be encoding in the
output encoding.
- Added option to update the encoding declaration in XML/HTML
files when possible.
- Filters:
- Added the MIF Filter (Beta)
- Added the Rainbow Translation Kit Filter, to process translation
packages.
- Added the Transifex Filter, to process remote Transifex projects.
- Added the Archive Filter, to process any files inside a zip or jar
file.
- HTML Filter:
- Fixed the case where non-quoted one-word translatable
attributes could be merged back as non-quoted multi-words. This
resolves issue #126.
- TS Filter:
- Added resname for group (<context> elements)
- Changed to use Woodsox Stax parser instead of the defautl VM
parser.
- TTX Filter:
- Improved handling of unsegmented content to mimick TagEditor's
behavior closer: Leading whitespace characters are now excluded
from the entries. This addresses issue #164.
- Changed to use Woodsox Stax parser instead of the defautl VM
parser.
- PO Filter:
- Added mapping for msgctxt to the context property.
- Added the option to protect approved entries (i.e not empty
and not fuzzy).
- TMX Filter:
- Changed to use Woodsox Stax parser instead of the defautl VM
parser.
- XLIFF Filter:
- Fixed issue with out-of-segment inline codes collapsing in
previous empty segmented target entry.
- Fixed the bug were the approved property was not writeable.
Now you can add, delete or modify it.
- Changed to use Woodsox Stax parser instead of the defautl VM
parser.
- OpenXML Filter
- Fixed the problem of < and/or > in text boxes causing
merging error. This resolves issue #142.
- Table Filter:
- Fixed the issue with incorrectly setting the inline code
finder rules.
- Plain Text Filter:
- Fixed the issue with incorrectly setting the inline code
finder rules.
- Connectors:
- Libraries:
- Fixed the short search cases in NGramTokenizer/Analyzer. This
fixes issue #159.
- Added Transifex client library.
Changes from M9 to M10
- Rainbow:
- Changed the help system to use the wiki (the plan is to have a
snapshot of the wiki also available as local help).
- Added the "Tools" > "Plugins Manager" command.
- Added the "Plugins Location" option in the "User Preferences"
dialog box.
- Fixed locale variables (${TrgLoc}, etc.) to get a consistant
casing regardless of the casing of the value in the Languages and
Encodings tab.
- Fixed corrupted path when dropping file on output using ${TrgLoc}.
- Added the Code Finder Editor (in Tools menu) to edit code finder
rules for filters using them but not having any UI to define them.
- The root of a list is now automatically adjusted to the longest
root possible when a document above the current root is added, the
documents already listed have their relative path adjusted as well.
- Create Translation package:
- Refactored heavily the leveraging mechanism and the output
(e.g. added <alt-trans> output in Generic XLIFF)
- Re-organized the Utilities menu with sub-menus.
- Added the "XML Analysis", the "XML Characters Fixing", and the
"XML Validation" pre-defined pipelines.
- Changed command-line processing to handle pre-defined pipelines in
addition to utilities (and pipeline).
- CheckMate:
- Added option in the Term tab to match strings only when enclosed
in codes.
- Added the "Reset to Defaults" command in the Configuration dialog.
- Added the "Accept all next documents with their defaults" option
when adding documents to the session.
- Implemented tab-delimited format for the report, and added choice
between HTML or tab-delimited in Configuration dialog.
- Added support for term list in CSV format.
- Added warning when the target from RTF contains an hidden part.
- OmegaT plugin
- A new component, the Okapi Filters for OmegaT plugin is now part
of the distributions. It allows to use some of the Okapi filters
directly in OmegaT. Currently the filters for the following formats
are included: JSON, TTX, Qt-TS and IDML.
- Tikal:
- Added the -xm command to extract files to Moses InlineText format.
- Added the -lm command to leverage files from their Moses
InlineText corresponding files.
- Updated handling of default for source and target languages
(allows autodetection before using system defaults).
- Added the -s command to segment files.
- Steps:
- Added the RTF Conversion Step, and replaced the
"RTF Conversion" utility by a pre-defined pipeline.
- Added the BOM Conversion Step, and replaced the
"byte-Order-Mark Conversion" utility by a pre-defined pipeline.
- Added the Encoding Conversion Step, and replaced
the "Encoding Conversion" utility by a pre-defined pipeline.
- Added the Create Target Step (allows you to
create target from the source).
- Added the Scoping Report Step (generates
word-count reports)
- Added the XML Characters Fixing Step (replaces
invalid XML characters by markers)
- Added the XLIFF Joiner Step (allows to re-join
together XLIFF documents created by the XLIFF Splitter step).
- Added the Moses InlineText Extraction Step (to
extract entries to Moses text files)
- Added the Moses InlineText Leveraging Step (to
leverage translation from a Moses text file)
- Added the XML Analysis Step (to generate the
list of elements in a set of XML document and guess their
localization-related properties), this also closes issue #153.
- Leveraging Step:
- Added the threshold option for the match to
copy into the target.
- Segmentation Step:
- Added option to overwrite type of output
segmentation in files such as XLIFF.
- Added option "Overwrite existing
segmentation".
- Sentence Alignment Step:
- Improved handling of whitespaces.
- Updated default SRX rules for the step.
- Search and Replace Step:
- Added option to replace on source and or target when using
filter events.
- Added support for \uHHHH notation on all modes
- IMPORTANT: The behavior of this step has changed
when no target is in the text unit (in filter event mode):
- Before M10: A copy of the source was automatically copied
as the target and the sreach and replaced performed on that
text.
- Starting with M10: text unit with empty target are simply
not processed. You must now have a Create Target step before
this step to copy the source into the target.
- XLIFF Splitter Step:
- Improved support for large documents. This fixes issues #146
and #147.
- Translation resources:
- Microsoft MT Connector:
- Updated the connector to use the v2 HTTP API instead of the v1
SOAP one (which is no longer accessible)
- Filters:
- Added the Moses Text Filter for processing Moses MT system data
files.
- XML Stream Filter:
- Fixed issue #145 about PI being moved.
- Fixed issue #150 about inline codes being incorrectly escaped.
- Changed default so apostrophes are not escaped in output.
- Implemented default extraction for CDATA sections when no
sub-filter is defined.
- Fixed issue null point with empty CDATA sections.
- Fixed issue of conditions not being applied on CDATA sections.
- HTML Filter:
- Fixed escaping of inline codes detected using the codeFinder
option.
- Fixed issue #90 (CDATA section not extracted)
- IDML Filter:
- Refactored the filter completely. The filter is still beta,
but has been improved significantly.
- TS Filter:
- Fixed issue with null character output on string with inline
codes when using TS encoder.
- Finished implementation of the <byte> element as inline
code.
- Table Filter:
- Improved handling of qualifiers.
- Added pre-defined filter configuration for Haiku catkeys file
format.
- TTX Filter:
- Improved mapping of leveraged entries with a score to Okapi
annotations.
- Fixed filter-writer for pre-segmented RTF output.
- Fixed handling of split opening tags (TTX tags with
Type="start" and leftEdge="split").
- Improved handling of isolated </df> tags in un-segmented
content.
- Changed extraction to extract non-segmented parts of text
entries (this closes issue #151 and 157)
- Trados-Tagged RTF Filter:
- Improved parsing for fldinst and xmlopen fields: their content
is not included in the extracted text.
- Added warning when part of the target segment is hidden.
- XLIFF Filter:
- Added the option to override the original target language.
- Improved mapping of alt-trans attributes to Okapi annotations.
- Implemented options to add possible new <alt-trans>
elements in output files (in addition to the one in the
original document), and to include or not extension information
in the new <alt-trans>.
- Added the output option: "Segment only if the entry is
segmented and regardless how the input was".
- Libraries:
- Completely refactored the IQuery, QueryResult, QueryManager and
related classes.
- Removed the ScoreAnnotation and ScoreInfo classes that were
deprecated in M7. Use AltTranslationsAnnotation instead.
- Added capability to write extra data in header and phase-name in
trans-unit for XLIFFWriter class.
- Upgraded the SWT libraries to 3.6.1.
- Fixed the handling of literal and 
 in most
XML-based filters and encoders, so \r is not stripped out or
converted to \n.
- Added setBoolean, setString and setInteger by name in IParameters.
- Upgraded ICU4J library from 4.0.1 to 4.6.
- Fixed issue with not re-balancing codes after inser/append in
TextFragment (this solve several bx/ex-related issues)
- Changed transferCodes() method used in merging in Rainbow, Tikal
and xliffkit, to fix merging issue with <g>-type XLIFF codes.
Changes from M8 to M9
- Rainbow:
- Translation package Post-Processing utility:
- Fixed the bug where pre-translated XLIFF entries with
translate='no' could not be merged back properly, for example
for PO files.
- Added the user option "Always show the Log when starting a
process".
- Tikal:
- Fixed the bug in the Merge command where pre-translated XLIFF
entries with translate='no' could not be merged back properly, for
example for PO files.
- Switched help to use the wiki.
- Ratel:
- Windows position and size are now saved for the next session.
- CheckMate:
- Added capability to save and load configurations outside the
session.
- Improved pattern checks defaults and processing.
- Added support for short vs. long text in text length verification
(new Length tab)
- Added experimental support for terminology verification.
- Added support for exceptions in verification of double-words.
- Added some limited support for string-based term verification.
- Translation resources:
- Added
batchQuery method to the IQuery
interface.
- Added
leverage method to the IQuery
interface.
- Open-Tran connector:
- Changed implementation to use the REST API instead of the
XML-RPC.
- Improved support for queries with inline codes.
- SimpleTM connector:
- IMPORTANT: Changed the H2 database dependency
from version 1.1.103 (.data.db files) to 1.2.135 (.h2.db files),
this breaks backward compatibility: the new SimpleTM connector
cannot open the old .data.db files. To convert an older TM: Use
a M8 or prior version of Rainbow to run the SimpleTM to TMX
step to export your database to TMX. Then, Use this version of
Rainbow to run the Generate SimpleTM step to convert
your TMX document into a new .h2.data file.
- Steps:
- Added the Resource Simplifier Step. It modifies normal reources of
filter events into simpler resources for some third-party tools.
- Added the XLIFF Spitter Step. It splits several
<file> inside an XLIFF documents into separate
documents.
- Added the Id-Based Aligner Step. It aligns text units from two
input files, based on their unique IDs (resname).
- Added the XML Validation Step. It performs well-formness XML
verification and optionally, DTD or schema validation.
- Sentence Aligner Step:
- Updated so entries with empty text are skipped and don't cause
an error.
- Diff Leverage Step:
- Added support for 3 input files: new source, old source, old
translation. The second and third files must have the same text
units (same number and same order).
- Filters:
- Modified several filters to generate unique extraction ids in
non-text-unit events.
- Vignette Filter:
- Added support for monolingual documents.
- XML Filter:
- Fixed the bug where text extracted from attribute values was
not processed for the codeFinder option.
- Libraries:
- Implemented the Appendable and CharSequence interfaces for
TextFragment.
- IMPORTANT: Changed
TextFragment.toString()
to return the coded text instead of the original content of the
fragment. The previous behavior of toString() is now
accessible using text().
- The
net.sf.okapi.lib.extra.pipelinebuilder package
has been added. It allows you to easily script run pipelines, for
example using Jython.
Changes from M7 to M8
- Installation:
- Added a distribution for the Windows 64-bit platform.
- Rainbow:
- Fixed the bug where the initial character of input file was
truncated if root had a final slash or backslash.
- Replaced the Line-Break conversion utility by the "Line-Break
conversion" pre-defined pipeline.
- Added the "Run Quality Check Session" command to to Tools menu.
- Fixed the issue #139 where a target SRX was required for
segmentation in "Translation Package Creation".
- CheckMate:
- Added CheckMate: a standalone application to run a the quality
checker.
- Translation resources:
- Added a first simple connector implementation for the TDA Search
services.
- Steps:
- Added the Term Extraction step.
- Added the Quality Check step. Including support for Language
Tool Checker.
- Added Line-Break Conversion step.
- Added the Image Modification step.
- Full-Width Conversion step:
- Added the option to convert Squared Latin Abbreviations parts
of the CJK Compatibility block to non-CJK.
- Added the option to convert some of the Letter-Like Symbols
block to simple character sequences.
- Format Conversion step:
- Added the output to Parallel Corpus Files (for example to use
as input for training MT systems)
- Added the option "Output only approved entries".
- Search and Replace step:
- Added support for
\n, \r, \t,
and \N in replacement feild when in regex mode.
Resolve issue #123.
- Filters:
- XML Filter:
- Added support for unique ID in pre-defined configuration for
RESX files.
- Added the
omitXMLDeclaration option to the
parameters file.
- XMLStream Filter:
- Added new filter for streamed XML, e.g. to handle large
documents.
- TTX Filter:
- Replaced ScoreInfo annotation by AltTranslation annotation.
- Added the option of escaping the character "greater-than" in
output.
- Improved the supported for overlapping TTX
<df>
tags.
- Trados RTF
- Improved the RTF filter
- Integrated it as Trados RTF filter (Reading mode only, and
inline codes only when represented with Trados styles). This
filter cannot be used for normal extract/merge operations, but
is useable for any function that requires only extraction.
- Table Filter:
- Fixed issue #138 where tab was not useable as separator in
"csv" mode.
- Fixed issue #136 where a defined Record ID was not set
properly.
- Fixed issue #137 where the Source column of Source was
incorrectly set
- Libraries:
- Added
getDefaultConfigurationFromExtension() to
filter configurationn mapper.
Changes from M6 to M7
- Rainbow:
- Fixed issue where output example was not updated when the top
input file was removed.
- Fixed issue where pipeline file was not written as UTF-8.
- Translation package Creation:
- Fixed issue #132 where we generated segment <mrk> in
XLIFF if the text was pre-translated but not segmented.
- ID-Based Alignment
- Implemented request #134: A TMX output can now be created for
un-aligned entries.
- Libraries:
- Changed the SVN structure to allow checking-out and building the
libraries separately from the UI and apps. To get the base libraries
only:
http://okapi.googlecode.com/svn/trunk/okapi. To get
everything:
http://okapi.googlecode.com/svn/trunk.
- Changed the TextContainer class and refactor all dependencies.
This modification is a major code change.
- Added the
setRootDirectory() method to the IQuery
interface.
- Updated QueryManager to handle empty inline codes and inline codes
with references when leveraging fuzzy matches
- Added spin-like input part to the generic editor.
- Fixed bug where platform type for "cocoa" was not handled, and
therefore Mac not detected in some occurrences.
- Added support for ftnsep, ftnsepc, aftnsep and aftnsepc control
words in the RTF parser, so any defined paragraph or character is
skipped.
- Added the following generic UI parts to the generic editor:
SpinInputPart and SeparatorPart,
- The ScoresAnnotation class has been deprecated, use the new
AltTranslationsAnnotation instead.
- Fixed help location issue for SRX editor (Ratel).
- Translation resources:
- Updated all the connectors for the IQuery change and Implemented
${rootDir}
for all the connectors using locale files: SimpleTM and Pensieve.
- Apertium MT:
- Cross-Language Gateway MT services:
- Filters:
- OpenXML filter:
- Fixed an issue with open/closing group in some conditions.
- Fixed an issue with a case of text box resulting in hanging.
- XML filter:
- Added pre-defined configuration for WiX (Windows Installer
XML) Localization files.
- Improved handling of empty elements.
- XLIFF filter:
- Improved the reading of pre-segmented content, so the segment
Ids are now preserved instead of re-generated.
- Fixed parent-id for StartSubDocument event/resource.
- Implemented read-only property for build-num in <file>
and extradata in <trans-unit>.
- Improved support for segmentation choices in output. Now the
filter can remove, add or keep the segmentation for each
trans-unit.
- Vignette filter:
- Fixed issue of 64K limit of blocks (due to Java
DataOutputStream writeUTF() limitation): added multi-chunks
write/read function.
- Ruby on Rails YAML filter:
- Added support for Ruby on Rails YAML filter. It offers partial
support of YAML files.
- Versified Text Filter:
- Added support for filter on versified text documents.
- HTML filter:
- Fixed default configurations to extract ALT attribute of AREA
elements.
- TMX filter:
- Fixed the bug where the option "escape greater-than
characters" was not working.
- Steps:
- Implemented ${rootDir} for the follwoing steps: Format Conversion,
Generate SimpleTM, Segmentation, TM Import, Leveraging, Batch
Translation.
- Segmentation step:
- Made copy of source into empty target an option.
- Added the option of verifying source and target segments match
after segmentation.
- Added the "Diff Leverage" step.
- Added the "External Command" step.
- Sentence Alignment step:
- Added support to use a single bilingual input file.
- Format Conversion step:
- Added the option to generate output files with automated
extension.
- Text Modification step:
- Implemeted Request #100: An option to modify or not entries
without text.
Changes from M5 (0.5.1) to M6
- Installation:
- Updated the Macintosh distributions with application bundles for
Rainbow and Ratel.
- Changed the Macintosh distributions to GunZIP files to preserve
executable flag of the shell scripts.
- Rainbow:
- Translation Package Creation:
- Fixed the issue where pre-segmented RTF output was losing
referents in target.
- Fixed the deletion of the empty TMX files when the package is
zipped.
- Added English-India in the locales list.
- Fixed bug where steps using 3 input lists for more than 3 input
files were getting null values instead of raw documents.
- Added support for plugins for steps, filters and parameters
editors. Just drop the JAR in the
dropins folder.
- Updated the way the utilities menu is stored.
- Replaced the "URI Conversion" utility by a pre-define pipeline
using the "URI Conversion" step.
- Tikal:
- Added support for plugins for filters and parameters editors. Just
drop the JAR in the
dropins folder.
- Steps:
- Format Conversion step:
- Fixed the issue where monolingual segmented input was not
output properly in tab-delimited format.
- Added the "Desegmentation" step.
- Added the "URI Conversion" step.
- Added Import/Export functions to the dialog box of the "Search and
Replace" step
- Libraries:
- Changed QueryManager:
- Allow code changes in target for the non-segmented queries.
- Prevents exact matches to have the target codes "adjusted"
from the source.
- Added setReferentCopies() to GenericSkeletonWriter to allow
correct output for writers refering more than once to the referents
(e.g. when creating pre-segmented RTF with source and target).
- Moved lib-plugins to common.
- Translation resources:
- Added in SimpleTM an option for code content and order difference
between query and source text
- Filters:
- HTML filter:
- Added support for inline codes using regular expressions.
- Table filter:
- Fixed issue #124 where part of the copy of the file
configuration was dropped for TSV files whn creating package for
XLIFF.
- TTX Filter:
- Fixed issue #130 where empty TargetLanguage attributes were
not updated with the target language code.
- XML filter:
- Improved the pre-defined configuration for Android resources
files.
- Fixed issue #128: help example for codeFinder:
count=1
is now count.i=1.
Changes from M5 (0.5) to M5 (0.5.1)
- Rainbow:
- Translation Package Creation:
- Fixed the bug where the encoder manager for RTF output was not
properly set and cause some formats like HTML, TMX, etc. to have
un-escaped characters.
- Changed the RTF writer to allow other skeleton writers than
GenericSkeletonWriter.
- Replaced the Search and Replace utility by the "Search and Replace
with Filter" and the "Search and Replace without Filter" pre-defined
pipelines.
- Replaced the Text Rewriting utility by the "Text Rewriting"
pre-defined pipeline.
- Tikal:
- Fixed the issue of not having the HTML filter mapped when using
the Vignette filter.
- Added support for accessing Microsoft MT engine (
-ms
option).
- Translation resources:
- Added a connector for Microsoft MT Web services (http://api.microsofttranslator.com/V1/SOAP.svc),
a Microsoft Bing AppID is needed to use it. You can obtain one at
http://www.bing.com/developers/appids.aspx.
- Google MT: made it consistent with other connector when result is
same as target, now the result is returned.
- SimpleTM: Added made the feature "penalize exact matches when
target has different codes than the query" an option. (default is
true, backward compatible).
- Libraries:
- Fixed issue with GenericSkeletonWriter and in-line codes in
segmented text unit that were outside any segment.
- Fixed issue with GenericFilterWriter output stream not nullified
in close() (causing for example no output using
FilterEventsToRawDocument).
- Steps / Pipeline:
- Added MULTI_EVENT (new resource and Event) handling to pipeline.
- Changed step handlers to return Event by default.
- Fixed the parameters setting bug preventing to save the parameters
for pre-defined pipeline from one session to the next.
- Leveraging step:
- Fixed the bug preventing to enter a TMX path.
- Made adding an
MT! prefix to the TMX entries an
option.
- Added an option to enabled/disable the step.
- Search and Replace step: Improved the behavior of the dialog box
for add/edit item.
- Format Conversion step: Fixed bug where the table-delimited output
was not closed properly for "one output per input" use case.
- Added Text Modification step.
- Filters:
- PHP Content filter: Added UI for the localization directives
options (default behaviour is the same).
- OpenXML filter: Changed the parameters editor to use GridLayout
instead of BorderLayout.
- TMX filter: Fixed losing original line-breaks between <tu>
when re-writing.
- Vignette filter: Fixed bug of un-escaped and non-CDATA RTF output.
- Properties filter: Added the option "Convert \n and \t to
line-break and tab".
- Table filter:
- Fixed issue #119 where csv action "Exclude
leading/trailing..." was not updated properly in the parameters
editor
- Fixed issue #118 where some csv cases were not extracted
properly
- Installation:
- Updated licence information for third-party packages.
- Removed all the dependencies to swing2swt.
Changes from M4 to M5
- Libraries:
- Changed minimum requirement to Java 1.6 instead of Java 1.5.
- Removed distribution for Mac Carbon, added distribution for Mac
Cocoa-64-bit.
- Updated to Lucene 3.0.0
- Refactored Pensieve TM engine, added new API.
- Rainbow:
- Added the duration of the process in the log.
- Updated the UI of the Pipeline Edit / Execute facility to make the
panels of each step accessible without clicking.
- Replaced the utility "Generate SimpleTM Dabase" by the pre-defined
pipeline "Import Into Pensieve TM" (the previous utility's
functionality is still available using a custom pipeline).
- Replaced the utility "Export SimpleTM Database" by the pre-defined
pipeline "Convert File Format" (the previous utility's functionality
is still available using a custom pipeline).
- Fixed issue with Text Rewriting and empty <target> for XLIFF
input.
- Replaced the utility "Translation Comparison" by the pre-defined
pipeline "Translation Comparison".
- Added the pre-defined pipeline "Create Translations in Batch Mode"
- Replaced the utility "XSL Transformation" by the pre-defined
pipeline "XSL Transformation".
- Replaced the utility "Used Characters Listing" by the pre-defined
pipeline "Used Characters Listing".
- Ratel:
- Fixed selection bug in UI.
- Updated the default segmentation rules.
- Steps:
- Added Batch Translation step (tested with ProMT and Apertium).
- Added Codes Removal step
- Added Leveraging step
- Completed initial Tokenization and Word-Count steps.
- Added the Sentence Alignment step.
- Translation resources:
- Fixed issue with score > 100 in Pensieve TM.
- Added NCR support for Apertium connector.
- Filters:
- In the Properties Filter: Added pre-defined configuration for
Skype's .lang format.
- In the RTF parser:
- Fixed the issue with \'HHc being read as \'HH\'HH in some
cases.
- Added support for additional DBCS encodings.
- Added TTX Filter for Trados TagEditor documents (Beta).
- In the HTML Filter: Added pre-defined configuration for
well-formed files, providing groups and extra meta-data.
- In the XML Filter: Changed the ITS extension
idPointer
to idValue and modified its behavior to allow ID
values to be generated from the expression, not just from the
content pointed by the expression. The values are backward
compatible, but existing parameters file will have to rename any
reference to idPointer by idValue.
- Added the Vignette Filter for Vignette export XML documents
(Alpha)
- Added the Pensieve Filter for reading and writing Pensieve
translation memories.
Changes from M3 to M4
- Filters:
- XLIFF filter: Added property for target-language and option to add
it. Changed some of the language selection behaviors and set
fall-back to ID option to false.
- Fixed several bugs in the OpenXML filter (MS Office 2007
documents)
- The JSON Filter has been added, to support for example AJAX or
Palm WebOS applications.
- The PHP Content Filter has been added, to support PHP include
files.
- Added default DITA configuration to the XML Filter.
- Fixed several issues with the TS, Table, TMX, and XLIFF filters.
- Added
whiteSpaces ITS extension support in the XML
Filter.
- The PHP Content Filter has been added.
- Library, Translation resources:
- All the TM and MT connectors have been moved to the package
net.sf.okapi.connectors.
- Modified the OpenTran connector to use the REST interface instead
of RCP.
- Added the connector to the MyMemory server (http://mymemory.translated.net)
- Improved Google MT connector.
- Improved GlobalSight TM connector for inline codes, and adjusted
it for GS version 7.1.6.
- Added Pensieve TM engine and its connector.
- Added the connector for the open-source Apertium MT system web
service (http://wiki.apertium.org/wiki/Main_Page)
- Changed language identification from String to LocaleId objects
across the whole framework.
- Steps and Rainbow utilities:
- Added the SimpleTM2TMX step.
- Added Import and Export utilities for SimpleTM files.
- Continued improving the Tokenization and WordCount steps.
- Implement an option to select the XSLT processor to use with the XSL
Transformation utility.
- Updated the Translation Package Creation utility to
select from several resources for the pre-translation options, and
to allow specifying threshold instead "exact match only".
- Updated the Text Rewriting utility to select from
several resources for the translation options.
- Added the FormatConversion step.
- Improved inline compatibility in projects generated for OmegaT.
- Tikal:
- Added support for accessing the MyMemory repository (
-mm
option)
- Corrected display of extended characters on the console for some
languages/platforms.
- Added threshold and max-hits options for TM query command (
-opt
option)
- Added a command to create PO files from any input (
-2po
command).
- Added a command to create TMX files from any input (
-2tmx
command).
- Added a command to create Table files from any input (
-2tbl
command).
- Added capability to query a Pensieve TM (
-pen
option).
- Added support for accessing GlobalSight TM servers (
-gs
option).
- Added support for accessing Apertium MT servers (
-apertium
option).
- Added segmentation and leveraging options for the extraction
command.
- Added a commands to import any file into a Pensieve TM (
-imp
command).
- Added a command to export a Pensieve TM to a TMX file (
-exp
command).
Changes from M2 to M3
- The build system has been completely redone and now uses Maven as its
main builder. This has resulted in several changes in the structure of
the Okapi classes, and in the way the files are distributed.
- Filters:
- Added the TS Filter (beta) for Qt translation files.
- Fixed handling of fuzzy flag for plural entries in the PO filter.
- Fixed handling of
approved, state and
coord properties in the XLIFF Filter.
- Improved XML Filter:
- Improved rewriting of document type subset declaration.
- Added support for protecting custom entity references.
- Added support for ID defined using
xml:id or the
idPointer ITS extension feature.
- Properties Filter:
- Change the default configuration to always escape output.
- Added pre-defined configuration for non-escaped output.
- Fixed various issues in the OpenXML Filter (docx, pptx, etc.), and
PO Filter.
- Libraries:
- The Google MT connector has been enhanced to have the inline codes
taken into account, not simply pushed to the end of the text.
- Fixed one error in default segmentation rules.
- Added a connector component for the Translate Toolkit TM server.
- Added steps such as Word-count and Tokenizer.
- The command-line tool Tikal has been added.
- Rainbow (okapi-apps distribution only):
- Improved handling of un-approved translations in TMX generated
during a translation package creation.
- Added option to choose to merge only approved translations in
translation package post-processing.
Changes from M1 to M2
- Filters:
- The DTD Filter has been added.
- The PlainText Filter has been added.
- The Table Filter has been added.
- Several pre-defined filter configurations have been added or
updated: Mozilla-RDF, XML Android Strings, XML Java properties,
RESX, Monoligual PO, SRT (Sub-titles), plain-text lines, plain-text
paragraphs, CSV, etc.
- The OpenXML Filter (DOCX, PPTX, XSLX files) has been improved and
now provides much inline code simplification.
- The definition of the parameters for the RegEx Filter have been
modified to allow the support of target text, ID, etc. This new
format is not compatible with the one of M1.
- Other filters (HTML, Properties, XLIFF, TMX, PO, and OpenDocument
filters) have been improved.
- Libraries:
- A new TM connector to query remote GlobalSight TM servers has been
added. (See the Java Example05 of the okapi-lib distribution for an
illustration on how to use this component).
- A connector to query the remote OpenTran server has been added.
(See the Java Example05 of the okapi-lib distribution for an
illustration on how to use this component).
- New
RawDocument object model.
- The events mechanism has been augmented to work with batch items
in the pipeline.
- The encoding detection and handling of BOM has been modified in
most filters and utilities.
- The pipeline mechanism has been extensively re-written.
- Many steps for the pipeline have been created, they are
experimental for now.
- Rainbow:
- The selection of the filter settings is now done using the new
filter configuration mapping system integrated in the library.
- An experimental interface for creating and executing pipelines has
been added (see Utilities > Edit / Execute Pipeline)
- The creation of OmegaT, XLIFF and RTF translation packages has
been modified to handle pre-segmentation and pre-leveraging.
- Uses the latest libraries.
- Ratel:
- Better preservation of comments in SRX files; and capability to
add comments from within Ratel.
- Uses the latest libraries.