Rainbow TKit - Translation Table

From Okapi Framework
Revision as of 19:19, 4 June 2016 by Ysavourel (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

The Translation Table package is one of the types of translation packages you can create with the Rainbow Translation Kit Creation Step. Such package can be post-processed using the Rainbow Translation Kit Merging Step.

In this package each input document is extracted into a corresponding tab-delimited file.

  • The first existing or leveraged translation is copied into the translation field.
  • Segmentation is currently not supported: Each text unit corresponds to a single entry in the table.
  • Inline codes are represented using placeholder similar to the ones used in OmegaT (e.g. <g1>/<x2/>/<b3>/<e3>).

Assuming the source language is English and the target is French, the table has the following format:

TransTableV1{tab}en{tab}fr
"okpCtx:tu=1"{tab}"Text of the first entry"{tab}
etc...
  • The symbol {tab} represent a tabulation.
  • The first line must be the header line, with the signature of the file format, the code of the source locale and the code of the target locale.
  • The next lines are the entries. each one has at least two fields:
    • The ID to marge back the entry
    • The source text
  • Real tabs are escaped as \t.
  • Line-breaks are escaped as \n
  • Back-slashes are escaped as \\
  • Double quotes are escaped as \"

The files generated in this package can be translated using any spreadsheet or word-processor capable of reading and writing tab-delimited files.

Options

Allow segmentation — Set this option to allow the segments of segmented text units to be represented as separate rows in the table.

Details

Sub-Directories

The extracted files are stored in the same directory structure as the original files, relative to the root of the file set.

For example if you have two files named index.html in two different sub-directories, they will be both extracted as index.html.txt but each on its corresponding sub-directory.

Inline Codes

Inline codes are represented by letter-codes (similar to the ones used in OmegaT).

For example:

Text in <b>bold</b><br />
with a <a href='myfile.html'>link</a>.

is encoded as:

Text in <g1>bold</g1><x2/>
with a <g2>link</g2>.

This notation usually works well with MT engines such as Google Translate or Microsoft Translator.

Segmentation

The representation of segmented text units depends on the Allow segmentation option.

  • If the option is set: each segment is represented in a separate row. Any inter-segment part is not stored in the table. Use segmentation without trimming leading or trailing spaces to avoid loosing spaces when merging back the translation.
  • If the option is not set: each text unit is represented as an un-segmented entry in a single row.

Package Layout

Assuming that your package name is pack1, your input root ends with main, the target language is French, you have selected to use the same filenames as the input files for the output files, and you have the following source files:

--- main
    |
    +--- index.html
    +--- myFile.idml
    +--- subDir
         |
         +--- index.html 

The layout of this package after creation will be:

--- pack1
    |
    +--- manifest.rkm
    +--- *.tmx
    +--- original
    |    |
    |    +--- index.html
    |    +--- myFile.idml
    |    +--- subDir
    |         |
    |         +--- index.html
    |
    +--- work
         |
         +--- index.html.txt
         +--- myFile.idml.txt
         +--- subDir
              |
              +--- index.html.txt
  • original contains a copy of the original source documents. You needs those files for post-processing.
  • work contains the documents that are to be translated. The translation is expected to be saved into those files.

After post-processing it will be:

--- pack1
    |
    +--- done
    |    |
    |    +--- index.html
    |    +--- myFile.idml
    |    +--- subDir
    |         |
    |         +--- index.html
    |
    +... (same as after creation)
  • done contains the merged translated documents. This directory is created during post-processing.

Limitations

  • When working with segmented entries any inter-segment parts (e.g. space between sentences) is not stored in the table. When re-reading the translated document the text unit are made by concatenating the segments and therefore if they don't include separator spaces, the sentences will be pasted together without spaces. The recommended work-around for this is to segment without trimming white spaces.
  • This package is BETA