How to Extract Text for Translation

From Okapi Framework
Jump to navigation Jump to search

Using Rainbow

This example illustrates how to prepare a simple translation package. It assumes you do not have an existing translation memory available, don't pre-translate or pre-segment. It just creates a basic set of XLIFF files you can translate using one of the several editors that support XLIFF.

For this example execute the following steps:

1. Start Rainbow.

2. Drop the files you want to prepare in the Input List 1 tab of the main window.

3. Associate each input document with its appropriate filter configuration. In many case the configuration selected by default is fine. In some cases you may need to defined custom configurations (For an example, see the article "How to Create a Custom Configuration for the XML Filter).

4. In the Languages and Encodings tab: Set the proper languages and encodings. You do not need to set the encoding for the file formats where the encoding can be detected automatically (e.g. XML documents).

5. In the Other Settings tab: Specify what file name your translated files should have. By default they will be named the same as the source document, but with an extra ".out" extension just before the original extension (e.g. myFile.txt will become myFile.out.txt).

6. Select Utilities > Translation Kit Creation. This opens the Translation Kit Creation pre-defined pipeline.

7. Go to the Rainbow Translation Kit Creation entry.

8. In the Package Format tab: Select the type of package to create. For this example choose Generic XLIFF.

9. In the Output Location tab: Enter the root part of the output directory, for example in Windows: C:\Tmp\Extraction.

10. Enter also the name of the package to create. This will be the added to the root directory and make up the full output path. For example, if you enter pack1 the output directory will be C:\Tmp\Extraction\pack1.

11. Because you are creating a simple package and not using any additional option, you do not need to set other parameters. Simply click Execute to start the process. Any error or warning will be displayed in the Log window.

Assuming you have prepared a single file called myFile.html, you should have something like this:

 C:\Tmp\Extraction\pack1
 |
 +--- original
 |    |
 |    +--- myFile.html
 |
 +--- work
 |    |
 |    +--- myFile.html.xlf
 |
 +--- manifest.rkm
  • The original directory contains the data needs to merge back the extracted text into its original format. You should not touch those files.
  • The work directory contains the XLIFF documents that need to be translated.
  • The file manifest.rkm is used lated to merge back the extracted data. You should not touch this file.

At this point you are ready to translate your package. See: "How to Translate XLIFF Documents".

Once the XLIFF documents are translated you need to post-process the package to generate the translated documents in their original format. See: "How to Post-Process Extracted Text".

Using Tikal

1. Start a DOS prompt or a shell console from where you can run Tikal.

2. Assuming:

  • You have a file myFile.html in the current directory.
  • The file is in English and you want to translate into Japanese.

Execute the command-line:

tikal -x myFile.html -sl en -tl ja 

A file myFile.html.xlf is created. It contains the extracted text for the original document.

Here the filter used is automatically detected using the extension of the input file, but if you need to use a specific filter configuration you can use the -fc parameter. For example if you have a file myFile.txt that is a tab-delimited document with specific columns to extract and you have created a filter configuration for it, you would run:

tikal -x myFile.csv -sl en -tl ja -fc okf_table@myConfig

Where okf_table@myConfig is the filter configuration identifier to use.

Tikal offers other extraction options such as pre-segmentation, leveraging, etc.

At this point you are ready to translate your XLIFF file. See: "How to Translate XLIFF Documents".

Once the XLIFF documents are translated you need to post-process them to generate the translated documents in their original format. See: "How to Post-Process Extracted Text".