Tikal - Translation Commands

From Okapi Framework
Jump to: navigation, search

Translate Files

This command creates a pre-translated version of the input files. It is basically the same thing as running an Extract Files command (with pre-translation) immediately followed by a Merge Files command.

By default, some extensions are mapped to a specific filter configuration (for example: .docx to okf_openxml, .odt to okf_openoffice, .po to okf_po, etc.). But you can define your own configuration and specify it as well using the -fc option. To get a list of all available filter configurations use the List Filter Configurations command. For more details the filters available and their configurations, see each filter's documentation.

You can use the -seg option to specify that the extracted text should be segmented. Use -seg without file name to use the default segmentation rules, use "-seg myRules.srx" to specify your own rules. The rules file must be in SRX format.

The output files have a .out extension prepended to the original extension. For example, if your original file is myFile.html, the translated document should be myFile.out.html.

The syntax of this command is:

-t [options] inputFile [inputFile2...]

Where the options are:

-fc configId The identifier of the filter configuration to use for the extraction.
-ie encoding The encoding name of the input files. This is used only if the filter cannot detect the encoding from the input file itself.
-sl srcLang The code of the source language of the input files. See more details...
-tl trgLang The code of the target language for the output (also used in the input if the input documents are multilingual). See more details...
-seg [srxFile] The segmentation rules to utilize. To specify the default rules that come with the installation, use -seg without filename. The default rules are in config/defaultSegmentation.srx in your Okapi main directory.
-pen tmDirectory|
-tt [hostname[:port]]|
-gs configFile|
-mm [key]|
-gg configFile|
-apertium [configFile]|
-ms configFile|
-tda configFile|
-lingo24 configFile|
-mmt url [context]|
-bi bilingualFile
A translation resource connector to use to translate the document: -pen for the Pensieve TM Connector, -tt for the Translate Toolkit TM Connector, -gs for the GlobalSight TM Connector, -mm for MyMemory TM Connector, -gg for the Google MT v2 Connector, -apertium for the Apertium MT Connector, -ms for the Microsoft Translator Connector, -tda for the TDA Translation Repository Connector, -lingo24 for the Lingo24 Premium MT Connector, -mmt for the ModernMT API Connector and -bi for the Bilingual File Connector.

The leveraging occurs after segmentation, if you have specified segmentation rules.

Note that some Internet-based resource may be slow and result in lengthy processing time. Be also aware that some translation resources may not always provide a good handling of inline codes.

-opt threshold TM query option: The threshold is a number between 0 and 100. If this option is not set the default is 95. Note that this option may be limited for some search engines because of the way they are configured.
-maketmx [tmxFile] Generates a TMX document with all the entries leveraged. You can specify the name of the document, if you do not it will be named pretrans.tmx.
-rd rootDirectory The root directory (by default the user's home directory).

For example:

tikal -t *.html -sl en -tl eo -apertium

Translate from English to Esperanto all .html files in the current directory, using the default Apertium MT server. No segmentation is used.

Query Translation Resources

This command queries one or more translation resources for a given text.

You can query all resources at once. When querying several resources, the results are shown per resource, not sorted by best score as a whole.

The syntax of this command is:

-q "text" [options]

Where the options are:

-sl srcLang The code of the source language (language of the text queried). See more details...
-tl trgLang The code of the target language (language of the requested translation). See more details...
-pen directory Queries a Pensieve TM stored in a given directory.
-opentran Queries the Open-Tran translation repository. This requires Internet access.
-gs configFile Queries a GlobalSight TM server. This requires Internet access.
-tt [hostname[:port]] Queries the specified Translate Toolkit TM server. The server can be local or remote.
-mm [key] Queries the MyMemory TM with an optional key access (use mmDemo123 for demo). They key is for backward compatibility. This requires Internet access.
-gg configFile Queries the Google MT paid service. This requires Internet access. The -google parameter works also like -gg (it used to invoke the v1 of Google Translate which has been discontinued).
-apertium [configFile] Queries the specified Apertium MT server (local or remote). A default remote server is provided.
-ms configFile Queries the Microsoft Translator service. This requires Internet access.
-tda configFile Queries the TDA translation repository. This requires Internet access.
-bi bilingualFile Queries a bilingual file.
-lingo24 configFile Queries the Lingo24 Premium MT. This requires Internet access.
-mmt url [context] Queries a ModernMT server. This may require Internet access.
-opt threshold[:maxhits] TM query options: The threshold is a number between 0 and 100. The maximum number of hits is a number above 0. If this option is not set each TM engine uses its own defaults. If this option is set, all TM engines are set to use the specified options. Note that parameters of some engines may be limited by their server-side configuration.

Note: Because the text of the query cannot be associated with a given file format, there is no support for format-specific inline codes. However, when querying a resource that is inline-code aware, you can use HTML-like tags to replace codes: For example, in "Open the <x>window</x><x/>." the tags "<x>", "</x>" and "<x/>" are interpreted as opening, closing and placeholder inline codes, and the query processed as such. When querying resources that are not inline code-aware, the tags are treated as plain text. You can use any well-formed XML syntax, not necessarily an element x.

Examples:

tikal -q "open file" -sl en

Queries the default translation resource (Open-Tran) for the text "open file" in English. The target language by default is French. Note: You could omit the -sl option if you are running from a English system.

tikal -q "open <x>file</x>" -sl en -pen mytm -opt 60:20

Queries the Pensieve TM located in mytm for the text "open <x>file</x>" in English. The target language by default is French. Because Pensieve TM can work with inline codes, the tags "<x>" and "</x>" are processed as inline codes. The threshold is set to 60 and the maximum hits is set to 20.

tikal -q "open file" -ms appid.key

Queries the Microsoft Translator engine for the English text "open file" in French. The file appid.key contains your Microsoft credentials.

tikal -q "open file" -opentran -sl en -tl zu

Queries the Open-Tran translation repository for the English text "open file" in Zulu.

tikal -q "open file" -tt localhost:8080 -sl en -tl af

Queries a local Translate Toolkit TM server located on http://localhost:8080. The source is English and the requested translation is Afrikaans.

tikal -q "Data type" -tda myTDAInfo.cfg -sl en-us -tl fr=fr

Queries the TDA translation repository to get the French translation of the US English text "Data type". The file myTDAInfo.cfg holds the options and credentials to access the repository.

Add Translation to a Resource

This command adds a translation (source and target text) to a given translation resource.

For now this command is implemented only for the Microsoft Translator resource.

The syntax of this command is:

-a "source text" "target text" [options] -ms configFile

Where the options are:

N The rating to associate with the translation. The value must be between 1 and 10 (included). By default it is set to 6. MT results have generally a rating of 5.
-sl srcLang The code of the source language (language of the source text). See more details...
-tl trgLang The code of the target language (language of the target text). See more details...


Note: Because the provided text cannot be associated with a given file format, you should use HTML-like tags to replace codes: For example, in "Open the <x>window</x><x/>." the tags "<x>", "</x>" and "<x/>" are interpreted as opening, closing and placeholder inline codes, and the query processed as such.

Examples:

tikal -a "Text to add" "Texte à ajouter" -sl en -tl fr -ms myConfig.cfg

Adds the pair "Text to add" + "Texte à ajouter" to the Microsoft Translator's repository. The source language is English and the target language is French. The file myConfig.cfg contains the parameters to access the engine.