TikalUser Guide |
If you are using an Okapi Tool after the M9 release, you should be using the wiki online help:
http://www.opentag.com/okapi/wiki/index.php?title=Tikal
Tikal is a command-line tool that provides the following basic localization-related utilities:
Note: The command -e (Edit Filter Configurations) requires
access to UI editors that are available only if you have one of the okapi-apps platform-specific distributions. It is not
available with the okapi-lib cross-platform distribution.
This command extracts the translatable content of one or more given files into an XLIFF document. You can then use any XLIFF-aware translation tool to translate the document. Example of open-source tools that are XLIFF-capable are (among others): OmegaT, Virtaal, Qt Linguist, and Lokalize. When the translation is done, you can use the Merge command to create a new translated file in its original format.
The XLIFF documents created are placed in the same directories as the
original files, and have the same name with an additional .xlf
extension.
By default, some extensions are mapped to a specific filter configuration
(for example: .docx to okf_openxml, .odt
to okf_openoffice, .po to okf_po, etc.). But you can define your own configuration and specify it as well
using the -fc option. To get a list
of all available filter configurations use the List
Configurations command. For more details the filters available and their
configurations, see each filter's documentation.
You can use the -seg option to specify that the extracted text
should be segmented. Use -seg without filename to use the default segmentation
rules, use "-seg myRules.srx" to specify your own rules. The rules
file must be in SRX format. The segments are marked up according the
XLIFF 1.2 specifications.
The syntax of this command is:
-x [options] inputFile [inputFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the extraction. |
-ie encoding |
The encoding name of the input files. this is used only if the filter cannot detect the encoding from the input file itself. |
-sl srcLang |
The code of the source language of the input files. |
-tl trgLang |
The code of the target language for the output (also used in the input if the input documents are multilingual). |
-seg [srxFile] |
The segmentation rules to utilize. To specify the default rules
that come with the installation, use |
-pen tmDirectory| |
A translation resource to use to
translate the document: -pen for a
Pensieve TM, -tt for a
Translate Toolkit TM, -gs for a
GlobalSight TM, -mm for MyMemory
repository, -google for Google MT, -apertium for an Apertium server.,
-ms for Microsoft MT, and
-tda for the TDA translation repository. The leveraging occurs after
segmentation, if you have specify segmentation rules.Note that some internet-based resource may be slow and result in lengthy processing time. Be also aware that some translation resources may not always provide a good handling of inline codes. |
-opt threshold |
TM query option: The threshold is a number between 0 and 100. If this option is not set the default is 95. Note that this option may be limited for some search engines because of the way they are configured. |
-maketmx [tmxFile] |
Generates a TMX document with all the
entries leveraged. You can specify the name of the document, if you do
not it will be named pretrans.tmx. |
-nocopy |
Ensures that the generated XLIFF files do not have a copy of the source text in the target entries if the original target does not exists. |
For example:
tikal -x *.docx *.html
Extracts all .docx and .html files in the current
directory into corresponding .docx.xlf and .html.xlf
XLIFF documents. The source language here is the default, which is the current
language of the system. The target language by default is fr. No
segmentation is done.
tikal -x -sl EN tl DE -fc okf_regex-srt -ie iso-8859-1 findingNemo.srt
Extracts the sub-title file findingNemo.srt into a
findingNemo.srt.xlf XLIFF document. The encoding iso-8859-1
is used to process the input file. The filter used is the Regex filter with the
pre-define configuration for SRT documents. The source language is English (EN)
and the target language is German (DE). No segmentation is done.
tikal -x *.docx -seg -tl BR
Extracts all .docx in the current
directory into corresponding .docx.xlf
XLIFF documents. The source language here is the default, which is the current
language of the system. The target language is Breton. The extracted text units
will be segmented according the rules defined in the default SRX segmentation
rules file (located in the config sub-directory in your Okapi main
directory).
This command merges back into their original format one or more XLIFF documents that were created using the Extract command. You must have the original files in the same directories as their corresponding XLIFF documents.
The XLIFF document names must be the name of the original files with an
additional .xlf extension. The new documents are created in the
directories where the XLIFF documents are, with a .out extension
pre-pended to the original extension. For example, if your original file is
myFile.html, the XLIFF document should be myFile.html.xlf,
and the merged file will be myFile.out.html.
The syntax of this command is:
-m [options] xliffFile [xliffFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the re-extraction of the original file. |
-ie encoding |
The encoding name of the original files. This is used only if the filter cannot detect the encoding from the input file itself. |
-oe encoding |
The encoding name of the file to generate. The same encoding as the input file will be used if this option is not specified. |
-sl srcLang |
The code of the source language. |
-tl trgLang |
The code of the target language. |
For example:
tikal -m *.xlf -sl EN -tl DE
Merges all XLIFF documents in the directory. The original files should be in the same directory as well. The source language is English and the target language is German.
This command creates a pre-translated version of the input files. It is basically the same thing as running an Extraction command (with pre-translation) immediately followed by a Merge command.
By default, some extensions are mapped to a specific filter configuration
(for example: .docx to okf_openxml, .odt
to okf_openoffice, .po to okf_po, etc.). But you can define your own configuration and specify it as well
using the -fc option. To get a list
of all available filter configurations use the List
Configurations command. For more details the filters available and their
configurations, see each filter's documentation.
You can use the -seg option to specify that the extracted text
should be segmented. Use -seg without filename to use the default segmentation
rules, use "-seg myRules.srx" to specify your own rules. The rules
file must be in SRX format.
The syntax of this command is:
-t [options] inputFile [inputFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the extraction. |
-ie encoding |
The encoding name of the input files. this is used only if the filter cannot detect the encoding from the input file itself. |
-oe encoding |
The encoding name of the output file to generate. The same encoding as the input file will be used if this option is not specified. |
-sl srcLang |
The code of the source language of the input files. |
-tl trgLang |
The code of the target language for the output (also used in the input if the input documents are multilingual). |
-seg [srxFile] |
The segmentation rules to utilize. To specify the default rules
that come with the installation, use |
-pen tmDirectory| |
A translation resource to use to
translate the document: -pen for a
Pensieve TM, -tt for a
Translate Toolkit TM, -gs for a
GlobalSight TM, -mm for MyMemory
repository, -google for Google MT,
-apertium for an Apertium server,
-ms for Microsoft MT, and
-tda for the TDA translation repository. The leveraging occurs after
segmentation, if you have specify segmentation rules.Note that some internet-based resource may be slow and result in lengthy processing time. Be also aware that some translation resources may not always provide a good handling of inline codes. |
-opt threshold |
TM query option: The threshold is a number between 0 and 100. If this option is not set the default is 95. Note that this option may be limited for some search engines because of the way they are configured. |
-maketmx [tmxFile] |
Generates a TMX document with all the
entries leveraged. You can specify the name of the document, if you do
not it will be named pretrans.tmx. |
For example:
tikal -t *.html -sl en -tl eo -apertium
Translate from English to Esperanto all .html files in the current
directory, using the default Apertium MT demonstration server. No segmentation
is used.
This command queries one or more translation resources for a given text. By default the query is sent to the Google MT engine, but you can also query Pensieve TMs, GlobalSight TMs, Translate Toolkit TMs, the Open-Tran repository, the MyMemory repository, as well as any Apertium MT server. See the section Translation Resources Details for more information.
You can query all resources at once. When querying several resources, the results are shown per resource, not sorted by best score as a whole.
The syntax of this command is:
-q "text" [options]
Where the options are:
-sl srcLang |
The code of the source language (language of the text queried) |
-tl trgLang |
The code of the target language (language of the requested translation) |
-pen directory |
Queries a Pensieve TM stored in a given directory. |
-opentran |
Queries the OpenTran translation repository. This requires Internet access. |
-gs configFile |
Queries a GlobalSight TM server. This requires Internet access. |
-tt hostname[:port] |
Queries the specified Translate Toolkit TM server. This assumes you have access to the server (local or remote). |
-mm key |
Queries the MyMemory translation repository
with a given key access (use mmDemo123 for demo). This
requires Internet access. |
-google |
Queries the Google MT server. If no other type of resource is specified, this is used by default. This requires Internet access. |
-apertium [configFile] |
Queries the specified Apertium MT server (local or remote). A default remote server is provided. |
-ms configFile |
Queries the Microsoft MT server. This requires Internet access. |
-tda configFile |
Queries the TDA translation repository. this requires Internet access. |
-opt threshold[:maxhits] |
TM query options: The threshold is a number between 0 and 100. The maximum number of hits is a number above 0. If this option is not set each TM engine uses its own defaults. If this option is set, all TM engines are set to use the specified options. Note that parameters of some engines may be limited by their configuration. |
Because the text of the query cannot be associated with a given file format,
there is no support for format-specific inline codes. However, when querying
a resource that in inline-code aware, you can use HTML-like tags to replace
codes: For example, in "Open the <x>window</x><x/>." the tags "<x>",
"</x>" and "<x/>" will be interpreted as opening,
closing and placeholder inline codes, and the query processed as such. When
querying resources that are not inline code-aware, the tags are treated as plain
text.
For example:
tikal -q "open file" -sl en
Queries the default translation resource (Google MT system) for the text
"open file" in English. The target language by default is French. Note: You
could omit the -sl option if you are running from a English system.
tikal -q "open <x>file</x>" -sl en -pen mytm -opt 60:20
Queries the Pensieve TM located in mytm for the text
"open <x>file</x>" in English. The target language by default is French.
Because Pensieve TM can work with inline codes, the tags "<x>" and
"</x>" are processed as inline codes. The threshold
is set to 60 and the maximum hits is set to 20.
tikal -q "open file" -opentran -sl en -tl zu
Queries the OpenTran translation repository for the English text "open file" in Zulu.
tikal -q "open file" -tt localhost -sl en -tl af
Queries a local Translate Toolkit TM server located on
http://localhost:8080 (note that 8080 is omitted in the command line as
it is port by default). The source is
English and the requested translation is Afrikaans.
This command lists all the filter configurations available for Tikal. The
configurations listed are the ones you can use as filter configurations the the
input files (-fc option). This configuration indicates how to
extract the document.
The syntax of this command is:
-lfc | --listconf
For example:
tikal -listconf
Lists all the configurations currently available.
This command edits or view filter configurations.
Note: This command requires access to UI editors that are available only if
you have one of the okapi-apps platform-specific distribution. If you run
this command from the okapi-lib cross-platform distribution you will get
an error. To edit filter configurations in the okapi-lib distribution,
open the .fprm files. Make sure to always save your modifications
in UTF-8.
The syntax of this command is:
-e [[-fc] configId]
For example:
tikal -e okf_regex@myConfig
Edits the filter configuration okf_regex@myConfig. This is a
user configuration for the RegEx Filter.
tikal -e
Opens the Filter Configurations dialog box, where all the available configurations are listed and can be viewed or edited, and from where you can create new configurations.
Creates a PO file for the give input file. If the input file is multilingual (like a TMX or a TS file), the source and target will be in the PO file.
The syntax of this command is:
-2po [options] inputFile [inputFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the input files |
-ie encoding |
The encoding name of the input files. this is used only if the filter cannot detect the encoding from the input file itself. |
-sl srcLang |
The code of the source language of the input files. |
-tl trgLang |
The code of the target language. |
-generic |
Indicates to use generic notation for inline codes in the generated
PO file, for example <1/> vs. <br/>. If this
option is not specified the inline codes are output in their original
form. |
-trgsource| |
Forces the content of the output target field to be either a copy of the source or empty. If neither option is set the content of the target field is the target text or empty. |
-all |
Allows entries that have no text to be converted. If this option is not set, the entries that are empty, or contains only codes or whitespaces are not included in the output file. If this option is set all entries are included in the output. |
For example:
tikal -2po data.tmx -sl EN -tl ZU
Creates a PO file from the TMX document data.tmx. The source
language will be English and the target Zulu.
Creates a TMX document for the give input file. If the input file is multilingual (like a PO or a TS file), the source and target will be in the TMX document.
The syntax of this command is:
-2tmx [options] inputFile [inputFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the input files |
-ie encoding |
The encoding name of the input files. this is used only if the filter cannot detect the encoding from the input file itself. |
-sl srcLang |
The code of the source language of the input files. |
-tl trgLang |
The code of the target language. |
-trgsource| |
Forces the content of the output target field to be either a copy of the source or empty. If neither option is set the content of the target field is the target text or empty. |
-all |
Allows entries that have no text to be converted. If this option is not set (the default), the entries that are empty, or contains only codes or whitespaces are not included in the output file. If this option is set all entries are included in the output. |
For example:
tikal -2tmx data.po -sl EN -tl ZU
Creates a TMX document from the PO file data.po. The source
language will be English and the target Zulu.
tikal -2tmx data.tmx -sl EN -tl DE -trgempty
Creates a TMX document from another TMX document named data.tmx. The source
language will be English and the target German. The content of the <tuv>
elements for the German will be empty.
Creates a table-like output for the give input file. If the input file is multilingual (like a PO or a TS file), the source and target will be in the output table.
The syntax of this command is:
-2tbl [options] inputFile [inputFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the input files |
-ie encoding |
The encoding name of the input files. this is used only if the filter cannot detect the encoding from the input file itself. |
-sl srcLang |
The code of the source language of the input files. |
-tl trgLang |
The code of the target language. |
-trgsource| |
Forces the content of the output target field to be either a copy of the source or empty. If neither option is set the content of the target field is the target text or empty. |
-csv| |
Output format: csv for comma-separated values, or tab
for tab-delimited values. |
-xliff| |
Inline codes format: xliff for XLIFF, xliffgx for
XLIFF with g/x notation, tmx for TMX, or generic
for generic placeholders. |
-all |
Allows entries that have no text to be converted. If this option is not set, the entries that are empty, or contains only codes or whitespaces are not included in the output file. If this option is set all entries are included in the output. |
For example:
tikal -2tbl data.tmx -sl EN -tl ZU
Creates a tab-delimited file from the TMX document data.tmx. Any
inline codes is output in its original form. The source
language is English and the target Zulu. Any tab character within the text is
escaped with a backslash prefix.
tikal -2tbl data.po -sl EN -tl ES -csv -xliffgx -trgsource
Creates a comma-separated values output file from the PO file data.po.
The inline codes are represented as XLIFF elements using the <g> and <x>
notation. The text is between double quotes, and any double-quote and backslash
characters within the text is escaped with a backslash prefix. The source
language is English and the target Spanish. the content of the target column is
a copy of the source.
Imports the specified input documents into a Pensieve TM database. If the specified TM does not exists, it is created. If it does exist, the input files are added to it.
The syntax of this command is:
-imp myTMdirectory [options] inputFile [inputFile2...]
Where the options are:
-fc configId |
The identifier of the filter configuration to use for the input files |
-ie encoding |
The encoding name of the input files. this is used only if the filter cannot detect the encoding from the input file itself. |
-sl srcLang |
The code of the source language of the input files. |
-tl trgLang |
The code of the target language. |
-trgsource| |
Force the content of the output target field to be either a copy of the source or empty. If neither option is set the content of the target field is the target text or empty. |
-all |
Allows entries that have no text to be imported. If this option is not set (the default), the entries that are empty, or contains only codes or whitespaces are not included in the output file. If this option is set all entries are included in the output. |
For example:
tikal -imp myTMdir data.po -sl JA -tl FR
Imports the PO file data.po into the TM database located in
myTMDir. If the directory does not exists it will be created. If a
TM exists, the input file9s) will be added to it. The
source language of the PO file is Japanese and the target French.
Creates a TMX output file for the the give Pensive TMs.
The syntax of this command is:
-exp tmDirectory [tmDirectory2...] [options]
Where the options are:
-sl srcLang |
The code of the source language. |
-tl trgLang |
The code of the target language. |
-trgsource| |
Forces the content of the output target field to be either a copy of the source or empty. If neither option is set the content of the target field is the target text or empty. |
-all |
Allows entries that have no text to be exported. If this option is not set (the default), the entries that are empty, or contains only codes or whitespaces are not included in the output file. If this option is set all entries are included in the output. |
For example:
tikal -exp myProjectTM -sl en -tl IT
Creates a TMX document from the Pensieve TM stored in myProjectTM. The source language is English and the target Italian.
Note that the command -exp is a shortcut for the -2tmx
command (Convert to TMX Format). You can export Pensieve
TM entries in table, TMX or PO format using the filter configuration
okf_pensieve. For instance, the example above can be also execute using
the following command:
tikal -2tmx myProjectTM -fc okf_pensieve -sl en -tl IT
Tikal provides access to several translation resources, some are machine translation system (MT), some are translation memory system, or some kind or other searchable translation repository. the following resources are available:
Note that some of these resources are Web-based and require an internet connection, others can be installed and used locally.
This is the Okapi framework's own TM engine. It is still under development, but
can be used. This resource is indicated by the option -pen in
Tikal and takes one argument: the directory of the TM for a local TM, or the URL
of the server for a remote TM. If no argument is provided the default host
http://localhost:8080 is used by default.
tikal -q "text to translate" -pen myTmDir
The Translate Toolkit TM is an engine that comes with the Translate Toolkit,
a nicely designed and well supported set of open-source localization tools. You
can run the tool called tmserver on your own machine, or use it
through the Web. You can find more information about the Translate Toolkit here:
http://translate.sourceforge.net/wiki/toolkit/index and the help for
tmserver is here:
http://translate.sourceforge.net/wiki/toolkit/tmserver.
Note that the Translate Toolkit TM can take PO files as input. You can create such PO file from TMX or other formats using the Convert to PO Format command.
This resource is indicated by the option -tt in Tikal and takes
one argument: the host of the server, and optionally the port to use. Use
localhost for a server running locally. If the port is omitted it is set
to 8080 by default.
tikal -q "text to translate" -tt localhost
The GlobalSight TM engine is part of the open-source GlobalSight System. You need to have access to server with the system installed in order to use this resource. You can find more information about the GlobalSight TMS here: http://www.globalsight.com/
This resource is indicated by the option -gs in Tikal and takes one
argument: a configuration file that contains the information about the server to
connect to, username and password, as well as the TM profile to use. The
configuration file must look like this:
#v1 username=myusername password=mypassword serverURL=http://myhost:8080/globalsight/services/AmbassadorWebService?wsdl tmProfile=myprofile
The Open-Tran project is a open-source repository of translations of open-source software. It provides access to its entries through Web services. You can find more information about Open-Tran here: http://open-tran.eu/
This resource is indicated by the option -opentran in
Tikal (without argument). Note that the results returned by the Open-Tran server
have no meangingful scores from a TM viewpoint, so the Okapi connector
recalculate, re-sort and re-filter the results. Note also that this resource is
available only for the Query command, it would be too slow
to use for the other commands.
tikal -q "text to translate" -opentran
The MyMemory project is a central repository of public and private TMs. It offers access through a Web service interface, and can provide MT fall-back translation when no good matches are found. You can find more information about MyMemory here: http://mymemory.translated.net/
This resources is indicated by the option -mm and takes one
argument: an access key. You can use the default access key mmDemo123
for testing the resource. For real work you may want to open an account at
MyMemory.
tikal -q "text to translate" -mm mmDemo123
The Google MT engine is well-known and widely used. Okapi provides a connector for the Ajax API. You can find more information about Google MT here: http://code.google.com/apis/ajaxlanguage/.
This resource is indicated by the option -google in Tikal
(without argument). It is the default for the Query command.
tikal -q "text to translate" -google
Apertium is an open-source Rule-based MT project. It provides translation for many language pairs, including for less-common languages such as Catalan, Galician, Welsh, Esperanto, Occitan, etc. You can find more information about Apertium here: http://wiki.apertium.org/. The connector uses the JSONP REST API described here: http://wiki.apertium.org/wiki/Apertium_web_service
This resource is indicated by the option -apertium and takes one
optional argument: a configuration file that contains the information about the
server to connect to. The configuration file looks something like this:
#v1 server=http://api.apertium.org/json/translate apiKey=myApiKey
Note that the apiKey parameter is optional. However using an API key is highly recommended. See http://api.apertium.org/register.jsp for details and to register one.
If the configuration file is omitted, the default main public apertium.org server is used, without API key.
tikal -q "text to translate" -apertium myApertium.cfg
The Microsoft MT engine is freely available from Microsoft (http://www.microsofttranslator.com/). Okapi provides a connector for the SOAP API. You can get more information about this API and its terms here: http://sdk.microsofttranslator.com/. To use this connector you need a AppID from Microsoft. You can get one at http://www.bing.com/developers/appids.aspx.
This resource is indicated by the option -ms in Tikal and takes one
argument: a configuration file that contains the information to connect to and use
the Microsoft MT service. The
configuration file must look like this:
#v1 appId=mypersonalappid
To use the Microsoft MT engine call Tikal like this (where myMS.cfg
is your configuration file).
tikal -q "text to translate" -ms myMS.cfg
The TAUS Data Association (TDA) offers a public Search facility on a large corpus of translations. Okapi provides a connector for the REST API provided by TDA. The TDA Web Search is accessible from here: http://www.tausdata.org/index.php/taus-search.
This resource is indicated by the option -tda in Tikal and takes
one argument: a configuration file that contains the information to access the
TDA search service. The configuration file must look like this:
#v1 server=http://www.tausdata.org/api appKey=myAppKey username=myTDAUsername password=myTDAPassword industry.i=0 contentType.i=0
For example, to access TDA Search, call Tikal like this (where
myTDA.cfg is your configuration file):
tikal -q "terms to search" -tda myTDA.cfg -sl en-us -tl fr-fr