JSON Filter

2021-01-04T23:55:30Z

Ctingley: Update metadata param description

{{Filters Header}}
==Overview==

The JSON Filter is an Okapi component that implements the IFilter interface for JSON (Javascript Object Notation).

The implementation is based on the JSON specifications: http://www.json.org/

The following is an example of a very simple JSON file. The translatable text is highlighted:

{"menu": {
"value": "File",
"popup": {
"menuitem": [
{"value": "New"},
{"value": "Open"},
{"value": "Close"}
]
}
}}

==Processing Details==

===Input Encoding===

JSON files are normally in one of the Unicode encoding, but the filter supports any encoding. It decides which encoding to use for the input file using the following logic:

* If the file has a Unicode Byte-Order-Mark:
** Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used.
* Else, if a header entry with a <code>charset</code> declaration exists in the first 1000 characters of the file:
** If the value of the <code>charset</code> is "<code>charset</code>" (case insensitive):
*** Then the file is likely to be a template with no encoding declared, so the current encoding (auto-detected or default) is used.
*** Else, the declared encoding is used. Note that if the encoding has been detected from a Byte-Order-Mark and the encoding declared in the header entry does not match, a warning is generated and the encoding of the Byte-Order-Mark is used.
* Otherwise, the input encoding used is the default encoding that was specified when setting the filter options.

===Output Encoding===

If the output encoding is UTF-8:

* If the input encoding was also UTF-8, a Byte-Order-Mark is used for the output document only if one was detected in the input document.
* If the input encoding was not UTF-8, no Byte-Order-Mark is used in the output document.

===Line-Breaks===

The type of line-breaks of the output is the same as the one of the original input.

===Comments===

Though not technically legal in JSON these comment types are supported:
<code>
* // comment
* # comment
* /* comment */
* 
</code>

==Parameters==

=== Options Tab===

====Stand-alone strings====

<cite>Extract strings without associated key</cite> — Set this option to extract string that are not associated directly to a key value.

====Strings with keys====

<cite>Extract all key/strings pairs</cite> — Set this option to extract all strings that have a key associated. If a regular expression for exceptions is defined, the strings that have a key matching the expression are not extracted.

<cite>Do not extract key/string pairs</cite> — Set the option to not extract any string that has an associated key. If a regular expression for exceptions is defined, the strings that have a key matching the expression are extracted.

<cite>Excepted when the key matches the following regular expression</cite> — Enter a regular expression that correspond to the keys that should have a behavior inverse to the default behavior you have selected for the key/strings pairs.

<cite>Use the key as the resname</cite> — Set this option to use the value of the key as the value of the name of the extracted item (<code>resname</code> in XLIFF).

<cite>Use the full key path</cite> — Set this option to use the full key path in the <code>resname</code>. For example: <code>/menu/value/popup/menuitem/value</code>. The use key name as resname option must be set for this option to take effect. If enabled, exception regular expressions apply to the full path.

<cite>Include leading "/" on key path</cite> — Set this option to have a leading character '/' in the full key path.

<cite>Regex matching keys that are notes, values of which to appear as <note> in XLIFF</cite> — Specify regular expression. The values of the matching keys will be transferred to <note> elements in XLIFF.

<cite>Regex matching keys who's values are added as TextUnit Metadata</cite> — Specify regular expression. The values of the matching keys will be written out as <context-group> elements in XLIFF.

===New Extraction Rules >= version M39===
If specified these will override the corresponding rules above.

<cite>Regex matching keys which are ID's (resname in XLIFF), overrides "use the key as resname"</cite> — Specify regular expression. The value of the matching key will be used as <code>resname</code> in XLIFF.

<cite>Regex matching keys who's values are extracted (overrides "extraction exceptions")</cite> — Specify regular expression. The values of the matching keys will be extracted.

===Content Processing Tab===

<cite>Process text content with this sub-filter</cite> — Specify an Okapi filter ID (e.g. <code>okf_html</code>) to process the content of all translatable text with that filter. Leave this field blank for default behavior.

<cite>Find inline codes by patterns defined below</cite> — Set this option to use the specified regular expressions on the text of the extracted items. Any match will be converted to an inline code.

'''Note:''' This option cannot be used together with the sub-filtering option.

By default the expression is:

((%(([-0+#]?)[-0+#]?)((\d\$)?)(([\d\*]*)(\.[\d\*]*)?)[dioxXucsfeEgGpn])
|((\\r\\n)|\\a|\\b|\\f|\\n|\\r|\\t|\\v)
|(\{\d.*?\}))

{{CodeFinder Help}}

==Limitations==

Comments within a JSON string are parsed as part of the string content, not as comments. A configured subfilter will then process these as true comments (they will become part of the skeleton or whatever the filter is configured to do).
[[Category:Filters]]

FAQ

2018-05-11T20:40:17Z

Ctingley: /* Is there a users group or a support mailing list? */

==Capabilities==

====What formats are supported?====

The framework offers filters for many file formats, including XML, XLIFF, TMX, HTML, DOCX, ODT, Properties, PO, and many more. 
For a more complete list of the supported formats, see the "[[Filters]]" page.

Note that you can also create your own filter configurations to support some formats. You can also create your own filters and use them seamlessly with the Okapi tools.

====How do I extract text for translation?====

See the article "[[How to Extract Text for Translation]]" in the [[Knowledge Base]].

====Does Okapi provide a translation editor?====

Not at this time. The Okapi tools allow you to create translation packages in various formats that can be opened in different translation editors such as OmegaT, MemoQ, Trados Workbench, Swordfish, Wordfast, etc.

For translating XLIFF files see: "[[How to Translate XLIFF Documents]]".

====Does Okapi provide a TM (Translation Memory)?====

Yes. There are currently two TM engines implemented in the framework:

* [[Pensieve TM]] is the main TM engine.
* [[SimpleTM TM]] is a limited and older engine that '''is being progressively phased out'''.

You can also use third-part TM engines through the the different [[Connectors|connectors]] that the framework provides. For example: the [[Translate Toolkit TM Connector|Translate Toolkit TM]], [[GlobalSight TM Connector|GlobalSight TM]], the [[OpenTran Translation Repository Connector|OpenTran Translation Repository]], [[MyMemory TM Connector|MyMemory]], etc. For a complete list and more details see the "[[Connectors]]" page.

====Does Okapi provide a MT (Machine Translation) system?====

Not at this time. But you can use different third-party MT system using one of the connectors distributed with the framework. For example you can work with [[Google MT v2 Connector|Google MT]], [[Apertium MT Connector|Apertium MT]], [[Microsoft Translator Connector|Microsoft Translator]], etc. For a complete list, see the [[Connectors|Connectors page]].

====Why is there several distributions, isn't Java cross-platform?====

Yes, Java is cross-platform, and most of the Okapi code runs anywhere Java runs.
However, for a better internationalization support and a more seamless integration with each platform, we have selected to use Eclipse SWT (http://www.eclipse.org/swt) as the foundation for the UI of our applications. That library requires a different distribution for each platform and architecture.

Okapi's source code has been carefully designed to separate UI-dependant code and non-UI code, so most of the components (such as the [[Filters]], the [[Steps]] and the [[Connectors]]) can be used on any platform.

====Can I change the Java VM settings when running the tools?====

Yes. See [[How to Change the Java Parameters for Rainbow]]. You can follow the same steps for all Okapi tools.

==Simple Troubleshooting==

====Is there a Getting Started guide?====

Yes. See the "[[Getting Started]]" page.

====When I try to start Rainbow/Ratel/CheckMate nothing happens. What is wrong?====

* Check that you have the proper version of Java (1.7 or above).
* Make sure you have installed the correct distribution for your platform.
* If your machine is 32-bit make sure to have installed the 32-bit distribution.
* If your machine is 64-bit make sure to have installed the 64-bit distribution.

==Licenses==

====Under what licence the Okapi Framework is developed?====

* The source code is under [https://www.apache.org/licenses/LICENSE-2.0 Apache Licence version 2.0].
* The documentation is under [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike License (CC-BY-SA)].

====Can I use Okapi's components in my applications?====

Yes. The project uses the Apache license which allows open-source or commercial products to use our applications and components. See more information the license at [https://www.apache.org/licenses/LICENSE-2.0].

==Support==

====Is there a users group or a support mailing list?====

Yes. There are two main mailing lists. Both have public archives, and both require registration to post a message:

* [http://tech.groups.yahoo.com/group/okapitools/ https://groups.yahoo.com/group/okapitools/] is the group and mailing list '''for the end users'''.
* [http://groups.google.com/group/okapi-devel https://groups.google.com/group/okapi-devel] is the group and mailing list '''for the developers''' working on the source code.

====How do I report bugs or request enhancement?====

* You can post a bug report or an enhancement request in the issues tracking page: http://code.google.com/p/okapi/issues/entry if you have a Google account.

* You can post a message to the [http://tech.groups.yahoo.com/group/okapitools/ Okapi Tools users group] if you are part of the group.

* You can just [mailto:okapitools@opentag.com&subject=Feedback send feedback by email].

==Miscellaneous==

====What does 'Okapi' mean?====

An okapi is an African animal looking somewhat like [http://en.wikipedia.org/wiki/Okapi a cross between a zebra and a giraffe]. Okapi is pronounced [http://en.wikipedia.org/wiki/Wikipedia:IPA_for_English /oʊˈkɑːpɪ/] ([http://www.m-w.com/cgi-bin/audio.pl?okapi001.wav=okapi hear it])

The usage of this name for the framework has its roots to much older projects. At some point it was an acronym for "Open Kit API".

====What happened to the .NET Okapi?====

The older version of the Okapi Framework for .NET is no longer developed. Its distribution and source code is still available here: http://sourceforge.net/projects/okapi/. All new development is now done in the Java branch.

====Where is Olifant?====

Olifant, the TMX editor, is currently only part of the .NET Okapi. It is still available [http://sourceforge.net/projects/okapi/files/ from the SourceForge project]. Note that Olifant is for Windows only.

==For developers==

====Getting set up====

* Check out the source code from Bitbucket using git clone: https://bitbucket.org/okapiframework/okapi
* Or, if you want to submit pull requests, first create a fork of the Okapi project.
* Import into your IDE. For example, in Eclipse go to File > Import > Maven > Existing Maven project.
If you want to keep several distinct Okapi repositories in the same Eclipse workspace (for instance, your fork and the main Okapi project), you need to assign a name template under the "Advanced" section in the first step of the import wizard.
* The "master" branch contains the latest release version. The "dev" branch contains the current work (the "snapshot" in Maven terms).
* See also: https://bitbucket.org/okapiframework/okapi/wiki/How%20to%20Contribute
Happy coding!

====How to build okapi-lib locally====

The Okapi Framework consists of Maven projects. However, in order to build the apps and lib projects locally, you need to use the Ant build configurations.

For instance, to create a local version of okapi-lib.jar, go to <OKAPI_HOME>/deployment/maven/ and run ant -f build_okapi-lib.xml init okapiLib. The jar will be generated in <OKAPI_HOME>/deployment/maven/dist_common/lib/.

If you use the default build.xml by running above command without the -f option, platform-specific distributions of the apps will be created plus the platform-indipendent okapi-lib.jar.

Longhorn

2018-01-23T19:54:19Z

Ctingley:

__TOC__
==Overview==

Longhorn is a server application that allows you to execute Batch Configurations remotely on any set of input files. Batch Configurations which include pre-defined pipelines and filter configurations, can be exported from [[Rainbow]].

The distribution also includes a client library to access the Longhorn Web services.

==Download and Installation==

* '''Stable release: http://bintray.com/okapi/Distribution/Longhorn

* <del>Development release (snapshot): http://okapiframework.org/snapshots</del> Development snapshots are not currently available.

To install Longhorn:

* Unzip the distribution file on your server.
* Follow the instructions provided with the <code>readme</code> file of the distribution.
* Starting with m24, Longhorn requires Java 1.7.

==Functionality==

To process files with Longhorn these steps are required:
# Create a temporary project
# Upload a Batch Configuration file into that project
# Upload the input files into that project
# Execute the project
# Download the output files
# Delete the project

==Usage==

There are three ways to access Longhorns functionality. There is
* a REST interface,
* a Java API and
* an HTML client.

They can be used as described below.

===REST-Interface===

Longhorn can be accessed directly via HTTP methods:
;POST http://{host}/okapi-longhorn/projects/new : Creates a new temporary project and returns its URI (e.g. <code>http://localhost/okapi-longhorn/projects/1</code>) in the <tt>Location</tt> header of the response.
;POST http://{host}/okapi-longhorn/projects/1/batchConfiguration : Uploads a Batch Configuration file
;POST http://{host}/okapi-longhorn/projects/1/inputFiles.zip : Adds input files as a zip archive (the zip will be extracted and the included files will be used as input files)
;PUT http://{host}/okapi-longhorn/projects/1/inputFiles/help.html : Uploads a file that will have the name 'help.html'
;GET http://{host}/okapi-longhorn/projects/1/inputFiles/help.html: Retrieve an input file that was previously added with PUT or POST
;POST http://{host}/okapi-longhorn/projects/1/tasks/execute : Executes the Batch Configuration on the uploaded input files
;POST http://{host}/okapi-longhorn/projects/1/tasks/execute/en-US/de-DE : Executes the Batch Configuration on the uploaded input files with the source language set to 'en-US' and the target language set to 'de-DE'
;POST http://{host}/okapi-longhorn/projects/1/tasks/execute/en-US?targets=de-DE&targets=fr-FR : Executes the Batch Configuration on the uploaded input files with the source language set to 'en-US' and multiple target languages, 'de-DE' and 'fr-FR'
;GET http://{host}/okapi-longhorn/projects/1/outputFiles : Returns a list of the output files generated
;GET http://{host}/okapi-longhorn/projects/1/outputFiles/help.out.html : Accesses the output file 'help.out.html' directly
;GET http://{host}/okapi-longhorn/projects/1/outputFiles.zip : Returns all output files in a zip archive
;DEL http://{host}/okapi-longhorn/projects/1 : Deletes the project
;GET http://{host}/okapi-longhorn/projects : Returns a list of all projects on the server

===REST-Interface Sample code: Python===

This example works with the requests package - minidom is used to parse the XML project list.

import requests
from xml.dom import minidom

url = 'http://localhost:8080/okapi-longhorn/'

Code to create a new project

r = requests.post(url+'projects/new')
print r.text

Code to '''list''' existing projects (i.e.: to check if the project was created, and to get the ID of the last project)

r = requests.get(url+'projects/')

xmlstring = minidom.parseString(r.text)
itemlist = xmlstring.getElementsByTagName('e')
lastproject = len(itemlist)

Code to '''post''' a '''batch config file'''

batchfile = open('/home/user/batchconfig.bconf', 'rb')
r = requests.post(url+'projects/'+str(lastproject)+'/batchConfiguration', files=dict(batchConfiguration=batchfile))

Code to '''put''' a string as a '''file'''

payload = "hello world!"
r = requests.put(url+'projects/'+str(lastproject)+'/inputFiles/test.txt', files=dict(inputFile=payload))

Code to '''post''' a '''file'''

payload = open('/home/user/test.txt', 'rb')
r = requests.post(url+'projects/'+str(lastproject)+'/inputFiles/test.txt', files=dict(inputFile=payload))

===Java API===

The API is distributed as a <code>.jar</code> file in the Longhorn distribution package. You can also build it from the Okapi source code via Maven from the project <code>lib-longhorn-api</code>.

====Maven====
The API is available as a maven dependency. Add this repository to your <tt>pom.xml</tt>:
<repository>
<id>okapi-longhorn-release</id>
<name>Okapi Longhorn Release</name>
<url>http://repository-opentag.forge.cloudbees.com/release/</url>
</repository>

Along with this dependency, substituting in a valid version number (e.g, <tt>0.27</tt>):
<dependency>
<groupId>net.sf.okapi.lib</groupId>
<artifactId>okapi-lib-longhorn-api</artifactId>
<version>${okapi.version}</version>
</dependency>

====Sample Code====

LonghornService ws = new RESTService(new URI("http://localhost:9095/okapi-longhorn"));

// Create project
LonghornProject proj = ws.createProject();

// Post batch configuration
File bconfFile = new File("C:\\setup.bconf");
proj.addBatchConfiguration(bconfFile);

// Send input files

// First by single upload...
File file1 = new File("C:\\help.html");
// * in the root directory
proj.addInputFile(file1, file1.getName());
// * and in a sub-directory
proj.addInputFile(file1, "samefile/" + file1.getName());

// ...then by package upload
File inputPackage = new File("C:\\more_files.zip");
proj.addInputFilesFromZip(inputPackage);

// Execute pipeline
// Languages don't matter
proj.executePipeline();
// Languages matter
proj.executePipeline("en-US", "de-DE");

// Get output files
ArrayList<LonghornFile> outputFiles = proj.getOutputFiles();

// Does the fetching of files work?
for (LonghornFile of : outputFiles) {
InputStream is = of.openStream();
//TODO save InputStream to local file
}

// Delete project
proj.delete();

===HTML-Client===

You can create projects and upload/download files via an integrated HTML client, too. Uploading input files (and downloading output files) as a zip archive is currently not implemented for the HTML client.

[[File:longhorn_html_client.png]]

===Configuration===
Since Okapi M22 Okapi Longhorn can be build to run multiple instances on one server.
You can adjust the build so that it is possible to run multiple Longhorn instances in one JBoss application server. Therefore, the build must be called with an additional parameter:

mvn clean verify -DuseUniqueContextRoot

====Configure working directory path====
Longhorn has 2 options to configure the working directory of longhorn (sort by priority):
#system parameter "LONGHORN_WORKDIR"
#configuration file in user.home "/okapi-longhorn-configuration.xml"
If nothing is defined, the working-directory is in user.home in folder "Okapi-Longhorn-Files".
Longhorn configuration file example:

<longhorn-config>
<use-unique-working-directory>True</use-unique-working-directory>
<working-directory>D:\testData\longhorn-files</working-directory>
</longhorn-config>

====Configuration Options====

{| class="wikitable"
! option
! description
! data type
|-
| working-directory
| path of the working directory
| string
|-
| use-unique-working-directory
| if set to true the version of longhorn will be added to working directory name
e.g path/to/working/directory_M0.21
| boolean(True or False)
|}

[[Category:Longhorn]]

XML Filter

2017-10-18T18:27:08Z

Ctingley: /* codeFinder */

{{Filters Header}}
==Overview==

This filter allows you to process XML documents. It uses a DOM-based parser, which allows it to implement [[ITS]]. If you need to process very large XML documents and have no need for ITS, you may want to look at using the [[XML Stream Filter]].

The following is an example of a simple XML document. The translatable text is highlighted. Because each format based on XML is different, you need information on what are the translatable parts, what are the inline elements, etc. The XML Filter [[#ITS Support|implements the ITS W3C Recommendation]] to address this issue.

<?xml version="1.0" encoding="utf-8"?>
<myDoc>
<prolog>
<author>Zebulon Fairfield</author>
<version>version 12, revision 2 - 2006-08-14</version>
<keywords><kw>horse</kw><kw>appaloosa</kw></keywords>
<storageKey>articles-6D272BA9-3B89CAD8</storageKey>
</prolog>
<body>
<title>Appaloosa</title>
The Appaloosas are rugged horses originally breed by
the <kw>Nez-Perce</kw> tribe in the US Northwest.
They are often characterized by their spotted coats.
</body>
</myDoc>

This filter is implemented in the class <code>net.sf.okapi.filters.xml.XMLFilter</code> of the library.

==Processing Details==

===Input Encoding===

The filter decides which encoding to use for the input document using the following logic:

* If the document has an encoding declaration it is used.
* Otherwise, UTF-8 is used as the default encoding (regardless the actual default encoding that was specified when opening the document).

===Output Encoding===

If the output encoding is UTF-8:

* If the input encoding was also UTF-8, a Byte-Order-Mark is used for the output document only if one was detected in the input document.
* If the input encoding was not UTF-8, no Byte-Order-Mark is used in the output document.

If the original document had an XML encoding declaration it is updated, if it did not, one is automatically added.

===Line-Breaks===

The type of line-breaks of the output is the same as the one of the original input.

==Parameters==

This filter stores its parameters in an XML file and does not provide an editor to modify it. You can edit the file in a simple text editor, or with an XML editor. For an example, see the article "[[How to Create a Custom Configuration for the XML Filter]]".

===ITS Support===

By default the filter process the XML documents based on the '''ITS defaults'''. That is:

* the content of all elements is translatable,
* and none of the values of the attribute translatable.

Different behavior can occur if the input document contains ITS markup, or if a filter parameters file is specified. The parameters file used by the the XML Filter is [[ITS|an ITS document]].

The '''Internationalization Tag set (ITS)''' is a W3C recommendation that defines a set of elements and attributes you can use to specify different internationalization- and localization-related aspects of your XML document, for instance: ITS defines what attribute values are translatable, what element content should be protected, what element should be treated as a nested sub-flow of text, and much more.

The filter supports ITS 1.0 and ITS 2.0 (2.0 is backward compatible with 1.0)

* The ITS 1.0 specification is available at http://www.w3.org/TR/its/.
* The ITS 2.0 specification is available at http://www.w3.org/TR/its20/.

See the "[[ITS]]" page for more details on the format.

The filter supports global and local rules and most data categories. See the '''[[ITS Components]]''' page for a detailed list of how the data categories are supported and other information on the implementation.

===ITS Extensions===

The filter supports extensions to the ITS specification. These extension use the namespace URI http://www.w3.org/2008/12/its-extensions.

* [[#idValue and xml:id|idValue and xml:id]]
* [[#whiteSpaces|whiteSpaces]]

====idValue and xml:id====

{{NoteBox|This extension was defined for ITS 1.0, ITS 2.0 offers the new [http://www.w3.org/TR/its20/#idvalue Id Value] data category that should be used instead of this extension.}}

When the attribute <code>xml:id</code> is found on a translatable element, it is used as the name of the text unit generated for that element.

For example, in the example below, the resource name associated with the text unit for the <code></code> element is "<code>id1</code>".

Text

The attribute <code>idValue</code> used in the ITS <code>translateRule</code> element allows you to define an XPath expression that correspeonds to the identifier value for the given selection. The value of <code>idValue</code> must be an expression that can return a string. A node location is a valid expression: it will return the value of the first node at the given location.

For example, in the example below, the resource name associated with the text unit for the <code></code> element is "<code>id1</code>":

<pre><doc>
<its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
<its:translateRule selector="//p" translate="yes" itsx:idValue="@name"/>
</its:rules>
text 1
</doc></pre>

Note that <code>xml:id</code> has precedence over <code>idValue</code> declaration. For example, in the example below, the resource name associated with the text unit for the <code></code> element is "<code>xid1</code>", not "<code>id1</code>".

<pre><doc>
<its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
<its:translateRule selector="//p" translate="yes" itsx:idValue="@name"/>
</its:rules>
text 1
</doc></pre>

You can build complex ID based on different attributes, element or event hard-coded text. Any of the String functions offered by XPath can be used.

For example, in the file below, the two elements <code>&tl;text></code> and <code><desc></code> are translatable, but they have only one corresponding ID, the <code>name</code> attribute in their parent element. To make sure you have a unique identifier for both the content of <code><text></code> and the content of <code><desc></code>, you can use the rules set in the example. The XPath expression "<code>concat(../@name, '_t')</code>" will give the ID "<code>id1_t</code>" and the expression "<code>concat(../@name, '_d')</code>" will give the ID "<code>id1_d</code>".

<pre><doc>
<its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
<its:translateRule selector="//text" translate="yes" itsx:idValue="concat(../@name, '_t')"/>
<its:translateRule selector="//desc" translate="yes" itsx:idValue="concat(../@name, '_d')"/>
</its:rules>
<msg name="id1">
<text>Value of text</text>
<desc>Value of desc</desc>
</msg>
</doc></pre>

====whiteSpaces====

{{NoteBox|This extension was defined for ITS 1.0, ITS 2.0 offers the new [http://www.w3.org/TR/its20/#preservespace Preserve Space] data category that should be used instead of this extension.}}

The extension attribute whiteSpaces allows you to apply globally the equivalent of a local <code>xml:space</code> attribute.

For example, if you have a format where all element <code><pre></code> must have their spaces, tabs and line breaks preserved, you can specify the attribute <code>whiteSpaces="preserve"</code> in a <code><its:translateRule></code> element for the <code><pre></code> elements. In the example below, the spaces in the <code><pre></code> element will be preserved on extraction.

<doc>
<nowiki><its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:itsx="http://www.w3.org/2008/12/its-extensions"></nowiki>
<its:translateRule selector="//pre" translate="yes" itsx:whiteSpaces="preserve"/>
</its:rules>
<pre>Some txt with many spaces. </pre>
</doc>

Note that the <code>xml:space</code> attribute has precedence over <code>whiteSpaces</code>. For example, in the following example, the white spaces in the content of <code><pre></code> may '''not''' be preserved because the attribute <code>xml:space</code> has the value <code>default</code>:

<doc>
<nowiki><its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its"
xmlns:itsx="http://www.w3.org/2008/12/its-extensions"></nowiki>
<its:translateRule selector="//pre" translate="yes" itsx:whiteSpaces="preserve"/>
</its:rules>
&<pre xml:space="default">Some txt with many spaces. </pre>
</doc>

===Filter Options===

The filter supports also options in addition to ITS and ITS extension. These options use the namespace URI <code>okapi-framework:xmlfilter-options</code>.

{{NoteBox|The filter options must be placed in the parameters file (.fprm) used with the filter, not in embedded or linked ITS rules. Options placed in embedded or linked ITS rules have no effect.}}

When you use several options, they must be set in a single <code><okp:options></code> element, as shown below:

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options lineBreakAsCode="yes"
escapeQuotes="no"
escapeGT="yes"
/>
</its:rules></pre>

The following options are available:

* [[#lineBreakAsCode|lineBreakAsCode]]
* [[#codeFinder|codeFinder]]
* [[#omitXMLDeclaration|omitXMLDeclaration]]
* [[#escapeQuotes|escapeQuotes]]
* [[#escapeGT|escapeGT]]
* [[#escapeNbsp|escapeNbsp]]
* [[#extractIfOnlyCodes|extractIfOnlyCodes]]
* [[#inlineCdata|inlineCdata]]

====lineBreakAsCode====

In some cases the content of element includes line-breaks that need to be included as part of the content but without using an actual line-break in the extracted text. For example in some XML documents generated by Excel, the formatting of the cells is marked up with <code>&#10;</code> entity references. They need to be passed as inline codes.

By default this option is set to false.

To specify this the filter use the extension <code>lineBreakAsCode</code> extension attribute. This affect all the extracted content.

For example: The following code is an ITS document with the option to treat line-breaks as code. It can be used along with the example of XML document listed below.

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options lineBreakAsCode="yes"/>
</its:rules></pre>

<doc>
<data>line 1&#10;line 2.</data>
</doc>

====codeFinder====

You can define a set of regular expressions to capture span of extracted text that should be treated as inline codes. For example, some element content may have variables, or HTML tags that need to be protected from modification and treated as codes. Use the codeFinder element for this.

In the following parameters file, the <code>codeFinder</code> element defines two rules:

* The first one (rule0) is "<code><(/?)\w[^>]*?></code>" and matches any XML-type tags (e.g. "<code></code>", "<code></code>", "<code> </code>")
* The second one (rule1) is "<code>(#\w+?\#)|(%\d+?%)</code>" and matches any word enclosed in <code>#</code> (e.g. "<code>#VAR#</code>") or number enclosed in <code>%</code> (e.g. "<code>%1%</code>").

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:codeFinder useCodeFinder="yes">#v1
count.i=2
rule0=&lt;(/?)\w+[^&gt;]*?&gt;
rule1=(#\w+?\#)|(%\d+?%)
</okp:codeFinder>
</its:rules></pre>

Some important details:

* Set <code>useCodeFinder</code> to "yes" to have the rules used, if the attribute is missing its value is assumed to be "no".
* Make sure the first line of the <code><codeFinder></code> element content is <code>#v1</code>.
* Each entry in the content must be on a separate line.
* <code>count.i=N</code> must be before any rules and <code>N</code> must be the number of rules.
* <code>ruleN</code> must be incremented starting at 0.
* The pattern for a rule must be escaped for XML, for example: "<code><(/?)\w[^>]*?></code>" must be entered "<code>&lt;(/?)\w[^&lt;]*?&gt;</code>" in the parameters file.
* Do not put spaces before <code>count.i</code> or <code>ruleN</code>, and not after your expressions.

To facilitate the creation of code finder rules [[Rainbow - Code Finder Editor|Rainbow provides the Code Finder Editor]].

====omitXMLDeclaration====

By default an XML declaration is always set at the top of the output document (regardless wether the original document has one or not). It is an important part of the XML document and it is especially needed when the encoding of the output document is not UTF-8, UTF-16 or UTF-32, as its name must be specified in the XML declaration. However, there are a few special cases when the declaration is better left off. To handle those rare cases, you can use <code>omitXMLDeclation</code> to indicate the filter to not output the XML declaration.

For example:

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options omitXMLDeclaration="yes"/>
</its:rules></pre>

Remember that XML documents without an XML declaration may be read incorrectly if the encoding of the document is not UTF-8, UTF-16 or UTF-32.

====escapeQuotes====

By default, when processing the document, the filter uses double-quotes to enclose all attributes (translatable or not) and use the following rules for escaping/not-escaping the literal quotes:

* Inside the attribute values:
** Single-quotes (=apostrophes) are never escaped
** Double-quotes are always escaped
* In element content:
** Single-quotes (=apostrophes) are not escaped
** Double-quotes are escaped escaped by default

You cannot change the escaping rules for attributes.

For element content: If the document is processed without triggering any rule that allow the translation of an attribute, then (and only then) the filter takes into account the <code>escapeQuotes</code> option to escape or not double-quotes in the translatable content.

For example, the following parameters file allows to not escape double-quotes in element content (for the documents where there is no rule for translatable attributes are triggered):

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options escapeQuotes="no"/>
</its:rules></pre>

====escapeGT====

By default the character '<code>></code>' is escaped. You can indicate to the filter to not escape it using the <code>escapeGT</code> option.

For example, the following parameters file indicates to not escape greater-than characters:

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options escapeGT="no"/>
</its:rules></pre>

====escapeNbsp====

By default the non-breaking space character is escaped (in the form <code>&#x00a0;</code>). You can indicate to the filter to not escape it using the <code>escapeNbsp</code> option.

For example, the following parameters file indicates to not escape the non-breaking space characters:

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options escapeNbsp="no"/>
</its:rules></pre>

====extractIfOnlyCodes====

By default all extractable entries are extracted even when they contain only white-spaces and/or inline codes. You can indicate to the filter to not extract such entries using the <code>extractIfOnlyCodes</code> option.

For example, the following parameters file indicates to not extract entries with only whte-spaces and/or inline codes:

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options extractIfOnlyCodes="no"/>
</its:rules></pre>

====inlineCdata====

By default, CDATA sections will be exposed as regular content, and the CDATA markers themselves will be discarded. When the <code>inlineCdata</code> option is set,
the CDATA markers will be exposed as inline codes.

For example, the following parameters file will expose CDATA markers as inline codes:

<pre><its:rules version="1.0"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:okp="okapi-framework:xmlfilter-options">
<okp:options inlineCdata="yes"/>
</its:rules></pre>

==Limitations==

* Currently, in some cases, the ITS rule <code>withinTextRule</code> with the value <code>nested</code> may act like it has a value <code>yes</code> instead.
* In output, the values of the <code>xml:lang</code> attributes are not updated to reflect the target language.
* When doing the extraction, the whole input file is loaded into memory. You may run into memory limitation if the document is very large.

[[Category:Filters]] [[Category:ITS]]

IDML Filter

2017-10-03T18:51:51Z

Ctingley: Update options/limitations for new (post-M34) rewrite of IDMLFilter

{{Filters Header}}
==Overview==

This filter allows you to process IDML documents. IDML (InDesign Markup Language) is an XML-based format, introduced in Adobe InDesign CS4, for representing InDesign content. IDML is used in several InDesign and InCopy file types. The specification can be found [http://www.adobe.com/content/dam/Adobe/en/devnet/indesign/cs5_docs/idml/idml-specification.pdf on the Adobe Web site].

==Processing Details==

When processing an IDML filter, the filter looks at all the spreads in the document, and for each of them, gather the list of the stories used in <code><TextFrame></code> and <code><TextPath></code>. The text is extracted by spread, and for each spread by story in the order the appear in the spread.

Stories embedded inside other stories and not declared at a spread level are extracted in a special group.

==Parameters==

<cite>Untag XML Structures</cite> — Set this option to skip embedded XML structural information when extracting translatable content.

<cite>Extract notes</cite> — Set this option to extract the content of notes (<code><Note></code> elements).

<cite>Extract master spreads</cite> — Set this option to extract the content of the master spreads if they exist. If this option is not set only the normal spreads are extracted.

<cite>Extract hidden layers</cite> — Set this option to extract also the hidden layers.

==Deprecated Parameters==

Prior to release M34, the filter supported several additional parameters. The behavior of these has been subsumed by the more intelligent content processing performed by the updated version of the filter in versions M34 and later.

<cite>Simplify inline codes when possible</cite> — Set this option to reduce the number of inline codes by re-grouping adjacent codes when it is possible.

<cite>Create new text units on hard returns</cite> — Set this option to create separate text units when a hard return element (<code> </code>) is found. '''IMPORTANT: This option is not completed yet. Setting it may create extracted documents you will not be able to merge back. Always test merge before use this for production.'''

<cite>Maximum spread size</cite> — Set the maximum size for the spread files (in KBytes). Any spread file above the given value will either generate an error or will be skipped from extraction depending on the specified option. This allows you to skip over large spread files that may contain only graphics and require too much memory to be opened. Note that the skipped file are not checked for translatable text.

<cite>Generate an error when a spread is larger than the specified value</cite> — Set this option to generate an error if a spread size is above the specified <cite>Maximum spread size</cite>. If this option is not set, the spread is skipped with a warning message.

[[Category:Filters]]

Markdown Filter

2017-09-08T16:30:35Z

Ctingley: /* Inline Codes */

OpenXML Filter

2015-11-28T06:15:35Z

Ctingley:

KantanMT Connector

2015-11-13T01:28:03Z

Ctingley: Created page with "{{Connectors Header}} __TOC__ ==Overview== The commercial [https://kantanmt.com KantanMT] service can be accessed via API, which is documented at http://docs.kantanmt.apiary.io...."

{{Connectors Header}}
__TOC__
==Overview==

The commercial [https://kantanmt.com KantanMT] service can be accessed via API, which is documented at http://docs.kantanmt.apiary.io.

The connector assumes that the specified KantanMT engine is running, which is not always the case. Users must start the the appropriate engine prior to using the connector, either via the KantanMT dashboard or using the API (for example, with <tt>curl</tt>.)

==Using the Connector==

In [[Rainbow]], the connector can be accessed through the [[Leveraging Step]]. It can also be called programmatically.

==Parameters==

<cite>KantanMT Client Profile</cite> (internal name: <tt>profileName</tt>) — the client profile to use. (Sample value: "Test-EN-DE")

<cite>KantanMT Authorization Token</cite> (internal name: <tt>apiToken</tt>) — the authorization token. (Sample value: "ABCdef123467")

==Limitations==

* The connector assumes that the specified KantanMT engine is running, which is not always the case. Users must start the the appropriate engine prior to using the connector, either via the KantanMT dashboard or using the API (for example, with <tt>curl</tt>.)

[[Category:Connectors]]

Longhorn

2015-06-16T21:19:49Z

Ctingley: /* Download and Installation */

__TOC__
==Overview==

Longhorn is a server application that allows you to execute Batch Configurations remotely on any set of input files. Batch Configurations which include pre-defined pipelines and filter configurations, can be exported from [[Rainbow]].

The distribution also includes a client library to access the Longhorn Web services.

==Download and Installation==

* '''Stable release: http://bintray.com/okapi/Distribution/Longhorn

* <del>Development release (snapshot): http://okapi.opentag.com/snapshots</del> Development snapshots are not currently available.

To install Longhorn:

* Unzip the distribution file on your server.
* Follow the instructions provided with the <code>readme</code> file of the distribution.
* Starting with m24, Longhorn requires Java 1.7.

==Functionality==

To process files with Longhorn these steps are required:
# Create a temporary project
# Upload a Batch Configuration file into that project
# Upload the input files into that project
# Execute the project
# Download the output files
# Delete the project

==Usage==

There are three ways to access Longhorns functionality. There is
* a REST interface,
* a Java API and
* an HTML client.

They can be used as described below.

===REST-Interface===

Longhorn can be accessed directly via HTTP methods:
;POST http://{host}/okapi-longhorn/projects/new : Creates a new temporary project and returns its URI (e.g. <code>http://localhost/okapi-longhorn/projects/1</code>)
;POST http://{host}/okapi-longhorn/projects/1/batchConfiguration : Uploads a Batch Configuration file
;POST http://{host}/okapi-longhorn/projects/1/inputFiles.zip : Adds input files as a zip archive (the zip will be extracted and the included files will be used as input files)
;PUT http://{host}/okapi-longhorn/projects/1/inputFiles/help.html : Uploads a file that will have the name 'help.html'
;GET http://{host}/okapi-longhorn/projects/1/inputFiles/help.html: Retrieve an input file that was previously added with PUT or POST
;POST http://{host}/okapi-longhorn/projects/1/tasks/execute : Executes the Batch Configuration on the uploaded input files
;POST http://{host}/okapi-longhorn/projects/1/tasks/execute/en-US/de-DE : Executes the Batch Configuration on the uploaded input files with the source language set to 'en-US' and the target language set to 'de-DE'
;POST http://{host}/okapi-longhorn/projects/1/tasks/execute/en-US?targets=de-DE&targets=fr-FR : Executes the Batch Configuration on the uploaded input files with the source language set to 'en-US' and multiple target languages, 'de-DE' and 'fr-FR'
;GET http://{host}/okapi-longhorn/projects/1/outputFiles : Returns a list of the output files generated
;GET http://{host}/okapi-longhorn/projects/1/outputFiles/help.out.html : Accesses the output file 'help.out.html' directly
;GET http://{host}/okapi-longhorn/projects/1/outputFiles.zip : Returns all output files in a zip archive
;DEL http://{host}/okapi-longhorn/projects/1 : Deletes the project
;GET http://{host}/okapi-longhorn/projects : Returns a list of all projects on the server

===Java API===

The API is distributed as a <code>.jar</code> file in the Longhorn distribution package. You can also build it from the Okapi source code via Maven from the project <code>lib-longhorn-api</code>.

====Maven====
The API is available as a maven dependency. Add this repository to your <tt>pom.xml</tt>:
<repository>
<id>okapi-longhorn-release</id>
<name>Okapi Longhorn Release</name>
<url>http://repository-opentag.forge.cloudbees.com/release/</url>
</repository>

Along with this dependency, substituting in a valid version number (e.g, <tt>0.27</tt>):
<dependency>
<groupId>net.sf.okapi.lib</groupId>
<artifactId>okapi-lib-longhorn-api</artifactId>
<version>${okapi.version}</version>
</dependency>

====Sample Code====

LonghornService ws = new RESTService(new URI("http://localhost:9095/okapi-longhorn"));

// Create project
LonghornProject proj = ws.createProject();

// Post batch configuration
File bconfFile = new File("C:\\setup.bconf");
proj.addBatchConfiguration(bconfFile);

// Send input files

// First by single upload...
File file1 = new File("C:\\help.html");
// * in the root directory
proj.addInputFile(file1, file1.getName());
// * and in a sub-directory
proj.addInputFile(file1, "samefile/" + file1.getName());

// ...then by package upload
File inputPackage = new File("C:\\more_files.zip");
proj.addInputFilesFromZip(inputPackage);

// Execute pipeline
// Languages don't matter
proj.executePipeline();
// Languages matter
proj.executePipeline("en-US", "de-DE");

// Get output files
ArrayList<LonghornFile> outputFiles = proj.getOutputFiles();

// Does the fetching of files work?
for (LonghornFile of : outputFiles) {
InputStream is = of.openStream();
//TODO save InputStream to local file
}

// Delete project
proj.delete();

===HTML-Client===

You can create projects and upload/download files via an integrated HTML client, too. Uploading input files (and downloading output files) as a zip archive is currently not implemented for the HTML client.

[[File:longhorn_html_client.png]]

===Configuration===
Since Okapi M22 Okapi Longhorn can be build to run multiple instances on one server.
You can adjust the build so that it is possible to run multiple Longhorn instances in one JBoss application server. Therefore, the build must be called with an additional parameter:

mvn clean verify -DuseUniqueContextRoot

====Configure working directory path====
Longhorn has 2 options to configure the working directory of longhorn (sort by priority):
#system parameter "LONGHORN_WORKDIR"
#configuration file in user.home "/okapi-longhorn-configuration.xml"
If nothing is defined, the working-directory is in user.home in folder "Okapi-Longhorn-Files".
Longhorn configuration file example:

<longhorn-config>
<use-unique-working-directory>True</use-unique-working-directory>
<working-directory>D:\testData\longhorn-files</working-directory>
</longhorn-config>

====Configuration Options====

{| class="wikitable"
! option
! description
! data type
|-
| working-directory
| path of the working directory
| string
|-
| use-unique-working-directory
| if set to true the version of longhorn will be added to working directory name
e.g path/to/working/directory_M0.21
| boolean(True or False)
|}

[[Category:Longhorn]]

Knowledge Base

2014-07-28T21:18:08Z

Ctingley: /* For Developers */

__NOTOC__
For larger tutorials see the [[Tutorials|Tutorials page]].
{| border="0" cellspacing="0" cellpadding="8" width="100%"
|- valign="top"
|
==Overview==
* [[Getting Started|Installing the tools]]
* [[Filters|List of the file formats supported]]
* [[Steps|List of the functions available]]
* [[Connectors|List of the connectors to TM and MT systems]]

==Filters==
* [[Understanding Filter Configurations]]
* [[How to Create a Custom Configuration for the XML Filter]]
* [[How to Extract Text for Translation]]
* [[How to Translate XLIFF Documents]]
* [[How to Post-Process Extracted Text]]
* [[Okapi Filters Plugin for OmegaT]]
* [[How to Translate Transifex Projects with OmegaT]]
* [[How to create an XLIFF file from Excel]]

==Pipelines and Steps==
* [[How to Create a Pipeline in Rainbow]]

==Standards==
* [[Open Standards|Open Standards used in translation and localization]]
* [[SRX and Java]]
|
==Translation Resource Connectors==
* [[How to Machine-Translate a TMX File]]
* [[Match Types|List of the types of match]]
* [[Trying out the Microsoft Translator Connector]]

==Translation Memories==
* [[How to Create a Pensieve TM]]
** [[How to Create a Pensieve TM#Using Rainbow|Using Rainbow]]
** [[How to Create a Pensieve TM#Using Tikal|Using Tikal]]
* [[How to Query a Pensieve TM]]
* [[How to Create a TMX File from a Transifex Project]]

==Miscellaneous==
* [[How to Change the Java Parameters for Rainbow]]
* [[How to Add Languages to Rainbow]]
* [[How to Use CheckMate with OmegaT]]

==For Developers==
* [[Maven Basics]]
* [http://okapi.opentag.com/devguide/ Okapi Developer's Guide]
* [http://okapi.opentag.com/javadoc/ Okapi Javadoc]
* [[Okapi Java Persistence API]]
* [[Okapi Subfilters]]
* [[Creating UI with the net.sf.okapi.common.ui.abstracteditor Package]]
|}