Trying out the Microsoft Translator Connector

From Okapi Framework
Jump to: navigation, search

The Microsoft Translator Connector allows you to access Microsoft Translator system through its API.

You must have a "Client ID" and a "Client Secret" from Microsoft to use it. If you get those by obtaining a Windows Live ID, and then registering an application in your Live account. See the MSDN pages for more information.

Note that for commercial or high volume usage you must have a license with Microsoft. The API has some restrictions in its throughput for callers without license. Those restrictions may change without notice and vary based on service utilization, trying to ensure fairness. More information can be found on the Microsoft Translator forums.

Note: You need to have a release between M13 and M17 to try out the batch translation and the submission features using the Microsoft AppID authentication.
Starting at M18 the library uses the Client ID/Secret authentication.

Searching Translations

Manual Queries

Tikal provides a way to try out the connector easily.

First you need to create a configuration file that has your credentials. You can create the file with a simple text editor, it should be as follow:

To use the connector with an AppID (obsolete):

#v1
appId=yourAppID

To use the connector with a Client ID/Secret:

#v1
clientId=myPersonalClientID
secret=theSecretForThatClientID

Name the file for example config.cfg.

Now you can use the connector with Tikal. Try for instance:

tikal -q "This is a test" -sl en -tl fr -ms config.cfg

This command line uses the following parameters:

  • -q "This is a test" indicates that we want to search for a translation (i.e. do a query) and the source text to search for is "This is a test".
  • -sl en indicates that the source language is English
  • -tl fr indicates that the target language is French
  • -ms config.cfg specifies to use the Microsoft Translator Connector and to use config.cfg for the connector's configuration.

This should give you back something like:

= From Microsoft-Translator (en->fr)
  Threshold=95, Maximum hits=1
score: 95, origin: 'Microsoft-Translator'
  Source: "This is a test"
  Target: "Il s'agit d'un test."

By default the query is done with a threshold of 95. The threshold is the value under which the matches (or hits) are not retained. The default maximum number of hits displayed is 1.

You can change those options with the parameter -opt. For example:

tikal -q "This is a test" -sl en -tl fr -ms config.cfg -opt 70:5

This will set the threshold to 70 and the maximum number of hits to 5.

With the Leveraging Step

The connector is available in the Leveraging Step, so you can use it on any pipeline you need.

You can also use Tikal's Translate Files command to process directly an file supported by Okapi. For example, the following command creates an output file myFile.out.docx translated into Japanese. That is if the file is small enough to be processed withing the limitations of the API for non-licensed users.

tikal -t myFile.docx -sl en -tl ja -ms config.cfg

Both options use the GetTranslations method of the API, which works segment by segment, and may result in slower process because of this.

With the Microsoft Batch Translation Step

Microsoft Batch Translation Step (Windows 7)

The Microsoft Batch Translation Step takes advantage of the GetTranslationsArray method of the API and allows you to process your input much faster.

For example, to translate any document for which Okapi has a filter you can use the following pipeline:

= Raw Document to Filter Events Step
+ Microsoft Batch Translation Step
+ Filter Events to Raw Document Step

(See the article "How to Create a Pipeline in Rainbow" to learn about pipelines)

The step can perform several actions:

  • Annotate the text units with the matches found.
  • Copy the best translation in the target
  • Generate a TMX document

Like always, this step is restricted to the limitations of the service.

If you set the Maximum matches value to more than 1, you may get several results: The MT-generated translation as well as one or more translations added to the repository. Use the Threshold value to filter out matches below a given score.

Adding Translations

One interesting aspect of the Microsoft Translator is that anyone can contribute to the translation. This is done using Microsoft's Collaborative Translation Framework which provides the necessary API to add translations to the repository.

Note that the entries you are submitting must be single sentences. Any entry containing multiple sentences will be rejected automatically.

In Okapi you can use the feature through:

  • Tikal (to enter one translation at a time),
  • or the Microsoft Batch Submission Step (to provide a batch of aligned sentence from a TMX file or any other bi-lingual format supported by the framework).

The entries you add to the system can be access immediately. They are ranked higher than the default MT-generated entry only if they have been submitted with a rating value greater than 5 (which is the default for MT-generated results).

Warning: Be extremely cautious when using this feature as you have no way to remove a translation once it has been added to Microsoft Translator. You can only re-submit the same translations with a low rating to push it down the list of query results.

Manual Additions

Tikal lets you add translation to Microsoft Translator using the -a command:

tikal -a "This is my test" "C'est mon essai" -sl en -tl fr -ms config.cfg

This will add the French "C'est mon essai" with the English text "This is my test". You can verify this by querying it:

tikal -q "This is my test" -sl en -tl fr -ms config.cfg -opt 70:5

Should give you something like:

= From Microsoft-Translator (en->fr)
  Threshold=70, Maximum hits=5
score: 96, origin: 'Microsoft-Translator'
  Source: "This is my test"
  Target: "C'est mon essai"
score: 95, origin: 'Microsoft-Translator'
  Source: "This is my test"
  Target: "Il s'agit de mon test"

Microsoft Translator results come back with two possible values:

  • The MatchDegree is a value between 0 and 100 indicating how close the source of the result is from the source of the query.
  • The Rating is a value between -10 and 10 indicating how good or bad the translation is. The lower the value, the worst the translation. This value is not always present and its default is 5.

The Okapi connector has currently only one score to carry both information. So, for any MatchDegree above 90, we add the Rating minus 10. For example, a normal MT result will have a MatchDegree of 100 and a Rating of 5. Therefore its score is 95: 100+(5-10). An exact match rated at 6 (so better than 5) will be 96: 100+(6-10), etc. For results below 90, the Rating is not taken into account.

With the Microsoft Batch Submission Step

The Microsoft Batch Submission Step takes advantage of the AddTranslationArray method of the API and allows you to submit human or post-edited translations to Microsoft Translator's repository.

For example, to submit the segments of a TMX file for which Okapi has a filter, you can use the following pipeline:

= Raw Document to Filter Events Step
+ Microsoft Batch Submission Step

See the article "How to Create a Pipeline in Rainbow" to learn about pipelines

See the video "Importing TMX File into Microsoft Translator Engine" for a short demonstration on how to use such pipeline to feed a TMX file into Microsoft Translator.