Trying out the Microsoft Translator Connector
Retirement of version 2 API
MICROSOFT CONNECTOR of the Okapi stable releases will STOP WORKING at the end of April, 2019.
Microsoft will retire their version 2 API on 2019-4-30 as described in this page. Because of this, the Microsoft Connector found in the latest stable release, M37, will no longer work on and after 2019-5-01.
The support of the version 3 API has been added to Okapi in mid April after the M37 release. If you need to use Microsoft's machine translation service, please pick up the M38 snapshot version from here. Please note this is a minimal implementation and it does not support any new features such as profanity filtering,
Because the version 3 API no longer supports the translation memory, that aspect of function is not available even if you use the latest Okapi M38 snapshot version.
You will need an "azure key" to use the version 3 API. If you already have a key for version 2, the same key should work. For information on how to obtain an azure key, please see this page.
Information below is mostly out of date. It is kept as reference until full update of this page is done.
The Microsoft Translator Connector allows you to access Microsoft Translator system through its API.
You must have a "Client ID" and a "Client Secret" from Microsoft to use it. If you get those by obtaining a Windows Live ID, and then registering an application in your Live account. See the MSDN pages for more information.
Note that for commercial or high volume usage you must have a license with Microsoft. The API has some restrictions in its throughput for callers without license. Those restrictions may change without notice and vary based on service utilization, trying to ensure fairness. More information can be found on the Microsoft Translator forums.
Tikal provides a way to try out the connector easily.
First you need to create a configuration file that has your credentials. You can create the file with a simple text editor, it should be as follow:
To use the connector with an AppID (obsolete):
To use the connector with a Client ID/Secret:
#v1 clientId=myPersonalClientID secret=theSecretForThatClientID
Name the file for example
Now you can use the connector with Tikal. Try for instance:
tikal -q "This is a test" -sl en -tl fr -ms config.cfg
This command line uses the following parameters:
-q "This is a test"indicates that we want to search for a translation (i.e. do a query) and the source text to search for is "
This is a test".
-sl enindicates that the source language is English
-tl frindicates that the target language is French
-ms config.cfgspecifies to use the Microsoft Translator Connector and to use
config.cfgfor the connector's configuration.
This should give you back something like:
= From Microsoft-Translator (en->fr) Threshold=95, Maximum hits=1 score: 95, origin: 'Microsoft-Translator' Source: "This is a test" Target: "Il s'agit d'un test."
By default the query is done with a threshold of 95. The threshold is the value under which the matches (or hits) are not retained. The default maximum number of hits displayed is 1.
You can change those options with the parameter
-opt. For example:
tikal -q "This is a test" -sl en -tl fr -ms config.cfg -opt 70:5
This will set the threshold to 70 and the maximum number of hits to 5.
With the Leveraging Step
The connector is available in the Leveraging Step, so you can use it on any pipeline you need.
You can also use Tikal's Translate Files command to process directly an file supported by Okapi. For example, the following command creates an output file
myFile.out.docx translated into Japanese. That is if the file is small enough to be processed withing the limitations of the API for non-licensed users.
tikal -t myFile.docx -sl en -tl ja -ms config.cfg
Both options use the
GetTranslations method of the API, which works segment by segment, and may result in slower process because of this.
With the Microsoft Batch Translation Step
The Microsoft Batch Translation Step takes advantage of the
GetTranslationsArray method of the API and allows you to process your input much faster.
For example, to translate any document for which Okapi has a filter you can use the following pipeline:
- = Raw Document to Filter Events Step
- + Microsoft Batch Translation Step
- + Filter Events to Raw Document Step
(See the article "How to Create a Pipeline in Rainbow" to learn about pipelines)
The step can perform several actions:
- Annotate the text units with the matches found.
- Copy the best translation in the target
- Generate a TMX document
Like always, this step is restricted to the limitations of the service.
If you set the Maximum matches value to more than 1, you may get several results: The MT-generated translation as well as one or more translations added to the repository. Use the Threshold value to filter out matches below a given score.
One interesting aspect of the Microsoft Translator is that anyone can contribute to the translation. This is done using Microsoft's Collaborative Translation Framework which provides the necessary API to add translations to the repository.
Note that the entries you are submitting must be single sentences. Any entry containing multiple sentences will be rejected automatically.
In Okapi you can use the feature through:
- Tikal (to enter one translation at a time),
- or the Microsoft Batch Submission Step (to provide a batch of aligned sentence from a TMX file or any other bi-lingual format supported by the framework).
The entries you add to the system can be access immediately. They are ranked higher than the default MT-generated entry only if they have been submitted with a rating value greater than 5 (which is the default for MT-generated results).
Tikal lets you add translation to Microsoft Translator using the
tikal -a "This is my test" "C'est mon essai" -sl en -tl fr -ms config.cfg
This will add the French "C'est mon essai" with the English text "This is my test". You can verify this by querying it:
tikal -q "This is my test" -sl en -tl fr -ms config.cfg -opt 70:5
Should give you something like:
= From Microsoft-Translator (en->fr) Threshold=70, Maximum hits=5 score: 96, origin: 'Microsoft-Translator' Source: "This is my test" Target: "C'est mon essai" score: 95, origin: 'Microsoft-Translator' Source: "This is my test" Target: "Il s'agit de mon test"
Microsoft Translator results come back with two possible values:
MatchDegreeis a value between 0 and 100 indicating how close the source of the result is from the source of the query.
Ratingis a value between -10 and 10 indicating how good or bad the translation is. The lower the value, the worst the translation. This value is not always present and its default is 5.
The Okapi connector has currently only one score to carry both information. So, for any
MatchDegree above 90, we add the
Rating minus 10. For example, a normal MT result will have a
MatchDegree of 100 and a
Rating of 5. Therefore its score is 95: 100+(5-10). An exact match rated at 6 (so better than 5) will be 96: 100+(6-10), etc. For results below 90, the
Rating is not taken into account.
With the Microsoft Batch Submission Step
The Microsoft Batch Submission Step takes advantage of the
AddTranslationArray method of the API and allows you to submit human or post-edited translations to Microsoft Translator's repository.
For example, to submit the segments of a TMX file for which Okapi has a filter, you can use the following pipeline:
See the article "How to Create a Pipeline in Rainbow" to learn about pipelines
See the video "Importing TMX File into Microsoft Translator Engine" for a short demonstration on how to use such pipeline to feed a TMX file into Microsoft Translator.