Difference between revisions of "Trying out the Microsoft Translator Connector"

From Okapi Framework
Jump to navigation Jump to search
(API v3 update)
(Major update)
Line 1: Line 1:
 
__TOC__
 
__TOC__
==Retirement of version 2 API==
 
<span class="red">MICROSOFT CONNECTOR of the Okapi stable releases will STOP WORKING</span> at the end of April, 2019.
 
  
Microsoft will retire their version 2 API on 2019-4-30 as described in [https://docs.microsoft.com/en-us/azure/cognitive-services/translator/migrate-to-v3 this page].  
+
Warning: This page is being updated and not fully accurate. (2019-8-15)
Because of this, the Microsoft Connector found in the latest stable release, M37, will no longer work on and after 2019-5-01.
 
  
The support of the version 3 API has been added to Okapi in mid April after the M37 release. If you need to use Microsoft's machine translation service, please pick up the M38 snapshot version from [http://okapiframework.org/snapshots/ here].
+
==Overview==
Please note this is a minimal implementation and it does not support any new features such as profanity filtering,
+
The [[Microsoft Translator Connector]] is an Okapi component that connects to [https://docs.microsoft.com/en-us/azure/cognitive-services/translator/ Microsoft Translator Text Service] (referred to as '''Translator Service''' hereafter), which is part of the Microsoft Cognitive Services.
  
Because the version 3 API no longer supports the translation memory, that aspect of function is not available even if you use the latest Okapi M38 snapshot version.
+
This wiki page explains how to try out the Translator Service using the Tikal command line utility.
  
You will need an "azure key" to use the version 3 API. If you already have a key for version 2, the same key should work.
+
==Retirement of version 2 API==
For information on how to obtain an azure key, please see [https://azure.microsoft.com/en-us/pricing/details/cognitive-services/ this page].  
+
Microsoft has retired their version 2 API on 2019-4-30 as described in [https://docs.microsoft.com/en-us/azure/cognitive-services/translator/migrate-to-v3 this page].
 +
Because of this, the Microsoft Connector found in the latest stable release, M37, no longer works on and after 2019-5-01.
  
'''Information below is mostly out of date. It is kept as reference until full update of this page is done.'''
+
The support of the version 3 API has been added to Okapi in mid April after the M37 release. To use Microsoft's machine translation service, please pick up the M38 snapshot version from [http://okapiframework.org/snapshots/ here].  
<hr/>
 
The [[Microsoft Translator Connector]] allows you to access Microsoft Translator system through its API.
 
  
You must have a "Client ID" and a "Client Secret" from Microsoft to use it. If you get those by obtaining a Windows Live ID, and then registering an application in your Live account. See [http://msdn.microsoft.com/en-us/library/hh454950.aspx the MSDN pages] for more information.
+
The rest of this page assumes that you are using the M38 snapshot version built after mid April, 2019, the M38 stable release (which has not been released as of this writing in mid August, 2019), or later.
  
Note that for commercial or high volume usage you must have a license with Microsoft. The API has some restrictions in its throughput for callers without license. Those restrictions may change without notice and vary based on service utilization, trying to ensure fairness. More information can be found on the [http://social.msdn.microsoft.com/Forums/en-US/category/translation Microsoft Translator forums].
 
  
{{NoteBox|You need to have a release between M13 and M17 to try out the batch translation and the submission features using the Microsoft AppID authentication.<br/>Starting at M18 the library uses the Client ID/Secret authentication.}}
+
==Obtaining Azure Key==
 +
To use the Microsoft Translator Connector, you need an Azure Key.
 +
If you already have a key for version 2 API, the same key should work.
 +
Otherwise, please read [https://azure.microsoft.com/en-us/pricing/details/cognitive-services/ this page].
 +
Microsoft issues a key free of charge with certain limitations, which is enough to try out the connector as described in this page.
  
 
== Searching Translations ==
 
== Searching Translations ==
Line 30: Line 29:
 
[[Tikal]] provides a way to try out the connector easily.
 
[[Tikal]] provides a way to try out the connector easily.
  
First you need to create a configuration file that has your credentials. You can create the file with a simple text editor, it should be as follow:
+
First you need to create a configuration file that looks like:
 
 
To use the connector with an AppID (obsolete):  
 
  
 
  #v1
 
  #v1
  appId=yourAppID
+
  azureKey=your-azure-key
 +
baseURL=the-base-url
  
To use the connector with a Client ID/Secret:
+
using a text editor. 
 +
Here ''your-azure-key'' is the Azure Key that was obtained from Microsoft.
 +
''the-base-url'' is one of the URLs listed in [https://docs.microsoft.com/en-us/azure/cognitive-services/translator/reference/v3-0-reference#base-urls Base URLs] section in the API Reference.
  
 +
For example (warning: the Azure Key here is not valid):
 
  #v1
 
  #v1
  clientId=myPersonalClientID
+
  azureKey=4f4cfe47becf471a0123456789abcdef
  secret=theSecretForThatClientID
+
  baseURL=https://api-nam.cognitive.microsofttranslator.com
  
Name the file for example <code>config.cfg</code>.
+
We assume you have saved this file as <code>config.cfg</code>.
  
 
Now you can use the connector with Tikal. Try for instance:
 
Now you can use the connector with Tikal. Try for instance:
  
  tikal -q "This is a test" -sl en -tl fr -ms config.cfg
+
  tikal.sh -q "This is a test" -sl en -tl fr -ms config.cfg
 +
 
 +
(On a Windows system, type "tikal" instead of "./tikal.sh".)
 +
 
 +
(On a Linux/Unix/macOS system and PATH doesn't include ".", type "./tikal.sh" instead.)
  
 
This command line uses the following parameters:
 
This command line uses the following parameters:
Line 58: Line 63:
 
This should give you back something like:
 
This should give you back something like:
  
  = From Microsoft-Translator (en->fr)
+
 
   Threshold=95, Maximum hits=1
+
  = From net.sf.okapi.connectors.microsoft.MicrosoftMTConnector (en->fr)
  score: 95, origin: 'Microsoft-Translator'
+
   Threshold=-10, Maximum hits=1
 +
  Engine: 'general'
 +
  score: 95, origin: 'Microsoft-Translator' (from MT)
 
   Source: "This is a test"
 
   Source: "This is a test"
   Target: "Il s'agit d'un test."
+
   Target: "C'est un test"
 
 
By default the query is done with a threshold of 95. The threshold is the value under which the matches (or hits) are not retained. The default maximum number of hits displayed is 1.
 
 
 
You can change those options with the parameter <code>-opt</code>. For example:
 
  
tikal -q "This is a test" -sl en -tl fr -ms config.cfg -opt 70:5
 
 
This will set the threshold to 70 and the maximum number of hits to 5.
 
  
 
=== With the [[Leveraging Step]] ===
 
=== With the [[Leveraging Step]] ===
Line 76: Line 76:
 
The connector is available in the [[Leveraging Step]], so you can use it on any pipeline you need.
 
The connector is available in the [[Leveraging Step]], so you can use it on any pipeline you need.
  
You can also use Tikal's [[Tikal - Translation Commands#Translate Files|Translate Files]] command to process directly an file supported by Okapi. For example, the following command creates an output file <code>myFile.out.docx</code> translated into Japanese. That is if the file is small enough to be processed withing the limitations of the API for non-licensed users.
+
You can also use Tikal's [[Tikal - Translation Commands#Translate Files|Translate Files]] command to process directly an file supported by Okapi. For example, the following command creates an output file <code>myFile.out.docx</code> translated into Japanese. That is if the file is small enough to be processed within the limitations of your license.
  
  tikal -t myFile.docx -sl en -tl ja -ms config.cfg
+
  tikal.sh -t myFile.docx -sl en -tl ja -ms config.cfg
  
Both options use the <code>GetTranslations</code> method of the API, which works segment by segment, and may result in slower process because of this.
 
  
 
=== With the [[Microsoft Batch Translation Step]] ===
 
=== With the [[Microsoft Batch Translation Step]] ===
  
 
[[Image:MSBatchTranslation.png|thumb|600px|Microsoft Batch Translation Step (Windows&nbsp;7)]]
 
[[Image:MSBatchTranslation.png|thumb|600px|Microsoft Batch Translation Step (Windows&nbsp;7)]]
The [[Microsoft Batch Translation Step]] takes advantage of the <code>GetTranslationsArray</code> method of the API and allows you to process your input much faster.
+
The [[Microsoft Batch Translation Step]] can also be used to generate the target text using the Translator Service.
  
 
For example, to translate any document for which Okapi has a filter you can use the following pipeline:
 
For example, to translate any document for which Okapi has a filter you can use the following pipeline:
Line 93: Line 92:
 
: + [[Filter Events to Raw Document Step]]
 
: + [[Filter Events to Raw Document Step]]
  
(See the article "[[How to Create a Pipeline in Rainbow]]" to learn about pipelines)
 
 
The step can perform several actions:
 
 
* Annotate the text units with the matches found.
 
* Copy the best translation in the target
 
* Generate a [[TMX]] document
 
 
Like always, this step is restricted to the limitations of the service.
 
 
If you set the <cite>Maximum matches</cite> value to more than 1, you may get several results: The MT-generated translation as well as one or more translations added to the repository. Use the <cite>Threshold</cite> value to filter out matches below a given score.
 
 
== Adding Translations ==
 
 
One interesting aspect of the Microsoft Translator is that anyone can contribute to the translation. This is done using Microsoft's [http://blogs.msdn.com/b/translation/archive/2010/03/15/collaborative-translations-announcing-the-next-version-of-microsoft-translator-technology-v2-apis-and-widget.aspx Collaborative Translation Framework] which provides the necessary API to add translations to the repository.
 
 
Note that the entries you are submitting must be single sentences. Any entry containing multiple sentences will be rejected automatically.
 
 
In Okapi you can use the feature through:
 
* Tikal (to enter one translation at a time),
 
* or the [[Microsoft Batch Submission Step]] (to provide a batch of aligned sentence from a [[TMX]] file or any other bi-lingual format supported by the framework).
 
 
The entries you add to the system can be access immediately. They are ranked higher than the default MT-generated entry only if they have been submitted with a rating value greater than 5 (which is the default for MT-generated results).
 
 
{{WarningBox|Be '''extremely cautious''' when using this feature as '''you have no way to remove a translation once it has been added''' to Microsoft Translator. You can only re-submit the same translations with a low rating to push it down the list of query results.}}
 
 
=== Manual Additions ===
 
 
Tikal lets you add translation to Microsoft Translator using the [[Tikal - Translation Commands#Add Translation to a Resource|<code>-a</code> command]]:
 
 
tikal -a "This is my test" "C'est mon essai" -sl en -tl fr -ms config.cfg
 
 
This will add the French "C'est mon essai" with the English text "This is my test". You can verify this by querying it:
 
 
tikal -q "This is my test" -sl en -tl fr -ms config.cfg -opt 70:5
 
 
Should give you something like:
 
 
= From Microsoft-Translator (en->fr)
 
  Threshold=70, Maximum hits=5
 
score: 96, origin: 'Microsoft-Translator'
 
  Source: "This is my test"
 
  Target: "C'est mon essai"
 
score: 95, origin: 'Microsoft-Translator'
 
  Source: "This is my test"
 
  Target: "Il s'agit de mon test"
 
 
Microsoft Translator results come back with two possible values:
 
 
* The <code>MatchDegree</code> is a value between 0 and 100 indicating how close the source of the result is from the source of the query.
 
* The <code>Rating</code> is a value between -10 and 10 indicating how good or bad the translation is. The lower the value, the worst the translation. This value is not always present and its default is 5.
 
 
The Okapi connector has currently only one score to carry both information. So, for any <code>MatchDegree</code> above 90, we add the <code>Rating</code> minus 10. For example, a normal MT result will have a <code>MatchDegree</code> of 100 and a <code>Rating</code> of 5. Therefore its score is 95: 100+(5-10). An exact match rated at 6 (so better than 5) will be 96: 100+(6-10), etc. For results below 90, the <code>Rating</code> is not taken into account.
 
 
=== With the [[Microsoft Batch Submission Step]] ===
 
 
The [[Microsoft Batch Submission Step]] takes advantage of the <code>AddTranslationArray</code> method of the API and allows you to submit human or post-edited translations to Microsoft Translator's repository.
 
 
For example, to submit the segments of a TMX file for which Okapi has a filter, you can use the following pipeline:
 
 
: = [[Raw Document to Filter Events Step]]
 
: + [[Microsoft Batch Submission Step]]
 
  
See the article "[[How to Create a Pipeline in Rainbow]]" to learn about pipelines
+
The Microsoft Batch Translation Step is the preferred Step to use over the [[Leveraging Step]] because it sends many pieces (paragraphs) of text in one batch and more efficient. However, this might cause too many or too large text to be sent to the Translator Service than the service's limits. If that happens, the work around might be to use the Leveraging Step.
  
See the video "[http://youtu.be/mAjwczqfvAA Importing TMX File into Microsoft Translator Engine]" for a short demonstration on how to use such pipeline to feed a TMX file into Microsoft Translator.
+
==Obsolete Features==
 +
The following features are no longer supported because the Translator Service no longer supports the underlying features:
 +
* The Translator Service no longer has a built-in translation memory feature.
 +
* [[Microsoft Batch Submission Step]]
 +
* The threshold and the number of maximum hits that could be specified with <code>-opt</code> command line flag for Tikal or the Microsoft Batch Translation Step UI have no effect.
  
 
[[Category:Connectors]] [[Category:Tikal]]
 
[[Category:Connectors]] [[Category:Tikal]]

Revision as of 20:19, 15 August 2019

Warning: This page is being updated and not fully accurate. (2019-8-15)

Overview

The Microsoft Translator Connector is an Okapi component that connects to Microsoft Translator Text Service (referred to as Translator Service hereafter), which is part of the Microsoft Cognitive Services.

This wiki page explains how to try out the Translator Service using the Tikal command line utility.

Retirement of version 2 API

Microsoft has retired their version 2 API on 2019-4-30 as described in this page. Because of this, the Microsoft Connector found in the latest stable release, M37, no longer works on and after 2019-5-01.

The support of the version 3 API has been added to Okapi in mid April after the M37 release. To use Microsoft's machine translation service, please pick up the M38 snapshot version from here.

The rest of this page assumes that you are using the M38 snapshot version built after mid April, 2019, the M38 stable release (which has not been released as of this writing in mid August, 2019), or later.


Obtaining Azure Key

To use the Microsoft Translator Connector, you need an Azure Key. If you already have a key for version 2 API, the same key should work. Otherwise, please read this page. Microsoft issues a key free of charge with certain limitations, which is enough to try out the connector as described in this page.

Searching Translations

Manual Queries

Tikal provides a way to try out the connector easily.

First you need to create a configuration file that looks like:

#v1
azureKey=your-azure-key
baseURL=the-base-url

using a text editor. Here your-azure-key is the Azure Key that was obtained from Microsoft. the-base-url is one of the URLs listed in Base URLs section in the API Reference.

For example (warning: the Azure Key here is not valid):

#v1
azureKey=4f4cfe47becf471a0123456789abcdef
baseURL=https://api-nam.cognitive.microsofttranslator.com

We assume you have saved this file as config.cfg.

Now you can use the connector with Tikal. Try for instance:

tikal.sh -q "This is a test" -sl en -tl fr -ms config.cfg

(On a Windows system, type "tikal" instead of "./tikal.sh".)

(On a Linux/Unix/macOS system and PATH doesn't include ".", type "./tikal.sh" instead.)

This command line uses the following parameters:

  • -q "This is a test" indicates that we want to search for a translation (i.e. do a query) and the source text to search for is "This is a test".
  • -sl en indicates that the source language is English
  • -tl fr indicates that the target language is French
  • -ms config.cfg specifies to use the Microsoft Translator Connector and to use config.cfg for the connector's configuration.

This should give you back something like:


= From net.sf.okapi.connectors.microsoft.MicrosoftMTConnector (en->fr)
  Threshold=-10, Maximum hits=1
  Engine: 'general'
score: 95, origin: 'Microsoft-Translator' (from MT)
  Source: "This is a test"
  Target: "C'est un test"


With the Leveraging Step

The connector is available in the Leveraging Step, so you can use it on any pipeline you need.

You can also use Tikal's Translate Files command to process directly an file supported by Okapi. For example, the following command creates an output file myFile.out.docx translated into Japanese. That is if the file is small enough to be processed within the limitations of your license.

tikal.sh -t myFile.docx -sl en -tl ja -ms config.cfg


With the Microsoft Batch Translation Step

Microsoft Batch Translation Step (Windows 7)

The Microsoft Batch Translation Step can also be used to generate the target text using the Translator Service.

For example, to translate any document for which Okapi has a filter you can use the following pipeline:

= Raw Document to Filter Events Step
+ Microsoft Batch Translation Step
+ Filter Events to Raw Document Step


The Microsoft Batch Translation Step is the preferred Step to use over the Leveraging Step because it sends many pieces (paragraphs) of text in one batch and more efficient. However, this might cause too many or too large text to be sent to the Translator Service than the service's limits. If that happens, the work around might be to use the Leveraging Step.

Obsolete Features

The following features are no longer supported because the Translator Service no longer supports the underlying features:

  • The Translator Service no longer has a built-in translation memory feature.
  • Microsoft Batch Submission Step
  • The threshold and the number of maximum hits that could be specified with -opt command line flag for Tikal or the Microsoft Batch Translation Step UI have no effect.