Trying out the Microsoft Translator Connector

From Okapi Framework
Revision as of 20:19, 15 August 2019 by Kuro2 (talk | contribs) (Major update)
Jump to navigation Jump to search

Warning: This page is being updated and not fully accurate. (2019-8-15)

Overview

The Microsoft Translator Connector is an Okapi component that connects to Microsoft Translator Text Service (referred to as Translator Service hereafter), which is part of the Microsoft Cognitive Services.

This wiki page explains how to try out the Translator Service using the Tikal command line utility.

Retirement of version 2 API

Microsoft has retired their version 2 API on 2019-4-30 as described in this page. Because of this, the Microsoft Connector found in the latest stable release, M37, no longer works on and after 2019-5-01.

The support of the version 3 API has been added to Okapi in mid April after the M37 release. To use Microsoft's machine translation service, please pick up the M38 snapshot version from here.

The rest of this page assumes that you are using the M38 snapshot version built after mid April, 2019, the M38 stable release (which has not been released as of this writing in mid August, 2019), or later.


Obtaining Azure Key

To use the Microsoft Translator Connector, you need an Azure Key. If you already have a key for version 2 API, the same key should work. Otherwise, please read this page. Microsoft issues a key free of charge with certain limitations, which is enough to try out the connector as described in this page.

Searching Translations

Manual Queries

Tikal provides a way to try out the connector easily.

First you need to create a configuration file that looks like:

#v1
azureKey=your-azure-key
baseURL=the-base-url

using a text editor. Here your-azure-key is the Azure Key that was obtained from Microsoft. the-base-url is one of the URLs listed in Base URLs section in the API Reference.

For example (warning: the Azure Key here is not valid):

#v1
azureKey=4f4cfe47becf471a0123456789abcdef
baseURL=https://api-nam.cognitive.microsofttranslator.com

We assume you have saved this file as config.cfg.

Now you can use the connector with Tikal. Try for instance:

tikal.sh -q "This is a test" -sl en -tl fr -ms config.cfg

(On a Windows system, type "tikal" instead of "./tikal.sh".)

(On a Linux/Unix/macOS system and PATH doesn't include ".", type "./tikal.sh" instead.)

This command line uses the following parameters:

  • -q "This is a test" indicates that we want to search for a translation (i.e. do a query) and the source text to search for is "This is a test".
  • -sl en indicates that the source language is English
  • -tl fr indicates that the target language is French
  • -ms config.cfg specifies to use the Microsoft Translator Connector and to use config.cfg for the connector's configuration.

This should give you back something like:


= From net.sf.okapi.connectors.microsoft.MicrosoftMTConnector (en->fr)
  Threshold=-10, Maximum hits=1
  Engine: 'general'
score: 95, origin: 'Microsoft-Translator' (from MT)
  Source: "This is a test"
  Target: "C'est un test"


With the Leveraging Step

The connector is available in the Leveraging Step, so you can use it on any pipeline you need.

You can also use Tikal's Translate Files command to process directly an file supported by Okapi. For example, the following command creates an output file myFile.out.docx translated into Japanese. That is if the file is small enough to be processed within the limitations of your license.

tikal.sh -t myFile.docx -sl en -tl ja -ms config.cfg


With the Microsoft Batch Translation Step

Microsoft Batch Translation Step (Windows 7)

The Microsoft Batch Translation Step can also be used to generate the target text using the Translator Service.

For example, to translate any document for which Okapi has a filter you can use the following pipeline:

= Raw Document to Filter Events Step
+ Microsoft Batch Translation Step
+ Filter Events to Raw Document Step


The Microsoft Batch Translation Step is the preferred Step to use over the Leveraging Step because it sends many pieces (paragraphs) of text in one batch and more efficient. However, this might cause too many or too large text to be sent to the Translator Service than the service's limits. If that happens, the work around might be to use the Leveraging Step.

Obsolete Features

The following features are no longer supported because the Translator Service no longer supports the underlying features:

  • The Translator Service no longer has a built-in translation memory feature.
  • Microsoft Batch Submission Step
  • The threshold and the number of maximum hits that could be specified with -opt command line flag for Tikal or the Microsoft Batch Translation Step UI have no effect.