GIZA++ Installation and Running Tutorial

From Okapi Framework
Jump to: navigation, search

Overview

This tutorial will guide through the steps of the installation and running of GIZA++. This refers to a personalized installation that requires a few additional steps in comparison with the standard installation method. The tutorial will also teach you how to run it so that you may produce the Alignment probability file for the QuEst SVM Model Builder Step and QuEst Quality Estimation Step.


Installation

Note: This tutorial have been tested only on 64 bits operational systems.

If you use a Linux or Mac system, just follow the steps below. Windows users will have to download and install Cygwin, which emulates a Unix environment on Windows and facilitates the installation of GIZA++ greatly. Make sure to select all g++ and gcc related packages while installing Cygwin, so that it compiles GIZA++ properly. If Cygwin accuses of a missing application during the installation of GIZA++, re-install Cygwin and select the missing packages.


Step 1: Download GIZA++. This tutorial was tested with GIZA++ 1.0.7.

Step 2: Untar the package in the folder you wish to install GIZA++.

Step 3: In the Makefile located at .\giza-pp\GIZA++-v2\, substitute the line:

CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DBINARY_SEARCH_FOR_TTABLE -DWORDINDEX_WITH_4_BYTE

with the line:

CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DWORDINDEX_WITH_4_BYTE

Step 4: Either using your Linux/Mac terminal or in your Cygwin terminal, navigate to the folder .\giza-pp.

Step 5: Run the command make.


If you followed all the steps correctly, GIZA++ should be compiled without errors.


Running

Once you have installed GIZA++ using the tutorial above, you may then produce the Alignment probability file for the QuEst SVM Model Builder Step and QuEst Quality Estimation Step by following the steps below:

Step 1: Open your Linux/Mac/Cygwin terminal and navigate to the folder .\giza-pp\GIZA++-v2\ of your GIZA++ installation.

Step 2: Run the command:

./plain2snt.out [source_language_corpus] [target_language_corpus]

Which will generate the files:

[source_language_corpus].vcb
[target_language_corpus].vcb
[source_language_corpus]_[target_language_corpus].snt
[target_language_corpus]_[source_language_corpus].snt

Step 3: Navigate to the folder .\mkcls-v2\ of your GIZA++ installation.

Step 4: Run the following commands:

./mkcls -p[source_language_corpus] -V[source_language_corpus].vcb.classes
./mkcls -p[target_language_corpus] -V[target_language_corpus].vcb.classes

Which will produce the files:

[source_language_corpus].vcb.classes
[source_language_corpus].vcb.classes.cats
[target_language_corpus].vcb.classes
[target_language_corpus].vcb.classes.cats

Step 5: Navigate to the folder .\GIZA++-v2\ of your GIZA++ installation.

Step 6: Run the following command:

./GIZA++ -S [target_language_corpus].vcb -T [source_language_corpus].vcb -C [target_language_corpus]_[source_language_corpus].snt -o [prefix] -outputpath [output_folder]


If you followed the steps correctly, you should find a file named:

[prefix].actual.ti.final

located at [output_folder].

Note: This file is the one you must use as the Alignment probability file for the QuEst SVM Model Builder Step and the QuEst Quality Estimation Step.

If you do not find this file, please try re-running GIZA++ following the steps more carefully, and if the problem persists, try consulting online forums or the developers of GIZA++.