Scoping Report Step

From Okapi Framework
Jump to navigation Jump to search

Overview

This step creates a template-based report on various counts (word count, character count, etc.) and optionally leveraged data.

Takes: Filter events. Sends: Filter events.

In order to have leveraging statistics with this step, your pipeline needs to include, prior this step, one or more steps that leverage translations, such as the Leveraging Step. Some filters, such as the XLIFF Filter may also generate resources with leveraged data. For just generating word- or character-count annotations, without report, use the Word Count Step or Character Count Step.

For a list of the types of matches possible in the counts, see the "Match Types" page.

Parameters

Project name — Enter the name that is placed in the title of the report.

Custom template — Enter URI or the full path of the custom template to be used to generate the report. If the custom template filed is left empty, or if the specified URI is not found, the default template is used.

Output path — Enter the full path of the report file to generate. You can use the ${rootDir} variable, as well as any of the source or target locale variables (${srcLoc}, ${trgloc}, etc).

Templates

Templates are used by the Scoping Report Step to generate reports looking exactly the way you would like them to. Currently plain text and HTML formats are supported in templates. The Scoping Report Step includes a default HTML report, that displays general information about the project and its items. You can specify your own custom report with the step parameter Custom template.

Templates contain text and report fields. Report fields are enclosed in brackets. Table rows are enclosed in brackets around a row of column fields. A template can look like this:

Project Name: [PROJECT_NAME]
Creation Date: [PROJECT_DATE]
Target Locale: [PROJECT_TARGET_LOCALE]

File,Exact Previous Version Matches,Exact Local Context Matches,100% Matches,Fuzzy Matches,Repetitions,Total,
[[ITEM_NAME],[ITEM_EXACT_PREVIOUS_VERSION],[ITEM_EXACT_LOCAL_CONTEXT],[ITEM_EXACT],[ITEM_FUZZY],[ITEM_GMX_REPETITION_MATCHED_WORD_COUNT],[ITEM_TOTAL_WORD_COUNT],]
Total,[PROJECT_EXACT_PREVIOUS_VERSION],[PROJECT_EXACT_LOCAL_CONTEXT],[PROJECT_EXACT],[PROJECT_FUZZY],[PROJECT_GMX_REPETITION_MATCHED_WORD_COUNT],[PROJECT_TOTAL_WORD_COUNT]

This template will produce something similar to this:

Project Name: Community website
Creation Date: 17.03.2011 23:21:23 CET
Target Locale: fr-ca

File,Exact Previous Version Matches,Exact Local Context Matches,100% Matches,Fuzzy Matches,Repetitions,Total,
D:\SVN\OKAPI\steps\scopingreport\target\test-classes\net\sf\okapi\steps\scopingreport\aa324.html,10,23,12,57,132,23,
D:\SVN\OKAPI\steps\scopingreport\target\test-classes\net\sf\okapi\steps\scopingreport\form.html,31,22,13,17,19,17,
D:\SVN\OKAPI\steps\scopingreport\target\test-classes\net\sf\okapi\steps\scopingreport\W3CHTMHLTest1.html,10,23,12,57,12,54,
Total,210,323,512,357,312,154

Report fields

Templates should contain placeholders for calculable report data. Those placeholders are called report fields and are filled up automatically by the Scoping Report Step.

Please note, that calculation of most of the fields' values is performed by separate steps, e.g. Word Count Step, Character Count Step, or Leveraging Step. The Scoping Report Step generally speaking is a presentation layer, displaying information provided by other steps. So if you forget to include a required step in your pipeline, you will see zeros in the generated report.

Report fields can contain word or character counts for the entire project or an individual item in the project. Report fields for those count types are respectively prefixed with REPORT_ and ITEM_ respectively.

The tables below show how report fields are related to count categories, and list example steps that provide information for related word or character counts.

General project fields

Report field Example of provider Description
PROJECT_NAME Name of the project as set in the step parameters.
PROJECT_DATE Date and time when the report was generated.
PROJECT_SOURCE_LOCALE Source locale, obtained automatically.
PROJECT_TARGET_LOCALE Target locale, obtained automatically.
PROJECT_TOTAL_WORD_COUNT Word Count Step Total number of words, both translatable and non-translatable, in all items of the project.
PROJECT_TOTAL_CHARACTER_COUNT Character Count Step Total number of characters, excluding whitespace and punctuation, both translatable, and non-translatable in all items of the project.
PROJECT_WHITESPACE_CHARACTER_COUNT Character Count Step Total number of whitespace characters, both translatable and non-translatable, in all items of the project.
PROJECT_PUNCTUATION_CHARACTER_COUNT Character Count Step Total number of punctuation characters, both translatable and non-translatable, in all items of the project.
PROJECT_OVERALL_CHARACTER_COUNT Character Count Step Total number of characters, including whitespace and punctuation, both translatable and non-translatable, in all items of the project.


General item fields

Report field Example of provider Description
ITEM_NAME Name of the item (full file name).
ITEM_SOURCE_LOCALE Source locale, obtained automatically.
ITEM_TARGET_LOCALE Target locale, obtained automatically.
ITEM_TOTAL_WORD_COUNT Word Count Step Total number of words, both translatable and non-translatable, in the current item.
ITEM_TOTAL_CHARACTER_COUNT Character Count Step Total number of characters, excluding whitespace and punctuation, both translatable and non-translatable, in the current item.
ITEM_WHITESPACE_CHARACTER_COUNT Character Count Step Total number of whitespace characters, both translatable and non-translatable, in the current item.
ITEM_PUNCTUATION_CHARACTER_COUNT Character Count Step Total number of punctuation characters, both translatable and non-translatable, in the current item.
ITEM_OVERALL_CHARACTER_COUNT Character Count Step Total number of characters, including whitespace and punctuation, both translatable and non-translatable, in the current item.


Project fields for Okapi count categories

Report field Example of provider Okapi word count category Description
PROJECT_EXACT_UNIQUE_ID Leveraging Step EXACT_UNIQUE_ID Matches EXACT and matches a unique id.
PROJECT_EXACT_PREVIOUS_VERSION Leveraging Step EXACT_PREVIOUS_VERSION Matches EXACT and comes from the preceding version of the same document (i.e., if v4 is leveraged this match must come from v3, not v2 or v1!!).
PROJECT_EXACT_LOCAL_CONTEXT Leveraging Step EXACT_LOCAL_CONTEXT Matches EXACT and a small number of segments before and/or after.
PROJECT_EXACT_DOCUMENT_CONTEXT Repetition Analysis Step EXACT_DOCUMENT_CONTEXT Matches EXACT and comes from the same document.
PROJECT_EXACT_STRUCTURAL Leveraging Step EXACT_STRUCTURAL Matches EXACT and the structural type of the segment (title, paragraph, list element etc.)
PROJECT_EXACT Leveraging Step EXACT Matches text and codes exactly.
PROJECT_EXACT_TEXT_ONLY_UNIQUE_ID Leveraging Step EXACT_TEXT_ONLY_UNIQUE_ID Matches EXACT_TEXT_ONLY and matches a unique id.
PROJECT_EXACT_TEXT_ONLY_PREVIOUS_VERSION Leveraging Step EXACT_TEXT_ONLY_PREVIOUS_VERSION Matches EXACT_TEXT_ONLY and comes from a previous version of the same document.
PROJECT_EXACT_TEXT_ONLY Leveraging Step EXACT_TEXT_ONLY Matches text exactly, but there is a difference in one or more codes.
PROJECT_EXACT_REPAIRED Leveraging Step EXACT_REPAIRED Matches text and codes exactly, but only after the result of some automated repair (e.g. number replacement, code repair, capitalization, punctuation etc.)
PROJECT_FUZZY_UNIQUE_ID Leveraging Step FUZZY_UNIQUE_ID Matches FUZZY and matches a unique id.
PROJECT_FUZZY_PREVIOUS_VERSION Leveraging Step FUZZY_PREVIOUS_VERSION Matches FUZZY and comes from a previous version of the same document.
PROJECT_FUZZY Leveraging Step FUZZY Matches both text and/or codes partially.
PROJECT_FUZZY_REPAIRED Leveraging Step FUZZY_REPAIRED Matches both text and/or codes partially and some automated repair (e.g. number replacement, code repair, capitalization, punctuation etc..) was applied to the target.
PROJECT_PHRASE_ASSEMBLED - PHRASE_ASSEMBLED Matches assembled from phrases in the TM or other resources (different algorithms could be used).
PROJECT_MT Leveraging Step MT Indicates a translation coming from an MT engine.
PROJECT_CONCORDANCE - CONCORDANCE TM concordance or phrase match (usually a word or term only)
PROJECT_NOCATEGORY n/a Does not match any of the Okapi word count categories. This field is calculated by subtracting the sum of all words in all categories above from the total word count.
PROJECT_NONTRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match either of non-translatable Okapi word count categories.
PROJECT_TRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match neither of non-translatable Okapi word count categories, and thus need translation.

Character count categories are also available; replace WORD with CHARACTER or add the suffix _CHARACTER to the fields above to yield the character equivalent. Character counts exclude whitespace and punctuation characters.


Item fields for Okapi count categories

Report field Example of provider Okapi word count category Description
ITEM_EXACT_UNIQUE_ID Leveraging Step EXACT_UNIQUE_ID Matches EXACT and matches a unique id.
ITEM_EXACT_PREVIOUS_VERSION Leveraging Step EXACT_PREVIOUS_VERSION Matches EXACT and comes from the preceding version of the same document (i.e., if v4 is leveraged this match must come from v3, not v2 or v1!!).
ITEM_EXACT_LOCAL_CONTEXT Leveraging Step EXACT_LOCAL_CONTEXT Matches EXACT and a small number of segments before and/or after.
ITEM_EXACT_DOCUMENT_CONTEXT Repetition Analysis Step EXACT_DOCUMENT_CONTEXT Matches EXACT and comes from the same document.
ITEM_EXACT_STRUCTURAL Leveraging Step EXACT_STRUCTURAL Matches EXACT and the structural type of the segment (title, paragraph, list element etc.)
ITEM_EXACT Leveraging Step EXACT Matches text and codes exactly.
ITEM_EXACT_TEXT_ONLY_UNIQUE_ID Leveraging Step EXACT_TEXT_ONLY_UNIQUE_ID Matches EXACT_TEXT_ONLY and matches a unique id.
ITEM_EXACT_TEXT_ONLY_PREVIOUS_VERSION Leveraging Step EXACT_TEXT_ONLY_PREVIOUS_VERSION Matches EXACT_TEXT_ONLY and comes from a previous version of the same document.
ITEM_EXACT_TEXT_ONLY Leveraging Step EXACT_TEXT_ONLY Matches text exactly, but there is a difference in one or more codes.
ITEM_EXACT_REPAIRED Leveraging Step EXACT_REPAIRED Matches text and codes exactly, but only after the result of some automated repair (e.g. number replacement, code repair, capitalization, punctuation etc.)
ITEM_FUZZY_UNIQUE_ID Leveraging Step FUZZY_UNIQUE_ID Matches FUZZY and matches a unique id.
ITEM_FUZZY_PREVIOUS_VERSION Leveraging Step FUZZY_PREVIOUS_VERSION Matches FUZZY and comes from a previous version of the same document.
ITEM_FUZZY Leveraging Step FUZZY Matches both text and/or codes partially.
ITEM_FUZZY_REPAIRED Leveraging Step FUZZY_REPAIRED Matches both text and/or codes partially and some automated repair (e.g. number replacement, code repair, capitalization, punctuation etc..) was applied to the target.
ITEM_PHRASE_ASSEMBLED - PHRASE_ASSEMBLED Matches assembled from phrases in the TM or other resources (different algorithms could be used).
ITEM_MT Leveraging Step MT Indicates a translation coming from an MT engine.
ITEM_CONCORDANCE - CONCORDANCE TM concordance or phrase match (usually a word or term only)
ITEM_NOCATEGORY n/a Does not match any of the Okapi word count categories. This field is calculated by subtracting the sum of all words in all categories above from the total word count.
ITEM_NONTRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match either of non-translatable Okapi word count categories.
ITEM_TRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match neither of non-translatable Okapi word count categories, and thus need translation.

Character count categories are also available; replace WORD with CHARACTER or add the suffix _CHARACTER to the fields above to yield the character equivalent. Character counts exclude whitespace and punctuation characters.

Project fields for GMX count categories

Report field Example of provider GMX word count category Description
PROJECT_GMX_PROTECTED_WORD_COUNT ProtectedWordCount An accumulation of the word count for text that has been marked as 'protected', or otherwise not translatable (XLIFF text enclosed in <mrk mtype="protected"> elements).
PROJECT_GMX_EXACT_MATCHED_WORD_COUNT Leveraging Step ExactMatchedWordCount An accumulation of the word count for text units that have been matched unambiguously with a prior translation and thus require no translator input.
PROJECT_GMX_LEVERAGED_MATCHED_WORD_COUNT Leveraging Step LeveragedMatchedWordCount An accumulation of the word count for text units that have been matched against a leveraged translation memory database.
PROJECT_GMX_REPETITION_MATCHED_WORD_COUNT Repetition Analysis Step RepetitionMatchedWordCount An accumulation of the word count for repeating text units that have not been matched in any other form. Repetition matching is deemed to take precedence over fuzzy matching.
PROJECT_GMX_FUZZY_MATCHED_WORD_COUNT Leveraging Step FuzzyMatchedWordCount An accumulation of the word count for text units that have been fuzzy matched against a leveraged translation memory database.
PROJECT_GMX_ALPHANUMERIC_ONLY_TEXT_UNIT_WORD_COUNT AlphanumericOnlyTextUnitWordCount An accumulation of the word count for text units that have been identified as containing only alphanumeric words.
PROJECT_GMX_NUMERIC_ONLY_TEXT_UNIT_WORD_COUNT NumericOnlyTextUnitWordCount An accumulation of the word count for text units that have been identified as containing only numeric words.
PROJECT_GMX_MEASUREMENT_ONLY_TEXT_UNIT_WORD_COUNT MeasurementOnlyTextUnitWordCount An accumulation of the word count from measurement-only text units.
PROJECT_GMX_NOCATEGORY n/a Does not match any of the GMX word count categories. This field is calculated by subtracting the sum of all words in all categories above from the total word count.
PROJECT_GMX_NONTRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match either of non-translatable GMX word count categories.
PROJECT_GMX_TRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match neither of non-translatable GMX word count categories, and thus need translation.

Character count categories are also available; replace WORD with CHARACTER or add the suffix _CHARACTER to the fields above to yield the character equivalent. Character counts exclude whitespace and punctuation characters.

Item fields for GMX count categories

Report field Example of provider GMX word count category Description
ITEM_GMX_PROTECTED_WORD_COUNT ProtectedWordCount An accumulation of the word count for text that has been marked as 'protected', or otherwise not translatable (XLIFF text enclosed in <mrk mtype="protected"> elements).
ITEM_GMX_EXACT_MATCHED_WORD_COUNT Leveraging Step ExactMatchedWordCount An accumulation of the word count for text units that have been matched unambiguously with a prior translation and thus require no translator input.
ITEM_GMX_LEVERAGED_MATCHED_WORD_COUNT Leveraging Step LeveragedMatchedWordCount An accumulation of the word count for text units that have been matched against a leveraged translation memory database.
ITEM_GMX_REPETITION_MATCHED_WORD_COUNT Repetition Analysis Step RepetitionMatchedWordCount An accumulation of the word count for repeating text units that have not been matched in any other form. Repetition matching is deemed to take precedence over fuzzy matching.
ITEM_GMX_FUZZY_MATCHED_WORD_COUNT Leveraging Step FuzzyMatchedWordCount An accumulation of the word count for text units that have been fuzzy matched against a leveraged translation memory database.
ITEM_GMX_ALPHANUMERIC_ONLY_TEXT_UNIT_WORD_COUNT AlphanumericOnlyTextUnitWordCount An accumulation of the word count for text units that have been identified as containing only alphanumeric words.
ITEM_GMX_NUMERIC_ONLY_TEXT_UNIT_WORD_COUNT NumericOnlyTextUnitWordCount An accumulation of the word count for text units that have been identified as containing only numeric words.
ITEM_GMX_MEASUREMENT_ONLY_TEXT_UNIT_WORD_COUNT MeasurementOnlyTextUnitWordCount An accumulation of the word count from measurement-only text units.
ITEM_GMX_NOCATEGORY n/a Does not match any of the GMX word count categories. This field is calculated by subtracting the sum of all words in all categories above from the total word count.
ITEM_GMX_NONTRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match either of non-translatable GMX word count categories.
ITEM_GMX_TRANSLATABLE_WORD_COUNT Word Count Step n/a Number of words that match neither of non-translatable GMX word count categories, and thus need translation.

Character count categories are also available; replace WORD with CHARACTER or add the suffix _CHARACTER to the fields above to yield the character equivalent. Character counts exclude whitespace and punctuation characters.

Limitations

None known.