Package net.sf.okapi.steps.tokenization
Class TokenizationStep
- java.lang.Object
-
- net.sf.okapi.common.pipeline.BasePipelineStep
-
- net.sf.okapi.steps.tokenization.TokenizationStep
-
- All Implemented Interfaces:
AutoCloseable
,Function<Stream<Event>,Stream<Event>>
,IPipelineStep
public class TokenizationStep extends BasePipelineStep
-
-
Constructor Summary
Constructors Constructor Description TokenizationStep()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<Token>
apostrophe(Token token, LocaleId locale)
Break French and Italian words with apostrophe into three tokens WORD, PUNCTUATION, WORDString
getDescription()
Gets a short localizable description of what this step does.String
getName()
Gets the localizable name of this step.LocaleId
getSourceLocale()
Delegate to concrete classLocaleId
getTargetLocale()
Delegate to concrete classprotected Event
handleStartDocument(Event event)
Handles theEventType.START_DOCUMENT
event.protected Event
handleTextUnit(Event event)
Handles theEventType.TEXT_UNIT
event.Collection<? extends Token>
postProcess(Token t, LocaleId language)
Various rules to make corrections toRbbiTokenizer
void
setSourceLocale(LocaleId sourceLocale)
Delegate to concrete classvoid
setTargetLocale(LocaleId targetLocale)
-
Methods inherited from class net.sf.okapi.common.pipeline.BasePipelineStep
cancel, destroy, getHelpLocation, getParameters, handleCustom, handleDocumentPart, handleEndBatch, handleEndBatchItem, handleEndDocument, handleEndGroup, handleEndSubDocument, handleEndSubfilter, handleEvent, handleMultiEvent, handlePipelineParameters, handleRawDocument, handleStartBatch, handleStartBatchItem, handleStartGroup, handleStartSubDocument, handleStartSubfilter, isDone, isLastOutputStep, setLastOutputStep, setParameters
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface net.sf.okapi.common.pipeline.IPipelineStep
apply, close, handleStream
-
-
-
-
Method Detail
-
handleStartDocument
protected Event handleStartDocument(Event event)
Description copied from class:BasePipelineStep
Handles theEventType.START_DOCUMENT
event.- Overrides:
handleStartDocument
in classBasePipelineStep
- Parameters:
event
- event to handle.- Returns:
- the event returned.
-
handleTextUnit
protected Event handleTextUnit(Event event)
Description copied from class:BasePipelineStep
Handles theEventType.TEXT_UNIT
event.- Overrides:
handleTextUnit
in classBasePipelineStep
- Parameters:
event
- event to handle.- Returns:
- the event returned.
-
getSourceLocale
public LocaleId getSourceLocale()
Description copied from interface:IPipelineStep
Delegate to concrete class- Specified by:
getSourceLocale
in interfaceIPipelineStep
- Overrides:
getSourceLocale
in classBasePipelineStep
- Returns:
- LocaleId
-
setSourceLocale
public void setSourceLocale(LocaleId sourceLocale)
Description copied from interface:IPipelineStep
Delegate to concrete class- Specified by:
setSourceLocale
in interfaceIPipelineStep
- Overrides:
setSourceLocale
in classBasePipelineStep
-
getTargetLocale
public LocaleId getTargetLocale()
Description copied from interface:IPipelineStep
Delegate to concrete class- Specified by:
getTargetLocale
in interfaceIPipelineStep
- Overrides:
getTargetLocale
in classBasePipelineStep
- Returns:
- LocaleId
-
setTargetLocale
public void setTargetLocale(LocaleId targetLocale)
- Specified by:
setTargetLocale
in interfaceIPipelineStep
- Overrides:
setTargetLocale
in classBasePipelineStep
-
postProcess
public Collection<? extends Token> postProcess(Token t, LocaleId language)
Various rules to make corrections toRbbiTokenizer
- Parameters:
t
- theToken
- Returns:
- list of correct tokens or the original token if no changes were made
-
apostrophe
public List<Token> apostrophe(Token token, LocaleId locale)
Break French and Italian words with apostrophe into three tokens WORD, PUNCTUATION, WORD- Parameters:
token
-- Returns:
- list of transformed tokens if any
-
getName
public String getName()
Description copied from interface:IPipelineStep
Gets the localizable name of this step.- Returns:
- the localizable name of this step.
-
getDescription
public String getDescription()
Description copied from interface:IPipelineStep
Gets a short localizable description of what this step does.- Returns:
- the text of a short description of what this step does.
-
-