Class RegexPlainTextFilter
- java.lang.Object
-
- net.sf.okapi.common.filters.AbstractFilter
-
- net.sf.okapi.filters.plaintext.regex.RegexPlainTextFilter
-
- All Implemented Interfaces:
AutoCloseable
,Iterator<Event>
,IFilter
public class RegexPlainTextFilter extends AbstractFilter
PlainTextFilter
extracts lines of input text, separated by line terminators. The filter is aware of the following line terminators:- Carriage return character followed immediately by a newline character ("\r\n")
- Newline (line feed) character ("\n")
- Stand-alone carriage return character ("\r")
- Next line character (" ")
- Line separator character (" ")
- Paragraph separator character (" ").
- Version:
- 0.1, 09.06.2009
-
-
Field Summary
Fields Modifier and Type Field Description static String
FILTER_CONFIG
static String
FILTER_CONFIG_LINES
static String
FILTER_CONFIG_PARAGRAPHS
static String
FILTER_MIME
static String
FILTER_NAME
-
Fields inherited from interface net.sf.okapi.common.filters.IFilter
SUB_FILTER
-
-
Constructor Summary
Constructors Constructor Description RegexPlainTextFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cancel()
Cancels the current process.void
close()
Closes the input document.IFilterWriter
createFilterWriter()
Default case.ISkeletonWriter
createSkeletonWriter()
Default case.String
getMimeType()
Gets the input document mime type.String
getName()
Gets the name/identifier of this filter.Parameters
getParameters()
Gets the current parameters for this filter.Parameters
getRegexParameters()
Provides access to the internal line extractor'sParameters
object.boolean
hasNext()
Indicates if there is an event to process.Event
next()
Gets the next event available.void
open(RawDocument input)
Opens the input document described in a give RawDocument object.void
open(RawDocument input, boolean generateSkeleton)
Opens the input document described in a give RawDocument object, and optionally creates skeleton information.void
setParameters(IParameters params)
Sets new parameters for this filter.void
setRule(String rule, int sourceGroup, int regexOptions)
Configures an internal line extractor.-
Methods inherited from class net.sf.okapi.common.filters.AbstractFilter
addConfiguration, addConfiguration, addConfiguration, addConfigurations, createEndFilterEvent, createStartFilterEvent, findConfiguration, getConfiguration, getConfigurations, getDisplayName, getDocumentId, getDocumentName, getEncoderManager, getEncoding, getFilterConfigurationMapper, getNewlineType, getParameters, getParametersClassName, getParentId, getSrcLoc, getTrgLoc, isCanceled, isGenerateSkeleton, isMultilingual, isUtf8Bom, isUtf8Encoding, removeConfiguration, setDisplayName, setDocumentName, setEncoding, setFilterConfigurationMapper, setGenerateSkeleton, setMimeType, setMultilingual, setName, setNewlineType, setOptions, setParentId, setSrcLoc, setTrgLoc
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.util.Iterator
forEachRemaining, remove
-
-
-
-
Field Detail
-
FILTER_NAME
public static final String FILTER_NAME
- See Also:
- Constant Field Values
-
FILTER_MIME
public static final String FILTER_MIME
- See Also:
- Constant Field Values
-
FILTER_CONFIG
public static final String FILTER_CONFIG
- See Also:
- Constant Field Values
-
FILTER_CONFIG_LINES
public static final String FILTER_CONFIG_LINES
- See Also:
- Constant Field Values
-
FILTER_CONFIG_PARAGRAPHS
public static final String FILTER_CONFIG_PARAGRAPHS
- See Also:
- Constant Field Values
-
-
Method Detail
-
setRule
public void setRule(String rule, int sourceGroup, int regexOptions)
Configures an internal line extractor. If you want to set a custom rule, call this method with a modified rule.- Parameters:
rule
- - Java regex rule used to extract lines of text. Default: "^(.*?)$".sourceGroup
- - regex capturing group denoting text to be extracted. Default: 1.regexOptions
- - Java regex options. Default: Pattern.MULTILINE.
-
getRegexParameters
public Parameters getRegexParameters()
Provides access to the internal line extractor'sParameters
object.- Returns:
Parameters
object; with this object you can access the line extraction rule, source group, regex options, etc.
-
cancel
public void cancel()
Description copied from interface:IFilter
Cancels the current process.- Specified by:
cancel
in interfaceIFilter
- Overrides:
cancel
in classAbstractFilter
-
close
public void close()
Description copied from interface:IFilter
Closes the input document. Developers should call this method from within their code before sending the last event: This can allow writer objects to overwrite the input file when they receive the last event. This method must also be safe to call even if the input document is not opened.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceIFilter
- Overrides:
close
in classAbstractFilter
-
createFilterWriter
public IFilterWriter createFilterWriter()
Description copied from class:AbstractFilter
Default case. Override if needed.- Specified by:
createFilterWriter
in interfaceIFilter
- Overrides:
createFilterWriter
in classAbstractFilter
- Returns:
- new instance of
GenericFilterWriter
-
createSkeletonWriter
public ISkeletonWriter createSkeletonWriter()
Description copied from class:AbstractFilter
Default case. Override if needed.- Specified by:
createSkeletonWriter
in interfaceIFilter
- Overrides:
createSkeletonWriter
in classAbstractFilter
- Returns:
- new instance of
GenericSkeletonWriter
-
getMimeType
public String getMimeType()
Description copied from class:AbstractFilter
Gets the input document mime type.- Specified by:
getMimeType
in interfaceIFilter
- Overrides:
getMimeType
in classAbstractFilter
- Returns:
- the mime type
-
getName
public String getName()
Description copied from interface:IFilter
Gets the name/identifier of this filter.- Specified by:
getName
in interfaceIFilter
- Overrides:
getName
in classAbstractFilter
- Returns:
- The name/identifier of the filter.
-
getParameters
public Parameters getParameters()
Description copied from interface:IFilter
Gets the current parameters for this filter.- Specified by:
getParameters
in interfaceIFilter
- Overrides:
getParameters
in classAbstractFilter
- Returns:
- The current parameters for this filter, or
DefaultParameters
if this filter has no parameters.
-
hasNext
public boolean hasNext()
Description copied from interface:IFilter
Indicates if there is an event to process.Implementer Note: The caller must be able to call this method several times without changing state.
- Returns:
- True if there is at least one event to process, false if not.
-
next
public Event next()
Description copied from interface:IFilter
Gets the next event available. Calling this method can be done only once on each event.- Returns:
- The next event available or null if there are no events.
-
open
public void open(RawDocument input)
Description copied from interface:IFilter
Opens the input document described in a give RawDocument object. Skeleton information is always created when you use this method.- Parameters:
input
- The RawDocument object to use to open the document.
-
open
public void open(RawDocument input, boolean generateSkeleton)
Description copied from interface:IFilter
Opens the input document described in a give RawDocument object, and optionally creates skeleton information.- Specified by:
open
in interfaceIFilter
- Overrides:
open
in classAbstractFilter
- Parameters:
input
- The RawDocument object to use to open the document.generateSkeleton
- true to generate the skeleton data, false otherwise.
-
setParameters
public void setParameters(IParameters params)
Description copied from interface:IFilter
Sets new parameters for this filter.- Specified by:
setParameters
in interfaceIFilter
- Overrides:
setParameters
in classAbstractFilter
- Parameters:
params
- The new parameters to use.
-
-