Package net.sf.okapi.common.encoder
Interface IEncoder
-
- All Known Implementing Classes:
BypassEncoder,CDATAEncoder,CsvEncoder,DefaultEncoder,DTDEncoder,EncoderManager,HtmlEncoder,IcuMessageEncoder,JSONEncoder,MarkdownEncoder,MIFEncoder,MosesTextEncoder,OpenXMLEncoder,PHPContentEncoder,POEncoder,PropertiesEncoder,RegexEncoder,TEXEncoder,TSEncoder,XMLEncoder,YamlEncoder
public interface IEncoder extends Function<ITextUnit,ITextUnit>
Provides common methods to encode/escape text to a specific format.Important: Each class implementing this interface must have a Nullary Constructor, so the object can be instantiated using the Class.fromName() methods by the EncoderManager.
- Filters (and subfilters) decode any special sequences for their format. The goal is 100% Unicode inside Okapi. This includes normalizing newlines to \n.
- The exception to #1 is Skeleton and Code.data. This content should remain unaltered to the extent possible. For example, XML processors will decode everything and this is out of our control.
- For non-problematic formats that use
GenericFilterWriter/GenericSkeleton, only an IEncoder implementation is needed. Encoding is handled automatically in this case based onMimeTypeMapper. - IEncoder implementation should reside in the encoders package in core.
- The IEncoder implementation should take into account
EncoderContext. Normally the encoder shouldn't be run on SKELETON or INLINE content - or only run with a small subset of cases as compared to TEXT (the goal is to keep SKELETON/INLINE as close to the original as possible). - Special
IParameterscan be passed to the IEncoder if more configuration is needed. QuoteModeis also provided to help guide logic around double and single quotes. Some encoders take more parameters in their constructor (XMLEncoder)- For "problematic" formats an
IFilterWritershould be implemented. This will give the full context of theTextUnitand surroundingEvents. Encoding can be applied with more nuance. However an IEncoder can still be implemented for default cases. - ALL encoder logic should reside within the
IFilterWriterand/or IEncoder implementations. Not handled in ad hoc ways.
-
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default ITextUnitapply(ITextUnit tu)Stringencode(char value, EncoderContext context)Encodes a given character with this encoding.Stringencode(int codePoint, EncoderContext context)Encodes a given code-point with this encoding.Stringencode(String text, EncoderContext context)Encodes a given text with this encoder.default CharsetEncodergetCharsetEncoder()Gets the character set encoder used for this encoder.default StringgetEncoding()Gets the name of the charset encoding to use.default StringgetLineBreak()Gets the line-break to use for this encoder.IParametersgetParameters()Gets the parameters object with all the configuration information specific to this encoder.voidreset()Reset state in this encoder in preparation for processing new content.static <T> TrunEncoders(List<Function<T,T>> functions, T t)Given a collection ofIPipelineStepexecute them in sequence given a Streaminput. voidsetOptions(IParameters params, String encoding, String lineBreak)Sets the options for this encoder.default StringtoNative(String propertyName, String value)Converts any property values from its standard representation to the native representation for this encoder.
-
-
-
Method Detail
-
runEncoders
static <T> T runEncoders(List<Function<T,T>> functions, T t)
Given a collection ofIPipelineStepexecute them in sequence given a Streaminput. functions take the output of the previous step as the input of the current step etc. Similar to running a pipeline. - Parameters:
functions- List of Functiont- String- Returns:
- String
-
reset
void reset()
Reset state in this encoder in preparation for processing new content.
-
setOptions
void setOptions(IParameters params, String encoding, String lineBreak)
Sets the options for this encoder.- Parameters:
params- the parameters object with all the configuration information specific to this encoder.encoding- the name of the charset encoding to use.lineBreak- the type of line break to use.
-
encode
String encode(String text, EncoderContext context)
Encodes a given text with this encoder.- Parameters:
text- the text to encode.context- the context of the text: 0=text, 1=skeleton, 2=inline.- Returns:
- the encoded text.
-
encode
String encode(int codePoint, EncoderContext context)
Encodes a given code-point with this encoding. If this method is called from a loop it is assumed that the code point is tested by the caller to know if it is a supplemental one or not and and any index update to skip the low surrogate part of the pair is done on the caller side.- Parameters:
codePoint- the code-point to encode.context- the context of the character: 0=text, 1=skeleton, 2=inline.- Returns:
- the encoded character (as a string since it can be now made up of more than one character).
-
encode
String encode(char value, EncoderContext context)
Encodes a given character with this encoding.- Parameters:
value- the character to encode.context- the context of the character: 0=text, 1=skeleton, 2=inline.- Returns:
- the encoded character 9as a string since it can be now made up of more than one character).
-
toNative
default String toNative(String propertyName, String value)
Converts any property values from its standard representation to the native representation for this encoder.- Parameters:
propertyName- the name of the property.value- the standard value to convert.- Returns:
- the native representation of the given value.
-
getLineBreak
default String getLineBreak()
Gets the line-break to use for this encoder.- Returns:
- the line-break used for this encoder.
-
getEncoding
default String getEncoding()
Gets the name of the charset encoding to use.- Returns:
- the charset encoding used for this encoder.
-
getCharsetEncoder
default CharsetEncoder getCharsetEncoder()
Gets the character set encoder used for this encoder.- Returns:
- the character set encoder used for this encoder. This can be null.
-
getParameters
IParameters getParameters()
Gets the parameters object with all the configuration information specific to this encoder.- Returns:
- the parameters object used for this encoder. This can be null.
-
-