Class RawDocument
- java.lang.Object
-
- net.sf.okapi.common.resource.RawDocument
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Cloneable
,IResource
,IWithAnnotations
,IWithProperties
,IWithSkeleton
public class RawDocument extends Object implements IResource, Closeable
Resource that carries all the information needed for a filter to open a given document, and also the resource associated with the event RAW_DOCUMENT. Documents are passed through the pipeline either as RawDocument, or a filter events. Specialized steps allows to convert one to the other and conversely. The RawDocument object has one (and only one) of three input objects: a CharSequence, a URI, or an InputStream.
-
-
Field Summary
Fields Modifier and Type Field Description static String
UNKOWN_ENCODING
-
Fields inherited from interface net.sf.okapi.common.IResource
COPY_ALL, COPY_CONTENT, COPY_PROPERTIES, COPY_SEGMENTATION, COPY_SEGMENTED_CONTENT, CREATE_EMPTY
-
-
Constructor Summary
Constructors Constructor Description RawDocument()
RawDocument(InputStream inputStream, String defaultEncoding, LocaleId sourceLocale)
Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.RawDocument(InputStream inputStream, String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale)
Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.RawDocument(CharSequence inputCharSequence, LocaleId sourceLocale)
Creates a new RawDocument object with a given CharSequence and a source locale.RawDocument(CharSequence inputCharSequence, LocaleId sourceLocale, LocaleId targetLocale)
Creates a new RawDocument object with a given CharSequence, a source locale and a target locale.RawDocument(URI inputURI, String defaultEncoding, LocaleId sourceLocale)
Creates a new RawDocument object with a given URI, a default encoding and a source locale.RawDocument(URI inputURI, String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale)
Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale.RawDocument(URI inputURI, String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale, String filterConfigId)
Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale, and the filter configuration id.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Destroy the underlying stream of this RawDocument and delete all temp reosurces.File
createOutputFile(URI outputURI)
Creates a new output file object based on a given output URI and the URI of the raw document.void
finalizeOutput()
Finalizes the name for this output file.<A extends IAnnotation>
AgetAnnotation(Class<A> annotationType)
Gets the annotation object for a given class for this resource.Annotations
getAnnotations()
String
getEncoding()
Gets the default encoding associated to this resource.String
getFilterConfigId()
Gets the identifier of the filter configuration to use with this document.String
getId()
Gets the identifier of the resource.CharSequence
getInputCharSequence()
Gets the CharSequence associated with this resource.URI
getInputURI()
Gets the URI object associated with this resource.Map<String,Property>
getProperties()
Reader
getReader()
Returns a Reader based on the current Stream returned from getStream().LocaleId
getSourceLocale()
Gets the source locale associated to this resource.InputStream
getStream()
Returns an InputStream based on the current input.LocaleId
getTargetLocale()
Gets the target locale associated to this resource.List<LocaleId>
getTargetLocales()
Gets the list of target locales associated to this resource.void
setEncoding(String encoding)
Set the input encoding.void
setEncoding(Charset encoding)
void
setFilterConfigId(String filterConfigId)
Sets the identifier of the filter configuration to use with this document.void
setId(String id)
Sets the identifier of this resource.void
setSourceLocale(LocaleId locId)
Sets the source locale associated to this document.void
setTargetLocale(LocaleId locId)
Sets the target locale associated to this document.void
setTargetLocales(List<LocaleId> locIds)
Sets the list of target locales associated to this document.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface net.sf.okapi.common.resource.IWithAnnotations
annotationIterator, clear, getAnnotationsTypesAsSet, hasAnnotation, hasAnnotations, remove, setAnnotation
-
Methods inherited from interface net.sf.okapi.common.resource.IWithProperties
getProperty, getPropertyNames, hasProperty, propertyIterator, removeProperty, setProperty
-
Methods inherited from interface net.sf.okapi.common.resource.IWithSkeleton
getSkeleton, setSkeleton
-
-
-
-
Field Detail
-
UNKOWN_ENCODING
public static final String UNKOWN_ENCODING
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
RawDocument
public RawDocument()
-
RawDocument
public RawDocument(CharSequence inputCharSequence, LocaleId sourceLocale)
Creates a new RawDocument object with a given CharSequence and a source locale.- Parameters:
inputCharSequence
- the CharSequence for this RawDocument.sourceLocale
- the source locale for this RawDocument.
-
RawDocument
public RawDocument(CharSequence inputCharSequence, LocaleId sourceLocale, LocaleId targetLocale)
Creates a new RawDocument object with a given CharSequence, a source locale and a target locale.- Parameters:
inputCharSequence
- the CharSequence for this RawDocument.sourceLocale
- the source locale for this RawDocument.targetLocale
- the target locale for this RawDocument.
-
RawDocument
public RawDocument(URI inputURI, String defaultEncoding, LocaleId sourceLocale)
Creates a new RawDocument object with a given URI, a default encoding and a source locale.- Parameters:
inputURI
- the URI for this RawDocument.defaultEncoding
- the default encoding for this RawDocument.sourceLocale
- the source locale for this RawDocument.
-
RawDocument
public RawDocument(URI inputURI, String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale)
Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale.- Parameters:
inputURI
- the URI for this RawDocument.defaultEncoding
- the default encoding for this RawDocument.sourceLocale
- the source locale for this RawDocument.targetLocale
- the target locale for this RawDocument.
-
RawDocument
public RawDocument(InputStream inputStream, String defaultEncoding, LocaleId sourceLocale)
Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.- Parameters:
inputStream
- the InputStream for this RawDocument.defaultEncoding
- the default encoding for this RawDocument.sourceLocale
- the source locale for this RawDocument.
-
RawDocument
public RawDocument(URI inputURI, String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale, String filterConfigId)
Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale, and the filter configuration id.- Parameters:
inputURI
- the URI for this RawDocument.defaultEncoding
- the default encoding for this RawDocument.sourceLocale
- the source locale for this RawDocument.targetLocale
- the target locale for this RawDocument.filterConfigId
- the filter configuration id.
-
RawDocument
public RawDocument(InputStream inputStream, String defaultEncoding, LocaleId sourceLocale, LocaleId targetLocale)
Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.- Parameters:
inputStream
- the InputStream for this RawDocument.defaultEncoding
- the default encoding for this RawDocument.sourceLocale
- the source locale for this RawDocument.targetLocale
- the target locale for this RawDocument.
-
-
Method Detail
-
getId
public String getId()
Description copied from interface:IResource
Gets the identifier of the resource. This identifier is unique per extracted document and by type of resource. This value is filter-specific. It and may be different from one extraction of the same document to the next. It can a sequential number or not, incremental or not, and it can be not a number. It has no correspondence in the source document ("IDs" coming from the source document are "names" and not available for all resources).
-
setId
public void setId(String id)
Description copied from interface:IResource
Sets the identifier of this resource.- Specified by:
setId
in interfaceIResource
- Parameters:
id
- the new identifier value.- See Also:
IResource.getId()
-
getReader
public Reader getReader()
Returns a Reader based on the current Stream returned from getStream().- Returns:
- a Reader
-
getStream
public InputStream getStream()
Returns an InputStream based on the current input. The underlyingFileCachedInputStream
is reset and reopened if needed.- Returns:
- the InputStream
- Throws:
OkapiIOException
- if there was any problem creating the steam.
-
getAnnotation
public <A extends IAnnotation> A getAnnotation(Class<A> annotationType)
Description copied from interface:IWithAnnotations
Gets the annotation object for a given class for this resource.- Specified by:
getAnnotation
in interfaceIWithAnnotations
- Returns:
-
getInputURI
public URI getInputURI()
Gets the URI object associated with this resource. It may be null if either CharSequence InputStream inputs are not null.- Returns:
- the URI object for this resource (may be null).
-
getInputCharSequence
public CharSequence getInputCharSequence()
Gets the CharSequence associated with this resource. It may be null if either URI or InputStream inputs are not null.- Returns:
- the CHarSequence
-
getEncoding
public String getEncoding()
Gets the default encoding associated to this resource.- Returns:
- The default encoding associated to this resource.
-
getSourceLocale
public LocaleId getSourceLocale()
Gets the source locale associated to this resource.- Returns:
- the source locale associated to this resource.
-
setSourceLocale
public void setSourceLocale(LocaleId locId)
Sets the source locale associated to this document.- Parameters:
locId
- the locale to set.
-
getTargetLocale
public LocaleId getTargetLocale()
Gets the target locale associated to this resource.If several targets are set, this method returns the first one.
- Returns:
- the sole or first target locale associated to this resource, or null if no target locale is set.
-
setTargetLocale
public void setTargetLocale(LocaleId locId)
Sets the target locale associated to this document.This call overrides any existing target locale or list of target locales.
- Parameters:
locId
- the locale to set.
-
getTargetLocales
public List<LocaleId> getTargetLocales()
Gets the list of target locales associated to this resource.If the target locale was set using a constructor or
setTargetLocale(LocaleId)
, this list return that locale.- Returns:
- the target locales associated to this resource. Never null.
-
setTargetLocales
public void setTargetLocales(List<LocaleId> locIds)
Sets the list of target locales associated to this document.If the target locale was set with a constructor or
setTargetLocale(LocaleId)
, this method overrides that locale.- Parameters:
locIds
- the locales to set. If the value is null, an empty list will be associated.
-
setEncoding
public void setEncoding(String encoding)
Set the input encoding.WARNING:
Any Readers gotten via getReader() are now invalid. You should call getReader after calling setEncoding. In some cases it may not be possible to create a new Reader. It is best to set the encoding before any calls to getReader.- Parameters:
encoding
- the encoding to use with the reader.
-
setEncoding
public void setEncoding(Charset encoding)
-
setFilterConfigId
public void setFilterConfigId(String filterConfigId)
Sets the identifier of the filter configuration to use with this document.- Parameters:
filterConfigId
- the filter configuration identifier to set.
-
getFilterConfigId
public String getFilterConfigId()
Gets the identifier of the filter configuration to use with this document.- Returns:
- the the filter configuration identifier for this document, or null if none is set.
-
close
public void close()
Destroy the underlying stream of this RawDocument and delete all temp reosurces.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-
createOutputFile
public File createOutputFile(URI outputURI)
Creates a new output file object based on a given output URI and the URI of the raw document.If the path of the raw document is the same as the path of the output a temporary file is created, otherwise the output URI is used directly. You must call
finalizeOutput()
when all writing is done and both the input file and output file are closed to make sure the proper output file name is used.If one or more directories of the output path do not exist, they are created automatically.
If the input of the raw document is a CharSequence or a Stream, the method assumes it can use directly the path of the output URI.
- Parameters:
outputURI
- the URI of the output file.- Returns:
- the output file.
- Throws:
OkapiIOException
- if an error occurs when creating the work file or its directory.- See Also:
finalizeOutput()
-
finalizeOutput
public void finalizeOutput()
Finalizes the name for this output file. If a temporary file was used, this call deletes the existing file, and then rename the temporary file to the existing file. This method must always be called after both input and output files are closed.- Throws:
OkapiIOException
- if the original input file cannot be deleted or if the work file cannot be renamed.- See Also:
createOutputFile(URI)
-
getProperties
public Map<String,Property> getProperties()
- Specified by:
getProperties
in interfaceIWithProperties
- Returns:
Map
of properties for the implementer of interface
-
getAnnotations
public Annotations getAnnotations()
- Specified by:
getAnnotations
in interfaceIWithAnnotations
- Returns:
Annotations
for the implementer of interface
-
-