Class RawDocument

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Cloneable, IResource, IWithAnnotations, IWithProperties, IWithSkeleton

    public class RawDocument
    extends Object
    implements IResource, Closeable
    Resource that carries all the information needed for a filter to open a given document, and also the resource associated with the event RAW_DOCUMENT. Documents are passed through the pipeline either as RawDocument, or a filter events. Specialized steps allows to convert one to the other and conversely. The RawDocument object has one (and only one) of three input objects: a CharSequence, a URI, or an InputStream.
    • Constructor Detail

      • RawDocument

        public RawDocument()
      • RawDocument

        public RawDocument​(CharSequence inputCharSequence,
                           LocaleId sourceLocale)
        Creates a new RawDocument object with a given CharSequence and a source locale.
        Parameters:
        inputCharSequence - the CharSequence for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
      • RawDocument

        public RawDocument​(CharSequence inputCharSequence,
                           LocaleId sourceLocale,
                           LocaleId targetLocale)
        Creates a new RawDocument object with a given CharSequence, a source locale and a target locale.
        Parameters:
        inputCharSequence - the CharSequence for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
        targetLocale - the target locale for this RawDocument.
      • RawDocument

        public RawDocument​(URI inputURI,
                           String defaultEncoding,
                           LocaleId sourceLocale)
        Creates a new RawDocument object with a given URI, a default encoding and a source locale.
        Parameters:
        inputURI - the URI for this RawDocument.
        defaultEncoding - the default encoding for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
      • RawDocument

        public RawDocument​(URI inputURI,
                           String defaultEncoding,
                           LocaleId sourceLocale,
                           LocaleId targetLocale)
        Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale.
        Parameters:
        inputURI - the URI for this RawDocument.
        defaultEncoding - the default encoding for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
        targetLocale - the target locale for this RawDocument.
      • RawDocument

        public RawDocument​(InputStream inputStream,
                           String defaultEncoding,
                           LocaleId sourceLocale)
        Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.
        Parameters:
        inputStream - the InputStream for this RawDocument.
        defaultEncoding - the default encoding for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
      • RawDocument

        public RawDocument​(URI inputURI,
                           String defaultEncoding,
                           LocaleId sourceLocale,
                           LocaleId targetLocale,
                           String filterConfigId)
        Creates a new RawDocument object with a given URI, a default encoding, a source locale and a target locale, and the filter configuration id.
        Parameters:
        inputURI - the URI for this RawDocument.
        defaultEncoding - the default encoding for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
        targetLocale - the target locale for this RawDocument.
        filterConfigId - the filter configuration id.
      • RawDocument

        public RawDocument​(InputStream inputStream,
                           String defaultEncoding,
                           LocaleId sourceLocale,
                           LocaleId targetLocale)
        Creates a new RawDocument object with a given InputStream, a default encoding and a source locale.
        Parameters:
        inputStream - the InputStream for this RawDocument.
        defaultEncoding - the default encoding for this RawDocument.
        sourceLocale - the source locale for this RawDocument.
        targetLocale - the target locale for this RawDocument.
    • Method Detail

      • getId

        public String getId()
        Description copied from interface: IResource
        Gets the identifier of the resource. This identifier is unique per extracted document and by type of resource. This value is filter-specific. It and may be different from one extraction of the same document to the next. It can a sequential number or not, incremental or not, and it can be not a number. It has no correspondence in the source document ("IDs" coming from the source document are "names" and not available for all resources).
        Specified by:
        getId in interface IResource
        Returns:
        the identifier of this resource.
      • setId

        public void setId​(String id)
        Description copied from interface: IResource
        Sets the identifier of this resource.
        Specified by:
        setId in interface IResource
        Parameters:
        id - the new identifier value.
        See Also:
        IResource.getId()
      • getReader

        public Reader getReader()
        Returns a Reader based on the current Stream returned from getStream().

        Returns:
        a Reader
      • getStream

        public InputStream getStream()
        Returns an InputStream based on the current input. The underlying FileCachedInputStream is reset and reopened if needed.
        Returns:
        the InputStream
        Throws:
        OkapiIOException - if there was any problem creating the steam.
      • getInputURI

        public URI getInputURI()
        Gets the URI object associated with this resource. It may be null if either CharSequence InputStream inputs are not null.
        Returns:
        the URI object for this resource (may be null).
      • getInputCharSequence

        public CharSequence getInputCharSequence()
        Gets the CharSequence associated with this resource. It may be null if either URI or InputStream inputs are not null.
        Returns:
        the CHarSequence
      • getEncoding

        public String getEncoding()
        Gets the default encoding associated to this resource.
        Returns:
        The default encoding associated to this resource.
      • getSourceLocale

        public LocaleId getSourceLocale()
        Gets the source locale associated to this resource.
        Returns:
        the source locale associated to this resource.
      • setSourceLocale

        public void setSourceLocale​(LocaleId locId)
        Sets the source locale associated to this document.
        Parameters:
        locId - the locale to set.
      • getTargetLocale

        public LocaleId getTargetLocale()
        Gets the target locale associated to this resource.

        If several targets are set, this method returns the first one.

        Returns:
        the sole or first target locale associated to this resource, or null if no target locale is set.
      • setTargetLocale

        public void setTargetLocale​(LocaleId locId)
        Sets the target locale associated to this document.

        This call overrides any existing target locale or list of target locales.

        Parameters:
        locId - the locale to set.
      • getTargetLocales

        public List<LocaleId> getTargetLocales()
        Gets the list of target locales associated to this resource.

        If the target locale was set using a constructor or setTargetLocale(LocaleId), this list return that locale.

        Returns:
        the target locales associated to this resource. Never null.
      • setTargetLocales

        public void setTargetLocales​(List<LocaleId> locIds)
        Sets the list of target locales associated to this document.

        If the target locale was set with a constructor or setTargetLocale(LocaleId), this method overrides that locale.

        Parameters:
        locIds - the locales to set. If the value is null, an empty list will be associated.
      • setEncoding

        public void setEncoding​(String encoding)
        Set the input encoding.

        WARNING:

        Any Readers gotten via getReader() are now invalid. You should call getReader after calling setEncoding. In some cases it may not be possible to create a new Reader. It is best to set the encoding before any calls to getReader.
        Parameters:
        encoding - the encoding to use with the reader.
      • setEncoding

        public void setEncoding​(Charset encoding)
      • setFilterConfigId

        public void setFilterConfigId​(String filterConfigId)
        Sets the identifier of the filter configuration to use with this document.
        Parameters:
        filterConfigId - the filter configuration identifier to set.
      • getFilterConfigId

        public String getFilterConfigId()
        Gets the identifier of the filter configuration to use with this document.
        Returns:
        the the filter configuration identifier for this document, or null if none is set.
      • close

        public void close()
        Destroy the underlying stream of this RawDocument and delete all temp reosurces.
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface Closeable
      • createOutputFile

        public File createOutputFile​(URI outputURI)
        Creates a new output file object based on a given output URI and the URI of the raw document.

        If the path of the raw document is the same as the path of the output a temporary file is created, otherwise the output URI is used directly. You must call finalizeOutput() when all writing is done and both the input file and output file are closed to make sure the proper output file name is used.

        If one or more directories of the output path do not exist, they are created automatically.

        If the input of the raw document is a CharSequence or a Stream, the method assumes it can use directly the path of the output URI.

        Parameters:
        outputURI - the URI of the output file.
        Returns:
        the output file.
        Throws:
        OkapiIOException - if an error occurs when creating the work file or its directory.
        See Also:
        finalizeOutput()
      • finalizeOutput

        public void finalizeOutput()
        Finalizes the name for this output file. If a temporary file was used, this call deletes the existing file, and then rename the temporary file to the existing file. This method must always be called after both input and output files are closed.
        Throws:
        OkapiIOException - if the original input file cannot be deleted or if the work file cannot be renamed.
        See Also:
        createOutputFile(URI)