Class BOMNewlineEncodingDetector


  • public final class BOMNewlineEncodingDetector
    extends Object
    Helper class to detect byte-order-mark and other easily guessed of encodings, as well as the type of line-break used in a given input. Based on information in: http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info http://www.w3.org/TR/html401/charset.html#h-5.2
    • Constructor Detail

      • BOMNewlineEncodingDetector

        public BOMNewlineEncodingDetector​(InputStream inputStream,
                                          String defaultEncoding)
        Create a new BOMNewlineEncodingDetector from an InputStream and a user provided encoding. This BOMNewlineEncodingDetector can convert the input bytes to Unicode for detection of the BOMNewlineEncodingDetector.NewlineType
        Parameters:
        inputStream - the input stream
        defaultEncoding - the default encoding
      • BOMNewlineEncodingDetector

        public BOMNewlineEncodingDetector​(InputStream inputStream,
                                          Charset defaultEncoding)
    • Method Detail

      • getInputStream

        public InputStream getInputStream()
        Get the input stream pased in to the constructor
        Returns:
        the InputStream
      • getEncoding

        public String getEncoding()
        Get the guessed encoding or if encoding couldn't be guessed return the user supplied encoding. If no user supplied encoding is found use ISO_8859_1.
        Returns:
        the guessed or user supplied encoding.
      • getEncodingSpecificationInfo

        public String getEncodingSpecificationInfo()
        Return a short description of the encoding.
        Returns:
        String containing the specification.
      • isDefinitive

        public boolean isDefinitive()
        Are we confident of the document encoding?
        Returns:
        true if the encoding is obvious from the BOM or bytes, false if the encoding must be guessed.
      • detectBom

        public void detectBom()
      • detectAndRemoveBom

        public void detectAndRemoveBom()
      • getDefaultEncoding

        public String getDefaultEncoding()
        Get the defaultEncoding set by the user.
        Returns:
        String representation of the encoding
      • setDefaultEncoding

        public void setDefaultEncoding​(String defaultEncoding)
        Set the default encoding.
        Parameters:
        defaultEncoding - default encoding
      • hasBom

        public boolean hasBom()
        Does this document have a byte order mark?
        Returns:
        true if there is a BOM, false otherwise.
      • hasUtf8Bom

        public boolean hasUtf8Bom()
        Indicates if the guessed encoding is UTF-8 and this file has a BOM.
        Returns:
        True if the guessed encoding is UTF-8 and this file has a BOM, false otherwise.
      • hasUtf7Bom

        public boolean hasUtf7Bom()
        Does this document have a UTF-7 byte order mark?
        Returns:
        true if there is a BOM, false otherwise.
      • isAutodetected

        public boolean isAutodetected()
        Indicates if the guessed encoding was auto-detected. If not it is the default encoding that was provided.
        Returns:
        True if the guessed encoding was auto-detected, false if not.
      • getBomSize

        public int getBomSize()
        Gets the number of bytes used by the Byte-Order-mark in this document.
        Returns:
        The byte size of the BOM in this document.
      • hasUtf8Encoding

        public boolean hasUtf8Encoding()