Class TextUnitUtil

    • Constructor Detail

      • TextUnitUtil

        public TextUnitUtil()
    • Method Detail

      • trimLeading

        public static void trimLeading​(TextFragment textFragment)
        Removes leading whitespaces from a given text fragment.
        Parameters:
        textFragment - the text fragment which leading whitespaces are to be removed.
      • trimLeading

        public static void trimLeading​(TextFragment textFragment,
                                       GenericSkeleton skel)
        Removes leading whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.
        Parameters:
        textFragment - the text fragment which leading whitespaces are to be removed.
        skel - the skeleton to put the removed whitespaces.
      • trimTrailing

        public static void trimTrailing​(TextFragment textFragment)
        Removes trailing whitespaces from a given text fragment.
        Parameters:
        textFragment - the text fragment which trailing whitespaces are to be removed.
      • trimTrailing

        public static void trimTrailing​(TextFragment textFragment,
                                        GenericSkeleton skel)
        Removes trailing whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.
        Parameters:
        textFragment - the text fragment which trailing whitespaces are to be removed.
        skel - the skeleton to put the removed whitespaces.
      • endsWith

        public static boolean endsWith​(TextFragment textFragment,
                                       String substr)
        Indicates if a given text fragment ends with a given sub-string. Trailing spaces are not counted.
        Parameters:
        textFragment - the text fragment to examine.
        substr - the text to lookup.
        Returns:
        true if the given text fragment ends with the given sub-string.
      • isEmpty

        public static boolean isEmpty​(ITextUnit textUnit)
        Indicates if a given text unit resource is null, or its source part is null or empty.
        Parameters:
        textUnit - the text unit to check.
        Returns:
        true if the given text unit resource is null, or its source part is null or empty.
      • hasSource

        public static boolean hasSource​(ITextUnit textUnit)
        Indicates if a given text unit resource is null, or its source part is null or empty. Whitespaces are not taken into account, e.g. if the text unit contains only whitespaces, it's considered empty.
        Parameters:
        textUnit - the text unit to check.
        Returns:
        true if the given text unit resource is null, or its source part is null or empty.
      • isEmpty

        public static boolean isEmpty​(ITextUnit textUnit,
                                      boolean ignoreWS)
        Indicates if a given text unit resource is null, or its source part is null or empty. Whitespaces are not taken into account, if ignoreWS = true, e.g. if the text unit contains only whitespaces, it's considered empty.
        Parameters:
        textUnit - the text unit to check.
        ignoreWS - if true and the text unit contains only whitespaces, then the text unit is considered empty.
        Returns:
        true if the given text unit resource is null, or its source part is null or empty.
      • getSourceText

        public static String getSourceText​(ITextUnit textUnit)
        Gets the coded text of the first part of the source of a given text unit resource.
        Parameters:
        textUnit - the text unit resource which source text should be returned.
        Returns:
        the source part of the given text unit resource.
      • getSourceText

        public static String getSourceText​(ITextUnit textUnit,
                                           boolean removeCodes)
        Gets the coded text of the first part of a source part of a given text unit resource. If removeCodes = false, and the text contains inline codes, then the codes will be removed.
        Parameters:
        textUnit - the text unit resource which source text should be returned.
        removeCodes - true if possible inline codes should be removed.
        Returns:
        the source part of the given text unit resource.
      • getTargetText

        public static String getTargetText​(ITextUnit textUnit,
                                           LocaleId locId)
        Gets text of the first part of the target of a given text unit resource in the given locale.
        Parameters:
        textUnit - the text unit resource which source text should be returned.
        locId - the locale the target part being sought.
        Returns:
        the target part of the given text unit resource in the given loacle, or an empty string if the text unit doesn't contain one.
      • getCodedText

        public static String getCodedText​(TextFragment textFragment)
        Gets text of a given text fragment object possibly containing inline codes.
        Parameters:
        textFragment - the given text fragment object.
        Returns:
        the text of the given text fragment object possibly containing inline codes.
      • getText

        public static String getText​(TextFragment textFragment,
                                     List<Integer> markerPositions)
        Extracts text from the given text fragment. Used to create a copy of the original string but without code markers. The original string is not stripped of code markers, and remains intact.
        Parameters:
        textFragment - TextFragment object with possible codes inside
        markerPositions - List to store initial positions of removed code markers. use null to not store the markers.
        Returns:
        The copy of the string, contained in TextFragment, but without code markers
      • printMarkerIndexes

        public static String printMarkerIndexes​(TextFragment textFragment)
      • getText

        public static String getText​(TextFragment textFragment)
        Extracts text from the given text fragment. Used to create a copy of the original string but without code markers. The original string is not stripped of code markers, and remains intact.
        Parameters:
        textFragment - TextFragment object with possible codes inside
        Returns:
        The copy of the string, contained in TextFragment, but w/o code markers
      • getLastChar

        public static char getLastChar​(TextFragment textFragment)
        Gets the last character of a given text fragment.
        Parameters:
        textFragment - the text fragment to examin.
        Returns:
        the last character of the given text fragment, or '\0'.
      • deleteLastChar

        public static void deleteLastChar​(TextFragment textFragment)
        Deletes the last non-whitespace and non-code character of a given text fragment.
        Parameters:
        textFragment - the text fragment to examine.
      • lastIndexOf

        public static int lastIndexOf​(TextFragment textFragment,
                                      String findWhat)
        Returns the index (within a given text fragment object) of the rightmost occurrence of the specified substring.
        Parameters:
        textFragment - the text fragment to examine.
        findWhat - the substring to search for.
        Returns:
        if the string argument occurs one or more times as a substring within this object, then the index of the first character of the last such substring is returned. If it does not occur as a substring, -1 is returned.
      • isEmpty

        public static boolean isEmpty​(TextFragment textFragment)
        Indicates if a given text fragment object is null, or the text it contains is null or empty.
        Parameters:
        textFragment - the text fragment to examine.
        Returns:
        true if the given text fragment object is null, or the text it contains is null or empty.
      • buildGenericTU

        public static ITextUnit buildGenericTU​(TextContainer source)
        Creates a new generic text unit resource based on a given text container object becoming the source part of the text unit. WARNING: Not all filters use GenericSkeleton. Use with caution.
        Parameters:
        source - the given text container becoming the source part of the text unit.
        Returns:
        a new text unit resource with the given text container object being its source part.
      • buildGenericTU

        public static ITextUnit buildGenericTU​(String source)
        Creates a generic new text unit resource based a given string becoming the source text of the text unit. WARNING: Not all filters use GenericSkeleton. Use with caution.
        Parameters:
        source - the given string becoming the source text of the text unit.
        Returns:
        a new text unit resource with the given string being its source text.
      • buildGenericTU

        public static ITextUnit buildGenericTU​(String srcPart,
                                               String skelPart)
        Creates a generic new text unit resource based on a given string becoming the source text of the text unit, and a skeleton string, which gets appended to the new text unit's skeleton. WARNING: Not all filters use GenericSkeleton. Use with caution.
        Parameters:
        srcPart - the given string becoming the source text of the created text unit.
        skelPart - the skeleton string appended to the new text unit's skeleton.
        Returns:
        a new text unit resource with the given string being its source text, and the skeleton string in the skeleton.
      • buildGenericTU

        public static ITextUnit buildGenericTU​(ITextUnit textUnit,
                                               String name,
                                               TextContainer source,
                                               TextContainer target,
                                               LocaleId locId,
                                               String comment)
        Creates a new generic text unit resource or updates the one passed as the parameter. You can use this method to create a new text unit or modify existing one (adding or modifying its fields' values). WARNING: Not all filters use GenericSkeleton. Use with caution.
        Parameters:
        textUnit - the text unit to be modified, or null to create a new text unit.
        name - name of the new text unit, or a new name for the existing one.
        source - the text container object becoming the source part of the text unit.
        target - the text container object becoming the target part of the text unit.
        locId - the locale of the target part (passed in the target parameter).
        comment - the optional comment becoming a NOTE property of the text unit.
        Returns:
        a reference to the original or newly created text unit.
      • forceSkeleton

        public static GenericSkeleton forceSkeleton​(ITextUnit tu)
        Makes sure that a given text unit contains a skeleton. If there's no skeleton already attached to the text unit, a new skeleton object is created and attached to the text unit.
        Parameters:
        tu - the given text unit to have a skeleton.
        Returns:
        the skeleton of the text unit.
      • convertToSkeleton

        public static GenericSkeleton convertToSkeleton​(ITextUnit textUnit)
        Copies source and target text of a given text unit into a newly created skeleton. The original text unit remains intact, and plays a role of a pattern for a newly created skeleton's contents.
        Parameters:
        textUnit - the text unit to be copied into a skeleton.
        Returns:
        the newly created skeleton, which contents reflect the given text unit.
      • getSourceAnnotation

        public static <A extends IAnnotation> A getSourceAnnotation​(ITextUnit textUnit,
                                                                    Class<A> type)
        Gets an annotation attached to the source part of a given text unit resource.
        Type Parameters:
        A - a class implementing IAnnotation
        Parameters:
        textUnit - the given text unit resource.
        type - reference to the requested annotation type.
        Returns:
        the annotation or null if not found.
      • setSourceAnnotation

        public static void setSourceAnnotation​(ITextUnit textUnit,
                                               IAnnotation annotation)
        Attaches an annotation to the source part of a given text unit resource.
        Parameters:
        textUnit - the given text unit resource.
        annotation - the annotation to be attached to the source part of the text unit.
      • getTargetAnnotation

        public static <A extends IAnnotation> A getTargetAnnotation​(ITextUnit textUnit,
                                                                    LocaleId locId,
                                                                    Class<A> type)
        Gets an annotation attached to the target part of a given text unit resource in a given locale.
        Type Parameters:
        A - a class implementing IAnnotation
        Parameters:
        textUnit - the given text unit resource.
        locId - the locale of the target part being sought.
        type - reference to the requested annotation type.
        Returns:
        the annotation or null if not found.
      • setTargetAnnotation

        public static void setTargetAnnotation​(ITextUnit textUnit,
                                               LocaleId locId,
                                               IAnnotation annotation)
        Attaches an annotation to the target part of a given text unit resource in a given language.
        Parameters:
        textUnit - the given text unit resource.
        locId - the locale of the target part being attached to.
        annotation - the annotation to be attached to the target part of the text unit.
      • setSourceText

        public static void setSourceText​(ITextUnit textUnit,
                                         String text)
        Sets the coded text of the un-segmented source of a given text unit resource.
        Parameters:
        textUnit - the given text unit resource.
        text - the text to be set.
      • setTargetText

        public static void setTargetText​(ITextUnit textUnit,
                                         LocaleId locId,
                                         String text)
        Sets the coded text of the the target part of a given text unit resource in a given language.
        Parameters:
        textUnit - the given text unit resource.
        locId - the locale of the target part being set.
        text - the text to be set.
      • trimTU

        public static void trimTU​(ITextUnit textUnit,
                                  boolean trimLeading,
                                  boolean trimTrailing)
        Removes leading and/or trailing whitespaces from the source part of a given text unit resource.
        Parameters:
        textUnit - the given text unit resource.
        trimLeading - true to remove leading whitespaces if there are any.
        trimTrailing - true to remove trailing whitespaces if there are any.
      • addQualifiers

        public static void addQualifiers​(ITextUnit textUnit,
                                         String startQualifier,
                                         String endQualifier)
        Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text. This method is useful when the starting and ending qualifiers are different.
        Parameters:
        textUnit - the given text unit resource
        startQualifier - the qualifier to be added before text
        endQualifier - the qualifier to be added after text
      • addQualifiers

        public static void addQualifiers​(ITextUnit textUnit,
                                         String qualifier)
        Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.
        Parameters:
        textUnit - the given text unit resource
        qualifier - the qualifier to be added before and after text
      • removeQualifiers

        public static boolean removeQualifiers​(ITextUnit textUnit,
                                               String startQualifier,
                                               String endQualifier)
        Removes from the source part of a given un-segmented text unit resource qualifiers (parenthesis, quotation marks etc.) around text. This method is useful when the starting and ending qualifiers are different.
        Parameters:
        textUnit - the given text unit resource.
        startQualifier - the qualifier to be removed before source text.
        endQualifier - the qualifier to be removed after source text.
        Returns:
        true if the qualifiers were found and removed
      • simplifyCodes

        public static void simplifyCodes​(ITextUnit textUnit,
                                         String rules,
                                         boolean removeLeadingTrailingCodes)
        Simplifies all possible tags in the source part of a given text unit resource.
        Parameters:
        textUnit - the given text unit
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
      • simplifyCodes

        public static void simplifyCodes​(ITextUnit textUnit,
                                         String rules,
                                         boolean removeLeadingTrailingCodes,
                                         boolean mergeCodes)
        Simplifies all possible tags in the source part of a given text unit resource.
        Parameters:
        textUnit - the given text unit
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes
        mergeCodes - true to merge adjacent codes, false to leave as-is of the source part and place their text in the skeleton.
      • simplifyCodesPostSegmentation

        public static void simplifyCodesPostSegmentation​(ITextUnit textUnit,
                                                         String rules,
                                                         boolean removeLeadingTrailingCodes,
                                                         boolean mergeCodes)
        Simplifies all possible tags in the source part of a given text unit resource. If the TextUnit has a target then skip simplification.
        Parameters:
        textUnit - the given text unit
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the corresponding inter-segment TextPart.
        mergeCodes - true to merge adjacent codes, false to leave as-is
      • simplifyCodesPostSegmentation

        public static void simplifyCodesPostSegmentation​(TextContainer tc,
                                                         String rules,
                                                         boolean removeLeadingTrailingCodes,
                                                         boolean mergeCodes)
        Simplifies all possible tags in the source part of a given text unit resource. If the TextUnit has a target then skip simplification.
        Parameters:
        tc - the given text container
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the corresponding inter-segment TextPart.
        mergeCodes - true to merge adjacent codes, false to leave as-is
      • expandCodes

        public static TextFragment expandCodes​(TextFragment tf)
        Expand codes that have been previously merged.
        Parameters:
        tf - The original TextFragment with possibly merged codes.
        Returns:
        new TextFragment with expanded codes or original if there are no codes or they have not been merged.
      • hasMergedCode

        public static boolean hasMergedCode​(TextFragment tf)
      • removeCodes

        public static void removeCodes​(ITextUnit textUnit,
                                       boolean removeTargetCodes)
        Removes all inline tags in the source (or optionally the target) text unit resource.
        Parameters:
        textUnit - the given text unit
        removeTargetCodes - - remove target codes?
      • removeCodes

        public static void removeCodes​(TextContainer tc)
        Removes all inline tags from the given TextContainer
        Parameters:
        tc - the given text container
      • removeCodes

        public static void removeCodes​(TextFragment tf)
        Removes all inline tags from the given TextFragment
        Parameters:
        tf - the given text fragment
      • removeCodes

        public static String removeCodes​(String codedText)
        Removes all inline tags from a given coded text.
        Parameters:
        codedText - the given coded text string
        Returns:
        the string without code markers
      • removeAndReplaceCodes

        public static String removeAndReplaceCodes​(String codedText,
                                                   String isolatedCodeReplacement)
        Removes the opening and closing codes and replaces the isolated codes in text with the specified string.
        Parameters:
        codedText - The given coded text string
        isolatedCodeReplacement - The isolated code replacement
        Returns:
        The string without code markers
      • simplifyCodes

        public static TextFragment[] simplifyCodes​(TextFragment tf,
                                                   String rules,
                                                   boolean removeLeadingTrailingCodes)
        Simplifies all possible tags in a given text fragment.
        Parameters:
        tf - the given text fragment
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
        Returns:
        Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
      • simplifyCodes

        public static TextFragment[] simplifyCodes​(TextFragment tf,
                                                   String rules,
                                                   boolean removeLeadingTrailingCodes,
                                                   boolean mergeCodes)
        Simplifies all possible tags in a given text fragment.
        Parameters:
        tf - the given text fragment
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
        mergeCodes - true to merge adjacent codes, false to leave as-is
        Returns:
        Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
      • simplifyCodes

        public static TextFragment[] simplifyCodes​(TextContainer tc,
                                                   String rules,
                                                   boolean removeLeadingTrailingCodes)
        Simplifies all possible tags in a given text container.
        Parameters:
        tc - the given text container
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
        Returns:
        Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
      • simplifyCodes

        public static TextFragment[] simplifyCodes​(TextContainer tc,
                                                   String rules,
                                                   boolean removeLeadingTrailingCodes,
                                                   boolean mergeCodes)
        Simplifies all possible tags in a given text container.
        Parameters:
        tc - the given text container
        rules - rules for the data-driven simplification
        removeLeadingTrailingCodes - true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
        mergeCodes - true to merge adjacent codes, false to leave as-is
        Returns:
        Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
      • removeQualifiers

        public static boolean removeQualifiers​(ITextUnit textUnit,
                                               String qualifier)
        Removes from the source part of a given text unit resource qualifiers (quotation marks etc.) around text.
        Parameters:
        textUnit - the given text unit resource.
        qualifier - the qualifier to be removed before and after source text.
        Returns:
        true if the qualifiers were found and removed
      • addAltTranslation

        public static AltTranslationsAnnotation addAltTranslation​(Segment seg,
                                                                  AltTranslation alt)
        Adds an AltTranslation object to a given Segment. The AltTranslationsAnnotation annotation is created if it does not exist already.
        Parameters:
        seg - the segment where to add the object.
        alt - alternate translation to add.
        Returns:
        the annotation where the object was added, it may be a new annotation or the one already associated with the segment.
      • trimSegments

        public static void trimSegments​(TextContainer tc,
                                        boolean trimLeading,
                                        boolean trimTrailing)
        Trims segments of a given text container that contains leading or trailing whitespaces. Removed whitespaces are placed in newly created whitespace-only text parts before and after a trimmed segment.
        Parameters:
        tc - the given text container
        trimLeading - true to remove leading whitespaces of a segment
        trimTrailing - true to remove trailing whitespaces of a segment
      • trimSegments

        public static void trimSegments​(TextContainer tc)
      • extractSegMarkers

        public static TextFragment extractSegMarkers​(TextFragment tf,
                                                     TextFragment original,
                                                     boolean removeFromOriginal)
        Extracts segment and text part markers from a given string, creates codes (place-holder type) for those markers, and appends them to a given text fragment.
        Parameters:
        tf - the given text fragment to append extracted codes
        original - the given string
        removeFromOriginal - remove found markers from the given string
        Returns:
        the given string if removeFromOriginal == false, or the modified original string with markers removed otherwise
      • hasSegOrTpMarker

        public static boolean hasSegOrTpMarker​(Code code)
      • hasSegStartMarker

        public static boolean hasSegStartMarker​(Code code)
      • hasSegEndMarker

        public static boolean hasSegEndMarker​(Code code)
      • hasTpStartMarker

        public static boolean hasTpStartMarker​(Code code)
      • hasTpEndMarker

        public static boolean hasTpEndMarker​(Code code)
      • hasExternalRefMarker

        public static boolean hasExternalRefMarker​(Code code)
      • restoreSegmentation

        public static String restoreSegmentation​(TextContainer tc,
                                                 TextFragment segStorage)
        Restores original segmentation of a given text container from a given text fragment created with storeSegmentation().
        Parameters:
        tc - the given text container
        segStorage - the text fragment created with storeSegmentation() and containing the original segmentation info
        Returns:
        a test string containing a sequence of markers created by the internal algorithm. Used for tests only.
      • testMarkers

        public static String testMarkers()
      • toText

        public static String toText​(TextFragment tf)
        Returns the content of a given text fragment, including the original codes whenever possible. Codes are decorated with '[' and ']' to tell them from regular text.
        Parameters:
        tf - the given text fragment
        Returns:
        the content of the given fragment
      • toText

        public static String toText​(String text,
                                    List<Code> codes)
        Returns representation of a given coded text with code data enclosed in brackets.
        Parameters:
        text - the given coded text
        codes - the given list of codes
        Returns:
        content of the given coded text
      • isApproved

        public static boolean isApproved​(ITextUnit tu,
                                         LocaleId targetLocale)
      • convertTextPartsToCodes

        public static void convertTextPartsToCodes​(TextContainer tc)
        Convert all TextParts (not Segments) in a given TextContainer to each contain a single code with the part's text. Needed to protect the text of text part (e.g. created from original codes) against being escaped by an encoder.
        Parameters:
        tc - the given TextContainer
      • convertTextPartToCode

        public static void convertTextPartToCode​(TextPart textPart)
        Create a single code with a given TextPart's text. Needed to protect the text of the text part from being escaped by an encoder. If the TextPart already has codes, no conversion is performed.
        Parameters:
        textPart - the given TextPart
      • convertTextParts_whitespaceCodesToText

        public static void convertTextParts_whitespaceCodesToText​(TextContainer tc)
      • convertTextPart_whitespaceCodesToText

        public static void convertTextPart_whitespaceCodesToText​(TextPart textPart)
      • isStandalone

        public static boolean isStandalone​(ITextUnit tu)
      • renumberCodes

        public static void renumberCodes​(TextContainer tc)
      • needsPreserveWhitespaces

        public static boolean needsPreserveWhitespaces​(TextContainer tc)
        Detects if a given TextContainer contains whitespace characters to be preserved in XML. Single space 0x20 doesn't need to be preserved, other whitespace characters, also a sequence of 2 or more single spaces do.
        Parameters:
        tc - the given TextContainer object.
        Returns:
        true if the given TextContainer has whitespace sequences that need to be preserved.
      • needsPreserveWhitespaces

        public static boolean needsPreserveWhitespaces​(ITextUnit tu)
      • isWellformed

        public static boolean isWellformed​(TextFragment tf)
      • isWellformed

        public static boolean isWellformed​(TextContainer tc)
      • unsegmentTU

        public static void unsegmentTU​(ITextUnit tu)