Package net.sf.okapi.common.resource
Class TextUnitUtil
- java.lang.Object
-
- net.sf.okapi.common.resource.TextUnitUtil
-
public class TextUnitUtil extends Object
-
-
Constructor Summary
Constructors Constructor Description TextUnitUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static AltTranslationsAnnotation
addAltTranslation(Segment seg, AltTranslation alt)
Adds anAltTranslation
object to a givenSegment
.static AltTranslationsAnnotation
addAltTranslation(TextContainer targetContainer, AltTranslation alt)
Adds anAltTranslation
object to a givenTextContainer
.static void
addQualifiers(ITextUnit textUnit, String qualifier)
Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.static void
addQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)
Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.static ITextUnit
buildGenericTU(String source)
Creates a generic new text unit resource based a given string becoming the source text of the text unit.static ITextUnit
buildGenericTU(String srcPart, String skelPart)
Creates a generic new text unit resource based on a given string becoming the source text of the text unit, and a skeleton string, which gets appended to the new text unit's skeleton.static ITextUnit
buildGenericTU(ITextUnit textUnit, String name, TextContainer source, TextContainer target, LocaleId locId, String comment)
Creates a new generic text unit resource or updates the one passed as the parameter.static ITextUnit
buildGenericTU(TextContainer source)
Creates a new generic text unit resource based on a given text container object becoming the source part of the text unit.static void
convertTextPart_whitespaceCodesToText(TextPart textPart)
static void
convertTextParts_whitespaceCodesToText(TextContainer tc)
static void
convertTextPartsToCodes(TextContainer tc)
Convert all TextParts (not Segments) in a given TextContainer to each contain a single code with the part's text.static void
convertTextPartToCode(TextPart textPart)
Create a single code with a given TextPart's text.static GenericSkeleton
convertToSkeleton(ITextUnit textUnit)
Copies source and target text of a given text unit into a newly created skeleton.static void
deleteLastChar(TextFragment textFragment)
Deletes the last non-whitespace and non-code character of a given text fragment.static boolean
endsWith(TextFragment textFragment, String substr)
Indicates if a given text fragment ends with a given sub-string.static TextFragment
expandCodes(TextFragment tf)
Expand codes that have been previously merged.static TextFragment
extractSegMarkers(TextFragment tf, TextFragment original, boolean removeFromOriginal)
Extracts segment and text part markers from a given string, creates codes (place-holder type) for those markers, and appends them to a given text fragment.static GenericSkeleton
forceSkeleton(ITextUnit tu)
Makes sure that a given text unit contains a skeleton.static String
getCodedText(TextFragment textFragment)
Gets text of a given text fragment object possibly containing inline codes.static char
getLastChar(TextFragment textFragment)
Gets the last character of a given text fragment.static <A extends IAnnotation>
AgetSourceAnnotation(ITextUnit textUnit, Class<A> type)
Gets an annotation attached to the source part of a given text unit resource.static String
getSourceText(ITextUnit textUnit)
Gets the coded text of the first part of the source of a given text unit resource.static String
getSourceText(ITextUnit textUnit, boolean removeCodes)
Gets the coded text of the first part of a source part of a given text unit resource.static <A extends IAnnotation>
AgetTargetAnnotation(ITextUnit textUnit, LocaleId locId, Class<A> type)
Gets an annotation attached to the target part of a given text unit resource in a given locale.static String
getTargetText(ITextUnit textUnit, LocaleId locId)
Gets text of the first part of the target of a given text unit resource in the given locale.static String
getText(TextFragment textFragment)
Extracts text from the given text fragment.static String
getText(TextFragment textFragment, List<Integer> markerPositions)
Extracts text from the given text fragment.static boolean
hasExternalRefMarker(Code code)
static boolean
hasMergedCode(TextFragment tf)
static boolean
hasSegEndMarker(Code code)
static boolean
hasSegOrTpMarker(Code code)
static boolean
hasSegStartMarker(Code code)
static boolean
hasSource(ITextUnit textUnit)
Indicates if a given text unit resource is null, or its source part is null or empty.static boolean
hasTpEndMarker(Code code)
static boolean
hasTpStartMarker(Code code)
static boolean
isApproved(ITextUnit tu, LocaleId targetLocale)
static boolean
isEmpty(ITextUnit textUnit)
Indicates if a given text unit resource is null, or its source part is null or empty.static boolean
isEmpty(ITextUnit textUnit, boolean ignoreWS)
Indicates if a given text unit resource is null, or its source part is null or empty.static boolean
isEmpty(TextFragment textFragment)
Indicates if a given text fragment object is null, or the text it contains is null or empty.static boolean
isStandalone(ITextUnit tu)
static boolean
isWellformed(TextContainer tc)
static boolean
isWellformed(TextFragment tf)
static int
lastIndexOf(TextFragment textFragment, String findWhat)
Returns the index (within a given text fragment object) of the rightmost occurrence of the specified substring.static boolean
needsPreserveWhitespaces(ITextUnit tu)
static boolean
needsPreserveWhitespaces(TextContainer tc)
Detects if a given TextContainer contains whitespace characters to be preserved in XML.static String
printMarkerIndexes(TextFragment textFragment)
static String
printMarkers(TextFragment textFragment)
static String
removeAndReplaceCodes(String codedText, String isolatedCodeReplacement)
Removes the opening and closing codes and replaces the isolated codes in text with the specified string.static String
removeCodes(String codedText)
Removes all inline tags from a given coded text.static void
removeCodes(ITextUnit textUnit, boolean removeTargetCodes)
Removes all inline tags in the source (or optionally the target) text unit resource.static void
removeCodes(TextContainer tc)
Removes all inline tags from the givenTextContainer
static void
removeCodes(TextFragment tf)
Removes all inline tags from the givenTextFragment
static boolean
removeQualifiers(ITextUnit textUnit, String qualifier)
Removes from the source part of a given text unit resource qualifiers (quotation marks etc.) around text.static boolean
removeQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)
Removes from the source part of a given un-segmented text unit resource qualifiers (parenthesis, quotation marks etc.) around text.static void
renumberCodes(TextContainer tc)
static String
restoreSegmentation(TextContainer tc, TextFragment segStorage)
Restores original segmentation of a given text container from a given text fragment created with storeSegmentation().static void
setSourceAnnotation(ITextUnit textUnit, IAnnotation annotation)
Attaches an annotation to the source part of a given text unit resource.static void
setSourceText(ITextUnit textUnit, String text)
Sets the coded text of the un-segmented source of a given text unit resource.static void
setTargetAnnotation(ITextUnit textUnit, LocaleId locId, IAnnotation annotation)
Attaches an annotation to the target part of a given text unit resource in a given language.static void
setTargetText(ITextUnit textUnit, LocaleId locId, String text)
Sets the coded text of the the target part of a given text unit resource in a given language.static void
simplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in the source part of a given text unit resource.static void
simplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource.static TextFragment[]
simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in a given text container.static TextFragment[]
simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in a given text container.static TextFragment[]
simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in a given text fragment.static TextFragment[]
simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in a given text fragment.static void
simplifyCodesPostSegmentation(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource.static void
simplifyCodesPostSegmentation(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource.static TextFragment
storeSegmentation(TextContainer tc)
static String
testMarkers()
static String
toText(String text, List<Code> codes)
Returns representation of a given coded text with code data enclosed in brackets.static String
toText(TextFragment tf)
Returns the content of a given text fragment, including the original codes whenever possible.static void
trimLeading(TextFragment textFragment)
Removes leading whitespaces from a given text fragment.static void
trimLeading(TextFragment textFragment, GenericSkeleton skel)
Removes leading whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.static void
trimSegments(TextContainer tc)
static void
trimSegments(TextContainer tc, boolean trimLeading, boolean trimTrailing)
Trims segments of a given text container that contains leading or trailing whitespaces.static void
trimTrailing(TextFragment textFragment)
Removes trailing whitespaces from a given text fragment.static void
trimTrailing(TextFragment textFragment, GenericSkeleton skel)
Removes trailing whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.static void
trimTU(ITextUnit textUnit, boolean trimLeading, boolean trimTrailing)
Removes leading and/or trailing whitespaces from the source part of a given text unit resource.static void
unsegmentTU(ITextUnit tu)
-
-
-
Method Detail
-
trimLeading
public static void trimLeading(TextFragment textFragment)
Removes leading whitespaces from a given text fragment.- Parameters:
textFragment
- the text fragment which leading whitespaces are to be removed.
-
trimLeading
public static void trimLeading(TextFragment textFragment, GenericSkeleton skel)
Removes leading whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.- Parameters:
textFragment
- the text fragment which leading whitespaces are to be removed.skel
- the skeleton to put the removed whitespaces.
-
trimTrailing
public static void trimTrailing(TextFragment textFragment)
Removes trailing whitespaces from a given text fragment.- Parameters:
textFragment
- the text fragment which trailing whitespaces are to be removed.
-
trimTrailing
public static void trimTrailing(TextFragment textFragment, GenericSkeleton skel)
Removes trailing whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.- Parameters:
textFragment
- the text fragment which trailing whitespaces are to be removed.skel
- the skeleton to put the removed whitespaces.
-
endsWith
public static boolean endsWith(TextFragment textFragment, String substr)
Indicates if a given text fragment ends with a given sub-string. Trailing spaces are not counted.- Parameters:
textFragment
- the text fragment to examine.substr
- the text to lookup.- Returns:
- true if the given text fragment ends with the given sub-string.
-
isEmpty
public static boolean isEmpty(ITextUnit textUnit)
Indicates if a given text unit resource is null, or its source part is null or empty.- Parameters:
textUnit
- the text unit to check.- Returns:
- true if the given text unit resource is null, or its source part is null or empty.
-
hasSource
public static boolean hasSource(ITextUnit textUnit)
Indicates if a given text unit resource is null, or its source part is null or empty. Whitespaces are not taken into account, e.g. if the text unit contains only whitespaces, it's considered empty.- Parameters:
textUnit
- the text unit to check.- Returns:
- true if the given text unit resource is null, or its source part is null or empty.
-
isEmpty
public static boolean isEmpty(ITextUnit textUnit, boolean ignoreWS)
Indicates if a given text unit resource is null, or its source part is null or empty. Whitespaces are not taken into account, if ignoreWS = true, e.g. if the text unit contains only whitespaces, it's considered empty.- Parameters:
textUnit
- the text unit to check.ignoreWS
- if true and the text unit contains only whitespaces, then the text unit is considered empty.- Returns:
- true if the given text unit resource is null, or its source part is null or empty.
-
getSourceText
public static String getSourceText(ITextUnit textUnit)
Gets the coded text of the first part of the source of a given text unit resource.- Parameters:
textUnit
- the text unit resource which source text should be returned.- Returns:
- the source part of the given text unit resource.
-
getSourceText
public static String getSourceText(ITextUnit textUnit, boolean removeCodes)
Gets the coded text of the first part of a source part of a given text unit resource. If removeCodes = false, and the text contains inline codes, then the codes will be removed.- Parameters:
textUnit
- the text unit resource which source text should be returned.removeCodes
- true if possible inline codes should be removed.- Returns:
- the source part of the given text unit resource.
-
getTargetText
public static String getTargetText(ITextUnit textUnit, LocaleId locId)
Gets text of the first part of the target of a given text unit resource in the given locale.- Parameters:
textUnit
- the text unit resource which source text should be returned.locId
- the locale the target part being sought.- Returns:
- the target part of the given text unit resource in the given loacle, or an empty string if the text unit doesn't contain one.
-
getCodedText
public static String getCodedText(TextFragment textFragment)
Gets text of a given text fragment object possibly containing inline codes.- Parameters:
textFragment
- the given text fragment object.- Returns:
- the text of the given text fragment object possibly containing inline codes.
-
getText
public static String getText(TextFragment textFragment, List<Integer> markerPositions)
Extracts text from the given text fragment. Used to create a copy of the original string but without code markers. The original string is not stripped of code markers, and remains intact.- Parameters:
textFragment
- TextFragment object with possible codes insidemarkerPositions
- List to store initial positions of removed code markers. use null to not store the markers.- Returns:
- The copy of the string, contained in TextFragment, but without code markers
-
printMarkerIndexes
public static String printMarkerIndexes(TextFragment textFragment)
-
printMarkers
public static String printMarkers(TextFragment textFragment)
-
getText
public static String getText(TextFragment textFragment)
Extracts text from the given text fragment. Used to create a copy of the original string but without code markers. The original string is not stripped of code markers, and remains intact.- Parameters:
textFragment
- TextFragment object with possible codes inside- Returns:
- The copy of the string, contained in TextFragment, but w/o code markers
-
getLastChar
public static char getLastChar(TextFragment textFragment)
Gets the last character of a given text fragment.- Parameters:
textFragment
- the text fragment to examin.- Returns:
- the last character of the given text fragment, or '\0'.
-
deleteLastChar
public static void deleteLastChar(TextFragment textFragment)
Deletes the last non-whitespace and non-code character of a given text fragment.- Parameters:
textFragment
- the text fragment to examine.
-
lastIndexOf
public static int lastIndexOf(TextFragment textFragment, String findWhat)
Returns the index (within a given text fragment object) of the rightmost occurrence of the specified substring.- Parameters:
textFragment
- the text fragment to examine.findWhat
- the substring to search for.- Returns:
- if the string argument occurs one or more times as a substring within this object, then the index of the
first character of the last such substring is returned. If it does not occur as a substring,
-1
is returned.
-
isEmpty
public static boolean isEmpty(TextFragment textFragment)
Indicates if a given text fragment object is null, or the text it contains is null or empty.- Parameters:
textFragment
- the text fragment to examine.- Returns:
- true if the given text fragment object is null, or the text it contains is null or empty.
-
buildGenericTU
public static ITextUnit buildGenericTU(TextContainer source)
Creates a new generic text unit resource based on a given text container object becoming the source part of the text unit. WARNING: Not all filters useGenericSkeleton
. Use with caution.- Parameters:
source
- the given text container becoming the source part of the text unit.- Returns:
- a new text unit resource with the given text container object being its source part.
-
buildGenericTU
public static ITextUnit buildGenericTU(String source)
Creates a generic new text unit resource based a given string becoming the source text of the text unit. WARNING: Not all filters useGenericSkeleton
. Use with caution.- Parameters:
source
- the given string becoming the source text of the text unit.- Returns:
- a new text unit resource with the given string being its source text.
-
buildGenericTU
public static ITextUnit buildGenericTU(String srcPart, String skelPart)
Creates a generic new text unit resource based on a given string becoming the source text of the text unit, and a skeleton string, which gets appended to the new text unit's skeleton. WARNING: Not all filters useGenericSkeleton
. Use with caution.- Parameters:
srcPart
- the given string becoming the source text of the created text unit.skelPart
- the skeleton string appended to the new text unit's skeleton.- Returns:
- a new text unit resource with the given string being its source text, and the skeleton string in the skeleton.
-
buildGenericTU
public static ITextUnit buildGenericTU(ITextUnit textUnit, String name, TextContainer source, TextContainer target, LocaleId locId, String comment)
Creates a new generic text unit resource or updates the one passed as the parameter. You can use this method to create a new text unit or modify existing one (adding or modifying its fields' values). WARNING: Not all filters useGenericSkeleton
. Use with caution.- Parameters:
textUnit
- the text unit to be modified, or null to create a new text unit.name
- name of the new text unit, or a new name for the existing one.source
- the text container object becoming the source part of the text unit.target
- the text container object becoming the target part of the text unit.locId
- the locale of the target part (passed in the target parameter).comment
- the optional comment becoming a NOTE property of the text unit.- Returns:
- a reference to the original or newly created text unit.
-
forceSkeleton
public static GenericSkeleton forceSkeleton(ITextUnit tu)
Makes sure that a given text unit contains a skeleton. If there's no skeleton already attached to the text unit, a new skeleton object is created and attached to the text unit.- Parameters:
tu
- the given text unit to have a skeleton.- Returns:
- the skeleton of the text unit.
-
convertToSkeleton
public static GenericSkeleton convertToSkeleton(ITextUnit textUnit)
Copies source and target text of a given text unit into a newly created skeleton. The original text unit remains intact, and plays a role of a pattern for a newly created skeleton's contents.- Parameters:
textUnit
- the text unit to be copied into a skeleton.- Returns:
- the newly created skeleton, which contents reflect the given text unit.
-
getSourceAnnotation
public static <A extends IAnnotation> A getSourceAnnotation(ITextUnit textUnit, Class<A> type)
Gets an annotation attached to the source part of a given text unit resource.- Type Parameters:
A
- a class implementing IAnnotation- Parameters:
textUnit
- the given text unit resource.type
- reference to the requested annotation type.- Returns:
- the annotation or null if not found.
-
setSourceAnnotation
public static void setSourceAnnotation(ITextUnit textUnit, IAnnotation annotation)
Attaches an annotation to the source part of a given text unit resource.- Parameters:
textUnit
- the given text unit resource.annotation
- the annotation to be attached to the source part of the text unit.
-
getTargetAnnotation
public static <A extends IAnnotation> A getTargetAnnotation(ITextUnit textUnit, LocaleId locId, Class<A> type)
Gets an annotation attached to the target part of a given text unit resource in a given locale.- Type Parameters:
A
- a class implementing IAnnotation- Parameters:
textUnit
- the given text unit resource.locId
- the locale of the target part being sought.type
- reference to the requested annotation type.- Returns:
- the annotation or null if not found.
-
setTargetAnnotation
public static void setTargetAnnotation(ITextUnit textUnit, LocaleId locId, IAnnotation annotation)
Attaches an annotation to the target part of a given text unit resource in a given language.- Parameters:
textUnit
- the given text unit resource.locId
- the locale of the target part being attached to.annotation
- the annotation to be attached to the target part of the text unit.
-
setSourceText
public static void setSourceText(ITextUnit textUnit, String text)
Sets the coded text of the un-segmented source of a given text unit resource.- Parameters:
textUnit
- the given text unit resource.text
- the text to be set.
-
setTargetText
public static void setTargetText(ITextUnit textUnit, LocaleId locId, String text)
Sets the coded text of the the target part of a given text unit resource in a given language.- Parameters:
textUnit
- the given text unit resource.locId
- the locale of the target part being set.text
- the text to be set.
-
trimTU
public static void trimTU(ITextUnit textUnit, boolean trimLeading, boolean trimTrailing)
Removes leading and/or trailing whitespaces from the source part of a given text unit resource.- Parameters:
textUnit
- the given text unit resource.trimLeading
- true to remove leading whitespaces if there are any.trimTrailing
- true to remove trailing whitespaces if there are any.
-
addQualifiers
public static void addQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)
Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text. This method is useful when the starting and ending qualifiers are different.- Parameters:
textUnit
- the given text unit resourcestartQualifier
- the qualifier to be added before textendQualifier
- the qualifier to be added after text
-
addQualifiers
public static void addQualifiers(ITextUnit textUnit, String qualifier)
Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.- Parameters:
textUnit
- the given text unit resourcequalifier
- the qualifier to be added before and after text
-
removeQualifiers
public static boolean removeQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)
Removes from the source part of a given un-segmented text unit resource qualifiers (parenthesis, quotation marks etc.) around text. This method is useful when the starting and ending qualifiers are different.- Parameters:
textUnit
- the given text unit resource.startQualifier
- the qualifier to be removed before source text.endQualifier
- the qualifier to be removed after source text.- Returns:
- true if the qualifiers were found and removed
-
simplifyCodes
public static void simplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in the source part of a given text unit resource.- Parameters:
textUnit
- the given text unitrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
-
simplifyCodes
public static void simplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource.- Parameters:
textUnit
- the given text unitrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codesmergeCodes
- true to merge adjacent codes, false to leave as-is of the source part and place their text in the skeleton.
-
simplifyCodesPostSegmentation
public static void simplifyCodesPostSegmentation(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource. If the TextUnit has a target then skip simplification.- Parameters:
textUnit
- the given text unitrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the corresponding inter-segment TextPart.mergeCodes
- true to merge adjacent codes, false to leave as-is
-
simplifyCodesPostSegmentation
public static void simplifyCodesPostSegmentation(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource. If the TextUnit has a target then skip simplification.- Parameters:
tc
- the given text containerrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the corresponding inter-segment TextPart.mergeCodes
- true to merge adjacent codes, false to leave as-is
-
expandCodes
public static TextFragment expandCodes(TextFragment tf)
Expand codes that have been previously merged.- Parameters:
tf
- The originalTextFragment
with possibly merged codes.- Returns:
- new
TextFragment
with expanded codes or original if there are no codes or they have not been merged.
-
hasMergedCode
public static boolean hasMergedCode(TextFragment tf)
-
removeCodes
public static void removeCodes(ITextUnit textUnit, boolean removeTargetCodes)
Removes all inline tags in the source (or optionally the target) text unit resource.- Parameters:
textUnit
- the given text unitremoveTargetCodes
- - remove target codes?
-
removeCodes
public static void removeCodes(TextContainer tc)
Removes all inline tags from the givenTextContainer
- Parameters:
tc
- the given text container
-
removeCodes
public static void removeCodes(TextFragment tf)
Removes all inline tags from the givenTextFragment
- Parameters:
tf
- the given text fragment
-
removeCodes
public static String removeCodes(String codedText)
Removes all inline tags from a given coded text.- Parameters:
codedText
- the given coded text string- Returns:
- the string without code markers
-
removeAndReplaceCodes
public static String removeAndReplaceCodes(String codedText, String isolatedCodeReplacement)
Removes the opening and closing codes and replaces the isolated codes in text with the specified string.- Parameters:
codedText
- The given coded text stringisolatedCodeReplacement
- The isolated code replacement- Returns:
- The string without code markers
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in a given text fragment.- Parameters:
tf
- the given text fragmentrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in a given text fragment.- Parameters:
tf
- the given text fragmentrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.mergeCodes
- true to merge adjacent codes, false to leave as-is- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in a given text container.- Parameters:
tc
- the given text containerrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in a given text container.- Parameters:
tc
- the given text containerrules
- rules for the data-driven simplificationremoveLeadingTrailingCodes
- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.mergeCodes
- true to merge adjacent codes, false to leave as-is- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
removeQualifiers
public static boolean removeQualifiers(ITextUnit textUnit, String qualifier)
Removes from the source part of a given text unit resource qualifiers (quotation marks etc.) around text.- Parameters:
textUnit
- the given text unit resource.qualifier
- the qualifier to be removed before and after source text.- Returns:
- true if the qualifiers were found and removed
-
addAltTranslation
public static AltTranslationsAnnotation addAltTranslation(TextContainer targetContainer, AltTranslation alt)
Adds anAltTranslation
object to a givenTextContainer
. TheAltTranslationsAnnotation
annotation is created if it does not exist already.- Parameters:
targetContainer
- the container where to add the object.alt
- alternate translation to add.- Returns:
- the annotation where the object was added, it may be a new annotation or the one already associated with the container.
-
addAltTranslation
public static AltTranslationsAnnotation addAltTranslation(Segment seg, AltTranslation alt)
Adds anAltTranslation
object to a givenSegment
. TheAltTranslationsAnnotation
annotation is created if it does not exist already.- Parameters:
seg
- the segment where to add the object.alt
- alternate translation to add.- Returns:
- the annotation where the object was added, it may be a new annotation or the one already associated with the segment.
-
storeSegmentation
public static TextFragment storeSegmentation(TextContainer tc)
-
trimSegments
public static void trimSegments(TextContainer tc, boolean trimLeading, boolean trimTrailing)
Trims segments of a given text container that contains leading or trailing whitespaces. Removed whitespaces are placed in newly created whitespace-only text parts before and after a trimmed segment.- Parameters:
tc
- the given text containertrimLeading
- true to remove leading whitespaces of a segmenttrimTrailing
- true to remove trailing whitespaces of a segment
-
trimSegments
public static void trimSegments(TextContainer tc)
-
extractSegMarkers
public static TextFragment extractSegMarkers(TextFragment tf, TextFragment original, boolean removeFromOriginal)
Extracts segment and text part markers from a given string, creates codes (place-holder type) for those markers, and appends them to a given text fragment.- Parameters:
tf
- the given text fragment to append extracted codesoriginal
- the given stringremoveFromOriginal
- remove found markers from the given string- Returns:
- the given string if removeFromOriginal == false, or the modified original string with markers removed otherwise
-
hasSegOrTpMarker
public static boolean hasSegOrTpMarker(Code code)
-
hasSegStartMarker
public static boolean hasSegStartMarker(Code code)
-
hasSegEndMarker
public static boolean hasSegEndMarker(Code code)
-
hasTpStartMarker
public static boolean hasTpStartMarker(Code code)
-
hasTpEndMarker
public static boolean hasTpEndMarker(Code code)
-
hasExternalRefMarker
public static boolean hasExternalRefMarker(Code code)
-
restoreSegmentation
public static String restoreSegmentation(TextContainer tc, TextFragment segStorage)
Restores original segmentation of a given text container from a given text fragment created with storeSegmentation().- Parameters:
tc
- the given text containersegStorage
- the text fragment created with storeSegmentation() and containing the original segmentation info- Returns:
- a test string containing a sequence of markers created by the internal algorithm. Used for tests only.
-
testMarkers
public static String testMarkers()
-
toText
public static String toText(TextFragment tf)
Returns the content of a given text fragment, including the original codes whenever possible. Codes are decorated with '[' and ']' to tell them from regular text.- Parameters:
tf
- the given text fragment- Returns:
- the content of the given fragment
-
toText
public static String toText(String text, List<Code> codes)
Returns representation of a given coded text with code data enclosed in brackets.- Parameters:
text
- the given coded textcodes
- the given list of codes- Returns:
- content of the given coded text
-
convertTextPartsToCodes
public static void convertTextPartsToCodes(TextContainer tc)
Convert all TextParts (not Segments) in a given TextContainer to each contain a single code with the part's text. Needed to protect the text of text part (e.g. created from original codes) against being escaped by an encoder.- Parameters:
tc
- the given TextContainer
-
convertTextPartToCode
public static void convertTextPartToCode(TextPart textPart)
Create a single code with a given TextPart's text. Needed to protect the text of the text part from being escaped by an encoder. If the TextPart already has codes, no conversion is performed.- Parameters:
textPart
- the given TextPart
-
convertTextParts_whitespaceCodesToText
public static void convertTextParts_whitespaceCodesToText(TextContainer tc)
-
convertTextPart_whitespaceCodesToText
public static void convertTextPart_whitespaceCodesToText(TextPart textPart)
-
isStandalone
public static boolean isStandalone(ITextUnit tu)
-
renumberCodes
public static void renumberCodes(TextContainer tc)
-
needsPreserveWhitespaces
public static boolean needsPreserveWhitespaces(TextContainer tc)
Detects if a given TextContainer contains whitespace characters to be preserved in XML. Single space 0x20 doesn't need to be preserved, other whitespace characters, also a sequence of 2 or more single spaces do.- Parameters:
tc
- the given TextContainer object.- Returns:
- true if the given TextContainer has whitespace sequences that need to be preserved.
-
needsPreserveWhitespaces
public static boolean needsPreserveWhitespaces(ITextUnit tu)
-
isWellformed
public static boolean isWellformed(TextFragment tf)
-
isWellformed
public static boolean isWellformed(TextContainer tc)
-
unsegmentTU
public static void unsegmentTU(ITextUnit tu)
-
-