Class TextFragment
- java.lang.Object
-
- net.sf.okapi.common.resource.TextFragment
-
- All Implemented Interfaces:
Appendable
,CharSequence
,Comparable<TextFragment>
public class TextFragment extends Object implements Appendable, CharSequence, Comparable<TextFragment>
Implements the methods for creating and manipulating a pre-parsed flat representation of a content with in-line codes.The model uses two objects to store the data:
- a coded text string
- a list of
Code
object.
The coded text string is composed of normal characters and markers.
A marker is a sequence of two special characters (in the Unicode PUA) that indicate the type of underlying code (opening, closing, isolated), and an index pointing to its corresponding Code object where more information can be found. The value of the index is encoded as a Unicode PUA character. You can use the
toChar(int)
andtoIndex(char)
methods to encoded and decode the index value.To get the coded text of a TextFragment object use
getCodedText()
, and to get its list of codes usegetCodes()
.You can modify directly the coded text or the codes and re-apply them to the TextFragment object using
setCodedText(String)
andsetCodedText(String, List)
.Adding a code to the coded text can be done by:
- appending the code with
append(TagType, String, String)
- changing a section of existing text to code with
changeToCode(int, int, TagType, String)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TextFragment.CompareMode
Enum constants to specify how compareTo should work.static class
TextFragment.Marker
List of the marker types as anEnum
.static class
TextFragment.TagType
List of the types of tag usable for in-line codes.
-
Field Summary
Fields Modifier and Type Field Description static int
CHARBASE
Special value used as the base of inline code indices.protected List<Code>
codes
List of the inline codes for this fragment.protected boolean
isBalanced
Flag indicating if the opening/closing inline codes of this fragment have been balanced or not.protected int
lastCodeID
Value of the last inline code ID in this fragment.static int
MARKER_CLOSING
Special character marker for a closing inline code.static int
MARKER_ISOLATED
Special character marker for an isolated inline code.static int
MARKER_OPENING
Special character marker for a opening inline code.static Pattern
MARKERS_REGEX
static String
REFMARKER_END
Marker for end of reference.static String
REFMARKER_SEP
Marker for reference separator.static String
REFMARKER_START
Marker for start of reference.protected StringBuilder
text
Coded text buffer of this fragment.
-
Constructor Summary
Constructors Constructor Description TextFragment()
Creates an empty TextFragment.TextFragment(String text)
Creates a TextFragment with a given text.TextFragment(String text, int lastCodeId)
Creates a TextFragment with a given text and an initial id value for codes.TextFragment(String codedText, List<Code> codes)
Creates a TextFragment with the content made of a given coded text and a list of codes.TextFragment(TextFragment fragment)
Creates a TextFragment with the content of a given TextFragment.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
alignCodeIds(TextFragment base)
void
alignCodeIds(TextFragment base, CodeMatchStrategy strategy)
Aligns the code IDs of this fragment with the ones of a given fragment.int
annotate(int start, int end, String type, InlineAnnotation annotation)
Annotates a section of this text.TextFragment
append(char value)
Appends a character to the fragment.TextFragment
append(CharSequence csq)
Appends the specified character sequence to this fragment.TextFragment
append(CharSequence csq, int start, int end)
Appends a subsequence of the specified character sequence to this fragment.void
append(CharSequence text, Function<Code,Code> codeProcessor)
Appends a CharSequence.TextFragment
append(String text)
Appends a string to the fragment.void
append(String text, Function<Code,Code> codeProcessor)
Appends a string.TextFragment
append(Code code)
Appends an existing code to this fragment.TextFragment
append(TextFragment fragment)
Appends a TextFragment object to this fragment.Code
append(TextFragment.TagType tagType, String type, String data)
Appends a new code to the text.Code
append(TextFragment.TagType tagType, String type, String data, int id)
Appends a new code to the text, when the code has a defined identifier.Code
append(TextFragment.TagType tagType, String type, InlineAnnotation annotation)
Appends an annotation-type code to this text.TextFragment
append(TextFragment fragment, boolean keepCodeIds)
Appends a TextFragment object to this fragment.void
balanceMarkers()
Balances the markers based on the tag type of the codes.int
changeToCode(int start, int end, TextFragment.TagType tagType, String type)
Changes a section of the coded text into a single code.int
changeToCode(int start, int end, TextFragment.TagType tagType, String type, boolean setDisplayText)
Changes a section of the coded text into a single code.char
charAt(int index)
Returns the character at the specified index in the coded text of this fragment.TextFragment
cleanCodes()
Removes all codes both in the Codes list and the markers.void
cleanUnusedCodes()
Removes all codes that have no data and no annotation.void
clear()
Clears the fragment of all content.TextFragment
clone()
Clones this TextFragment.int
compareTo(TextFragment tf)
Compares an object with this TextFragment.int
compareTo(TextFragment frag, TextFragment.CompareMode compMode)
Compares with another TextFragment.boolean
equals(Object object)
int
findClosingCodePosition(int id, int indexOfOpening)
Finds the position in this coded text of the closing code for a given opening code.int
findOpeningCodePosition(int id, int indexOfClosing)
Finds the position in this coded text of the opening code for a give closing code.static int
fromFragmentToString(TextFragment frag, int pos)
Gets the position in the string representation of a fragment of a given position in that fragment.List<AnnotatedSpan>
getAnnotatedSpans(String type)
Gets the list of all spans of text annotated with a given type of annotation.List<Code>
getClonedCodes()
Gets a list of the copy of the codes for this fragment.Code
getCode(char indexAsChar)
Gets the code for a given index formatted as character (the second special character in a marker in a coded text string).Code
getCode(int index)
Gets the code for a given index.Code
getCode(Code fc)
Finds the first code with a given ID and tagType in this fragment, or null if there is no such code.String
getCodedText()
Gets the coded text representation of the fragment.String
getCodedText(int start, int end)
Gets the portion of coded text for a given section of the coded text.int
getCodePosition(int index)
List<Code>
getCodes()
Gets the list of all codes for the fragment.List<Code>
getCodes(int start, int end)
Gets a copy of the list of the codes that are within a given section of coded text.int
getIndex(int id)
Gets the index value for the first in-line code (in the codes list) with a given identifier.int
getIndexForClosing(int id)
Gets the index value for the closing in-line code (in the codes list) with a given identifier.int
getIndexForOpening(int id)
Gets the index value for the opening in-line code (in the codes list) with a given identifier.Code
getLastCode()
Return the last code appended to this fragment, or null if there are no codes.int
getLastCodeId()
Gets the last value used for code id.static Object[]
getRefMarker(StringBuilder text)
Helper method to retrieve a reference marker from a string.String
getText()
Get the text of the fragment (all codes are removed)static String
getText(String codedText)
Helper method that will take a coded string and return a text only version.boolean
hasAnnotation()
Indicates if this text has at least one annotation.boolean
hasAnnotation(String type)
Indicates if this text has at least one annotation of a given type.boolean
hasCode()
Indicates if the fragment contains at least one code.int
hashCode()
boolean
hasReference()
Indicates if this TextFragment contains any in-line code with a reference.boolean
hasText()
Indicates if this fragment contains at least one character other than a whitespace.boolean
hasText(boolean whiteSpacesAreText)
Indicates if this fragment contains at least one character (inline codes, segment markers, and annotation markers do not count as characters).static int
indexOfFirstNonWhitespace(String codedText, int fromIndex, int untilIndex, boolean openingMarkerIsWS, boolean closingMarkerIsWS, boolean isolatedMarkerIsWS, boolean whitespaceIsWS)
Helper method to find the first non-whitespace character of a coded text, starting at a given position and no farther than another given position.static int
indexOfLastNonWhitespace(String codedText, int fromIndex, int untilIndex, boolean openingMarkerIsWS, boolean closingMarkerIsWS, boolean isolatedMarkerIsWS, boolean whitespaceIsWS)
Helper method to find, from the back, the first non-whitespace character of a coded text, starting at a given position and no farther than another given position.void
insert(int offset, String str)
Inserts aString
object to this fragment.void
insert(int offset, Code code)
Inserts aCode
object to this fragment.void
insert(int offset, TextFragment fragment)
Inserts a TextFragment object to this fragment.void
insert(int offset, TextFragment fragment, boolean keepCodeIds)
Inserts a TextFragment object to this fragment.void
invalidate()
Sets the fragment in a state where it has to be re-balanced before being used for output.boolean
isEmpty()
Indicates if the fragment is empty (no text and no codes).static boolean
isMarker(char ch)
Helper method that checks if a given character is an inline code marker.int
length()
Returns the number of character in the coded text of this fragment.static String
makeRefMarker(String id)
Helper method to build a reference marker string from a given identifier.static String
makeRefMarker(String id, String propertyName)
Helper method to build a reference marker string from a given identifier and a property name.int
minimumIdValue()
Returns the smallest id valuevoid
remove(int start, int end)
Removes a section of the fragment (including its codes).void
removeAnnotations()
Removes all annotations in this text.void
removeAnnotations(String type)
Removes all annotations of a given type in this text.void
removeCode(Code code)
Remove theCode
from thios fragmentint
renumberCodes()
Renumbers the IDs of the codes in the fragment.int
renumberCodes(int idBase)
Re-assigns IDs of the codes in this fragment to be in a sequential order starting from a given base.int
renumberCodes(int idBase, boolean mindPosition)
Re-assigns IDs of the codes in this fragment to be in a sequential order starting from a given base.void
setCodedText(String newCodedText)
Sets the coded text of the fragment, using its the existing codes.void
setCodedText(String newCodedText, boolean allowCodeDeletion)
Sets the coded text of the fragment, using its the existing codes.void
setCodedText(String newCodedText, List<Code> newCodes)
Sets the coded text of the fragment and its corresponding codes.void
setCodedText(String newCodedText, List<Code> newCodes, boolean allowCodeDeletion)
Sets the coded text of the fragment and its corresponding codes.protected void
setCodes(List<Code> codes)
TextFragment
subSequence(int start, int end)
Gets a copy of a sub-sequence of this object.static char
toChar(int index)
Helper method to convert a marker index to its character value in the coded text string.static int
toIndex(char index)
Helper method to convert the index-coded-as-character part of a marker into its index value.String
toOuterText()
String
toString()
Gets the coded text for this fragment.String
toText()
Returns the content of this fragment, including the original codes whenever possible.static void
unwrap(TextFragment frag)
Unwraps the content of a TextFragment.-
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.CharSequence
chars, codePoints
-
-
-
-
Field Detail
-
MARKER_OPENING
public static final int MARKER_OPENING
Special character marker for a opening inline code.- See Also:
- Constant Field Values
-
MARKER_CLOSING
public static final int MARKER_CLOSING
Special character marker for a closing inline code.- See Also:
- Constant Field Values
-
MARKER_ISOLATED
public static final int MARKER_ISOLATED
Special character marker for an isolated inline code.- See Also:
- Constant Field Values
-
CHARBASE
public static final int CHARBASE
Special value used as the base of inline code indices.- See Also:
- Constant Field Values
-
REFMARKER_START
public static final String REFMARKER_START
Marker for start of reference.- See Also:
- Constant Field Values
-
REFMARKER_END
public static final String REFMARKER_END
Marker for end of reference.- See Also:
- Constant Field Values
-
REFMARKER_SEP
public static final String REFMARKER_SEP
Marker for reference separator.- See Also:
- Constant Field Values
-
MARKERS_REGEX
public static final Pattern MARKERS_REGEX
-
text
protected StringBuilder text
Coded text buffer of this fragment.
-
isBalanced
protected boolean isBalanced
Flag indicating if the opening/closing inline codes of this fragment have been balanced or not.
-
lastCodeID
protected int lastCodeID
Value of the last inline code ID in this fragment.
-
-
Constructor Detail
-
TextFragment
public TextFragment()
Creates an empty TextFragment.
-
TextFragment
public TextFragment(String text)
Creates a TextFragment with a given text.- Parameters:
text
- the text to use.
-
TextFragment
public TextFragment(String text, int lastCodeId)
Creates a TextFragment with a given text and an initial id value for codes. This constructor can be used to create fragments that will be appended to an existing one.- Parameters:
text
- the text to use.lastCodeId
- value to use to start the code id. The first new code will have for id this value+1. The value should be -1 or a positive number. Values below -1 will be automatically reset to -1.
-
TextFragment
public TextFragment(TextFragment fragment)
Creates a TextFragment with the content of a given TextFragment.- Parameters:
fragment
- the content to use.
-
-
Method Detail
-
toChar
public static char toChar(int index)
Helper method to convert a marker index to its character value in the coded text string.- Parameters:
index
- the index value to encode.- Returns:
- the corresponding character value.
-
toIndex
public static int toIndex(char index)
Helper method to convert the index-coded-as-character part of a marker into its index value.- Parameters:
index
- the character to decode.- Returns:
- the corresponding index value.
-
makeRefMarker
public static String makeRefMarker(String id)
Helper method to build a reference marker string from a given identifier.- Parameters:
id
- the identifier to use.- Returns:
- the reference marker constructed from the ID.
-
makeRefMarker
public static String makeRefMarker(String id, String propertyName)
Helper method to build a reference marker string from a given identifier and a property name. The identifier and the property "\" and "]" symbols are escaped with "\".- Parameters:
id
- The identifier to use.propertyName
- the name of the property to use.- Returns:
- the reference marker constructed from the identifier and the property name.
-
getRefMarker
public static Object[] getRefMarker(StringBuilder text)
Helper method to retrieve a reference marker from a string. The identifier and the property parts are unescaped.- Parameters:
text
- the text to search for a reference marker.- Returns:
- null if no reference marker has been found. An array of four objects
if a reference marker has been found:
- Object 0: The identifier of the reference.
- Object 1: The start position of the reference marker in the string.
- Object 2: The end position of the reference marker in the string.
- Object 3: The name of the property if there is one, null otherwise.
-
fromFragmentToString
public static int fromFragmentToString(TextFragment frag, int pos)
Gets the position in the string representation of a fragment of a given position in that fragment.For example if you find a match in a coded text string, use this method to convert the boundaries of the match into character position in the string representing the fragment (4 in "xxyyMATCHyyxx" -> 6 in "{b}{i}MATCH{/i}{/b}")
- Parameters:
frag
- the fragment where the position is located.pos
- the position.- Returns:
- the same position, but in the string representation of the fragment.
-
indexOfLastNonWhitespace
public static int indexOfLastNonWhitespace(String codedText, int fromIndex, int untilIndex, boolean openingMarkerIsWS, boolean closingMarkerIsWS, boolean isolatedMarkerIsWS, boolean whitespaceIsWS)
Helper method to find, from the back, the first non-whitespace character of a coded text, starting at a given position and no farther than another given position.- Parameters:
codedText
- the coded text to process.fromIndex
- the first position to check (must be greater or equal to untilIndex). Use -1 to point to the last position of the text.untilIndex
- The last position to check (must be lesser or equal to fromIndex).openingMarkerIsWS
- indicates if opening markers count as whitespace.closingMarkerIsWS
- indicates if closing markers count as whitespace.isolatedMarkerIsWS
- indicates if isolated markers count as whitespace.whitespaceIsWS
- indicates if whitespace characters count as whitespace.- Returns:
- the first non-whitespace character position from the back, given the parameters, or -1 if the text in null, empty or if no non-whitespace has been found after the character at the position untilIndex has been checked. If the last non-whitespace found is a code, the position returned is the index of the second special character marker for that code.
-
indexOfFirstNonWhitespace
public static int indexOfFirstNonWhitespace(String codedText, int fromIndex, int untilIndex, boolean openingMarkerIsWS, boolean closingMarkerIsWS, boolean isolatedMarkerIsWS, boolean whitespaceIsWS)
Helper method to find the first non-whitespace character of a coded text, starting at a given position and no farther than another given position.- Parameters:
codedText
- the coded text to process.fromIndex
- the first position to check (must be lesser or equal to untilIndex).untilIndex
- the last position to check (must be greater or equal to fromIndex). Use -1 to point to the last position of the text.openingMarkerIsWS
- indicates if opening markers count as whitespace.closingMarkerIsWS
- indicates if closing markers count as whitespace.isolatedMarkerIsWS
- indicates if isolated markers count as whitespace.whitespaceIsWS
- indicates if whitespace characters count as whitespace.- Returns:
- the first non-whitespace character position, given the parameters, or -1 if the text is null or empty, or no non-whitespace has been found after the character at the position untilIndex has been checked.
-
unwrap
public static void unwrap(TextFragment frag)
Unwraps the content of a TextFragment. All sequences of consecutive white spaces are replaced by a single space characters, and any white spaces at the head or the end of the text is trimmed out. White spaces here are: space, tab, CR and LF. Existing segments are not unwrapped.- Parameters:
frag
- the text fragment to unwrap.
-
isMarker
public static boolean isMarker(char ch)
Helper method that checks if a given character is an inline code marker.- Parameters:
ch
- the character to check.- Returns:
- true if the character is a code marker, false if it is not.
-
getText
public static String getText(String codedText)
Helper method that will take a coded string and return a text only version.- Parameters:
codedText
- string with possible TextFragment codes.- Returns:
- the given string stripped out of any codes.
-
clone
public TextFragment clone()
Clones this TextFragment.
-
hasReference
public boolean hasReference()
Indicates if this TextFragment contains any in-line code with a reference.- Returns:
- true if there is one or more in-line codes with a reference, false if there is no reference.
-
append
public TextFragment append(String text)
Appends a string to the fragment. If the string is null, it is ignored.- Parameters:
text
- the string to append.
-
append
public void append(String text, Function<Code,Code> codeProcessor)
Appends a string. If the string is null, it is ignored. If the string contains okapi markers (Unicode 0xE101, 0xE102 or 0xE103) they are replaced by a Code "masking" the markers (which will result in MARKER_OPENING (0xE101) being in the coded text).- Parameters:
text
- the string to append.codeProcessor
- when a Code is generated to mask an Okapi marker this function will be called on it and can modify or replace the generated code
-
append
public void append(CharSequence text, Function<Code,Code> codeProcessor)
Appends a CharSequence. If the string is null, it is ignored. If the sequence contains okapi markers (Unicode 0xE101, 0xE102 or 0xE103) they are replaced by a Code "masking" the markers (which will result in MARKER_OPENING (0xE101) being in the coded text).- Parameters:
text
- the string to append.codeProcessor
- when a Code is generated to mask an Okapi marker this function will be called on it and can modify or replace the generated code
-
append
public TextFragment append(TextFragment fragment)
Appends a TextFragment object to this fragment. If the fragment is null, it is ignored.- Parameters:
fragment
- the TextFragment to append.- Returns:
- this fragment.
-
append
public TextFragment append(TextFragment fragment, boolean keepCodeIds)
Appends a TextFragment object to this fragment. If the fragment is null, it is ignored.- Parameters:
fragment
- the TextFragment to append.keepCodeIds
- if true do not renumberCode.id
- Returns:
- this fragment.
-
append
public TextFragment append(Code code)
Appends an existing code to this fragment.- Parameters:
code
- the existing code to append.- Returns:
- a reference to this fragment
-
append
public Code append(TextFragment.TagType tagType, String type, InlineAnnotation annotation)
Appends an annotation-type code to this text.- Parameters:
tagType
- the tag type of the code (e.g. TagType.OPENING).type
- the type of the annotation (e.g. "protected").annotation
- the annotation to add (can be null).- Returns:
- the new code that was added to this text.
-
append
public Code append(TextFragment.TagType tagType, String type, String data)
Appends a new code to the text.- Parameters:
tagType
- the tag type of the code (e.g. TagType.OPENING).type
- the type of the code (e.g. "bold").data
- the raw code itself. (e.g. "<b>").- Returns:
- the new code that was added to the text.
-
append
public Code append(TextFragment.TagType tagType, String type, String data, int id)
Appends a new code to the text, when the code has a defined identifier.- Parameters:
tagType
- the tag type of the code (e.g. TagType.OPENING).type
- the type of the code (e.g. "bold").data
- the raw code itself. (e.g. "<b>").id
- the identifier to use for this code.- Returns:
- the new code that was added to the text.
-
insert
public void insert(int offset, String str)
Inserts aString
object to this fragment.- Parameters:
offset
- position in the coded text where to insert the new String. You can use -1 to append at the end of the current content.str
- String to insert.- Throws:
InvalidPositionException
- when offset points inside a marker.
-
insert
public void insert(int offset, Code code)
Inserts aCode
object to this fragment.- Parameters:
offset
- position in the coded text where to insert the new Code. You can use -1 to append at the end of the current content.code
-Code
to insert.- Throws:
InvalidPositionException
- when offset points inside a marker.
-
insert
public void insert(int offset, TextFragment fragment)
Inserts a TextFragment object to this fragment.- Parameters:
offset
- position in the coded text where to insert the new fragment. You can use -1 to append at the end of the current content.fragment
- the TextFragment to insert.- Throws:
InvalidPositionException
- when offset points inside a marker.
-
insert
public void insert(int offset, TextFragment fragment, boolean keepCodeIds)
Inserts a TextFragment object to this fragment.- Parameters:
offset
- position in the coded text where to insert the new fragment. You can use -1 to append at the end of the current content.fragment
- the TextFragment to insert.keepCodeIds
- true to not change Ids of the codes of the inserted TextFragment.
-
clear
public void clear()
Clears the fragment of all content. The parent is not modified.
-
getText
public String getText()
Get the text of the fragment (all codes are removed)- Returns:
- the content of fragment without codes
-
getCodedText
public String getCodedText()
Gets the coded text representation of the fragment.- Returns:
- the coded text for the fragment.
-
setCodedText
public void setCodedText(String newCodedText)
Sets the coded text of the fragment, using its the existing codes. The coded text must be valid for the existing codes.- Parameters:
newCodedText
- the coded text to apply.- Throws:
InvalidContentException
- when the coded text is not valid, or does not correspond to the existing codes.
-
getCodedText
public String getCodedText(int start, int end)
Gets the portion of coded text for a given section of the coded text.- Parameters:
start
- the position of the first character or marker of the section (in the coded text representation).end
- The position just after the last character or marker of the section (in the coded text representation). You can use -1 for ending the section at the end of the fragment.- Returns:
- the portion of coded text for the given range. It can be empty but never null.
- Throws:
InvalidPositionException
- when start or end points inside a marker.
-
getCode
public Code getCode(char indexAsChar)
Gets the code for a given index formatted as character (the second special character in a marker in a coded text string).- Parameters:
indexAsChar
- the index value coded as character.- Returns:
- the corresponding code.
-
getCode
public Code getCode(int index)
Gets the code for a given index.- Parameters:
index
- the index of the code.- Returns:
- the code for the given index.
-
getCodes
public List<Code> getCodes()
Gets the list of all codes for the fragment.- Returns:
- the list of all codes for the fragment. If there is no code, an empty list is returned.
-
getClonedCodes
public List<Code> getClonedCodes()
Gets a list of the copy of the codes for this fragment.- Returns:
- the list of the copy of the codes for this fragment. If there is no code, an empty list is returned.
-
getCodes
public List<Code> getCodes(int start, int end)
Gets a copy of the list of the codes that are within a given section of coded text.- Parameters:
start
- the position of the first character or marker of the section (in the coded text representation).end
- the position just after the last character or marker of the section (in the coded text representation).- Returns:
- a new list of all codes within the given range.
- Throws:
InvalidPositionException
- when start or end points inside a marker.
-
getIndex
public int getIndex(int id)
Gets the index value for the first in-line code (in the codes list) with a given identifier.- Parameters:
id
- the identifier to look for.- Returns:
- the index of the found code, or -1 if none is found.
-
getIndexForOpening
public int getIndexForOpening(int id)
Gets the index value for the opening in-line code (in the codes list) with a given identifier.- Parameters:
id
- the identifier of the opening tag to look for.- Returns:
- the index of the found opening code, or -1 if none is found.
-
getIndexForClosing
public int getIndexForClosing(int id)
Gets the index value for the closing in-line code (in the codes list) with a given identifier.- Parameters:
id
- the identifier of the closing tag to look for.- Returns:
- the index of the found closing code, or -1 if none is found.
-
isEmpty
public boolean isEmpty()
Indicates if the fragment is empty (no text and no codes).- Returns:
- true if the fragment is empty.
-
hasText
public boolean hasText()
Indicates if this fragment contains at least one character other than a whitespace. (inline codes and other markers do not count as characters).- Returns:
- true if this fragment contains at least one character, excluding whitespace.
-
hasText
public boolean hasText(boolean whiteSpacesAreText)
Indicates if this fragment contains at least one character (inline codes, segment markers, and annotation markers do not count as characters).- Parameters:
whiteSpacesAreText
- indicates if whitespaces should be considered characters or not for the purpose of checking if this fragment is empty.- Returns:
- true if this fragment contains at least one character (that character could be a whitespace if whiteSpacesAreText is set to true).
-
hasCode
public boolean hasCode()
Indicates if the fragment contains at least one code.- Returns:
- true if the fragment contains at least one code.
-
remove
public void remove(int start, int end)
Removes a section of the fragment (including its codes).- Parameters:
start
- the position of the first character or marker of the section (in the coded text representation).end
- the position just after the last character or marker of the section (in the coded text representation). You can use -1 to indicate the end of the fragment.- Throws:
InvalidPositionException
- when start or end points inside a marker.
-
subSequence
public TextFragment subSequence(int start, int end)
Gets a copy of a sub-sequence of this object.- Specified by:
subSequence
in interfaceCharSequence
- Parameters:
start
- the position of the first character or marker of the section (in the coded text representation).end
- the position just after the last character or marker of the section (in the coded text representation). You can use -1 for ending the section at the end of the fragment.- Returns:
- a new TextFragment object with a copy of the given sub-sequence.
-
setCodedText
public void setCodedText(String newCodedText, boolean allowCodeDeletion)
Sets the coded text of the fragment, using its the existing codes. The coded text must be valid for the existing codes.- Parameters:
newCodedText
- The coded text to apply.allowCodeDeletion
- True when missing in-line codes in the coded text means the corresponding codes should be deleted from the fragment.- Throws:
InvalidContentException
- When the coded text is not valid, or does not correspond to the existing codes.
-
setCodedText
public void setCodedText(String newCodedText, List<Code> newCodes)
Sets the coded text of the fragment and its corresponding codes.- Parameters:
newCodedText
- the coded text to apply.newCodes
- the list of the corresponding codes.- Throws:
InvalidContentException
- when the coded text is not valid or does not correspond to the new codes.
-
setCodedText
public void setCodedText(String newCodedText, List<Code> newCodes, boolean allowCodeDeletion)
Sets the coded text of the fragment and its corresponding codes.- Parameters:
newCodedText
- the coded text to apply.newCodes
- the list of the corresponding codes.allowCodeDeletion
- True when missing in-line codes in the coded text means the corresponding codes should be deleted from the fragment.- Throws:
InvalidContentException
- when the coded text is not valid or does not correspond to the new codes.
-
toString
public String toString()
Gets the coded text for this fragment. This method returns the same data asgetCodedText()
.Each code is represented by a placeholder made of two special characters. To get the content with the codes expanded as their original data use
toText()
.- Specified by:
toString
in interfaceCharSequence
- Overrides:
toString
in classObject
- Returns:
- the coded text for this fragment.
-
toText
public String toText()
Returns the content of this fragment, including the original codes whenever possible. To get the coded text for this fragment usegetCodedText()
ortoString()
.- Returns:
- the content of this fragment.
-
toOuterText
public String toOuterText()
-
compareTo
public int compareTo(TextFragment tf)
Compares an object with this TextFragment. If the object is also a TextFragment, the method returns the same results ascompareTo(fragment, CompareMode.IGNORE_CODE)
Note that inline codes are not compared with this method but the markers and code indices embedded in the coded text are considered.- Specified by:
compareTo
in interfaceComparable<TextFragment>
- Parameters:
tf
- the object to compare with this TextFragment.- Returns:
- a value 0 if the objects are equals.
-
compareTo
public int compareTo(TextFragment frag, TextFragment.CompareMode compMode)
Compares with another TextFragment. This first compares the text member of this and the other TextFragment and returns the result if they aren't equal.
If the text members are equal, one of these actions is taken depending on compMode:- IGNORE_CODE: 0 is returned
- CODE_DATA_ONLY: The data member of the Code in the codes array is concatenated for each TextFragment, and string comparison result is returned.
- CODE_ALL: The codes array is processed by Codes.codesToString() for each TextFragment, and the result is returned.
Caveat #1:
The current implementation assumes that code indexes are in the normal ascending order in the coded text. For example, if
tf1.text="ABC", tf1.codes={{tagType:OPENING,id:1,data:"<em>"}, {tagType:CLOSING,id:1,data:"</em>"}}
and
tf2.text="ABC", tf2.codes={{tagType:CLOSING,id:1,data:"</em>"}, {tagType:OPENING,id:1,data:"<em>"}}
tf1.equals(tf2) returns false in all comparison modes, although they are semantically equal.- Parameters:
frag
-compMode
-- Returns:
-
changeToCode
public int changeToCode(int start, int end, TextFragment.TagType tagType, String type)
Changes a section of the coded text into a single code. Any code already existing that is within the range will become part of the new code.- Parameters:
start
- The position of the first character or marker of the section (in the coded text representation).end
- the position just after the last character or marker of the section (in the coded text representation).tagType
- the tag type of the new code.type
- the type of the new code.- Returns:
- the difference between the coded text length before and after the operation. This value can be used to adjust further start and end positions that have been calculated on the coded text before the changes are applied.
- Throws:
InvalidPositionException
- when start or end points inside a marker.
-
changeToCode
public int changeToCode(int start, int end, TextFragment.TagType tagType, String type, boolean setDisplayText)
Changes a section of the coded text into a single code. Any code already existing that is within the range will become part of the new code.- Parameters:
start
- The position of the first character or marker of the section (in the coded text representation).end
- the position just after the last character or marker of the section (in the coded text representation).tagType
- the tag type of the new code.type
- the type of the new code.setDisplayText
- if true set the subsequence (sub) as the displayText of the code- Returns:
- the difference between the coded text length before and after the operation. This value can be used to adjust further start and end positions that have been calculated on the coded text before the changes are applied.
- Throws:
InvalidPositionException
- when start or end points inside a marker.
-
findClosingCodePosition
public int findClosingCodePosition(int id, int indexOfOpening)
Finds the position in this coded text of the closing code for a given opening code.- Parameters:
id
- identifier of the opening code.indexOfOpening
- index of the opening code.- Returns:
- the position in this text of the closing code for the given opening code, or -1 if it could not be found.
-
findOpeningCodePosition
public int findOpeningCodePosition(int id, int indexOfClosing)
Finds the position in this coded text of the opening code for a give closing code.- Parameters:
id
- identifier of the opening code.indexOfClosing
- index of the opening code.- Returns:
- the position in this text of the closing code for the given opening code, or -1 if it could not be found.
-
annotate
public int annotate(int start, int end, String type, InlineAnnotation annotation)
Annotates a section of this text.- Parameters:
start
- the position of the first character or marker of the section to annotate (in the coded text representation).end
- the position just after the last character or marker of the section to annotate (in the coded text representation).type
- the type of annotation to set.annotation
- the annotation to set (can be null).- Returns:
- the difference between the coded text length before and after the operation. This value can be used to adjust further start and end positions that have been calculated on the coded text before the changes are applied.
- Throws:
InvalidPositionException
- when start or end points inside a marker.
-
removeAnnotations
public void removeAnnotations()
Removes all annotations in this text. This also removes any code that is or was there only for holding an annotation.
-
removeAnnotations
public void removeAnnotations(String type)
Removes all annotations of a given type in this text. This also removes any code that is there only for holding an annotation of the given type, or any code that has no annotation and no data either.- Parameters:
type
- the type of annotation to remove.
-
hasAnnotation
public boolean hasAnnotation()
Indicates if this text has at least one annotation.- Returns:
- true if there is at least one annotation, false otherwise.
-
hasAnnotation
public boolean hasAnnotation(String type)
Indicates if this text has at least one annotation of a given type.- Parameters:
type
- the type of annotation to look for.- Returns:
- true if there is at least one annotation of the given type, false otherwise.
-
cleanUnusedCodes
public void cleanUnusedCodes()
Removes all codes that have no data and no annotation.
-
cleanCodes
public TextFragment cleanCodes()
Removes all codes both in the Codes list and the markers.- Returns:
- this
TextFragment
, with the codes removed
-
getCodePosition
public int getCodePosition(int index)
-
getAnnotatedSpans
public List<AnnotatedSpan> getAnnotatedSpans(String type)
Gets the list of all spans of text annotated with a given type of annotation.- Parameters:
type
- the type of annotation to look for.- Returns:
- a list of annotated spans for the given type (it may be empty).
-
renumberCodes
public int renumberCodes()
Renumbers the IDs of the codes in the fragment.- Returns:
- The last value used for code ID or 0 if this fragment has no codes.
-
renumberCodes
public int renumberCodes(int idBase)
Re-assigns IDs of the codes in this fragment to be in a sequential order starting from a given base.- Parameters:
idBase
- The base from which code IDs start numbering.- Returns:
- The last value used for code ID or idBase-1 if this fragment has no codes.
-
renumberCodes
public int renumberCodes(int idBase, boolean mindPosition)
Re-assigns IDs of the codes in this fragment to be in a sequential order starting from a given base.- Parameters:
idBase
- The base from which code IDs start numbering.mindPosition
- If true, the codes with lesser positions in this text fragment will have lesser IDs. If false, the codes with lesser original IDs will be assigned lesser IDs.- Returns:
- The last value used for code ID or idBase-1 if this fragment has no codes.
-
removeCode
public void removeCode(Code code)
Remove theCode
from thios fragment- Parameters:
code
- - theCode
to remove
-
balanceMarkers
public void balanceMarkers()
Balances the markers based on the tag type of the codes. Closing codes can have -1 as their ID, they will get the Id of their matching opening, or a new ID if they are isolated. Closing codes with and existing id that found themselves isolated keep the same id. This method also resets the last code id value to the highest code id found. The method does nothing if the TextFragment is already balanced. To force it run its logic to a TextFragment which is already balanced, callinvalidate()
prior to calling this method.
-
alignCodeIds
public void alignCodeIds(TextFragment base, CodeMatchStrategy strategy)
Aligns the code IDs of this fragment with the ones of a given fragment. This method re-assigns the IDs of the in-line codes of this fragment based on the code data of the provided fragment. If there is a code with the same data, then prefer the first code as this is the matching target code in the majority of cases. An example of usage is when source and target fragments have codes generated from regular expressions and not in the same order. For example if the source is%d equals %s
and the target is%s equals %d
and%s
and%d
are codes. You want their IDs to match for the code with the same content.- Parameters:
base
- the fragment to use as the base for the synchronization.
-
alignCodeIds
public void alignCodeIds(TextFragment base)
-
append
public TextFragment append(char value)
Appends a character to the fragment.- Specified by:
append
in interfaceAppendable
- Parameters:
value
- the character to append.- Returns:
- a reference to this fragment.
-
append
public TextFragment append(CharSequence csq)
Appends the specified character sequence to this fragment.- Specified by:
append
in interfaceAppendable
- Parameters:
csq
- the character sequence to append. If the parameter is null, the string "null" is appended.- Returns:
- a reference to this fragment.
-
append
public TextFragment append(CharSequence csq, int start, int end)
Appends a subsequence of the specified character sequence to this fragment.- Specified by:
append
in interfaceAppendable
- Parameters:
csq
- the character sequence to append. If csq is null, then characters will be appended as if csq contained the string "null".start
- the index of the first character in the subsequence.end
- the index of the character following the last character in the subsequence.- Returns:
- a reference to this fragment.
-
charAt
public char charAt(int index)
Returns the character at the specified index in the coded text of this fragment. Each code in the coded text string take 2 characters, regardless of the size of the code.For example: If the fragment is "A[xy]B" and "[xy]" is a code, charAt(3) returns 'B' not 'x'.
If the specified index falls on a code placeholder, the character returned is either a marker (first character of the placeholder) or a special index to access the underlying code (second character of the placeholder). Markers can be identified using
isMarker(char)
.- Specified by:
charAt
in interfaceCharSequence
- Parameters:
index
- the index of the character to be returned.- Returns:
- the specified character.
- Throws:
IndexOutOfBoundsException
- if the if the index argument is negative or not less than the length of the coded text.- See Also:
isMarker(char)
-
length
public int length()
Returns the number of character in the coded text of this fragment.This is not the length of the content with all its codes. In the coded text, each code is represented by a placeholder made of two characters regardless of the size of the code. For example: If the fragment is "A[xy]B" and "[xy]" is a code, length() returns 4, not 6.
To get the length of the content including codes use
. Note that codes with referenced are not expanded bytoText()
.length()toText()
.- Specified by:
length
in interfaceCharSequence
- Returns:
- the number of character in the coded text of this fragment.
-
invalidate
public void invalidate()
Sets the fragment in a state where it has to be re-balanced before being used for output. This method is not harmful, but should preferably be used only when adding unbalanced paired codes.
-
getLastCodeId
public int getLastCodeId()
Gets the last value used for code id.- Returns:
- the last value used for code id.
-
getLastCode
public Code getLastCode()
Return the last code appended to this fragment, or null if there are no codes.- Returns:
- code, or null
-
getCode
public Code getCode(Code fc)
Finds the first code with a given ID and tagType in this fragment, or null if there is no such code.- Parameters:
fc
- theCode
to look for.- Returns:
- code, or null
-
minimumIdValue
public int minimumIdValue()
Returns the smallest id value- Returns:
- the id with the smallest value or 0 if there are no codes
-
-