Class TextContainer
- java.lang.Object
-
- net.sf.okapi.common.resource.BaseNameable
-
- net.sf.okapi.common.resource.TextContainer
-
- All Implemented Interfaces:
Cloneable,Iterable<TextPart>,IResource,INameable,IWithAnnotations,IWithProperties,IWithSkeleton
public class TextContainer extends BaseNameable implements Iterable<TextPart>
Provides methods for storing the content of a paragraph-type unit, to handle its properties, annotations and segmentation.The TextContainer is made of a collection of parts: Some are simple
TextPartobjects, others are specialTextPartobjects calledSegment.A TextContainer has always at least one
Segmentpart.
-
-
Field Summary
-
Fields inherited from class net.sf.okapi.common.resource.BaseNameable
id, isTranslatable, mimeType, name, preserveWS, type
-
Fields inherited from interface net.sf.okapi.common.IResource
COPY_ALL, COPY_CONTENT, COPY_PROPERTIES, COPY_SEGMENTATION, COPY_SEGMENTED_CONTENT, CREATE_EMPTY
-
-
Constructor Summary
Constructors Constructor Description TextContainer()Creates a new empty TextContainer object.TextContainer(String text)Creates a new TextContainer object with some initial text.TextContainer(Segment segment)Creates a new TextContainer object with an initial segment.TextContainer(TextFragment fragment)Creates a new TextContainer object with an initial TextFragment.TextContainer(TextPart... parts)Creates a new TextContainer object with initialTextParts (segment or non-segment) appended.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidappend(String text)Appends a part with a given text at the end of this container.voidappend(String text, boolean collapseIfPreviousEmpty)Appends a part with a given text at the end of this container.voidappend(TextFragment fragment)Appends a part at the end of this container.voidappend(TextFragment fragment, boolean collapseIfPreviousEmpty)Appends a part at the end of this container.voidappend(TextFragment fragment, boolean collapseIfPreviousEmpty, boolean keepCodeIds)Appends a part at the end of this container.voidappend(TextPart part)Appends aTextPart(segment or non-segment) at the end of this container.voidappend(TextPart part, boolean collapseIfPreviousEmpty)Appends aTextPart(segment or non-segment) at the end of this container.voidchangePart(int partIndex)Changes the type of a given part.voidclear()Clears this TextContainer, removes any existing segments.TextContainerclone()Clones this TextContainer, including the properties.TextContainerclone(boolean cloneProperties)Clones this container, with or without its properties.intcompareTo(TextContainer cont, TextFragment.CompareMode compareMode)Compares this container with another one.booleancontentIsOneSegment()Indicates if this container is made of a single segment that holds the whole content (i.e.static String[]contentToSplitStorage(TextContainer tc)Create two storage strings to serialize a givenTextContainer.static StringcontentToString(TextContainer tc)Creates a string that stores the content of a given container.intcount()Gets the number of parts (segments and non-segments) in this container.TextFragmentcreateJoinedContent()TextFragmentcreateJoinedContent(boolean keepCodeIds)TextPartget(int index)Gets the part (segment or non-segment) for a given part index.StringgetCodedText()Gets the coded text of the whole content (segmented or not).StringgetCodedText(boolean keepCodeIds)Gets the coded text of the whole content (segmented or not).TextFragmentgetFirstContent()Gets the content of the first part (segment or non-segment) of this container.SegmentgetFirstSegment()Returns the firstSegmentof this container.TextFragmentgetLastContent()Gets the content of the last part (segment or non-segment) of this container.List<TextPart>getParts()ISegmentsgetSegments()Creates a newISegmentsobject to access the segments of this container.TextFragmentgetUnSegmentedContentCopy()Gets a new TextFragment representing the un-segmented content of this container.TextFragmentgetUnSegmentedContentCopy(boolean keepCodeIds)Gets a new TextFragment representing the un-segmented content of this container.booleanhasBeenSegmented()Indicates if a segmentation has been applied to this container.booleanhasCode()Indicates if this container hasCodes.booleanhasText()Indicates if this fragment contains at least one character that is 'text' (inline codes, segment markers, and annotation markers do not count as 'text' characters).booleanhasText(boolean whiteSpacesAreText)Indicates if this container contains at least one character that is not a whitespace.booleanhasText(boolean lookInSegments, boolean whiteSpacesAreText)Indicates if this container contains at least one character.voidinsert(int partIndex, TextPart part)Inserts a given part (segment or non-segment) at a given position.booleanisEmpty()Indicates if this container is empty (no text and no codes).Iterator<TextPart>iterator()Creates an iterator to loop through the parts (segments and non-segments) of this container.voidjoinAll()Merges back together all parts (segments and non-segments) of this container, and clear the list of segments.intjoinWithNext(int partIndex, int partCount)Joins a given part with a specified number of its following parts.voidremove(int partIndex)Removes the part at s given position.voidsetContent(TextFragment content)Sets the content of this TextContainer.TextContainersetContentFromString(String data)Sets content of this TextContainer from a string created bycontentToString(TextContainer).voidsetHasBeenSegmentedFlag(boolean hasBeenSegmented)Sets the flag indicating if the content of this container has been segmented.voidsetParts(TextPart... parts)voidsplit(int partIndex, int start, int end, boolean spannedPartIsSegment)Splits a given part into two or three parts.static TextContainersplitStorageToContent(String ctext, String codes)Creates a newTextContainerobject from two strings generated withcontentToSplitStorage(TextContainer).static TextContainerstringToContent(String data)Converts a string created bycontentToString(TextContainer)back into a TextContainer.StringtoString()Gets the string representation of this container.voidunwrap(boolean trimEnds, boolean collapseMode)Unwraps the content of this container.-
Methods inherited from class net.sf.okapi.common.resource.BaseNameable
getAnnotation, getAnnotations, getId, getMimeType, getName, getProperties, getProperty, getPropertyNames, getSkeleton, getType, hasProperty, isTranslatable, preserveWhitespaces, removeProperty, setAnnotation, setId, setIsTranslatable, setMimeType, setName, setPreserveWhitespaces, setProperty, setSkeleton, setType
-
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Methods inherited from interface net.sf.okapi.common.resource.IWithAnnotations
annotationIterator, getAnnotationsTypesAsSet, hasAnnotation, hasAnnotations, remove
-
Methods inherited from interface net.sf.okapi.common.resource.IWithProperties
propertyIterator
-
-
-
-
Constructor Detail
-
TextContainer
public TextContainer()
Creates a new empty TextContainer object.
-
TextContainer
public TextContainer(String text)
Creates a new TextContainer object with some initial text.- Parameters:
text- the initial text.
-
TextContainer
public TextContainer(TextFragment fragment)
Creates a new TextContainer object with an initial TextFragment.- Parameters:
fragment- the initial TextFragment.
-
TextContainer
public TextContainer(TextPart... parts)
Creates a new TextContainer object with initialTextParts (segment or non-segment) appended.- Parameters:
parts- the given initial parts.
-
TextContainer
public TextContainer(Segment segment)
Creates a new TextContainer object with an initial segment. If the id of the segment is null it will be set automatically.- Parameters:
segment- the initial segment.
-
-
Method Detail
-
getSegments
public ISegments getSegments()
Creates a newISegmentsobject to access the segments of this container.- Returns:
- a new
ISegmentsobject.
-
contentToString
public static String contentToString(TextContainer tc)
Creates a string that stores the content of a given container. UsestringToContent(String)to create the container back from the string.IMPORTANT: Only the content is saved (not the properties, annotations, etc.).
- Parameters:
tc- the container holding the content to store.- Returns:
- a string representing the content of the given container.
-
stringToContent
public static TextContainer stringToContent(String data)
Converts a string created bycontentToString(TextContainer)back into a TextContainer.- Parameters:
data- the string to process.- Returns:
- a new TextConatiner with the stored content re-created.
-
setContentFromString
public TextContainer setContentFromString(String data)
Sets content of this TextContainer from a string created bycontentToString(TextContainer).- Parameters:
data- the string to process.- Returns:
- this TextConatiner.
-
contentToSplitStorage
public static String[] contentToSplitStorage(TextContainer tc)
Create two storage strings to serialize a givenTextContainer. UsesplitStorageToContent(String, String)to create the container back from the strings.IMPORTANT: Only the content is saved (not the properties, annotations, etc.).
- Parameters:
tc- the text container to store.- Returns:
- An array of two
Stringobjects: The first one contains the coded text parts, the second one contains the codes. - See Also:
splitStorageToContent(String, String)
-
splitStorageToContent
public static TextContainer splitStorageToContent(String ctext, String codes)
Creates a newTextContainerobject from two strings generated withcontentToSplitStorage(TextContainer).- Parameters:
ctext- the string holding the coded text parts.codes- the string holding the codes.- Returns:
- a new
TextContainerobject created from the strings. - See Also:
contentToSplitStorage(TextContainer)
-
toString
public String toString()
Gets the string representation of this container. If the container is segmented, the representation shows the merged segments. Inline codes are also included.- Overrides:
toStringin classBaseNameable- Returns:
- the string representation of this container.
-
iterator
public Iterator<TextPart> iterator()
Creates an iterator to loop through the parts (segments and non-segments) of this container.
-
compareTo
public int compareTo(TextContainer cont, TextFragment.CompareMode compareMode)
Compares this container with another one. Note: This is a costly operation if the two containers have segments and no text differences.- Parameters:
cont- the other container to compare this one with.compareMode-TextFragment.CompareMode- Returns:
- a value 0 if the objects are equals.
-
hasBeenSegmented
public boolean hasBeenSegmented()
Indicates if a segmentation has been applied to this container. Note that it does not mean there is more than one segment or one part. UsecontentIsOneSegment()to check if the container counts only one segment (whether is is the result of a segmentation or simply the default single segment).This method return true if any method that may cause the content to be segmented has been called, and no operation has resulted in un-segmenting the content since that call, or if the content has more than one part.
- Returns:
- true if a segmentation has been applied to this container.
- See Also:
setHasBeenSegmentedFlag(boolean)
-
setHasBeenSegmentedFlag
public void setHasBeenSegmentedFlag(boolean hasBeenSegmented)
Sets the flag indicating if the content of this container has been segmented.- Parameters:
hasBeenSegmented- true to flag the content has having been segmented, false to set it has not having been segmented.- See Also:
hasBeenSegmented()
-
contentIsOneSegment
public boolean contentIsOneSegment()
Indicates if this container is made of a single segment that holds the whole content (i.e. there is no other parts).When this method returns true, the methods
getFirstContent(),ISegments.getFirstContent(),getLastContent()andISegments.getLastContent()return the same result.- Returns:
- true if the whole content of this container is in a single segment.
- See Also:
count(),ISegments.count()
-
changePart
public void changePart(int partIndex)
Changes the type of a given part. If the part was a segment this makes it a non-segment (except if this is the only part in the content. In that case the part remains unchanged). If this part was not a segment this makes it a segment (with its identifier automatically set).- Parameters:
partIndex- the index of the part to change. Note that even if the part is a segment this index must be the part index not the segment index.
-
insert
public void insert(int partIndex, TextPart part)Inserts a given part (segment or non-segment) at a given position. If the position is already occupied that part and all the parts to it right are shifted to the right.If the part to insert is a segment, its id is validated.
- Parameters:
partIndex- the position where to insert the new part.part- the part to insert.
-
remove
public void remove(int partIndex)
Removes the part at s given position.If the selected part is the last segment in the content, the part is only cleared, not removed.
- Parameters:
partIndex- the position of the part to remove.
-
append
public void append(TextFragment fragment, boolean collapseIfPreviousEmpty)
Appends a part at the end of this container.If collapseIfPreviousEmpty and if the current last part (segment or non-segment) is empty, the text fragment is appended to the last part. Otherwise the text fragment is appended to the content as a new non-segment part.
Important: If the container is empty, the appended part becomes a segment, as the container has always at least one segment.
- Parameters:
fragment- the text fragment to append.collapseIfPreviousEmpty- true to collapse the previous part if it is empty.
-
append
public void append(TextFragment fragment, boolean collapseIfPreviousEmpty, boolean keepCodeIds)
Appends a part at the end of this container.If collapseIfPreviousEmpty and if the current last part (segment or non-segment) is empty, the text fragment is appended to the last part. Otherwise the text fragment is appended to the content as a new non-segment part.
Important: If the container is empty, the appended part becomes a segment, as the container has always at least one segment.
- Parameters:
fragment- the text fragment to append.collapseIfPreviousEmpty- true to collapse the previous part if it is empty.keepCodeIds- true to block code balancing.
-
append
public void append(TextFragment fragment)
Appends a part at the end of this container.This call is the same as calling
append(TextFragment, boolean)with collapseIfPreviousEmpty set to true.- Parameters:
fragment- the text fragment to append.
-
append
public void append(String text, boolean collapseIfPreviousEmpty)
Appends a part with a given text at the end of this container.If collapseIfPreviousEmpty is true and if the current last part (segment or non-segment) is empty, the new text is appended to the last part part. Otherwise the text is appended to the content as a new non-segment part.
- Parameters:
text- the text to append.collapseIfPreviousEmpty- true to collapse the previous part if it is empty.
-
append
public void append(String text)
Appends a part with a given text at the end of this container.This call is the same as calling
append(String, boolean)with collapseIfPreviousEmpty set to true.- Parameters:
text- the text to append.
-
append
public void append(TextPart part, boolean collapseIfPreviousEmpty)
Appends aTextPart(segment or non-segment) at the end of this container.If collapseiIfPreviousEmpty is true and if the current last part (segment or non-segment) is empty, the new part replaces the last part. Otherwise the part is appended to the container as it. If the result of the operation would result in a container without segment, the first part is automatically converted to a fragment.
- Parameters:
part- the TextPart to append.collapseIfPreviousEmpty- true to collapse the previous part if it is empty.
-
append
public void append(TextPart part)
Appends aTextPart(segment or non-segment) at the end of this container.This call is the same as calling
append(TextPart, boolean)with collapseIfPreviousEmpty set to true.- Parameters:
part- the TextPart to append.
-
getCodedText
public String getCodedText(boolean keepCodeIds)
Gets the coded text of the whole content (segmented or not). Use this method to compute segment boundaries that will be applied usingISegments.create(int, int)orISegments.create(List)or other methods.- Parameters:
keepCodeIds- if true then keep the id of the originalCode- Returns:
- the coded text of the whole content to use for segmentation template.
- See Also:
ISegments.create(int, int),ISegments.create(List)
-
getCodedText
public String getCodedText()
Gets the coded text of the whole content (segmented or not). Use this method to compute segment boundaries that will be applied usingISegments.create(int, int)orISegments.create(List)or other methods.- Returns:
- the coded text of the whole content to use for segmentation template.
- See Also:
ISegments.create(int, int),ISegments.create(List)
-
split
public void split(int partIndex, int start, int end, boolean spannedPartIsSegment)Splits a given part into two or three parts.- If end == start or end or -1 : A new part is created on the right side of the position. It has the same type as the original part.
- If start == 0: A new part is created on the left side of the original part.
- If the specified span is empty at either end of the part, or if it is equals to the whole length of the part: No change (it would result in an empty part). It has the type specified by spannedPartIsSegment.
- Parameters:
partIndex- index of the part to split.start- start of the middle part to create.end- position just after the last character of the middle part to create.spannedPartIsSegment- true if the new middle part should be a segment, false if it should be a non-segment.
-
unwrap
public void unwrap(boolean trimEnds, boolean collapseMode)Unwraps the content of this container.This method replaces any sequences of white-spaces by a single space character. It also removes leading and trailing white-spaces if the parameter trimEnds is set to true.
White spaces in this context are #x9, #xA and #x20. #xD is not considered a whitespace as the content of a text container must have its line-breaks normalized to #xA.
If the container has more than one segment and if collapseMode mode is set: non-segments parts are normalized and removed if they end up empty. If the option is not set: the method preserve at least one space between segments, even if the segments are empty.
Empty segments are always left.
Currently there is no provision to not unwrap a given span of the content.
- Parameters:
trimEnds- true to remove leading and trailing white-spaces.collapseMode- true to remove non-segments parts that end up empty after the unwrapping.
-
getFirstContent
public TextFragment getFirstContent()
Gets the content of the first part (segment or non-segment) of this container.This method always returns the same result as
ISegments.getFirstContent()ifcontentIsOneSegment()is true.- Returns:
- the content of the first part (segment or non-segment) of this container.
- See Also:
ISegments.getFirstContent(),getLastContent(),ISegments.getLastContent()
-
getLastContent
public TextFragment getLastContent()
Gets the content of the last part (segment or non-segment) of this container.This method always returns the same result as
ISegments.getLastContent()ifcontentIsOneSegment().- Returns:
- the content of the last part (segment or non-segment) of this container.
- See Also:
ISegments.getLastContent(),getFirstContent(),ISegments.getFirstContent()
-
clone
public TextContainer clone()
Clones this TextContainer, including the properties.
-
clone
public TextContainer clone(boolean cloneProperties)
Clones this container, with or without its properties.- Parameters:
cloneProperties- indicates if the properties should be cloned.- Returns:
- A new TextContainer object that is a copy of this one.
-
getUnSegmentedContentCopy
public TextFragment getUnSegmentedContentCopy()
Gets a new TextFragment representing the un-segmented content of this container.Important: This is an expensive method.
- Returns:
- an un-segmented copy of the content of this container.
-
getUnSegmentedContentCopy
public TextFragment getUnSegmentedContentCopy(boolean keepCodeIds)
Gets a new TextFragment representing the un-segmented content of this container.Important: This is an expensive method.
- Returns:
- an un-segmented copy of the content of this container.
-
setContent
public void setContent(TextFragment content)
Sets the content of this TextContainer. Any existing segmentation is removed. The content becomes a single segment content.- Parameters:
content- the new content to set.
-
setParts
public void setParts(TextPart... parts)
-
clear
public void clear()
Clears this TextContainer, removes any existing segments. The content becomes a single empty segment content. Keeps annotations.- Specified by:
clearin interfaceIWithAnnotations
-
hasText
public boolean hasText(boolean lookInSegments, boolean whiteSpacesAreText)Indicates if this container contains at least one character. Inline codes and annotation markers do not count as characters.- If the whole content is a single segment the check is performed on that content and the option lookInSegments is ignored.
- If the content has several segments or if the single segment is not the whole content, each segment is checked only if lookInSegment is set.
- The holder is always checked if no text is found in the segments.
- Parameters:
lookInSegments- indicates if the possible segments in this containers should be looked at. If this parameter is set to false, the segment marker are treated as codes.whiteSpacesAreText- indicates if whitespaces should be considered text characters or not.- Returns:
- true if this container contains at least one character according the given options.
-
hasText
public boolean hasText(boolean whiteSpacesAreText)
Indicates if this container contains at least one character that is not a whitespace. All parts (segments and non-segments) are checked.- Parameters:
whiteSpacesAreText- indicates if whitespaces should be considered text characters or not.- Returns:
- true if this container contains at least one character that is not a whitespace.
-
hasText
public boolean hasText()
Indicates if this fragment contains at least one character that is 'text' (inline codes, segment markers, and annotation markers do not count as 'text' characters). This method has the same result as callinghasText(boolean, boolean)with the parameters true and false.- Returns:
- true if this container contains at least one character that is not a whitespace.
-
isEmpty
public boolean isEmpty()
Indicates if this container is empty (no text and no codes).- Returns:
- true if this container is empty.
-
hasCode
public boolean hasCode()
Indicates if this container hasCodes.- Returns:
- true if this container has codes.
-
get
public TextPart get(int index)
Gets the part (segment or non-segment) for a given part index.- Parameters:
index- the index of the part to retrieve. the first part has the index 0, the second has the index 1, etc.- Returns:
- the part (segment or non-segment) for the given index.
- Throws:
IndexOutOfBoundsException- if the index is out of bounds.- See Also:
ISegments.get(int)
-
count
public int count()
Gets the number of parts (segments and non-segments) in this container. This method always returns at least 1.- Returns:
- the number of parts (segments and non-segments) in this container.
- See Also:
ISegments.count()
-
createJoinedContent
public TextFragment createJoinedContent(boolean keepCodeIds)
-
createJoinedContent
public TextFragment createJoinedContent()
-
joinAll
public void joinAll()
Merges back together all parts (segments and non-segments) of this container, and clear the list of segments. The content becomes a single segment content. WARNING: All TextPart annotations and Properties are lost after joining
-
joinWithNext
public int joinWithNext(int partIndex, int partCount)Joins a given part with a specified number of its following parts.If the resulting part is the only part in the container and is not a segment, it is set automatically changed into a segment.
joinWithNext(0, -1) has the same effect as joinAll();
- Parameters:
partIndex- the index of the part where to append the following parts.partCount- the number of parts to join. You can use -1 to indicate all the parts after the initial one.- Returns:
- the number of parts joined to the given part (and removed from the list of parts).
-
-