java.lang.Object
- net.sf.okapi.lib.segmentation.SRXDocument

```
public class SRXDocument
extends Object
```
Provides facilities to load, save, and manage segmentation rules in SRX format. This class also implements several extensions to the standard SRX behavior.

Field Summary

Fields
Modifier and Type	Field	Description
`static String`	`ANYCODE`	Marker for INLINECODE_PATTERN in the given pattern.
`static String`	`DEFAULT_SRX_RULES`
`static String`	`INLINECODE_PATTERN`	Represents the pattern for an inline code (both special characters).
`static String`	`NOAUTO`	Placed at the end of the 'after' expression, this marker indicates the given pattern should not have auto-insertion of AUTO_INLINECODES.

Constructor Summary

Constructors
Constructor Description

SRXDocument()
Creates an empty SRX document.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`addLanguageMap(LanguageMap langMap)`	Adds a language map to this document.
`void`	`addLanguageRule(String name, ArrayList<Rule> langRule)`	Adds a language rule to this SRX document.
`boolean`	`cascade()`	Indicates if cascading must be applied when selecting the rules for a given language pattern.
`ISegmenter`	`compileLanguageRules(LocaleId languageCode, ISegmenter existingSegmenter)`	Compiles the all language rules applicable for a given language code, and assign them to a segmenter.
`ISegmenter`	`compileSingleLanguageRule(String ruleName, ISegmenter existingSegmenter)`	Compiles a single language rule group and assign it to a segmenter.
`String`	`generateRuleRegex(Rule rule)`
`LinkedHashMap<String,ArrayList<Rule>>`	`getAllLanguageRules()`	Gets a map of all the language rules in this document.
`ArrayList<LanguageMap>`	`getAllLanguagesMaps()`	Gets the list of all the language maps in this document.
`String`	`getComments()`	Gets the comments associated with this document.
`String`	`getHeaderComments()`	Gets the comments associated with the header of this document.
`ArrayList<Rule>`	`getLanguageRules(String ruleName)`	Gets the list of rules for a given <languagerule7gt; element.
`String`	`getMaskRule()`	Gets the current pattern of the mask rule.
`String`	`getSampleLanguage()`	Gets the current sample language code.
`String`	`getSampleText()`	Gets the current sample text.
`String`	`getVersion()`	Gets the version of this SRX document.
`String`	`getWarning()`	Gets the last warning that was issued while loading a document.
`boolean`	`hasWarning()`	Indicates if a warning was issued last time a document was read.
`boolean`	`includeEndCodes()`	Indicates if end codes should be included (See SRX implementation notes).
`boolean`	`includeIsolatedCodes()`	Indicates if isolated codes should be included (See SRX implementation notes).
`boolean`	`includeStartCodes()`	Indicates if start codes should be included (See SRX implementation notes).
`boolean`	`isModified()`	Indicates if the document has been modified since the last load or save.
`void`	`loadRules(InputStream inputStream)`	Loads an SRX document from an input stream.
`void`	`loadRules(CharSequence data)`	Loads an SRX document from a CharSequence object.
`void`	`loadRules(String pathOrURL)`	Loads an SRX document from a file.
`boolean`	`oneSegmentIncludesAll()`	Indicates if, when there is a single segment in a text, it should include the whole text (no spaces or codes trim left/right)
`void`	`resetAll()`	Resets the document to its default empty initial state.
`void`	`saveRules(String rulesPath, boolean saveExtensions, boolean saveNonValidInfo)`	Saves the current rules to an SRX rules document.
`String`	`saveRulesToString(boolean saveExtensions, boolean saveNonValidInfo)`	Saves the current rules to an SRX string.
`boolean`	`segmentSubFlows()`	Indicates if sub-flows must be segmented.
`void`	`setCascade(boolean value)`	Sets the flag indicating if cascading must be applied when selecting the rules for a given language pattern.
`void`	`setComments(String text)`	Sets the comments for this document.
`void`	`setHeaderComments(String text)`	Sets the comments for the header of this document.
`void`	`setIncludeEndCodes(boolean value)`	Sets the indicator that tells if end codes should be included or not.
`void`	`setIncludeIsolatedCodes(boolean value)`	Sets the indicator that tells if isolated codes should be included or not.
`void`	`setIncludeStartCodes(boolean value)`	Sets the indicator that tells if start codes should be included or not.
`void`	`setMaskRule(String pattern)`	Sets the pattern for the mask rule.
`void`	`setModified(boolean value)`	Sets the flag indicating if the document has been modified since the last load or save.
`void`	`setOneSegmentIncludesAll(boolean value)`	Sets the indicator that tells if when there is a single segment in a text it should include the whole text (no spaces or codes trim left/right) text.
`void`	`setSampleLanguage(String value)`	Sets the sample language code.
`void`	`setSampleText(String value)`	Sets the sample text.
`void`	`setSegmentSubFlows(boolean value)`	Sets the flag indicating if sub-flows must be segmented.
`void`	`setTestOnSelectedGroup(boolean value)`	Sets the indicator on how to apply rules for samples.
`void`	`setTreatIsolatedCodesAsWhitespace(boolean value)`	Sets the indicator if this document should treat isolated codes as whitespace when matching SRX rules.
`void`	`setTrimLeadingWhitespaces(boolean value)`	Sets the indicator that tells if leading white-spaces should be left outside the segments.
`void`	`setTrimTrailingWhitespaces(boolean value)`	Sets the indicator that tells if trailing white-spaces should be left outside the segments.
`void`	`setUseICU4JBreakRules(boolean value)`	Sets the indicator that tells if this document uses ICU4J BreakIterator rules.
`boolean`	`testOnSelectedGroup()`	Indicates that, when sampling the rules, the sample should be computed using only a selected group of rules.
`boolean`	`treatIsolatedCodesAsWhitespace()`	Indicates if this document should treat isolated codes as whitespace when matching SRX rules.
`boolean`	`trimLeadingWhitespaces()`	Indicates if leading white-spaces should be left outside the segments.
`boolean`	`trimTrailingWhitespaces()`	Indicates if trailing white-spaces should be left outside the segments.
`boolean`	`useIcu4JBreakRules()`	Indicates if this document uses ICU4J break rules.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_SRX_RULES
```
public static final String DEFAULT_SRX_RULES
```
    See Also:
    
    Constant Field Values
  - INLINECODE_PATTERN
```
public static final String INLINECODE_PATTERN
```
    Represents the pattern for an inline code (both special characters).
  - ANYCODE
```
public static final String ANYCODE
```
    Marker for INLINECODE_PATTERN in the given pattern. \Y+ = one or more codes, \Y* = zero, one or more codes, etc.
    
    See Also:
    
    Constant Field Values
  - NOAUTO
```
public static final String NOAUTO
```
    Placed at the end of the 'after' expression, this marker indicates the given pattern should not have auto-insertion of AUTO_INLINECODES.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - SRXDocument
```
public SRXDocument()
```
    Creates an empty SRX document.
- Method Detail
  - getVersion
```
public String getVersion()
```
    Gets the version of this SRX document.
    
    Returns:
    
    the version of this SRX document.
  - hasWarning
```
public boolean hasWarning()
```
    Indicates if a warning was issued last time a document was read.
    
    Returns:
    
    true if a warning was issued, false otherwise.
  - getWarning
```
public String getWarning()
```
    Gets the last warning that was issued while loading a document.
    
    Returns:
    
    the text of the last warning issued, or an empty string.
  - getHeaderComments
```
public String getHeaderComments()
```
    Gets the comments associated with the header of this document.
    
    Returns:
    
    the comments for the header of this document, or null if there are none.
  - setHeaderComments
```
public void setHeaderComments(String text)
```
    Sets the comments for the header of this document.
    
    Parameters:
    
    text - the new comments, use null or empty string for removing the comments.
  - getComments
```
public String getComments()
```
    Gets the comments associated with this document.
    
    Returns:
    
    the comments for this document, or null if there are none.
  - setComments
```
public void setComments(String text)
```
    Sets the comments for this document.
    
    Parameters:
    
    text - the new comments, use null or empty string for removing the comments.
  - resetAll
```
public void resetAll()
```
    Resets the document to its default empty initial state.
  - getAllLanguageRules
```
public LinkedHashMap<String,ArrayList<Rule>> getAllLanguageRules()
```
    Gets a map of all the language rules in this document.
    
    Returns:
    
    a map of all the language rules.
  - getLanguageRules
```
public ArrayList<Rule> getLanguageRules(String ruleName)
```
    Gets the list of rules for a given <languagerule7gt; element.
    
    Parameters:
    
    ruleName - the name of the <languagerulegt; element to query.
    
    Returns:
    
    the list of rules for a given <languagerulegt; element.
  - getAllLanguagesMaps
```
public ArrayList<LanguageMap> getAllLanguagesMaps()
```
    Gets the list of all the language maps in this document.
    
    Returns:
    
    the list of all the language maps.
  - segmentSubFlows
```
public boolean segmentSubFlows()
```
    Indicates if sub-flows must be segmented.
    
    Returns:
    
    true if sub-flows must be segmented, false otherwise.
  - setSegmentSubFlows
```
public void setSegmentSubFlows(boolean value)
```
    Sets the flag indicating if sub-flows must be segmented.
    
    Parameters:
    
    value - true if sub-flows must be segmented, false otherwise.
  - cascade
```
public boolean cascade()
```
    Indicates if cascading must be applied when selecting the rules for a given language pattern.
    
    Returns:
    
    true if cascading must be applied, false otherwise.
  - setCascade
```
public void setCascade(boolean value)
```
    Sets the flag indicating if cascading must be applied when selecting the rules for a given language pattern.
    
    Parameters:
    
    value - true if cascading must be applied, false otherwise.
  - oneSegmentIncludesAll
```
public boolean oneSegmentIncludesAll()
```
    Indicates if, when there is a single segment in a text, it should include the whole text (no spaces or codes trim left/right)
    
    Returns:
    
    true if a text with a single segment should include the whole text.
  - setOneSegmentIncludesAll
```
public void setOneSegmentIncludesAll(boolean value)
```
    Sets the indicator that tells if when there is a single segment in a text it should include the whole text (no spaces or codes trim left/right) text.
    
    Parameters:
    
    value - true if a text with a single segment should include the whole text.
  - useIcu4JBreakRules
```
public boolean useIcu4JBreakRules()
```
    Indicates if this document uses ICU4J break rules.
    
    Returns:
    
    true if ICU4J break rules are used, false otherwise.
  - setUseICU4JBreakRules
```
public void setUseICU4JBreakRules(boolean value)
```
    Sets the indicator that tells if this document uses ICU4J BreakIterator rules. BreakIterator break positions are converted to SRX-like rules and used as default rules for all languages.
    
    Parameters:
    
    value - true if ICU4J rules should be used as defaults expression, false if no ICU4J rules should be used
  - treatIsolatedCodesAsWhitespace
```
public boolean treatIsolatedCodesAsWhitespace()
```
    Indicates if this document should treat isolated codes as whitespace when matching SRX rules.
    
    Returns:
    
    true if isolated codes should be treated as whitespace
  - setTreatIsolatedCodesAsWhitespace
```
public void setTreatIsolatedCodesAsWhitespace(boolean value)
```
    Sets the indicator if this document should treat isolated codes as whitespace when matching SRX rules.
    
    Parameters:
    
    value - true if isolated codes should be treated as whitespace
  - trimLeadingWhitespaces
```
public boolean trimLeadingWhitespaces()
```
    Indicates if leading white-spaces should be left outside the segments.
    
    Returns:
    
    true if the leading white-spaces should be trimmed.
  - setTrimLeadingWhitespaces
```
public void setTrimLeadingWhitespaces(boolean value)
```
    Sets the indicator that tells if leading white-spaces should be left outside the segments.
    
    Parameters:
    
    value - true if the leading white-spaces should be trimmed.
  - trimTrailingWhitespaces
```
public boolean trimTrailingWhitespaces()
```
    Indicates if trailing white-spaces should be left outside the segments.
    
    Returns:
    
    true if the trailing white-spaces should be trimmed.
  - setTrimTrailingWhitespaces
```
public void setTrimTrailingWhitespaces(boolean value)
```
    Sets the indicator that tells if trailing white-spaces should be left outside the segments.
    
    Parameters:
    
    value - true if the trailing white-spaces should be trimmed.
  - includeStartCodes
```
public boolean includeStartCodes()
```
    Indicates if start codes should be included (See SRX implementation notes).
    
    Returns:
    
    true if start codes should be included, false otherwise.
  - setIncludeStartCodes
```
public void setIncludeStartCodes(boolean value)
```
    Sets the indicator that tells if start codes should be included or not. (See SRX implementation notes).
    
    Parameters:
    
    value - true if start codes should be included, false otherwise.
  - includeEndCodes
```
public boolean includeEndCodes()
```
    Indicates if end codes should be included (See SRX implementation notes).
    
    Returns:
    
    true if end codes should be included, false otherwise.
  - setIncludeEndCodes
```
public void setIncludeEndCodes(boolean value)
```
    Sets the indicator that tells if end codes should be included or not. (See SRX implementation notes).
    
    Parameters:
    
    value - true if end codes should be included, false otherwise.
  - includeIsolatedCodes
```
public boolean includeIsolatedCodes()
```
    Indicates if isolated codes should be included (See SRX implementation notes).
    
    Returns:
    
    true if isolated codes should be included, false otherwise.
  - setIncludeIsolatedCodes
```
public void setIncludeIsolatedCodes(boolean value)
```
    Sets the indicator that tells if isolated codes should be included or not. (See SRX implementation notes).
    
    Parameters:
    
    value - true if isolated codes should be included, false otherwise.
  - getMaskRule
```
public String getMaskRule()
```
    Gets the current pattern of the mask rule.
    
    Returns:
    
    the current pattern of the mask rule.
  - setMaskRule
```
public void setMaskRule(String pattern)
```
    Sets the pattern for the mask rule.
    
    Parameters:
    
    pattern - the new pattern to use for the mask rule.
  - getSampleText
```
public String getSampleText()
```
    Gets the current sample text. This text is an example string that can be used to test the various rules. It can be handy to be able to save it along with the SRX document.
    
    Returns:
    
    the sample text, or an empty string.
  - setSampleText
```
public void setSampleText(String value)
```
    Sets the sample text.
    
    Parameters:
    
    value - the new sample text.
  - getSampleLanguage
```
public String getSampleLanguage()
```
    Gets the current sample language code.
    
    Returns:
    
    the current sample language code.
  - setSampleLanguage
```
public void setSampleLanguage(String value)
```
    Sets the sample language code. Null or empty strings are changed to the default language.
    
    Parameters:
    
    value - the new sample language code.
  - testOnSelectedGroup
```
public boolean testOnSelectedGroup()
```
    Indicates that, when sampling the rules, the sample should be computed using only a selected group of rules.
    
    Returns:
    
    true to test using only a selected group of rules. False to test using all the rules matching a given language.
  - setTestOnSelectedGroup
```
public void setTestOnSelectedGroup(boolean value)
```
    Sets the indicator on how to apply rules for samples.
    
    Parameters:
    
    value - true to test using only a selected group of rules. False to test using all the rules matching a given language.
  - isModified
```
public boolean isModified()
```
    Indicates if the document has been modified since the last load or save.
    
    Returns:
    
    true if the document have been modified, false otherwise.
  - setModified
```
public void setModified(boolean value)
```
    Sets the flag indicating if the document has been modified since the last load or save. If you make change to the rules or language maps directly to the lists, make sure to set this flag to true.
    
    Parameters:
    
    value - true if the document has been changed, false otherwise.
  - addLanguageRule
```
public void addLanguageRule(String name,
                            ArrayList<Rule> langRule)
```
    Adds a language rule to this SRX document. If another language rule with the same name exists already it will be replaced by the new one, without warning.
    
    Parameters:
    
    name - name of the language rule to add.
    
    langRule - language rule object to add.
  - addLanguageMap
```
public void addLanguageMap(LanguageMap langMap)
```
    Adds a language map to this document. The new map is added at the end of the one already there.
    
    Parameters:
    
    langMap - the language map object to add.
  - compileLanguageRules
```
public ISegmenter compileLanguageRules(LocaleId languageCode,
                                       ISegmenter existingSegmenter)
```
    Compiles the all language rules applicable for a given language code, and assign them to a segmenter. This method applies the language code you specify to the language mappings currently available in the document and compile the rules when one or more language map is found. The matching is done in the order of the list of language maps and more than one can be selected if cascade() is true.
    
    Parameters:
    
    languageCode - the language code. the value should be a BCP-47 value (e.g. "de", "fr-ca", etc.)
    
    existingSegmenter - optional existing SRXSegmenter object to re-use. Use null for not re-using anything.
    
    Returns:
    
    the instance of the segmenter with the new compiled rules.
  - compileSingleLanguageRule
```
public ISegmenter compileSingleLanguageRule(String ruleName,
                                            ISegmenter existingSegmenter)
```
    Compiles a single language rule group and assign it to a segmenter.
    
    Parameters:
    
    ruleName - the name of the rule group to apply.
    
    existingSegmenter - optional existing SRXSegmenter object to re-use. Use null for not re-using anything.
    
    Returns:
    
    the instance of the segmenter with the new compiled rules.
  - generateRuleRegex
```
public String generateRuleRegex(Rule rule)
```
  - loadRules
```
public void loadRules(CharSequence data)
```
    Loads an SRX document from a CharSequence object. Calling this method resets all settings and rules to their default state and then populate them with the data stored in the document being loaded. The rules can be embedded inside another vocabulary.
    
    Parameters:
    
    data - the string containing the SRX document to load.
  - loadRules
```
public void loadRules(String pathOrURL)
```
    Loads an SRX document from a file. Calling this method resets all settings and rules to their default state and then populate them with the data stored in the document being loaded. The rules can be embedded inside another vocabulary.
    For SRXDocument.DEFAULT_SRX_RULES (the string "DEFAULT_SRX_RULES" in serialized parameters) this will load the (Okapi recommended) .srx file, embedded in the library jar.
    
    Parameters:
    
    pathOrURL - The full path or URL of the document to load.
  - loadRules
```
public void loadRules(InputStream inputStream)
```
    Loads an SRX document from an input stream. Calling this method resets all settings and rules to their default state and then populate them with the data stored in the document being loaded. The rules can be embedded inside another vocabulary.
    
    Parameters:
    
    inputStream - the input stream to read from.
  - saveRulesToString
```
public String saveRulesToString(boolean saveExtensions,
                                boolean saveNonValidInfo)
```
    Saves the current rules to an SRX string.
    
    Parameters:
    
    saveExtensions - true to save Okapi SRX extensions, false otherwise.
    
    saveNonValidInfo - true to save non-SRX-valid attributes, false otherwise.
    
    Returns:
    
    the string containing the saved SRX rules.
  - saveRules
```
public void saveRules(String rulesPath,
                      boolean saveExtensions,
                      boolean saveNonValidInfo)
```
    Saves the current rules to an SRX rules document.
    
    Parameters:
    
    rulesPath - the full path of the file where to save the rules.
    
    saveExtensions - true to save Okapi SRX extensions, false otherwise.
    
    saveNonValidInfo - true to save non-SRX-valid attributes, false otherwise.

Class SRXDocument

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_SRX_RULES

INLINECODE_PATTERN

ANYCODE

NOAUTO

Constructor Detail

SRXDocument

Method Detail

getVersion

hasWarning

getWarning

getHeaderComments

setHeaderComments

getComments

setComments

resetAll

getAllLanguageRules

getLanguageRules

getAllLanguagesMaps

segmentSubFlows

setSegmentSubFlows

cascade

setCascade

oneSegmentIncludesAll

setOneSegmentIncludesAll

useIcu4JBreakRules

setUseICU4JBreakRules

treatIsolatedCodesAsWhitespace

setTreatIsolatedCodesAsWhitespace

trimLeadingWhitespaces

setTrimLeadingWhitespaces

trimTrailingWhitespaces

setTrimTrailingWhitespaces

includeStartCodes

setIncludeStartCodes

includeEndCodes

setIncludeEndCodes

includeIsolatedCodes

setIncludeIsolatedCodes

getMaskRule

setMaskRule

getSampleText

setSampleText

getSampleLanguage

setSampleLanguage

testOnSelectedGroup

setTestOnSelectedGroup

isModified

setModified

addLanguageRule

addLanguageMap

compileLanguageRules

compileSingleLanguageRule

generateRuleRegex

loadRules

loadRules

loadRules

saveRulesToString

saveRules