Package net.sf.okapi.common
Class StringUtil
- java.lang.Object
-
- net.sf.okapi.common.StringUtil
-
public final class StringUtil extends Object
Helper methods to manipulate strings.
-
-
Constructor Summary
Constructors Constructor Description StringUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static StringcharsToString(Set<Character> set)static Stringchomp(String str)chomp single newline combination from end of string Copied from Apache Commons StrinUtil (http://www.apache.org/licenses/LICENSE-2.0)static StringcollapseNewline(String text)static StringcollapseWhitespace(String text)static booleancontainsWildcards(String string)Detects if a given string contains shell wildcard characters (e.g.static intgenerateIntId(String s)JVM independent hashCode implementation used to generate numeric id's from stringsstatic intgetNumOccurrences(String str, String substr)Returns a number of occurrences of a given substring in a given string.static StringgetQualifier(String st)Return any qualifiers (quotation marks etc.) around text in a given string.static StringgetString(int length, char c)Returns a new string padded with a given character repeated given times.static booleanhasQualifiers(String st)Test if there are qualifiers (quotation marks etc.) around text in a given string.static booleanisWhitespace(String str)Checks if a given string contains only whitespace characters.static floatLcsEditDistance(CharSequence seq1, CharSequence seq2)Longest Common Subsequence algorithm onCharSequences.static booleanmatchesWildcard(String string, String pattern)Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcardsstatic booleanmatchesWildcard(String string, String pattern, boolean filenameMode)Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcardsstatic StringmirrorString(String str)Returns the reversed version of a given string, e.g.static StringnormalizeLineBreaks(String string)Converts line breaks in a given string to the Unix standard (\n).static StringnormalizeWildcards(String string)Converts shell wildcards (e.g.static StringpadString(String string, int startPos, int endPos, char padder)Pads a range of a given string with a given character.static StringreadString(File path)static StringreadString(URL url)static StringremoveAnyQualifiers(String st)Removes quotation marks (double or single) around text in a given string.static StringremoveQualifiers(String st)Removes quotation marks around text in a given string.static StringremoveQualifiers(String st, String qualifier)Removes qualifiers (quotation marks etc.) around text in a given string.static StringremoveQualifiers(String st, String startQualifier, String endQualifier)Removes qualifiers (quotation marks etc.) around text in a given string.static StringremoveWhiteSpace(String s)Fast version of removeWhiteSpace vs regex versionstatic StringrepeatChar(char c, int rep)Build a string made of the given character for the specfied numberstatic String[]split(String string, String delimRegex)static String[]split(String string, String delimRegex, int group)static Stringsubstring(String string, int start, int end)static StringtitleCase(String st)Returns a title-case representation of a given string.static voidwriteString(String str, File path)static voidwriteString(String str, OutputStream os)
-
-
-
Method Detail
-
LcsEditDistance
public static float LcsEditDistance(CharSequence seq1, CharSequence seq2)
Longest Common Subsequence algorithm onCharSequences.- Parameters:
seq1-CharSequenceoneseq2-CharSequencetwo- Returns:
- the score based on the length of the common subsequence and the input sequences
-
titleCase
public static String titleCase(String st)
Returns a title-case representation of a given string. The first character is capitalized, following characters are in lower case.- Parameters:
st- the give string.- Returns:
- a copy of the given string normalized to the title case.
-
removeQualifiers
public static String removeQualifiers(String st, String startQualifier, String endQualifier)
Removes qualifiers (quotation marks etc.) around text in a given string.- Parameters:
st- the given string.startQualifier- the qualifier to be removed before the given string.endQualifier- the qualifier to be removed after the given string.- Returns:
- a copy of the given string without qualifiers.
-
getQualifier
public static String getQualifier(String st)
Return any qualifiers (quotation marks etc.) around text in a given string.- Parameters:
st- the given string.- Returns:
- the qualifier
-
hasQualifiers
public static boolean hasQualifiers(String st)
Test if there are qualifiers (quotation marks etc.) around text in a given string.- Parameters:
st- the given string.- Returns:
- true if the string has qualifiers
-
removeQualifiers
public static String removeQualifiers(String st, String qualifier)
Removes qualifiers (quotation marks etc.) around text in a given string.- Parameters:
st- the given string.qualifier- the qualifier to be removed before and after text in the string.- Returns:
- a copy of the given string without qualifiers.
-
removeAnyQualifiers
public static String removeAnyQualifiers(String st)
Removes quotation marks (double or single) around text in a given string.- Parameters:
st- the given string.- Returns:
- a copy of the given string without quotation marks.
-
removeQualifiers
public static String removeQualifiers(String st)
Removes quotation marks around text in a given string.- Parameters:
st- the given string.- Returns:
- a copy of the given string without quotation marks.
-
normalizeLineBreaks
public static String normalizeLineBreaks(String string)
Converts line breaks in a given string to the Unix standard (\n).- Parameters:
string- the given string.- Returns:
- a copy of the given string, all line breaks are \n.
-
normalizeWildcards
public static String normalizeWildcards(String string)
Converts shell wildcards (e.g. * and ?) in a given string to its Java regex representation.- Parameters:
string- the given string.- Returns:
- a copy of the given string, all wildcards are converted into a correct Java regular expression. The result is checked for being a correct regex pattern. If it is not, then the given original string is returned as being most likely already a regex pattern.
-
containsWildcards
public static boolean containsWildcards(String string)
Detects if a given string contains shell wildcard characters (e.g. * and ?).- Parameters:
string- the given string.- Returns:
- true if the string contains the asterisk (*) or question mark (?).
-
matchesWildcard
public static boolean matchesWildcard(String string, String pattern, boolean filenameMode)
Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcards- Parameters:
string- the given string (no-wildcards)pattern- the pattern containing wildcards to match againstfilenameMode- indicates if the given string should be considered a file name- Returns:
- true if the given string matches the given pattern
-
matchesWildcard
public static boolean matchesWildcard(String string, String pattern)
Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcards- Parameters:
string- the given string (no-wildcards)pattern- the pattern containing wildcards to match against- Returns:
- true if the given string matches the given pattern
-
getNumOccurrences
public static int getNumOccurrences(String str, String substr)
Returns a number of occurrences of a given substring in a given string.- Parameters:
str- the given string.substr- the given substring being sought.- Returns:
- the number of occurrences of the substring in the string.
-
isWhitespace
public static boolean isWhitespace(String str)
Checks if a given string contains only whitespace characters.- Parameters:
str- the given string- Returns:
- true if the given string is whitespace
-
getString
public static String getString(int length, char c)
Returns a new string padded with a given character repeated given times.- Parameters:
length- length of the new stringc- the character to pad the string- Returns:
- the new string
-
padString
public static String padString(String string, int startPos, int endPos, char padder)
Pads a range of a given string with a given character.- Parameters:
string- the given stringstartPos- start position of the pad range (including)endPos- end position of the pad range (excluding)padder- the character to pad the range with- Returns:
- the given string with the given range padded with the given char
-
mirrorString
public static String mirrorString(String str)
Returns the reversed version of a given string, e.g. "cba" for "abc".- Parameters:
str- The given string- Returns:
- The reversed string
-
writeString
public static void writeString(String str, OutputStream os)
-
removeWhiteSpace
public static String removeWhiteSpace(String s)
Fast version of removeWhiteSpace vs regex version- Parameters:
s- string with whitespace- Returns:
- string without whitespace
-
chomp
public static String chomp(String str)
chomp single newline combination from end of string Copied from Apache Commons StrinUtil (http://www.apache.org/licenses/LICENSE-2.0)- Parameters:
str- the string to chop.- Returns:
- the chopped string.
-
repeatChar
public static String repeatChar(char c, int rep)
Build a string made of the given character for the specfied number- Parameters:
c- character to repeatrep- the number of repeat- Returns:
- string of length rep made only of the character c
-
generateIntId
public static int generateIntId(String s)
JVM independent hashCode implementation used to generate numeric id's from strings- Parameters:
s-- Returns:
- an integer calculated from the give strings Some collisions are expected but should be rare for longer strings.
-
-