Guava源码解析-CharMatcher源码解析

CharMatcher源码解析

CharMatcher类介绍

CharMatcher提过了多种对字符串处理的方法,它主要用于对字符的操作以及找到匹配的字符。

CharMatcher 的内部实现主要包括两部分: 1. 实现了大量公用内部类, 用来方便用户对字符串做匹配: 例如 JAVA_DIGIT 匹配数字, JAVA_LETTER 匹配字母等等。 2. 实现了大量处理字符串的方法, 使用特定的CharMatcher可以对匹配到的字符串做出多种处理, 例如 remove(), replace(), trim(), retain()等等。

CharMatcher本身是一个抽象类, 其中一些操作方法是抽象方法, 他主要依靠内部继承CharMatcher的内部子类来实现抽象方法和重写一些操作方法, 因为不同的匹配规则的这些操作方法具有不同的实现要求。

类图

CharMatcher类图

主要属性

1
private static final int DISTINCT_CHARS = Character.MAX_VALUE - Character.MIN_VALUE + 1;

主要方法

any()

1
2
3
public static CharMatcher any() {
return Any.INSTANCE;
}

匹配任意字符。

返回的是一个CharMatcher的实现类之一,Any的实例。

Any实现类

主要属性

1
static final Any INSTANCE = new Any();

声明一个Any实例。

主要方法

any()
1
2
3
private Any() {
super("CharMatcher.any()");
}

私有方法,禁止创建Any的实例。

matches
1
2
3
4
@Override
public boolean matches(char c) {
return true;
}

因为Any的本意是匹配任意字符,所以这个方法永远返回true。

indexIn
1
2
3
4
@Override
public int indexIn(CharSequence sequence) {
return (sequence.length() == 0) ? -1 : 0;
}

返回字符在需要匹配的字符串的位置。

如果需要匹配的字符串长度为0,返回-1。否则,直接返回该字符串的首字母。

indexIn(CharSequence, int)
1
2
3
4
5
6
@Override
public int indexIn(CharSequence sequence, int start) {
int length = sequence.length();
checkPositionIndex(start, length);
return (start == length) ? -1 : start;
}

返回从位置start开始,匹配的字符在字串CharSequence中的位置。如果start和字串的长度一致,返回-1,否则直接返回start。

lastIndexIn(CharSequence)
1
2
3
4
@Override
public int lastIndexIn(CharSequence sequence) {
return sequence.length() - 1;
}

从给定的字串尾部开始寻找,返回第一个匹配的字符的位置。

当前直接返回最后一个字符。

matchesAllOf(CharSequence)
1
2
3
4
5
@Override
public boolean matchesAllOf(CharSequence sequence) {
checkNotNull(sequence);
return true;
}

返回当前字串是否全部匹配。

此处直接返回true。

matchesNoneOf(CharSequence)
1
2
3
4
@Override
public boolean matchesNoneOf(CharSequence sequence) {
return sequence.length() == 0;
}

返回当前字串是否全部匹配。

如果给定的字串长度为0,返回true,否则返回false。

removeFrom(CharSequence)
1
2
3
4
5
@Override
public String removeFrom(CharSequence sequence) {
checkNotNull(sequence);
return "";
}

从给定的当前字符串中移除所有匹配的字符。

当前返回空字符串。

replaceFrom(CharSequence)
1
2
3
4
5
6
@Override
public String replaceFrom(CharSequence sequence, char replacement) {
char[] array = new char[sequence.length()];
Arrays.fill(array, replacement);
return new String(array);
}

将给定的字符串中匹配的字符替换成指定的字符。

当前直接返回一个全是replacement的字符串。

replaceFrom(CharSequence, CharSequence)
1
2
3
4
5
6
7
8
@Override
public String replaceFrom(CharSequence sequence, CharSequence replacement) {
StringBuilder result = new StringBuilder(sequence.length() * replacement.length());
for (int i = 0; i < sequence.length(); i++) {
result.append(replacement);
}
return result.toString();
}

将给定的字符串中匹配的字符替换成指定的字符串。

当前直接返回一个全是replacement的字符串。

collapseFrom(CharSequence, char replacement)
1
2
3
4
@Override
public String collapseFrom(CharSequence sequence, char replacement) {
return (sequence.length() == 0) ? "" : String.valueOf(replacement);
}

将匹配的字符串替换为指定的字符。

如果给定的字符串长度为0,返回空字符串,否则返回单个replacement的字符串。

trimFrom(CharSequence)
1
2
3
4
5
@Override
public String trimFrom(CharSequence sequence) {
checkNotNull(sequence);
return "";
}

从当前给定的字符串中删除匹配的字符串。

当前返回空字符串。

countIn(CharSequence)
1
2
3
4
@Override
public int countIn(CharSequence sequence) {
return sequence.length();
}

统计在给定字符串中,匹配上的字符的个数。

and(CharMatcher)
1
2
3
4
@Override
public CharMatcher and(CharMatcher other) {
return checkNotNull(other);
}

生成一个新的CharMatcher。结果集是当前Any和other的集合(即,逻辑上的 and)。

当前直接返回other这个CharMatcher。

or(CharMatcher)
1
2
3
4
5
@Override
public CharMatcher or(CharMatcher other) {
checkNotNull(other);
return this;
}

生成一个新的CharMatcher。结果集是当前Any或other的集合(即,逻辑上的 or)。

当前直接返回this这个CharMatcher。

negate()
1
2
3
4
@Override
public CharMatcher negate() {
return none();
}

any的相反即为none。

None()

1
2
3
4
5
6
7
8
9
/**
* Matches no characters.
*
* @since 19.0 (since 1.0 as constant {@code NONE})
*/
public static CharMatcher none() {
return None.INSTANCE;
}

返回一个CharMatcher实例,该实例不匹配任意字符。

None实现类

主要属性

1
static final None INSTANCE = new None();

INSTANCE,当前类的静态实例

主要方法

None()
1
2
3
private None() {
super("CharMatcher.none()");
}

私有构造函数,禁止创建None的实例。

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return false;
}

返回给定的字符是否匹配。

当前返回false。没有字符能匹配。

indexIn(CharSequence)
1
2
3
4
5
@Override
public int indexIn(CharSequence sequence) {
checkNotNull(sequence);
return -1;
}

返回给定的字符串中符合条件的字符的位置。

当前返回-1.没有字符串能匹配。

indexIn(CharSequence, int)
1
2
3
4
5
public int indexIn(CharSequence sequence, int start) {
int length = sequence.length();
checkPositionIndex(start, length);
return -1;
}

从给定的位置开始,返回给定字符串中符合条件的字符的位置。

当前返回-1.没有字符串能匹配。

lastIndexIn(CharSequence)
1
2
3
4
5
@Override
public int lastIndexIn(CharSequence sequence) {
checkNotNull(sequence);
return -1;
}

从末尾的位置开始,返回给定字符或窜符合条件的字符的位置。

当前返回-1,没有字符串能匹配。

matchesAllOf(CharSequence)
1
2
3
4
@Override
public boolean matchesAllOf(CharSequence sequence) {
return sequence.length() == 0;
}

给定的字符串中的每个字符,是否符合要求。

如果给定的字符串长度为0,返回true,否则返回false。

matchesNoneOf(CharSequence)
1
2
3
4
5
@Override
public boolean matchesNoneOf(CharSequence sequence) {
checkNotNull(sequence);
return true;
}

给定的字符串中的每个字符,是否都不符合要求。

当前返回true。

removeFrom(CharSequence)
1
2
3
4
@Override
public String removeFrom(CharSequence sequence) {
return sequence.toString();
}

返回当前字符串中剔除符合条件的字符,剩下的字符串。

返回原字符串。

replaceFrom(CharSequence, char)
1
2
3
4
@Override
public String replaceFrom(CharSequence sequence, char replacement) {
return sequence.toString();
}

将当前字符串中符合条件的字符,替换为给定的字符。

当前返回原字符串。

replaceFrom(CharSequence, CharSequence)
1
2
3
4
5
@Override
public String replaceFrom(CharSequence sequence, CharSequence replacement) {
checkNotNull(replacement);
return sequence.toString();
}

将当前字符串中符合条件的字符,替换为给定的字符串。

当前返回原字符串。

collapseFrom(CharSequence, char)
1
2
3
4
5
@Override
public String collapseFrom(CharSequence sequence, char replacement) {
return sequence.toString();
}

将当前符合条件的字符串,缩减成给定的字符replacement。

当前返回原字符串。

trimFrom(CharSequence)
1
2
3
@Override
public String trimFrom(CharSequence sequence) {
return sequence.toString();

从给定的字符串中删除符合条件的字符,并返回剩下的字符串,

当前返回原字符串。

trimLeadingFrom(CharSequence)
1
2
3
4
@Override
public String trimLeadingFrom(CharSequence sequence) {
return sequence.toString();
}

从给定的字符串首部中删除匹配的字符,只删除一次。

当前返回原字符串。

trimTrailingFrom(CharSequence)
1
2
3
4
@Override
public String trimTrailingFrom(CharSequence sequence) {
return sequence.toString();
}

从给定的字符串尾部中删除匹配的字符,只删除一次。

当前返回原字符串。

countIn(CharSequence)
1
2
3
4
5
@Override
public int countIn(CharSequence sequence) {
checkNotNull(sequence);
return 0;
}

统计给定的字符串中符合条件的字符数目。

当前返回0.

and(CharMatcher)
1
2
3
4
5
@Override
public CharMatcher and(CharMatcher other) {
checkNotNull(other);
return this;
}

返回一个当前CharMatcher和other的逻辑上’and’的CharMatcher。

当前返回this。

or(CharMatcher)
1
2
3
4
@Override
public CharMatcher or(CharMatcher other) {
return checkNotNull(other);
}

返回一个当前CharMatcher和ohter的逻辑上’or’的CharMatcher。

当前返回other。

negate()
1
2
3
4
@Override
public CharMatcher negate() {
return any();
}

返回negate的相反。none的相反就是any。

whitespace()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/**
* Determines whether a character is whitespace according to the latest Unicode standard, as
* illustrated <a
* href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cp%7Bwhitespace%7D">here</a>.
* This is not the same definition used by other Java APIs. (See a <a
* href="https://goo.gl/Y6SLWx">comparison of several definitions of "whitespace"</a>.)
*
* <p>All Unicode White_Space characters are on the BMP and thus supported by this API.
*
* <p><b>Note:</b> as the Unicode definition evolves, we will modify this matcher to keep it up to
* date.
*
* @since 19.0 (since 1.0 as constant {@code WHITESPACE})
*/
public static CharMatcher whitespace() {
return Whitespace.INSTANCE;
}

返回一个Whitespace的实例。Whitespace用来匹配最新的Unicode标准下的空格。

Whitespace实现类

主要属性

1
2
3
4
5
6
7
8
9
10
11
12
13
// TABLE is a precomputed hashset of whitespace characters. MULTIPLIER serves as a hash function
// whose key property is that it maps 25 characters into the 32-slot table without collision.
// Basically this is an opportunistic fast implementation as opposed to "good code". For most
// other use-cases, the reduction in readability isn't worth it.
static final String TABLE =
"\u2002\u3000\r\u0085\u200A\u2005\u2000\u3000"
+ "\u2029\u000B\u3000\u2008\u2003\u205F\u3000\u1680"
+ "\u0009\u0020\u2006\u2001\u202F\u00A0\u000C\u2009"
+ "\u3000\u2004\u3000\u3000\u2028\n\u2007\u3000";
static final int MULTIPLIER = 1682554634;
static final int SHIFT = Integer.numberOfLeadingZeros(TABLE.length() - 1);

static final Whitespace INSTANCE = new Whitespace();

TABLE是预先计算好的25个UNICODE中的字符。MULTIPLIER是一个魔数,也是预先计算好的。1682554634这个魔数和TABLE是刻意设计成这样的。但是源码中没有解释如何生成,在GitHub上倒是也有人这么问过,Guava owner回复说道:他们确实有一个生成器,但是由于一些依赖的原因,并没有开源出来。

TABLE的长度为32,而Integer.numberOfLeadingZeros(32 - 1)的结果为27。因为,31的二进制为11111。而Java中一个Integer占32位,所以SHIFT为27.

主要方法

Whitespace()
1
2
3
Whitespace() {
super("CharMatcher.whitespace()");
}

构造函数,调用父类的构造函数。

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return TABLE.charAt((MULTIPLIER * c) >>> SHIFT) == c;
}

算法比较简单,即判断TABLE字符串中是否存在同样的字符C。上述算法通过将字符c和魔数的乘积(超出int范围之后取低32位)向右移动27位得到的数值,即为TABLE的下标索引,例如字符’u2002’其值为8194,它和1682554634的乘积再右移27位得到0,而TABLE第0个字符就是’u2002’,则判定相等,字符’u3000’的值为12288,应用相同算法得到26,TABLE第26个字符也是’u3000’,同样判定相等。

setBits(BitSet table)
1
2
3
4
5
6
7
@GwtIncompatible // used only from other GwtIncompatible code
@Override
void setBits(BitSet table) {
for (int i = 0; i < TABLE.length(); i++) {
table.set(TABLE.charAt(i));
}
}

将匹配的字符,设置到对应的BitSet中。

breakingWhitespace()

1
2
3
4
5
6
7
8
9
10
11
/**
* Determines whether a character is a breaking whitespace (that is, a whitespace which can be
* interpreted as a break between words for formatting purposes). See {@link #whitespace()} for a
* discussion of that term.
*
* @since 19.0 (since 2.0 as constant {@code BREAKING_WHITESPACE})
*/
public static CharMatcher breakingWhitespace() {
return BreakingWhitespace.INSTANCE;
}

返回一个BreakingWhitespace的实例。该实例确认字符是否为Breaking Whitespace(翻译不了)。

BreakingWhitespace实现类

主要属性

1
static final CharMatcher INSTANCE = new BreakingWhitespace();

BreankingWhitespace的实例

主要方法

matches(char)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Override
public boolean matches(char c) {
switch (c) {
case '\t':
case '\n':
case '\013':
case '\f':
case '\r':
case ' ':
case '\u0085':
case '\u1680':
case '\u2028':
case '\u2029':
case '\u205f':
case '\u3000':
return true;
case '\u2007':
return false;
default:
return c >= '\u2000' && c <= '\u200a';
}
}

switch case 判断是否符合特定的条件

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.breakingWhitespace()";
}

返回给定的字符串CharMatcher.breakingWhitespace()

ascii()

1
2
3
4
5
6
7
8
/**
* Determines whether a character is ASCII, meaning that its code point is less than 128.
*
* @since 19.0 (since 1.0 as constant {@code ASCII})
*/
public static CharMatcher ascii() {
return Ascii.INSTANCE;
}

返回一个Ascii的实例。该实例判断字符是否为Ascii字符。

Ascii实现类

主要属性

1
static final Ascii INSTANCE = new Ascii();

Ascii的实例

主要方法

Ascii()
1
2
3
Ascii() {
super("CharMatcher.ascii()");
}

构造函数,直接调用父类的构造函数。

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return c <= '\u007f';
}

直接判断当前字符是否小于等于127.

digit()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Determines whether a character is a BMP digit according to <a
* href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cp%7Bdigit%7D">Unicode</a>. If
* you only care to match ASCII digits, you can use {@code inRange('0', '9')}.
*
* @deprecated Many digits are supplementary characters; see the class documentation.
* @since 19.0 (since 1.0 as constant {@code DIGIT})
*/
@Deprecated
public static CharMatcher digit() {
return Digit.INSTANCE;
}

返回一个Digit实例。该实例根据Unicode确认字符是否为BMP数字。

Digit实现类

主要属性

1
2
3
4
5
6
7
// Must be in ascending order.
private static final String ZEROES =
"0\u0660\u06f0\u07c0\u0966\u09e6\u0a66\u0ae6\u0b66\u0be6\u0c66\u0ce6\u0d66\u0de6"
+ "\u0e50\u0ed0\u0f20\u1040\u1090\u17e0\u1810\u1946\u19d0\u1a80\u1a90\u1b50\u1bb0"
+ "\u1c40\u1c50\ua620\ua8d0\ua900\ua9d0\ua9f0\uaa50\uabf0\uff10";

static final Digit INSTANCE = new Digit();

ZEROS为0在UNICODE中的各种表现。

主要方法

zeros()
1
2
3
private static char[] zeroes() {
return ZEROES.toCharArray();
}

返回0的数组

nines()
1
2
3
4
5
6
7
private static char[] nines() {
char[] nines = new char[ZEROES.length()];
for (int i = 0; i < ZEROES.length(); i++) {
nines[i] = (char) (ZEROES.charAt(i) + 9);
}
return nines;
}

返回9的数组。

DIgit()
1
2
3
private Digit() {
super("CharMatcher.digit()", zeroes(), nines());
}

调用父类的构造方法,传入上届和下届

javaDigit()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Determines whether a character is a BMP digit according to {@linkplain Character#isDigit(char)
* Java's definition}. If you only care to match ASCII digits, you can use {@code inRange('0',
* '9')}.
*
* @deprecated Many digits are supplementary characters; see the class documentation.
* @since 19.0 (since 1.0 as constant {@code JAVA_DIGIT})
*/
@Deprecated
public static CharMatcher javaDigit() {
return JavaDigit.INSTANCE;
}

返回JavaDigit的实例。该实例实现了JavaDigit

JavaDigit实现类

主要属性

1
static final JavaDigit INSTANCE = new JavaDigit();

主要方法

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return Character.isDigit(c);
}

直接调用了Character.isDigit方法

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.javaDigit()";
}

javaLetter()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Determines whether a character is a BMP letter according to {@linkplain
* Character#isLetter(char) Java's definition}. If you only care to match letters of the Latin
* alphabet, you can use {@code inRange('a', 'z').or(inRange('A', 'Z'))}.
*
* @deprecated Most letters are supplementary characters; see the class documentation.
* @since 19.0 (since 1.0 as constant {@code JAVA_LETTER})
*/
@Deprecated
public static CharMatcher javaLetter() {
return JavaLetter.INSTANCE;
}

返回JavaLetter的实例。该实例根据Java的定义,判断一个BMP 字母是不是一个Java 字母。

JavaLetter实现类

主要属性

1
static final JavaLetter INSTANCE = new JavaLetter();

主要方法

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return Character.isLetter(c);
}

直接调用Java的isLetter方法判断

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.javaLetter()";
}

javaLetterOrDigit()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Determines whether a character is a BMP letter or digit according to {@linkplain
* Character#isLetterOrDigit(char) Java's definition}.
*
* @deprecated Most letters and digits are supplementary characters; see the class documentation.
* @since 19.0 (since 1.0 as constant {@code JAVA_LETTER_OR_DIGIT}).
*/
@Deprecated
public static CharMatcher javaLetterOrDigit() {
return JavaLetterOrDigit.INSTANCE;
}

返回一个JavaLetterOrDigit实例。该实例判断一个字符是否为Java的数字或者字母。

JavaLetterOrDigit实现类

主要属性

1
static final JavaLetterOrDigit INSTANCE = new JavaLetterOrDigit();

主要方法

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return Character.isLetterOrDigit(c);
}

直接调用Java的isLetterOrDigit方法

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.javaLetterOrDigit()";
}

javaUpperCase()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Determines whether a BMP character is upper case according to {@linkplain
* Character#isUpperCase(char) Java's definition}.
*
* @deprecated Some uppercase characters are supplementary characters; see the class
* documentation.
* @since 19.0 (since 1.0 as constant {@code JAVA_UPPER_CASE})
*/
@Deprecated
public static CharMatcher javaUpperCase() {
return JavaUpperCase.INSTANCE;
}

返回一个JavaUpperCase实例。该实例判断给定的字符是否是Java中的大写字母。

JavaUpperCase实现类

主要属性

1
static final JavaUpperCase INSTANCE = new JavaUpperCase();

主要方法

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return Character.isUpperCase(c);
}

调用Java的isUpperCase方法,判断。

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.javaUpperCase()";
}

javaLowerCase()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Determines whether a BMP character is lower case according to {@linkplain
* Character#isLowerCase(char) Java's definition}.
*
* @deprecated Some lowercase characters are supplementary characters; see the class
* documentation.
* @since 19.0 (since 1.0 as constant {@code JAVA_LOWER_CASE})
*/
@Deprecated
public static CharMatcher javaLowerCase() {
return JavaLowerCase.INSTANCE;
}

返回一个JavaLowerCase实例。该实例判断给定的字符是否是Java中的小谢字符。

JavaLowerCase实现类

主要属性

1
static final JavaLowerCase INSTANCE = new JavaLowerCase();

主要方法

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return Character.isLowerCase(c);
}
toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.javaLowerCase()";
}

javaIsoControl()

1
2
3
4
5
6
7
8
9
10
11
/**
* Determines whether a character is an ISO control character as specified by {@link
* Character#isISOControl(char)}.
*
* <p>All ISO control codes are on the BMP and thus supported by this API.
*
* @since 19.0 (since 1.0 as constant {@code JAVA_ISO_CONTROL})
*/
public static CharMatcher javaIsoControl() {
return JavaIsoControl.INSTANCE;
}

返回一个JavaIsoControl的实例。该实例判断给定的字符是否是Java中的控制字符。

JavaIsoControl实现类

主要属性

1
static final JavaIsoControl INSTANCE = new JavaIsoControl();

主要方法

matches
1
2
3
4
5
@Override
public boolean matches(char c) {
return c <= '\u001f' || (c >= '\u007f' && c <= '\u009f');
}

根据范围判断

JavaIsoControl
1
2
3
private JavaIsoControl() {
super("CharMatcher.javaIsoControl()");
}

invisible()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
* Determines whether a character is invisible; that is, if its Unicode category is any of
* SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and
* PRIVATE_USE according to ICU4J.
*
* <p>See also the Unicode Default_Ignorable_Code_Point property (available via ICU).
*
* @deprecated Most invisible characters are supplementary characters; see the class
* documentation.
* @since 19.0 (since 1.0 as constant {@code INVISIBLE})
*/
@Deprecated
public static CharMatcher invisible() {
return Invisible.INSTANCE;
}

返回一个Invisible的实例。该实例确认给定的字符是否是不可见字符。

Invisible实现类

主要属性

1
2
3
4
5
6
7
8
9
10
11
12
// Plug the following UnicodeSet pattern into
// https://unicode.org/cldr/utility/list-unicodeset.jsp
// [[[:Zs:][:Zl:][:Zp:][:Cc:][:Cf:][:Cs:][:Co:]]&[\u0000-\uFFFF]]
// with the "Abbreviate" option, and get the ranges from there.
private static final String RANGE_STARTS =
"\u0000\u007f\u00ad\u0600\u061c\u06dd\u070f\u08e2\u1680\u180e\u2000\u2028\u205f\u2066"
+ "\u3000\ud800\ufeff\ufff9";
private static final String RANGE_ENDS = // inclusive ends
"\u0020\u00a0\u00ad\u0605\u061c\u06dd\u070f\u08e2\u1680\u180e\u200f\u202f\u2064\u206f"
+ "\u3000\uf8ff\ufeff\ufffb";

static final Invisible INSTANCE = new Invisible();

主要方法

Invisible()
1
2
3
private Invisible() {
super("CharMatcher.invisible()", RANGE_STARTS.toCharArray(), RANGE_ENDS.toCharArray());
}

调用父类RnagesMatcher方法,传入上界和下界。

singleWidth()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/**
* Determines whether a character is single-width (not double-width). When in doubt, this matcher
* errs on the side of returning {@code false} (that is, it tends to assume a character is
* double-width).
*
* <p><b>Note:</b> as the reference file evolves, we will modify this matcher to keep it up to
* date.
*
* <p>See also <a href="http://www.unicode.org/reports/tr11/">UAX #11 East Asian Width</a>.
*
* @deprecated Many such characters are supplementary characters; see the class documentation.
* @since 19.0 (since 1.0 as constant {@code SINGLE_WIDTH})
*/
@Deprecated
public static CharMatcher singleWidth() {
return SingleWidth.INSTANCE;
}

确定字符是否为全角(不是全角)。 如有疑问,该匹配器在返回false时出错(也就是说,它倾向于假定一个字符是双倍宽度)。

SingleWidth实现类

主要属性

1
static final SingleWidth INSTANCE = new SingleWidth();

主要方法

SingleWidth()
1
2
3
4
5
6
private SingleWidth() {
super(
"CharMatcher.singleWidth()",
"\u0000\u05be\u05d0\u05f3\u0600\u0750\u0e00\u1e00\u2100\ufb50\ufe70\uff61".toCharArray(),
"\u04f9\u05be\u05ea\u05f4\u06ff\u077f\u0e7f\u20af\u213a\ufdff\ufeff\uffdc".toCharArray());
}

传入上下界

is(final char)

1
2
3
4
/** Returns a {@code char} matcher that matches only one specified BMP character. */
public static CharMatcher is(final char match) {
return new Is(match);
}

返回一个Is实例。该实例判断某个字符是否符合当前实例。

Is实现类

主要属性

1
private final char match;

需要匹配的字符

主要方法

Is(char)
1
2
3
Is(char match) {
this.match = match;
}

构造函数,赋值当前的字符

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return c == match;
}

给定的字符是否匹配is指定的字符

replaceFrom(CharSequence, char)
1
2
3
4
@Override
public String replaceFrom(CharSequence sequence, char replacement) {
return sequence.toString().replace(match, replacement);
}

将给定的字符串中match字符,替换为replacement

and(CharMatcher)
1
2
3
4
@Override
public CharMatcher and(CharMatcher other) {
return other.matches(match) ? this : none();
}

如果other CharMatcher匹配上了match,返回当前charMatcher,否则返回none,即一个也没匹配上。

or(CharMatcher)
1
2
3
4
5
@Override
public CharMatcher or(CharMatcher other) {
return other.matches(match) ? other : super.or(other);
}

如果other CharMatcher匹配上了match,返回other(短路),否则返回super.or。

negate()
1
2
3
4
@Override
public CharMatcher negate() {
return isNot(match);
}

取反,返回isNot(match)

setBits(Bitset)
1
2
3
4
5
@GwtIncompatible // used only from other GwtIncompatible code
@Override
void setBits(BitSet table) {
table.set(match);
}

将对应位置上的位设置为1

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.is('" + showCharacter(match) + "')";
}

isNot(final char)

1
2
3
4
5
6
7
8
/**
* Returns a {@code char} matcher that matches any character except the BMP character specified.
*
* <p>To negate another {@code CharMatcher}, use {@link #negate()}.
*/
public static CharMatcher isNot(final char match) {
return new IsNot(match);
}

返回一个CharMatcher。这个CharMatcher匹配除了当前给定字符的所有字符。

IsNot实现类

主要属性

1
private final char match;

需要和指定字符匹配的字符。Is的作用就是判断给定的字符是不是match。

主要方法

IsNot(char)
1
2
3
IsNot(char match) {
this.match = match;
}

构造函数

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return c != match;
}

判断给定的字符是不是和is持有的字符一致。

and(Charmatcher)
1
2
3
4
@Override
public CharMatcher and(CharMatcher other) {
return other.matches(match) ? super.and(other) : other;
}

如果other 匹配了match,再调用isNot的父类的and方法,否则,返回other

or(CharMatcher)
1
2
3
4
@Override
public CharMatcher or(CharMatcher other) {
return other.matches(match) ? any() : this;
}

如果other匹配了当前字符,返回any,否则,返回当前。

negate()
1
2
3
4
5
@Override
public CharMatcher negate() {
return is(match);
}

返回is(match)

setBits(BitSet table)
1
2
3
4
5
6
@GwtIncompatible // used only from other GwtIncompatible code
@Override
void setBits(BitSet table) {
table.set(0, match);
table.set(match + 1, Character.MAX_VALUE + 1);
}
toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.isNot('" + showCharacter(match) + "')";
}

anyOf(final CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
* Returns a {@code char} matcher that matches any BMP character present in the given character
* sequence. Returns a bogus matcher if the sequence contains supplementary characters.
*/
public static CharMatcher anyOf(final CharSequence sequence) {
switch (sequence.length()) {
case 0:
return none();
case 1:
return is(sequence.charAt(0));
case 2:
return isEither(sequence.charAt(0), sequence.charAt(1));
default:
// TODO(lowasser): is it potentially worth just going ahead and building a precomputed
// matcher?
return new AnyOf(sequence);
}
}

返回一个CharMatcher,其实现判断一个给定的字符串,是否匹配任意一个sequence中的字符。

如果sequence的长度为0,返回none。

为1,返回当前字符是不是就是sequence的第一个字符

为2,返回当前字符是不是就是sequence的第一个或者第二个字符

否则,创建一个新AnyOf对象

Anyof实现类

主要属性

1
private final char[] chars;

保存需要匹配的字符

主要方法

Anyof(CharSequence)
1
2
3
4
public AnyOf(CharSequence chars) {
this.chars = chars.toString().toCharArray();
Arrays.sort(this.chars);
}

构造函数,通过给定的字符串,转为字符数组并排序。

排序的原因是在后面使用matches方法的时候,可以使用二分查找。

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return Arrays.binarySearch(chars, c) >= 0;
}

二分查找匹配的字符。

setBits(BitSet)
1
2
3
4
5
6
7
@Override
@GwtIncompatible // used only from other GwtIncompatible code
void setBits(BitSet table) {
for (char c : chars) {
table.set(c);
}
}
toString()
1
2
3
4
5
6
7
8
9
@Override
public String toString() {
StringBuilder description = new StringBuilder("CharMatcher.anyOf(\"");
for (char c : chars) {
description.append(showCharacter(c));
}
description.append("\")");
return description.toString();
}

noneOf(CharSequence)

1
2
3
4
5
6
7
8
/**
* Returns a {@code char} matcher that matches any BMP character not present in the given
* character sequence. Returns a bogus matcher if the sequence contains supplementary characters.
*/
public static CharMatcher noneOf(CharSequence sequence) {
return anyOf(sequence).negate();
}

返回anyOf的相反

inRange(final char, final char)

1
2
3
4
5
6
7
8
9
10
/**
* Returns a {@code char} matcher that matches any character in a given BMP range (both endpoints
* are inclusive). For example, to match any lowercase letter of the English alphabet, use {@code
* CharMatcher.inRange('a', 'z')}.
*
* @throws IllegalArgumentException if {@code endInclusive < startInclusive}
*/
public static CharMatcher inRange(final char startInclusive, final char endInclusive) {
return new InRange(startInclusive, endInclusive);
}

生成指定上下界的InRange对象

InRange实现类

主要属性

1
2
private final char startInclusive;
private final char endInclusive;

上下界,闭区间

主要方法

InRange(char, char)
1
2
3
4
5
InRange(char startInclusive, char endInclusive) {
checkArgument(endInclusive >= startInclusive);
this.startInclusive = startInclusive;
this.endInclusive = endInclusive;
}

构造函数,下界一定要小于等于上界

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return startInclusive <= c && c <= endInclusive;
}

判断当前字符是否在给定的上下界内

setBits(BitSet)
1
2
3
4
5
@GwtIncompatible // used only from other GwtIncompatible code
@Override
void setBits(BitSet table) {
table.set(startInclusive, endInclusive + 1);
}
toString()
1
2
3
4
5
6
7
8
@Override
public String toString() {
return "CharMatcher.inRange('"
+ showCharacter(startInclusive)
+ "', '"
+ showCharacter(endInclusive)
+ "')";
}

forPredicate(Predicate)

1
2
3
4
5
6
7
/**
* Returns a matcher with identical behavior to the given {@link Character}-based predicate, but
* which operates on primitive {@code char} instances instead.
*/
public static CharMatcher forPredicate(final Predicate<? super Character> predicate) {
return predicate instanceof CharMatcher ? (CharMatcher) predicate : new ForPredicate(predicate);
}

返回一个断言

forPredicate实现类

主要属性

1
private final Predicate<? super Character> predicate;

断言

主要方法

ForPredicate
1
2
3
ForPredicate(Predicate<? super Character> predicate) {
this.predicate = checkNotNull(predicate);
}

构造方法

matches(char)
1
2
3
4
@Override
public boolean matches(char c) {
return predicate.apply(c);
}

返回当前字符调用断言apply方法的结果。

apply(Character)
1
2
3
4
5
@SuppressWarnings("deprecation") // intentional; deprecation is for callers primarily
@Override
public boolean apply(Character character) {
return predicate.apply(checkNotNull(character));
}

返回当前字符调用断言apply方法的结果。

toString()
1
2
3
4
@Override
public String toString() {
return "CharMatcher.forPredicate(" + predicate + ")";
}

CharMatcher()

1
2
3
4
5
/**
* Constructor for use by subclasses. When subclassing, you may want to override {@code
* toString()} to provide a useful description.
*/
protected CharMatcher() {}

构造方法。

matches(char)

1
2
/** Determines a true or false value for the given character. */
public abstract boolean matches(char c);

返回当前CharMatcher实例是否匹配字符c

negate()

1
2
3
4
5
6
/** Returns a matcher that matches any character not matched by this matcher. */
// @Override under Java 8 but not under Java 7
@Override
public CharMatcher negate() {
return new Negated(this);
}

返回一个CharMatcher,匹配所有当前matcher不能匹配的matcher

and(CharMatcher)

1
2
3
4
5
6
/**
* Returns a matcher that matches any character matched by both this matcher and {@code other}.
*/
public CharMatcher and(CharMatcher other) {
return new And(this, other);
}

返回一个CharMatcher,该Matcher可以匹配两个matcher都能匹配的字符。逻辑上的与

or(CharMatcher)

1
2
3
4
5
6
/**
* Returns a matcher that matches any character matched by either this matcher or {@code other}.
*/
public CharMatcher or(CharMatcher other) {
return new Or(this, other);
}

返回一个CharMatcher,该Matcher可以匹配两个matcher能匹配的字符的合集。逻辑上的或。

precomputed()

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Returns a {@code char} matcher functionally equivalent to this one, but which may be faster to
* query than the original; your mileage may vary. Precomputation takes time and is likely to be
* worthwhile only if the precomputed matcher is queried many thousands of times.
*
* <p>This method has no effect (returns {@code this}) when called in GWT: it's unclear whether a
* precomputed matcher is faster, but it certainly consumes more memory, which doesn't seem like a
* worthwhile tradeoff in a browser.
*/
public CharMatcher precomputed() {
return Platform.precomputeCharMatcher(this);
}

返回一个CharMatcher,该CharMather和this一致,但是可能会在速度上比较有优势。

precomputedInternal()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
* This is the actual implementation of {@link #precomputed}, but we bounce calls through a method
* on {@link Platform} so that we can have different behavior in GWT.
*
* <p>This implementation tries to be smart in a number of ways. It recognizes cases where the
* negation is cheaper to precompute than the matcher itself; it tries to build small hash tables
* for matchers that only match a few characters, and so on. In the worst-case scenario, it
* constructs an eight-kilobyte bit array and queries that. In many situations this produces a
* matcher which is faster to query than the original.
*/
@GwtIncompatible // SmallCharMatcher
CharMatcher precomputedInternal() {
final BitSet table = new BitSet();
setBits(table);
int totalCharacters = table.cardinality();
if (totalCharacters * 2 <= DISTINCT_CHARS) {
return precomputedPositive(totalCharacters, table, toString());
} else {
// TODO(lowasser): is it worth it to worry about the last character of large matchers?
table.flip(Character.MIN_VALUE, Character.MAX_VALUE + 1);
int negatedCharacters = DISTINCT_CHARS - totalCharacters;
String suffix = ".negate()";
final String description = toString();
String negatedDescription =
description.endsWith(suffix)
? description.substring(0, description.length() - suffix.length())
: description + suffix;
return new NegatedFastMatcher(
precomputedPositive(negatedCharacters, table, negatedDescription)) {
@Override
public String toString() {
return description;
}
};
}
}

precomputedPositive

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* Helper method for {@link #precomputedInternal} that doesn't test if the negation is cheaper.
*/
@GwtIncompatible // SmallCharMatcher
private static CharMatcher precomputedPositive(
int totalCharacters, BitSet table, String description) {
switch (totalCharacters) {
case 0:
return none();
case 1:
return is((char) table.nextSetBit(0));
case 2:
char c1 = (char) table.nextSetBit(0);
char c2 = (char) table.nextSetBit(c1 + 1);
return isEither(c1, c2);
default:
return isSmall(totalCharacters, table.length())
? SmallCharMatcher.from(table, description)
: new BitSetMatcher(table, description);
}
}

isSmall(int, int)

1
2
3
4
5
6
@GwtIncompatible // SmallCharMatcher
private static boolean isSmall(int totalCharacters, int tableLength) {
return totalCharacters <= SmallCharMatcher.MAX_SIZE
&& tableLength > (totalCharacters * 4 * Character.SIZE);
// err on the side of BitSetMatcher
}

setBits(BitSet)

1
2
3
4
5
6
7
8
9
/** Sets bits in {@code table} matched by this matcher. */
@GwtIncompatible // used only from other GwtIncompatible code
void setBits(BitSet table) {
for (int c = Character.MAX_VALUE; c >= Character.MIN_VALUE; c--) {
if (matches((char) c)) {
table.set(c);
}
}
}

matchesAnyOf(Charsequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
* Returns {@code true} if a character sequence contains at least one matching BMP character.
* Equivalent to {@code !matchesNoneOf(sequence)}.
*
* <p>The default implementation iterates over the sequence, invoking {@link #matches} for each
* character, until this returns {@code true} or the end is reached.
*
* @param sequence the character sequence to examine, possibly empty
* @return {@code true} if this matcher matches at least one character in the sequence
* @since 8.0
*/
public boolean matchesAnyOf(CharSequence sequence) {
return !matchesNoneOf(sequence);
}

当前CharMatcher是否匹配给定字符串中的任意字符。

matchesAllOf(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
* Returns {@code true} if a character sequence contains only matching BMP characters.
*
* <p>The default implementation iterates over the sequence, invoking {@link #matches} for each
* character, until this returns {@code false} or the end is reached.
*
* @param sequence the character sequence to examine, possibly empty
* @return {@code true} if this matcher matches every character in the sequence, including when
* the sequence is empty
*/
public boolean matchesAllOf(CharSequence sequence) {
for (int i = sequence.length() - 1; i >= 0; i--) {
if (!matches(sequence.charAt(i))) {
return false;
}
}
return true;
}

当前CharMatcher是否匹配给定字符串中的所有字符。

matchesNoneOf(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
* Returns {@code true} if a character sequence contains no matching BMP characters. Equivalent to
* {@code !matchesAnyOf(sequence)}.
*
* <p>The default implementation iterates over the sequence, invoking {@link #matches} for each
* character, until this returns {@code true} or the end is reached.
*
* @param sequence the character sequence to examine, possibly empty
* @return {@code true} if this matcher matches no characters in the sequence, including when the
* sequence is empty
*/
public boolean matchesNoneOf(CharSequence sequence) {
return indexIn(sequence) == -1;
}

当前CharMatcher是否都不匹配给定字符串中的所有字符。

indexIn(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Returns the index of the first matching BMP character in a character sequence, or {@code -1} if
* no matching character is present.
*
* <p>The default implementation iterates over the sequence in forward order calling {@link
* #matches} for each character.
*
* @param sequence the character sequence to examine from the beginning
* @return an index, or {@code -1} if no character matches
*/
public int indexIn(CharSequence sequence) {
return indexIn(sequence, 0);
}

找到指定字符串中第一个符合CharMatcher的字符的索引

indexIn(CharSequence, CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* Returns the index of the first matching BMP character in a character sequence, starting from a
* given position, or {@code -1} if no character matches after that position.
*
* <p>The default implementation iterates over the sequence in forward order, beginning at {@code
* start}, calling {@link #matches} for each character.
*
* @param sequence the character sequence to examine
* @param start the first index to examine; must be nonnegative and no greater than {@code
* sequence.length()}
* @return the index of the first matching character, guaranteed to be no less than {@code start},
* or {@code -1} if no character matches
* @throws IndexOutOfBoundsException if start is negative or greater than {@code
* sequence.length()}
*/
public int indexIn(CharSequence sequence, int start) {
int length = sequence.length();
checkPositionIndex(start, length);
for (int i = start; i < length; i++) {
if (matches(sequence.charAt(i))) {
return i;
}
}
return -1;
}

找到指定字符串中第一个符合CharMatcher的字符串的索引

lastIndexIn(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
* Returns the index of the last matching BMP character in a character sequence, or {@code -1} if
* no matching character is present.
*
* <p>The default implementation iterates over the sequence in reverse order calling {@link
* #matches} for each character.
*
* @param sequence the character sequence to examine from the end
* @return an index, or {@code -1} if no character matches
*/
public int lastIndexIn(CharSequence sequence) {
for (int i = sequence.length() - 1; i >= 0; i--) {
if (matches(sequence.charAt(i))) {
return i;
}
}
return -1;
}

找到指定字符串中第一个符合CharMatcher的字符的索引(从字符串的末尾还是寻找)

countIn(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Returns the number of matching {@code char}s found in a character sequence.
*
* <p>Counts 2 per supplementary character, such as for {@link #whitespace}().{@link #negate}().
*/
public int countIn(CharSequence sequence) {
int count = 0;
for (int i = 0; i < sequence.length(); i++) {
if (matches(sequence.charAt(i))) {
count++;
}
}
return count;
}

统计符合条件的字符的总数

removeFrom(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/**
* Returns a string containing all non-matching characters of a character sequence, in order. For
* example:
*
* <pre>{@code
* CharMatcher.is('a').removeFrom("bazaar")
* }</pre>
*
* ... returns {@code "bzr"}.
*/
public String removeFrom(CharSequence sequence) {
String string = sequence.toString();
int pos = indexIn(string);
if (pos == -1) {
return string;
}

char[] chars = string.toCharArray();
int spread = 1;

// This unusual loop comes from extensive benchmarking
OUT:
while (true) {
pos++;
while (true) {
if (pos == chars.length) {
break OUT;
}
if (matches(chars[pos])) {
break;
}
chars[pos - spread] = chars[pos];
pos++;
}
spread++;
}
return new String(chars, 0, pos - spread);
}

从给定的字符串中删除匹配的字符

retainFrom(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Returns a string containing all matching BMP characters of a character sequence, in order. For
* example:
*
* <pre>{@code
* CharMatcher.is('a').retainFrom("bazaar")
* }</pre>
*
* ... returns {@code "aaa"}.
*/
public String retainFrom(CharSequence sequence) {
return negate().removeFrom(sequence);
}

从给定的字符串中删除除匹配的字符之外的字符

replaceFrom(CharSequence, Char)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
* Returns a string copy of the input character sequence, with each matching BMP character
* replaced by a given replacement character. For example:
*
* <pre>{@code
* CharMatcher.is('a').replaceFrom("radar", 'o')
* }</pre>
*
* ... returns {@code "rodor"}.
*
* <p>The default implementation uses {@link #indexIn(CharSequence)} to find the first matching
* character, then iterates the remainder of the sequence calling {@link #matches(char)} for each
* character.
*
* @param sequence the character sequence to replace matching characters in
* @param replacement the character to append to the result string in place of each matching
* character in {@code sequence}
* @return the new string
*/
public String replaceFrom(CharSequence sequence, char replacement) {
String string = sequence.toString();
int pos = indexIn(string);
if (pos == -1) {
return string;
}
char[] chars = string.toCharArray();
chars[pos] = replacement;
for (int i = pos + 1; i < chars.length; i++) {
if (matches(chars[i])) {
chars[i] = replacement;
}
}
return new String(chars);
}

将符合条件的字符替换为指定的字符

replaceFrom(CharSequence, CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/**
* Returns a string copy of the input character sequence, with each matching BMP character
* replaced by a given replacement sequence. For example:
*
* <pre>{@code
* CharMatcher.is('a').replaceFrom("yaha", "oo")
* }</pre>
*
* ... returns {@code "yoohoo"}.
*
* <p><b>Note:</b> If the replacement is a fixed string with only one character, you are better
* off calling {@link #replaceFrom(CharSequence, char)} directly.
*
* @param sequence the character sequence to replace matching characters in
* @param replacement the characters to append to the result string in place of each matching
* character in {@code sequence}
* @return the new string
*/
public String replaceFrom(CharSequence sequence, CharSequence replacement) {
int replacementLen = replacement.length();
if (replacementLen == 0) {
return removeFrom(sequence);
}
if (replacementLen == 1) {
return replaceFrom(sequence, replacement.charAt(0));
}

String string = sequence.toString();
int pos = indexIn(string);
if (pos == -1) {
return string;
}

int len = string.length();
StringBuilder buf = new StringBuilder((len * 3 / 2) + 16);

int oldpos = 0;
do {
buf.append(string, oldpos, pos);
buf.append(replacement);
oldpos = pos + 1;
pos = indexIn(string, oldpos);
} while (pos != -1);

buf.append(string, oldpos, len);
return buf.toString();
}

将符合条件的字符替换为指定的字符串

trimFrom(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/**
* Returns a substring of the input character sequence that omits all matching BMP characters from
* the beginning and from the end of the string. For example:
*
* <pre>{@code
* CharMatcher.anyOf("ab").trimFrom("abacatbab")
* }</pre>
*
* ... returns {@code "cat"}.
*
* <p>Note that:
*
* <pre>{@code
* CharMatcher.inRange('\0', ' ').trimFrom(str)
* }</pre>
*
* ... is equivalent to {@link String#trim()}.
*/
public String trimFrom(CharSequence sequence) {
int len = sequence.length();
int first;
int last;

for (first = 0; first < len; first++) {
if (!matches(sequence.charAt(first))) {
break;
}
}
for (last = len - 1; last > first; last--) {
if (!matches(sequence.charAt(last))) {
break;
}
}

return sequence.subSequence(first, last + 1).toString();
}

删除字符串中匹配的字符

trimLeadingFrom(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* Returns a substring of the input character sequence that omits all matching BMP characters from
* the beginning of the string. For example:
*
* <pre>{@code
* CharMatcher.anyOf("ab").trimLeadingFrom("abacatbab")
* }</pre>
*
* ... returns {@code "catbab"}.
*/
public String trimLeadingFrom(CharSequence sequence) {
int len = sequence.length();
for (int first = 0; first < len; first++) {
if (!matches(sequence.charAt(first))) {
return sequence.subSequence(first, len).toString();
}
}
return "";
}

删除给定字符串中匹配的字符,从头部开始删除。遇到不符合条件的字符就停止

trimTrailingFrom(CharSequence)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* Returns a substring of the input character sequence that omits all matching BMP characters from
* the end of the string. For example:
*
* <pre>{@code
* CharMatcher.anyOf("ab").trimTrailingFrom("abacatbab")
* }</pre>
*
* ... returns {@code "abacat"}.
*/
public String trimTrailingFrom(CharSequence sequence) {
int len = sequence.length();
for (int last = len - 1; last >= 0; last--) {
if (!matches(sequence.charAt(last))) {
return sequence.subSequence(0, last + 1).toString();
}
}
return "";
}

删除给定字符串中匹配的字符,从尾部开始删除。遇到不符合条件的字符就停止

collapseFrom(CharSequence, char)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/**
* Returns a string copy of the input character sequence, with each group of consecutive matching
* BMP characters replaced by a single replacement character. For example:
*
* <pre>{@code
* CharMatcher.anyOf("eko").collapseFrom("bookkeeper", '-')
* }</pre>
*
* ... returns {@code "b-p-r"}.
*
* <p>The default implementation uses {@link #indexIn(CharSequence)} to find the first matching
* character, then iterates the remainder of the sequence calling {@link #matches(char)} for each
* character.
*
* @param sequence the character sequence to replace matching groups of characters in
* @param replacement the character to append to the result string in place of each group of
* matching characters in {@code sequence}
* @return the new string
*/
public String collapseFrom(CharSequence sequence, char replacement) {
// This implementation avoids unnecessary allocation.
int len = sequence.length();
for (int i = 0; i < len; i++) {
char c = sequence.charAt(i);
if (matches(c)) {
if (c == replacement && (i == len - 1 || !matches(sequence.charAt(i + 1)))) {
// a no-op replacement
i++;
} else {
StringBuilder builder = new StringBuilder(len).append(sequence, 0, i).append(replacement);
return finishCollapseFrom(sequence, i + 1, len, replacement, builder, true);
}
}
}
// no replacement needed
return sequence.toString();
}

删除匹配条件的连续字符

trimAndCollapseFrom(CharSequence, char)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
* Collapses groups of matching characters exactly as {@link #collapseFrom} does, except that
* groups of matching BMP characters at the start or end of the sequence are removed without
* replacement.
*/
public String trimAndCollapseFrom(CharSequence sequence, char replacement) {
// This implementation avoids unnecessary allocation.
int len = sequence.length();
int first = 0;
int last = len - 1;

while (first < len && matches(sequence.charAt(first))) {
first++;
}

while (last > first && matches(sequence.charAt(last))) {
last--;
}

return (first == 0 && last == len - 1)
? collapseFrom(sequence, replacement)
: finishCollapseFrom(
sequence, first, last + 1, replacement, new StringBuilder(last + 1 - first), false);
}

apply(Character)

1
2
3
4
5
6
7
8
9
/**
* @deprecated Provided only to satisfy the {@link Predicate} interface; use {@link #matches}
* instead.
*/
@Deprecated
@Override
public boolean apply(Character character) {
return matches(character);
}

为了满足Predicate方法,实现的apply,内部走的是matches的逻辑

toString()

1
2
3
4
5
6
7
8
/**
* Returns a string representation of this {@code CharMatcher}, such as {@code
* CharMatcher.or(WHITESPACE, JAVA_DIGIT)}.
*/
@Override
public String toString() {
return super.toString();
}

showCharacter(char)

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* Returns the Java Unicode escape sequence for the given {@code char}, in the form "\u12AB" where
* "12AB" is the four hexadecimal digits representing the 16-bit code unit.
*/
private static String showCharacter(char c) {
String hex = "0123456789ABCDEF";
char[] tmp = {'\\', 'u', '\0', '\0', '\0', '\0'};
for (int i = 0; i < 4; i++) {
tmp[5 - i] = hex.charAt(c & 0xF);
c = (char) (c >> 4);
}
return String.copyValueOf(tmp);
}
0%