Movatterモバイル変換


[0]ホーム

URL:


Loading

Regular expression syntax

Aregular expression is a way to match patterns in data using placeholder characters, called operators.

Elasticsearch supports regular expressions in the following queries:

Elasticsearch usesApache Lucene's regular expression engine to parse these queries.

Lucene’s regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:

. ? + * | { } [ ] ( ) " \

Depending on theoptional operators enabled, the following characters may also be reserved:

# @ & < >  ~

To use one of these characters literally, escape it with a preceding backslash or surround it with double quotes. For example:

\@\\"john@smith.com"
  1. renders as a literal '@'
  2. renders as a literal '\'
  3. renders as 'john@smith.com'
Note

The backslash is an escape character in both JSON strings and regular expressions. You need to escape both backslashes in a query, unless you use a language client, which takes care of this. For example, the stringa\b needs to be indexed as"a\\b":

PUT my-index-000001/_doc/1{  "my_field": "a\\b"}

This document matches the followingregexp query:

GET my-index-000001/_search{  "query": {    "regexp": {      "my_field.keyword": "a\\\\.*"    }  }}

Lucene’s regular expression engine does not use thePerl Compatible Regular Expressions (PCRE) library, but it does support the following standard operators.

.
Matches any character. For example:
ab.
  1. matches 'aba', 'abb', 'abz', etc.
?
Repeat the preceding character zero or one times. Often used to make the preceding character optional. For example:
abc?
  1. matches 'ab' and 'abc'
+
Repeat the preceding character one or more times. For example:
ab+
  1. matches 'ab', 'abb', 'abbb', etc.
*
Repeat the preceding character zero or more times. For example:
ab*
  1. matches 'a', 'ab', 'abb', 'abbb', etc.
{}
Minimum and maximum number of times the preceding character can repeat. For example:
a{{2}}a{2,4}a{2,}
  1. matches 'aa'
  2. matches 'aa', 'aaa', and 'aaaa'
  3. matches 'a` repeated two or more times
|
OR operator. The match will succeed if the longest pattern on either the left side OR the right side matches. For example:
abc|xyz
  1. matches 'abc' and 'xyz'
( … )
Forms a group. You can use a group to treat part of the expression as a single character. For example:
abc(def)?
  1. matches 'abc' and 'abcdef' but not 'abcd'
[ … ]
Match one of the characters in the brackets. For example:
[abc]
  1. matches 'a', 'b', 'c'

Inside the brackets,- indicates a range unless- is the first character or escaped. For example:

[a-c][-abc][abc\-]
  1. matches 'a', 'b', or 'c'
  2. '-' is first character. Matches '-', 'a', 'b', or 'c'
  3. Escapes '-'. Matches 'a', 'b', 'c', or '-'

A^ before a character in the brackets negates the character or range. For example:

[^abc][^a-c][^-abc][^abc\-]
  1. matches any character except 'a', 'b', or 'c'
  2. matches any character except 'a', 'b', or 'c'
  3. matches any character except '-', 'a', 'b', or 'c'
  4. matches any character except 'a', 'b', 'c', or '-'
Note

Character range classes such as[a-c] do not behave as expected when usingcase_insensitive: true — they remain case sensitive. For example,[a-c]+ withcase_insensitive: true will match strings containing only the characters 'a', 'b', and 'c', but not 'A', 'B', or 'C'. Use[a-zA-Z] to match both uppercase and lowercase characters.

This is due to a known limitation in Lucene's regular expression engine.SeeLucene issue #14378 for details.

You can use theflags parameter to enable more optional operators for Lucene’s regular expression engine.

To enable multiple operators, use a| separator. For example, aflags value ofCOMPLEMENT|INTERVAL enables theCOMPLEMENT andINTERVAL operators.

ALL (Default)
Enables all optional operators.
"" (empty string)
Alias for theALL value.
COMPLEMENT
Enables the~ operator. You can use~ to negate the shortest following pattern. For example:
a~bc
  1. matches 'adc' and 'aec' but not 'abc'
EMPTY
Enables the# (empty language) operator. The# operator doesn’t match any string, not even an empty string.

If you create regular expressions by programmatically combining values, you can pass# to specify "no string." This lets you avoid accidentally matching empty strings or other unwanted strings. For example:

#|abc
  1. matches 'abc' but nothing else, not even an empty string
INTERVAL
Enables the<> operators. You can use<> to match a numeric range. For example:
foo<1-100>foo<01-100>
  1. matches 'foo1', 'foo2' ... 'foo99', 'foo100'
  2. matches 'foo01', 'foo02' ... 'foo99', 'foo100'
INTERSECTION
Enables the& operator, which acts as an AND operator. The match will succeed if patterns on both the left side AND the right side matches. For example:
aaa.+&.+bbb
  1. matches 'aaabbb'
ANYSTRING
Enables the@ operator. You can use@ to match any entire string.

You can combine the@ operator with& and~ operators to create an "everything except" logic. For example:

@&~(abc.+)
  1. matches everything except terms beginning with 'abc'
NONE
Disables all optional operators.

Lucene’s regular expression engine does not support anchor operators, such as^ (beginning of line) or$ (end of line). To match a term, the regular expression must match the entire string.


[8]ページ先頭

©2009-2025 Movatter.jp