Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Comparison of regular expression engines

From Wikipedia, the free encyclopedia
(Redirected fromComparison of regular-expression engines)

This is acomparison ofregular expression engines.

Libraries

[edit]
List of regular expression libraries
NameOfficial websiteProgramming languageSoftware licenseUsed by
Boost.Regex[Note 1]Boost C++ LibrariesC++BoostNotepad++ >= 6.0.0,EmEditor
Boost.XpressiveBoost C++ LibrariesC++Boost 
DEELXRegExLabC++Proprietary 
FREJ[Note 2]Fuzzy Regular Expressions for JavaJavaLGPL 
GLib/GRegex[Note 3]GLib reference manualCLGPL 
GNU regexGnulib reference manualCLGPLGNU libc, GNU programs
GRETAMicrosoft ResearchC++Proprietary 
GregexGrovf Inc.RTL, HLSProprietaryFPGA accelerated >100 Gbit/s regex engine for cybersecurity, financial, e-commerce industries.
HyperscanIntelC,x86-specific assembly (SSSE3+[1])3-clause BSDRspamd
ICUInternational Components for UnicodeC, C++[Note 4]ICUFoundation (Apple and Swift open-source versions)
Jakarta RegexpThe Apache Jakarta ProjectJavaApache 
java.util.regexJava's User manualJavaGNU GPLv2 with Classpath exceptionjEdit
JRegexJRegexJavaBSD 
MATLABRegular ExpressionsMATLAB LanguageProprietary 
OnigurumaKosakoCBSDAtom,Take Command Console,Tera Term,TextMate,Sublime Text,SubEthaEdit,EmEditor,jq,Ruby
PattwoStevesoftJava (compatible with Java 1.0)LGPL 
PCREpcre.orgC, C++[Note 5]BSDApache HTTP Server,Nginx,BBEdit,Edbrowse,Julia,HHVM, Notepad++ < 6.0.0,PHP,Delphi,R,Exim,SWI-Prolog,Elixir,Erlang
Qt/QRegExpDigiaArchived 2013-12-12 at theWayback MachineC++Qt GNU GPL v. 3.0,

Qt GNU LGPL v. 2.1,Qt Commercial

Kate,Kile
regex -Henry Spencer's regular expression librariesArgListCBSD 
RE2RE2C++BSDGo, Google Sheets, Gmail, G Suite
Henry Spencer's Advanced Regular ExpressionsTclCBSD 
RGXRGXC++ based component libraryP6R 
RXPTitan ICRTLProprietaryhardware-accelerated search acceleration using RegEx available for ASIC, FPGA and cloud. Enables massively parallel content processing at ultra-high speeds.
SubRegMatt BucknallCMIT 
TPerlRegExTPerlRegEx VCL ComponentObject PascalMPLv1.1 
TRE[Note 2]Ville LaurikariCBSDmusl
TRegExprTRegExpr,documentation,

(RegExp Studio)

Object PascalDual-license:freeware, or LGPL with static linking exceptionTotal Commander
Wolfram Language (Mathematica)Wolfram Language Documentation CenterWolfram LanguageProprietaryMathematica, theWolfram Development Platform
XRegExpXRegExpJavaScriptMIT 
  1. ^Formerly called Regex++.
  2. ^abOne offuzzy regular expression engines.
  3. ^Included since version 2.13.0.
  4. ^ICU4J, the Java version, does not support regular expressions.
  5. ^C++ bindings were developed by Google and became officially part of PCRE in 2006.

Languages

[edit]
List of languages and frameworks including regular expression support
LanguageOfficial websiteSoftware licenseRemarks
ActionScript 3ActionScript Technology CenterFree
APL (APLX, Dyalog, GNU)APL WikiLicensed by the respective implementation⎕SS (PCRE),⎕R/⎕S (PCRE),⎕SS (PCRE2), respectively
C++11 (C++)C++ standards websiteLicensed by the respective implementationSince ISO14822:2011(e), similar to ECMAScript on default(Grammar Description)
DDBoost Software License[Note 1]
Elixirelixir-lang.orgApache 2.0Standard library includes PCRE-basedRegex module. The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and some parts of the library go beyond what PCRE offers. Currently PCRE version 8.40 (release date 2017-01-11) is used.
Erlangerlang.orgApache 2.0Standard library includes PCRE-basedre module. The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and some parts of the library go beyond what PCRE offers. Currently PCRE version 8.40 (release date 2017-01-11) is used.
Free Pascal (Object Pascal)freepascal.orgLGPL with static linking exceptionFree Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; Seewiki.lazarus.freepascal.org/Regexpr.
Gogo.devBSD-style
HaskellHaskell.orgBSD3Omitted in the language report, and in GHC's Hierarchical Libraries
JavaJavaGNU General Public LicenseREs are written as strings in source code: all backslashes must be doubled, harming readability.
JavaScript (ECMAScript)ECMA-262BSD3Limited but REs are first-class citizens of the language with a specific/.../mod syntax.
JuliaJuliaLang.orgMIT LicenseREs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available.
LuaLua.orgMIT LicenseUses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg.
MathematicaWolframProprietary
.NETMSDNMIT License[Note 2][Note 3]
Nimnim-lang.orgMIT LicenseStandard library includes PCRE-basedre andnre modules, as well as various alternatives (ex.strutils,pegs (Parsing Expression Grammar matching),strscans,parseutils, etc.).
OCamlCamlLGPLAs of 2010[update], the standard module is generally regarded as deprecated;[2] often recommended libraries arepcre (with full support for PCRE) andre (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing).
PerlPerl.comArtistic License, orGNU General Public LicenseFull, central part of the language
PHPPHP.netPHP LicenseHas two implementations, with PCRE being the more efficient in speed, functions
POSIX C (C)POSIX.1 web publicationLicensed by the respective implementationSupportsPOSIX BRE and ERE syntax
Pythonpython.orgPython Software Foundation LicensePython has two major implementations, the built inre and theregex library.
Rubyruby-lang.orgGNU Library General Public LicenseRuby 1.8, Ruby 1.9, and Ruby 2.0 and later versions use different engines; Ruby 1.9 integrates Oniguruma, Ruby 2.0 and later integrate Onigmo, a fork from Oniguruma.
Rustdocs.rsMIT LicenseThe primary regexcrate does not allow look-around expressions. There is an Oniguruma binding calledonig that does.
SAP ABAPSAP.comProprietary
Tcltcl.tkTcl/Tk License
(BSD-style)
Tcl library doubles as a regular expression library.
Wolfram LanguageWolfram ResearchProprietary: usable for free on a limited scale on the Wolfram Development platform
XML SchemaW3CLicensed by the respective implementation
XPath 3/XQueryW3CLicensed by the respective implementation
  1. ^"STD.regex - D Programming Language - Digital Mars".
  2. ^"Dotnet/Corefx".GitHub. 16 February 2022.
  3. ^"Dotnet/Corefx".GitHub. 16 February 2022.

Language features

[edit]

NOTE: An application using a library for regular expression support does not necessarily support the full set of features of the library, e.g., GNUgrep uses PCRE, but supports no lookahead, though PCRE does.

Part 1

[edit]
Language feature comparison (part 1)
"+" quantifierNegated character classesNon-greedy quantifiers
[Note 1]
Shy groups
[Note 2]
RecursionLook-aheadLook-behindBackreferences
[Note 3]
>9 indexable captures
Boost.RegexYesYesYesYesYes[Note 4]YesYesYesYes
Boost.XpressiveYesYesYesYesYes[Note 5]YesYesYesYes
CL-PPCREYesYesYesYesNoYesYesYesYes
EmEditorYesYesYesYesNoYesYesYesNo
FREJNo[Note 6]NoSome[Note 6]YesNoNoNoYesYes
GLib/GRegexYesYesYesYesYesYesYesYesYes
GNUgrepYesYesYesYesNoYesYesYes
HaskellYesYesYesYesNoYesYesYesYes
RXPYesYesYesYesNoNoNoYesYes
ICU RegexYesYesYesYesNoYesYesYesYes
JavaYesYesYesYesNoYesYesYesYes
JavaScript (ECMAScript)YesYesYesYesNoYesYes[Note 7]YesYes
JGsoftYesYesYesYesYes[3]YesYesYesYes
LuaYesYesSome[Note 8]NoNoNoNoYesNo
.NETYesYesYesYesNoYesYesYesYes
OCamlYesYesNoNoNoNoNoYesNo
PCREYesYesYesYesYesYesYesYesYes
PerlYesYesYesYesYesYesYesYesYes
PHPYesYesYesYesYesYesYesYesYes
PythonYesYesYesYesYes[Note 9]YesYesYesYes
Qt/QRegExpYesYesYesYesNoYesNoYesYes
RE2YesYesYesYesNoNoNoNoYes
Ruby, OnigmoYesYesYesYesYesYesYesYesYes
TREYesYesYesYesNoNoNoYesNo
VimYesYesYesYesNoYesYesYesNo
RGXYesYesYesYesNoYesYesYesYes
TclYesYesYesYesNoYesYesYesYes
TRegExprYes?Yes??????
XML SchemaYesYesNoNoNoNoNo
XPath 3/XQueryYesYesYesYesNoNoNoYesYes
XRegExpYesYesYesYesNoYesYes[Note 7]YesYes
  1. ^Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.
  2. ^Shy groups, also callednon-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.
  3. ^Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance,([ab]+)\1 matches "abab" but not "abaab".
  4. ^"Perl Regular Expression Syntax - 1.47.0".
  5. ^"User's Guide - 1.47.0".
  6. ^abFREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.
  7. ^abAs of ES2018
  8. ^Lua's only non-greedy quantifier is-, which is a non-greedy version of*. It does not have non-greedy versions of+ or?; in the former case, the non-greedy effect can be achieved by repeating the token followed by-, but in the latter case, there is no equivalent.
  9. ^Supported by the optionalregex library only.

Part 2

[edit]
Language feature comparison (part 2)
Directives
[Note 1]
ConditionalsAtomic groups
[Note 2]
Named capture
[Note 3]
CommentsEmbedded codeUnicode property support[4]Balancing groups
[Note 4]
Variable-length look-behinds
[Note 5]
Boost.RegexYesYesYesYesYesNoSome[Note 6]NoNo
Boost.XpressiveYesNoYesYesYesNoNoNoNo
CL-PPCREYesYesYesYesYesYesSome[Note 6]NoNo
EmEditorYesYes??YesNo?NoNo
FREJNoNoYesYesYesNo?NoNo
GLib/GRegexYesYesYesYesYesNoSome[Note 6]NoNo
GNUgrepYesYes?YesYesNoNoNoNo
Haskell?????NoNoNoNo
RXPYesYesNoYesYesNoNoNoNo
ICU RegexYesNoYesYes[Note 7]YesNoYesNoNo
JavaYesNoYesYes[Note 8]YesNoSome[Note 6]NoNo
JavaScript (ECMAScript)NoNoNoYesNoNoSome[Note 6][Note 9][5]NoYes
JGsoftYesYesYesYesYesNoSome[Note 6]NoYes
LuaNoNoNoNoNoNoNoNoNo
.NETYesYesYesYesYesNoSome[Note 6]YesYes
OCamlNoNoNoNoNoNoNoNoNo
PCREYesYesYesYesYesYesYesNoNo
PerlYesYesYesYesYesYesYesNoNo[Note 10]
PHPYesYesYesYesYesNoNoNoNo
PythonYesYesYes[Note 11]YesYesNoYes[Note 12]NoYes[Note 13]
Qt/QRegExpNoNoNoNoNoNoNoNoNo
RE2YesNo?YesNoNoSome[Note 6]NoNo
Ruby, OnigmoYesYesYesYesYesNoSome[Note 6]NoNo
TclYesNoYesNoYesNoYesNoNo
TREYesNoNoNoYesNo?NoNo
VimYesNoYesNoNoNoNoNoYes
RGXYesYesYesYesYesNoYesNoNo
XML SchemaNoNoNoNoNoNoYesNoNo
XPath 3/XQueryNoNoNoNoNoNoYesNoNo
XRegExpLeading onlyNoNoYesYesNoYesNoYes
  1. ^Also known asflags modifiers,modes modifiers oroption letters. Example pattern: "(?i:test)".
  2. ^Also calledindependent sub-expressions.
  3. ^Similar to back references, but with names instead of indices.
  4. ^Special feature allowing to match balanced constructs without recursion.
  5. ^Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable.
  6. ^abcdefghiUnicode property support may be incomplete (products are continuously updated!).All will be incomplete when a new Unicode revision is releaseduntil they are updated to comply.
  7. ^Available as of ICU55.
  8. ^Available as of JDK7.
  9. ^The support and range of properties is dependent on implementation.
  10. ^Experimental support added in v5.29.9.
  11. ^Supported by Python v3.11 and later, and the optionalregex library only.
  12. ^May only be available in the regex library when used with Python versions after 3.3.
  13. ^Supported by the optionalregex library only.

API features

[edit]
API feature comparison
NativeUTF-16 support[Note 1]NativeUTF-8 support[Note 1]Multi-line matchingPartial match[Note 2]
Boost.RegexNoNoYesYes
GLib/GRegexYesYesYesYes
RXPYesYesNoYes
ICU RegexYesNoYes?
JavaYes[Note 3]Yes[Note 3]YesYes
.NETNo[Note 4]YesYes?
PCREYes[Note 5]YesYesYes
Qt/QRegExpYesNoNoYes[Note 6]
Qt/QRegularExpressionYesYesYesYes
TclYesYes[Note 7]Yes?
TREYesYesYes?
RGXNoNoYes?
wxWidgets::wxRegEx[Note 8]YesYesYes?
XRegExpYesYesYesNo
  1. ^abMeans the format can be used internally without explicit conversion.
  2. ^Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully.[1].
  3. ^abSupports Unicode 15.0 standard from 2023.[2].
  4. ^Implementation uses originalUCS-2 support/features, so it only recognizes 64K chars total (vsUTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010.[3].
  5. ^Since version 8.30.
  6. ^Partial matching is performed implicitly, requiring a separate call to matchedLength() if an exact match fails.
  7. ^Tcl includes facilities to convert to and from UTF-8.
  8. ^wxRegEx uses any system suppliedPOSIX library or if not available and for Unicode mode usesHenry Spencer's library.

See also

[edit]

References

[edit]
  1. ^"Getting Started – Hyperscan 5.4.0 documentation".
  2. ^"Regex - Regular Expressions in OCaml".
  3. ^"Recursive Regex—Tutorial".
  4. ^"UTS #18: Unicode Regular Expressions".
  5. ^"ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification".www.ecma-international.org. Retrieved4 August 2020.

External links

[edit]
String metric
String-searching algorithm
Multiple string searching
Regular expression
Sequence alignment
Data structure
Other
Retrieved from "https://en.wikipedia.org/w/index.php?title=Comparison_of_regular_expression_engines&oldid=1287921399"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp