Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.

License

NotificationsYou must be signed in to change notification settings

3F/regXwild

Repository files navigation

⏱ Superfast ^Advanced wildcards++?*,|,?,^,$,+,#,>,++??,##??,>c in addition to slow regex engines and more.

✔ regex-like quantifiers, amazing meta symbols, and speed...

Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET throughConari (recommended due to caching of 0x29 opcodes + related optimizations), and others such aspython etc.

Build statusreleaseLicenseNuGet packageTests

Build history

SamplesregXwild filtern
number = '1271';number = '????';0 - 4
year = '2020';'##'|'####'2 | 4
year = '20';= '##??'2 | 4
number = 888;number = +??;1 - 3
SamplesregXwild filter
everything is ok^everything*ok$
systemssystem?
systemssys###s
A new 'X1' project^A*'+' pro?ect
professional systempro*system
regXwild in actionpro?ect$|open*source+act|^regXwild

Why regXwild ?

It was designed to be faster than just fast for features that usually go beyond the typical wildcards. Seriously, We love regex, I love, You love; 2013 far behind but regXwild still relevant for speed and powerful wildcards-like features, such as##?? (which means 2 or 4) ...

🔍 Easy to start

Unmanaged native C++ or managed .NET project. It doesn't matter, just use it:

C++

#include<regXwild.h>usingnamespacenet::r_eg::regXwild;...EssRxW rxw;if(rxw.match(_T("regXwild"), _T("reg?wild"))) {// ...}

C# ifConari

usingdynamicl=newConariX("regXwild.dll");...if(l.match<bool>("regXwild","reg?wild")){// ...}

🏄 Amazing meta symbols

ESS version (advanced EXT version)

metasymbolmeaning
*{0, ~}
|str1 or str2 or ...
?{0, 1}, ??? {0, 3}, ...
^[str... or [str1...
$...str] or ...str1]
+{1, ~}, +++ {3, ~}, ...
#{1}, ## {2}, ### {3}, ...
>Legacy> (F_LEGACY_ANYSP = 0x008) as[^/]*str | [^/]*$
>c1.4+ Modern> as[^**c**]*str | [^**c**]*$

EXT version (more simplified than ESS)

metasymbolmeaning
*{0, ~}
>as [^/\\]+
|str1 or str2 or ...
?{0, 1}, ??? {0, 3}, ...

🧮 Quantifiers

1.3+++??;##??

regexregXwildn
.**0+
.++1+
.??0 | 1
.{1}#1
.{2}##2
.{2, }++2+
.{0, 2}??0 - 2
.{2, 4}++??2 - 4
(?:.{2}|.{4})##??2 | 4
.{3, 4}+++?3 - 4
(?:.{1}|.{3})#??1 | 3

and similar ...

Play with our actualUnit-Tests.

🚀 Awesome speed

  • ~2000 times faster when C++.
  • For .NET (including modern .NET Core),Conari provides optional caching of 0x29 opcodes (Calli) and more to get similar to C++ result as possible.

Match result and Replacements

1.4+

EssRxW::MatchResult m;rxw.match(_T("number = '8888'; //TODO: up"),    _T("'+'"),    EssRxW::EngineOptions::F_MATCH_RESULT,    &m);//m.start = 9//m.end = 15...input.replace(m.start, m.end - m.start, _T("'9777'"));
tstring str = _T("year = 2021; dd = 17;");...if(rxw.replace(str, _T(" ##;"), _T(" 00;"))) {// year = 2021; dd = 00;}

🍰 Open and Free

Open Source project; MIT License, Enjoy 🎉

License

TheMIT License (MIT)

Copyright (c) 2013-2021  Denis Kuzmin <x-3F@outlook.com> github/3F

[ ☕ Make a donation ]

regXwild contributors:https://github.com/3F/regXwild/graphs/contributors

We're waiting for your awesome contributions!

Speed

Procedure of testing

  • Use thealgo subproject as tester of the main algorithms (Releasecfg - x32 & x64)
  • In general, calculation is simple and uses average asi = (t2 - t1); (sum(i) / n) where:
    • i - one iteration for searching by filter. Represents the delta of timet2 - t1
    • n - the number of repeats of the matching to get average.

e.g.:

{    Meter meter;int results =0;for(int total =0; total < average; ++total)    {        meter.start();for(int i =0; i < iterations; ++i)        {if((alg.*method)(data, filter)) {//...            }        }        results += meter.delta();    }TRACE((results / average) <<"ms");}

for regex results it also prepares additionalbasic_regex from filter, but of course,only one for all iterations:

meter.start();auto rfilter = tregex(    filter,    regex_constants::icase | regex_constants::optimize);results += meter.delta();...

Please note:

  • +icase means ignore case sensitivity when matching the filter(pattern) within the searched string, i.e.ignoreCase = true.Without this, everythingwill be much faster of course.That is, icase always adds complexity.
  • Below,MultiByte can be faster thanUnicode (for the same platform and the same way of module use) but it depends on specific architecture and can be about ~2 times faster when native C++, and about ~4 times faster when .NET + Conari and related.
  • The results below can be different on different machines. You need only look at the difference (in milliseconds) between algorithms for a specific target.
  • To calculate the data, as in the table below, you need executealgo.exe

Sample of speed for Unicode

340 Unicode Symbols and 10^4 iterations (340 x 10000); Filter:L"nime**haru*02*Magica"

algorithms (see impl. fromalgo)+icase [x32]+icase [x64]
Find + Find~58ms~44ms
Iterator + Find~57ms~46ms
Getline + Find~59ms~54ms
Iterator + Substr~165ms~132ms
Iterator + Iterator~136ms~118ms
main :: based on Iterator + Find~53ms~45ms
​ ​
Final algorithm - EXT version:~50ms~26ms
Final algorithm - ESS version:~50ms~27ms
​ ​
regexp-c++11(regex_search)~59309ms~53334ms
regexp-c++11(only as ^match$ like a '==')~12ms~5ms
regexp-c++11(regex_match with endings .*)~59503ms~53817ms

ESS vs EXT

350 Unicode Symbols and 10^4 iterations (350 x 10000);

Operation (+icase)EXT [x32]ESS [x32]EXT [x64]ESS [x64]
ANY~54ms~55ms~32ms~34ms
ANYSP~60ms~59ms~37ms~38ms
ONE~56ms~56ms~33ms~35ms
SPLIT~92ms~94ms~58ms~63ms
BEGIN---~38ms---~19ms
END---~39ms---~21ms
MORE---~44ms---~23ms
SINGLE---~43ms---~22ms

For .NET users throughConari engine:

Same test Data & Filter: 10^4 iterations

Release cfg; x32 or x64 regXwild (Unicode)

Attention: For more speed you need upgrading toConari1.3 or higher !

algorithms (see impl. fromsnet)+icase [x32]+icase [x64]
regXwild via Conari v1.2 (Lambda) - ESS~1032ms~1418msx
regXwild via Conari v1.2 (DLR) - ESS~1238ms~1609msx
regXwild via Conari v1.2 (Lambda) - EXT~1117ms~1457msx
regXwild via Conari v1.2 (DLR) - EXT~1246ms~1601msx
​ ​
regXwild via Conariv1.3 (Lambda) - ESS~58ms~42ms<<
regXwild via Conariv1.3 (DLR) - ESS~218ms~234ms
regXwild via Conariv1.3 (Lambda) - EXT~54ms~35ms<<
regXwild via Conariv1.3 (DLR) - EXT~214ms~226ms
​ ​
.NET Regex engine [Compiled]~38310ms~37242ms
.NET Regex engine [Compiled]{only ^match$}< 1ms~3ms
.NET Regex engine~31565ms~30975ms
.NET Regex engine {only ^match$}< 1ms~1ms

How to get regXwild

regXwild v1.1+ can also be installed throughNuGet same for both unmanaged and managed projects.

For .NET it will put x32 & x64 regXwild into $(TargetDir). Use it with your .net modules throughConari and so on.

x64 + x32 Unicode + MultiByte modules;

Please note: Modern regXwild packages will no longer be distributed together withConari. Please consider to use it separately,Conari nuget packages.

About

⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.

Topics

Resources

License

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp