- Notifications
You must be signed in to change notification settings - Fork380
Matcher: pass _regionStart to RE2, pos 0 special#4199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
- instead of _hasMatch, use _lastMatchXXX as sufficient- in find(), also use _lastMatchEnd instead of _groups- instead of _inputLength, use _regionEnd for matching
Instead of telling RE2 "machine" we start at pos > 0, simply modify theinput slice and use position relative to region start.
@WojciechMazur@LeeTibbert sorry for multiple attempts to run checks, couldn't figure out how to test locally. helps improve#2324. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM,@LeeTibbert, you're the most experience with RE2, what do you think about these changes?
kitbellew, Thank you for the Issue and, especially, for this PR. It is encouraging to meet a first time Please do not be discouraged or put off by the review process. I find it generally The Please be understanding of my limitations. I speak a number of variants of regex but |
LeeTibbert commentedFeb 1, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'm 6+ hours into looking at the Issue and this PR and have run out of time for today. Let
More as it develops (days). |
Thank you for detailed, careful consideration. Separately, I was thinking that we could find a suite of unit tests used for the original JVM version of matcher and see what this implementation (not just region) fails on. |
I have another commit I prepared as an option, I like it a little better but didn't test it as much: |
kitbellew
I believe that running locally means cloning the Scala Native repository, making the I know of no easy way to restart Continuous Integration, but then again, that is not local. Call out if I you can not get what you want/need in SN build done. |
Yes, scala-native is close to being ready but I am struggling with speed. See comments inthis attempt to run scala-native on massive number of files. It's barely faster than JVM, if we exclude the compilation time. |
Good idea but may be yak shaving. Depends on your time & interests. As a Scala Native contributor, one must beexquisitely careful with license conditions and what you Unfortunately, the SN Incomplete, short story:
A whack of RE2mumble tests where ported when RE2 Scala Native support what added. Those engine At the same time, I believe that some SN javalib tests were created. The 2022 PR I mentioned above Those SN javalib tests may or may not have Scala.js tests as ancestors. These days I usually |
A friend of mine who lives in New Hampshire (USA) often uses the local expression to express It takes time but I'd like to see (and capture) both the current SN failure and understand I realize that you are trying to fix this real time and delay must be frustrating. As part of my archeology, I noticed that RE2j is now at its version 1.8, of approximately
|
LeeTibbert commentedFeb 1, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Both the current work and the closeness are good news.
Ah the culprits caughtin flagrante delicito Good data, good catch. I took a quick look at the scalameta url you provided. I'll certainly take two minutes wall-clock time off a I think that earlier reports from scalafmt about speed/correctness issues with I suggest that you create a new SN Issue with that URL and we use this thread to fix Looks like there, viewed from SN, are two separate concerns in that new Issue.
|
Regarding the "see and capture", I just want to make sure you are aware: my commits should be reviewed one at a time; there are three, and each does something separate, including the first one whichshows the failure and the last one which shows how it is fixed. If you ask me to fix something, I will not just add another commit on top; I will, if necessary, amend the existing ones. |
For this case, native is slower overall ; build time is 8 minutes and run time is another 8 (16 altogether); whereas for JVM build time is 1.5 minutes and run time about 10 (so 11.5 minutes together). If you mean for people using a pre-built binary, then native might get a 20% speed up; it's good but I wouldn't call it blazing fast, especially since the formatting step for most builds I have seen is a separate task and it usually finishes in a fraction of the time anything else does. |
LeeTibbert commentedFeb 5, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
kitbellew@WojciechMazur I'm 4+ days into studying this PR, an alternate commit in this PR, the existing SN javalib and RE2 code, Let me float an approach for your consideration. Phase 1
Phase 2If Phase 2 above proves out, then in a later submission, PR#2619 could be DiscussionMy main concerns are one of grand tactics, not of point edits. They are
Footnotes
|
If you are open to the approach I described and to creating a separate draft PR based on it, let me know If you do not like the approach or have run out of time, I can do the work described. In my My intent is to do collaborative development & iterative improvement of Scala Native. I hope |
Forgot about this, my apologies,@LeeTibbert .
I would call it a deal-breaker. regex libraries are not expected to duplicate any input, which is why there are so many methods returning indices to input allowing the user to extract matches without copying. If you really wanted to go this route, you can wrap the existing input (CharSequence) in a java.nio.CharBuffer (also a CharSequence). However, it would appear and in order to support transparent or anchoring bounds, the underlying (RE2) implementation would have to know something these positions, so pretending region covers the entire original input would be misleading. I understand your comment about not deviating from RE2 too much, but I would say the proposed change is minor. As I mentioned earlier, I would consider choosing between this implementation and the one withone extra commit. |
Kitbellew Thank you for reviving this discussion. I have been looking at this on my work list and it has been falling I'll have to study your observation "regex libraries are not expected to duplicate any input," and I am pretty certain I will not be able to do that before the snow flies. I do not want to stand in the Is this holding up any of your I do not mean to malign the current RE2 implementation. It has served well for a number of years after For awhile now I have been getting the feeling that it might be worth spending at least a few hours of looking Fear and horror are the wrong words but the right direction. I live in dread of the day when I or someone The wonder is not that the dog sings well, it is that the dog sings at all, and for so long. |
Instead of telling RE2 "machine" we start at pos > 0, simply modify the input slice and use position relative to region start.Fixes#4198.