Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

WIP: Various updates to the Regex HOWTO#107825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
akuchling wants to merge26 commits intopython:main
base:main
Choose a base branch
Loading
fromakuchling:update-regex-howto

Conversation

akuchling
Copy link
Contributor

@akuchlingakuchling commentedAug 9, 2023
edited
Loading

As people sent me comments over the years, I've been collecting user feedback on the Regex HOWTO. This PR will contain the resulting set of changes. It is currently still work-in-progress; I have a lengthy list of changes that I'm making.

I'll try very hard to keep each commit completely and logically separated, so you may want to proofread commit-by-commit. Feel free to cherry-pick particular commits into main if you like while other commits get worked on; I can rebase or merge and try to keep things coherent.


📚 Documentation preview 📚:https://cpython-previews--107825.org.readthedocs.build/

serhiy-storchaka reacted with thumbs up emoji
@bedevere-botbedevere-bot added awaiting review docsDocumentation in the Doc dir skip news labelsAug 9, 2023
@akuchlingakuchling changed the titleVarious updates to the Regex HOWTOWIP: Various updates to the Regex HOWTOAug 10, 2023
Comment on lines 556 to 558
To specify them in the pattern, you can write them as an embedded
modifier at the start of the pattern that uses the short one-letter
form: `(?i)` for a single flag or `(?mxi)` to enable multiple flags.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is worth to mentioned "modifier spans" like(?i:...). They are more powerful than global flags and modifiers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think so, but it's also okay to do that in a separate PR. We can iterate and work incrementally.


For example, the following RE detects doubled words in a string. ::

>>> p = re.compile(r'\b(\w+)\s+\1\b')
>>> p = re.compile(r'\b(\w+)\b\s+\1\b')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The second\b was removed intentionally. It is not needed here.

It is worth also to use possessive qualifiers here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But it's fine to keep the second\b, and when modifying the example for some other context it might be useful. So I'd be fine with keeping it too. (Note that it's mentioned in the text below also.)

(Also, what's a possessive qualifier?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not exactly this example, but see the conversation in#21420 about redundant\b.

This example was fixed in#4443. It was incorrect without\b at the end, but\b between\w and\s is redundant by definition.

Sorry, not "possessive qualifier" but "possessive quantifier" (although in some documents they are named "qualifiers"). A possessive quantifier is a quantifier without backtracking. It is written by adding+ to the quantifier (as non-greed quantifiers are written by adding?). For example, when try to match the pattern with greedy quantifiers\b(\w+)\s+\1\b in "then the", a dumb backtracking engine will try to match "then then", fail, backtrack and try to match consequentially "the ", "th ", "t " until it give up. But with possessive quantifier\b(\w++)\s++\1\b it will not backtrack and fail quicker. It is a new feature in Python 3.11. Even if it is supported in most modern RE engines, it is relatively little known, because it was not initially supported in old RE engines.

Seehttps://www.regular-expressions.info/possessive.html

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, I've removed the second \b and edited the text below a bit.

@serhiy-storchaka
Copy link
Member

It would be nice to add more about possessive qualifiers and atomic grouping. Modifier spans are also underrated.

Copy link
Member

@gvanrossumgvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi Andrew! Here are some small suggestions. I recommend merging this rather than sitting on it for much longer. If there are improvements you're still planning to make but don't feel you have time for right now, feel free to open another PR. I promise to review and merge quickly -- this looks like almost everything is uncontroversial.

Comment on lines 556 to 558
To specify them in the pattern, you can write them as an embedded
modifier at the start of the pattern that uses the short one-letter
form: `(?i)` for a single flag or `(?mxi)` to enable multiple flags.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think so, but it's also okay to do that in a separate PR. We can iterate and work incrementally.


For example, the following RE detects doubled words in a string. ::

>>> p = re.compile(r'\b(\w+)\s+\1\b')
>>> p = re.compile(r'\b(\w+)\b\s+\1\b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But it's fine to keep the second\b, and when modifying the example for some other context it might be useful. So I'd be fine with keeping it too. (Note that it's mentioned in the text below also.)

(Also, what's a possessive qualifier?)

akuchlingand others added8 commitsSeptember 24, 2024 21:43
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@akuchlingakuchling marked this pull request as ready for reviewSeptember 25, 2024 02:00
@akuchling
Copy link
ContributorAuthor

OK, I've applied a bunch of suggested revisions, and also adds comments listing future topics such as the possessive quantifiers and spanning modifiers. Let's work on those in future PRs, since this one has already taken long enough! 🕙

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@gvanrossumgvanrossumgvanrossum left review comments

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

@picnixzpicnixzpicnixz left review comments

Assignees
No one assigned
Labels
awaiting reviewdocsDocumentation in the Doc dirskip news
Projects
Status: Todo
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@akuchling@serhiy-storchaka@gvanrossum@picnixz@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp