Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue28727

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Implement comparison (x==y and x!=y) for _sre.SRE_Pattern
Type:enhancementStage:commit review
Components:Regular ExpressionsVersions:Python 3.7
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: vstinnerNosy List: SilentGhost, ezio.melotti, mrabarnett, python-dev, serhiy.storchaka, vstinner
Priority:normalKeywords:patch

Created on2016-11-17 16:02 byvstinner, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
pattern_compare.patchvstinner,2016-11-17 16:02review
pattern_compare-2.patchvstinner,2016-11-17 17:20review
pattern_compare-3.patchvstinner,2016-11-18 07:57review
pattern_compare-4.patchvstinner,2016-11-18 08:47review
pattern_compare-5.patchvstinner,2016-11-18 23:04review
pattern_compare-6.patchvstinner,2016-11-21 14:43review
Pull Requests
URLStatusLinkedEdit
PR 552closeddstufft,2017-03-31 16:36
Messages (23)
msg281046 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-17 16:02
Attached patch implements rich comparison for _sre.SRE_Pattern objects created by re.compile().Comparison between patterns is used in the warnings module to not add duplicated filters, see issue#18383:New changesetf57f4e33ba5e by Martin Panter in branch '3.5':Issue#18383: Avoid adding duplicate filters when warnings is reloadedhttps://hg.python.org/cpython/rev/f57f4e33ba5eFor the warnings module, it became a problem in test_warnings since the Python test runner started to clear all caches. When re.purge() is called, re.compile() creates a new object, whereas with the cache it returns the same object and so the two patterns are equal since it's the same object. => see issue#28688
msg281052 -(view)Author: SilentGhost (SilentGhost)*(Python triager)Date: 2016-11-17 16:37
Ten subtest in test_re fail withTypeError: unhashable type: '_sre.SRE_Pattern'
msg281057 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-17 17:20
> Ten subtest in test_re fail with: TypeError: unhashable type: '_sre.SRE_Pattern'Oops, right. Updated patch implements also hash() on patterns.
msg281062 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-17 19:00
Added comments on Rietveld.
msg281063 -(view)Author: Matthew Barnett (mrabarnett)*(Python triager)Date: 2016-11-17 20:37
I hope you make it clear what you mean by 'equal', i.e. that it's comparing the pattern and the flags (AFAICT), so re.compile('(?x)a') != re.compile('(?x) a ').
msg281083 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-18 07:57
Patch version 3:* pattern_hash() includes isbytes, I also shifted flags by 1 bit to not erase the isbytes bit (FYI maximum value of flags is 256)* pattern_richcompare() avoids calling PyObject_RichCompareBool() if flags or isbytes is different* unit test ensures that no BytesWarning warning is raised* checks hash() in unit tests* fix also the unit test with a different flag (use the same pattern)* document also in the unit test that the comparison is case sensitive
msg281084 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-18 08:47
Ok, I hope that it's the last attempt: patch 5* Remove hash(b) != hash(a): only keep tests on hash(b)==hash(a) when b==a* Replace re.ASCII flag with no flag to test two patterns with different flags
msg281086 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-18 09:07
There is a problem with locale-depending flags. The same pattern may be compiled with the LOCALE flag to different pattern objects in different locales. Perhaps we should compare the compiled code instead of pattern strings. Or both.
msg281087 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-18 09:15
Serhiy: "There is a problem with locale-depending flags. The same pattern may be compiled with the LOCALE flag to different pattern objects in different locales."Oh, I didn't know and you are right."Perhaps we should compare the compiled code instead of pattern strings. Or both."PatternObject contains many fields. I used the following two fields which come from re.compile():* pattern* flagsI considered that they were enough because pattern_repr() only displays these ones. Other fields:* groups* groupindex* indexgroup* weakreflist* isbytes* codesize, codeweakreflist can be skipped, isbytes is already tested in my patch.Would it be possible to only compare code instead of pattern? What are groups, groupindex and indexgroup: should we also compare them?Maybe I can start from  pattern_compare-4.patch and only add a test comparing code?
msg281088 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-18 09:35
'[abc]' and '[cba]' are compiled to the same code. Do we want to handle them as equal? This is implementation defined.
msg281094 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-18 11:13
> '[abc]' and '[cba]' are compiled to the same code. Do we want to handle them as equal?Comparison must be fast. If the comparison is just memcmp(code1,code2, codesize), it's ok.I agree that we must put a similar somewhere to say that some partsare implementation details. Maybe split the test in two parts, andmark the second part with @cpython_only.
msg281101 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-18 11:45
I see two options:* Compare flags, isbytes and code. This makes some different patterns be compiled to equal objects. For example spaces in verbose mode are ignored. But this don't make equal all equivalent patterns. '[aA]' would be equal to '(?:a|A)' but still would be not equal to '(i?a)' with current implementation.* Compare flags, isbytes, code and pattern. This makes literally different patterns be compiled to not equal objects even if the difference is not significant. '[abc]' would be different from '[cba]' despites the fact that matching both always returns the same result.Since this issue becomes a little ambiguous, I would target the patch to 3.7 only. Maybe we will find other subtle details or will decide to change the meaning of equality of pattern objects before releasing 3.7.
msg281178 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-18 23:04
New approach: patch 5 now compares indexgroup, groupindex and code instead of comparing pattern, to handle correctly two equal pattern strings compiled with the re.LOCALE flag and two different locales.The patch also converts indexgroup list to a tuple to be able to hash it. (It also prevents modification, but this is just a side effect, and groupindex remains a mutable dictionary.)_sre.compile() checks types which helps to identify a bug in unit tests.
msg281306 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-20 22:15
This looks too complicated. groups, indexgroup and groupindex are unambiguously derived from pattern string. If caching works different pattern strings are compiled to different pattern objects. Currently they are not equal, even if their codes are equal. And I don't see large need to consider them equal.
msg281362 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-21 14:43
Back to basis, patch 6:* revert changes on indexgroup and groupindex types: I will fix this later, once this issue is fixed* pattern_richcompare() and pattern_hash() also uses pattern, but don't use groups, indexgroup nor groupindex anymoreI removed the @cpython_only unit test and rewrote test_pattern_compare_bytes() to make it easier to follow.re.compile('abc', re.IGNORECASE) is different than re.compile('ABC', re.IGNORECASE), but it's a deliberate choice to not test it. I consider that the behaviour can change depending on the Python implementation and in a future version.
msg281363 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-21 14:59
pattern_compare-6.patch LGTM.
msg281365 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2016-11-21 15:39
New changeset5e8ef1493843 by Victor Stinner in branch '3.6':Implement rich comparison for _sre.SRE_Patternhttps://hg.python.org/cpython/rev/5e8ef1493843
msg281369 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-21 15:48
Serhiy Storchaka: "pattern_compare-6.patch LGTM."Thank you very much for your very useful reviews! I pushed the change.
msg281374 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-21 15:58
For stricter checks on _sre.compile() arguments, I created the issue#28765: "_sre.compile(): be more strict on types of indexgroup and groupindex".
msg281479 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-22 14:20
+           && left->codesize && right->codesize);There is a typo. Should be:+           && left->codesize == right->codesize);
msg281480 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-22 14:24
While we are here, it perhaps worth to add a fast path for self == other.
msg281483 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2016-11-22 14:31
New changeset6b43d15fd2d7 by Victor Stinner in branch '3.6':Issue#28727: Fix typo in pattern_richcompare()https://hg.python.org/cpython/rev/6b43d15fd2d7New changesetc2cb70c97163 by Victor Stinner in branch '3.6':Issue#28727: Optimize pattern_richcompare() for a==ahttps://hg.python.org/cpython/rev/c2cb70c97163
msg281484 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-22 14:32
Serhiy Storchaka: "&& left->codesize && right->codesize)"Ooops! Fixed!"While we are here, it perhaps worth to add a fast path for self == other."Done.
History
DateUserActionArgs
2022-04-11 14:58:39adminsetgithub: 72913
2017-03-31 16:36:24dstufftsetpull_requests: +pull_request981
2016-11-22 14:32:35vstinnersetstatus: open -> closed

messages: +msg281484
2016-11-22 14:31:49python-devsetmessages: +msg281483
2016-11-22 14:24:33serhiy.storchakasetmessages: +msg281480
2016-11-22 14:20:59serhiy.storchakasetstatus: closed -> open

messages: +msg281479
2016-11-22 01:00:50martin.panterlinkissue28688 superseder
2016-11-21 15:58:18vstinnersetmessages: +msg281374
2016-11-21 15:48:09vstinnersetstatus: open -> closed
resolution: fixed
messages: +msg281369
2016-11-21 15:39:26python-devsetnosy: +python-dev
messages: +msg281365
2016-11-21 14:59:11serhiy.storchakasetassignee:vstinner
messages: +msg281363
stage: patch review -> commit review
2016-11-21 14:43:15vstinnersetfiles: +pattern_compare-6.patch

messages: +msg281362
2016-11-20 22:15:53serhiy.storchakasetmessages: +msg281306
2016-11-18 23:04:15vstinnersetfiles: +pattern_compare-5.patch

messages: +msg281178
2016-11-18 11:45:19serhiy.storchakasetmessages: +msg281101
2016-11-18 11:13:45vstinnersetmessages: +msg281094
2016-11-18 09:35:26serhiy.storchakasetmessages: +msg281088
2016-11-18 09:15:16vstinnersetmessages: +msg281087
2016-11-18 09:07:54serhiy.storchakasetmessages: +msg281086
2016-11-18 08:47:28vstinnersetfiles: +pattern_compare-4.patch

messages: +msg281084
2016-11-18 07:57:02vstinnersetfiles: +pattern_compare-3.patch

messages: +msg281083
2016-11-17 20:37:57mrabarnettsetmessages: +msg281063
2016-11-17 19:00:19serhiy.storchakasetnosy: +ezio.melotti,mrabarnett
components: + Regular Expressions
2016-11-17 19:00:02serhiy.storchakasetnosy: +serhiy.storchaka

messages: +msg281062
stage: patch review
2016-11-17 17:20:44vstinnersetfiles: +pattern_compare-2.patch

messages: +msg281057
2016-11-17 16:37:22SilentGhostsetnosy: +SilentGhost
messages: +msg281052
2016-11-17 16:02:48vstinnercreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp