Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue22818

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Deprecate splitting on possible zero-width re patterns
Type:behaviorStage:resolved
Components:Extension Modules, Regular ExpressionsVersions:Python 3.5
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: serhiy.storchakaNosy List: ezio.melotti, mrabarnett, pitrou, python-dev, serhiy.storchaka
Priority:normalKeywords:patch

Created on2014-11-08 11:01 byserhiy.storchaka, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
re_deprecate_split_zero_width.patchserhiy.storchaka,2014-11-08 11:01review
re_deprecate_split_zero_width_2.patchserhiy.storchaka,2015-01-18 16:55review
re_deprecate_split_zero_width_3.patchserhiy.storchaka,2015-01-26 13:56review
re_deprecate_split_zero_width_4.patchserhiy.storchaka,2015-01-26 17:19review
Messages (13)
msg230843 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-11-08 11:01
For now re.split doesn't split with zero-width regex. There are a number of issues for this (issue852532,issue988761,issue3262,issue22817). This is definitely a bug, but fixing this bug will likely break existing code which use regular expressions which can match zero-width (e.g. re.split('(:*)', 'ab')).I propose to deprecate splitting on possible zero-width regular expressions. This expressions either not work at all as expected (r'\b' never split) or can be rewritten to not match empty string ('(:*)' to '(:+)').In next release (3.6) we can convert deprecation warning to the exception, an then after transitional period change behavior to more correct handling zero-width matches without breaking backward compatibility.
msg232657 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-12-15 12:19
I there are no objections I'm going to commit the patch soon.
msg234259 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-01-18 16:55
Now patterns which could match only an empty string (e.g. '(?m)^$' or '(?<=\w-)(?=\w)') are rejected at all. They never worked with current regex engine. Updated the documentation.Could anyone please make a review and correct my wording. It is desirable to get this in alpha 1 and receive early feedback.
msg234262 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2015-01-18 17:45
I don't really understand why "this is definitely a bug".
msg234263 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-01-18 18:20
Because users expect that split() supports zero-width patterns (as sub() supports them) and regexps in other languages support splitting on zero-width patterns. This looks as accidental implementation detail (see my patch inissue22817 -- the difference is pretty small) frozen in the ages for backward compatibility. We can't change this behavior in maintained releases because this will break mach code which accidentally use zero-width patterns. But we can change it in future as new feature, after deprecating current behavior. This would be very useful feature. For example it would allow to simplify and speed up the regex used for splitting on hyphens in textwrap (something like r'(?<=\w-)(?=\w)').
msg234737 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-01-26 12:07
Could anyone please make a review (mainly documentation)? It would be good to get this change in first alpha.
msg234742 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-01-26 13:56
Thank you Ezio for your review. Updated patch includes most of your suggestions. But I think some places still can be dim.
msg234762 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-01-26 17:19
Updated patch includes Ezio's suggestions. Thank you Ezio, they looks great to me.
msg235158 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-02-01 10:49
I hesitate about warning type. Originally I was going to emit a DeprecationWarning in 3.5, may be change it to a UserWarning in 3.6, and raise a ValueError or change behavior in 3.7. What would be better?
msg235162 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-02-01 12:09
May be RuntimeWarning or FutureWarning are more appropriate?
msg235209 -(view)Author: Ezio Melotti (ezio.melotti)*(Python committer)Date: 2015-02-01 21:43
DeprecationWarning: Base class for warnings about deprecated features.UserWarning: Base class for warnings generated by user code.RuntimeWarning: Base class for warnings about dubious runtime behavior.FutureWarning: Base class for warnings about constructs that will change semantically in the future.I think FutureWarning would be a good choice here, since we are going to change the semantics of a construct (before zero-width pattern didn't split -> in the future they will).
msg235321 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2015-02-03 09:05
New changeset7c667d8ae10d by Serhiy Storchaka in branch 'default':Issue#22818: Splitting on a pattern that could match an empty string nowhttps://hg.python.org/cpython/rev/7c667d8ae10d
msg235323 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-02-03 09:12
Thank you Ezio and Berker for your reviews.
History
DateUserActionArgs
2022-04-11 14:58:09adminsetgithub: 67007
2015-02-03 09:12:15serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: +msg235323

stage: patch review -> resolved
2015-02-03 09:05:34python-devsetnosy: +python-dev
messages: +msg235321
2015-02-01 21:43:51ezio.melottisetmessages: +msg235209
2015-02-01 12:09:04serhiy.storchakasetmessages: +msg235162
2015-02-01 10:49:38serhiy.storchakasetmessages: +msg235158
2015-01-26 17:19:43serhiy.storchakasetfiles: +re_deprecate_split_zero_width_4.patch

messages: +msg234762
2015-01-26 13:56:56serhiy.storchakasetfiles: +re_deprecate_split_zero_width_3.patch

messages: +msg234742
2015-01-26 12:07:46serhiy.storchakasetmessages: +msg234737
2015-01-18 18:20:38serhiy.storchakasetmessages: +msg234263
2015-01-18 17:45:16pitrousetmessages: +msg234262
2015-01-18 16:55:32serhiy.storchakasetfiles: +re_deprecate_split_zero_width_2.patch

messages: +msg234259
2014-12-15 12:19:01serhiy.storchakasetassignee:serhiy.storchaka
messages: +msg232657
2014-11-08 11:01:20serhiy.storchakacreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp