Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue2690

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Precompute range length and enhance range subscript support
Type:performanceStage:resolved
Components:Interpreter CoreVersions:Python 3.2
process
Status:closedResolution:accepted
Dependencies:Superseder:
Assigned To: ncoghlanNosy List: BreamoreBoy, SilentGhost, amaury.forgeotdarc, belopolsky, facundobatista, georg.brandl, mark.dickinson, ncoghlan, pitrou, rhettinger, stutzbach
Priority:normalKeywords:patch

Created on2008-04-25 15:32 bybelopolsky, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
range-length.diffbelopolsky,2008-04-25 15:31
anyrange.patchamaury.forgeotdarc,2008-04-25 19:56
range-sequence.diffbelopolsky,2008-04-28 19:04patch against revision 62564
issue2690-range-sequence-ncoghlan.diffncoghlan,2008-09-02 15:48Modified patch against rev 66147
issue2690-range-sequence-ncoghlan-v2.diffncoghlan,2008-09-02 16:54Fix issue noted by Antoine (also adds new unit tests)
issue2690_lazy_overflow_check.diffncoghlan,2010-12-03 14:26Complete with docs and lazy overflow semantics
stdtypes.rst.diffSilentGhost,2010-12-10 22:31
stdtypes.rst.diffSilentGhost,2010-12-11 23:44
Messages (44)
msg65786 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-25 15:31
Attached patch makes range objects precompute their length on creation.  This speeds up indexing and len at the expense of a small increase in range object size.  The main benefit, however is that unsupported length > sys.maxsize is detected early and confusing OverflowError from len(r) or r[i] is avoided.See discussion starting athttp://mail.python.org/pipermail/python-3000/2008-April/013225.html .
msg65793 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2008-04-25 16:55
So with this patch, range(10**100) produces an OverflowError:  is that right?I don't much like this aspect of the change:   there are uses forfor i in range(ridiculously_large_number):   ...   if condition_that_occurs_early_in_practice:       break
msg65798 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-25 17:32
On Fri, Apr 25, 2008 at 12:55 PM, Mark Dickinson <report@bugs.python.org> wrote:..>  I don't much like this aspect of the change:   there are uses for>>  for i in range(ridiculously_large_number):For this application, I would use "for i in itertools.count():"instead.  The only caveat is that while count() lets you specify thestart, it does not provide for a step.   If that is a problem, I wouldrather add step to count().
msg65802 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2008-04-25 18:47
I guess there needs to be a decision on whether to make range objects of length >= PY_SSIZE_T_MAX illegal; perhaps more discussion on python-dev would be worthwhile?I can see three options, besides leaving things as they are:(1) make large ranges illegal, as with this patch(2) make large ranges legal, but don't allow indexing with indiceslarger than PY_SSIZE_T_MAX.(3) allow large ranges *and* large indices.Option 3 seems to me like the ideal from the users' point of view, but I'm not sure whether it's easy/possible to implement it given that sq_item receives a Py_ssize_t for the index.Option 2 seems messy:  half of one thing and half of the other, but I think it would be easy to implement.  This is what I'd personally prefer if Option 3 isn't feasible.If Option 1 is indeed the preferred option, then the patch looks good to me, and works for me on OS X 10.5.  (Minor nitpick: it introduces some extra tab characters.)Whatever happens, we probably also need a documentation update explaining the limitations on range.
msg65803 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-25 19:23
Option (3) would require changing both sq_item and sq_length signatures,which is likely to have a large negative impact on performance.Option (2) would either require a change for the sq_length signature, orleave the problem of having valid range objects for which applying len()would produce an OverflowError.What are the use cases for ranges with length greater than maxsize? Notethat in 2.x all arguments to length are limited to 32 bit integers (evenon 64-bit platforms) and the main reason to support long start/stop/stepin 3.0  is because 2.x range() supports them.  On the other hand, since2.x range() produces lists, it is limited in length to a fraction ofsys.maxsize.  Therefore none of the current uses of either range orxrange require support of long length.
msg65807 -(view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc)*(Python committer)Date: 2008-04-25 19:56
I am currently working on a patch that allows large ranges and largeindices. The trick is to define tp_as_mapping->mp_subscript.Then range_item() is rarely used, only by functions calling directly thePySequence_* functions, instead of the abstract PyObject_*.There is still a limit with len(), which seems bound by the size_t limit.Most of the tests in test_builtin were re-enabled.I join the current version of the patch.I'm still working on further simplifications, and maybe supportingslices on ranges...Note: I found more useful to store a "range->end" member, which is themultiple of "step" just beyond the "stop" limit.
msg65810 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2008-04-25 20:37
> What are the use cases for ranges with length greater than maxsize?Yeah---I'm a bit short on use cases.  The one that originally bit me with Python 2.x was when I was doing a search for a quadratic non-residue modulo a largeish prime;for i in range(1, p):    if (i_is_a_nonresidue_modulo_p):        breakHere p might be a 200-digit prime number, and the situation is that half the integers between 1 and p-1 are 'quadratic residues', while the other half are 'quadratic nonresidues';  in practice the residues and nonresidues are mixed up fairly well, so the first nonresidue shows up pretty quickly, but there's no known small upper bound on when the first nonresidue appears.Of course, it's not hard to rewrite this with a while loop instead;  it would just be a bit annoying if that were necessary, when the code above is so clear and direct, and the one obvious way to do it (TM).I'd also note that it's not completely out of the question that something like range(10**10) would be useful on a 32-bit machine:  a long-running process might easily go through 10**10 iterations of something.I agree it's a bit strange to have a semi-functional range object, that you can iterate over but not take the length of.
msg65812 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-25 20:57
On Fri, Apr 25, 2008 at 4:37 PM, Mark Dickinson <report@bugs.python.org> wrote:..>  for i in range(1, p):>     if (i_is_a_nonresidue_modulo_p):>         break>>  Here p might be a 200-digit prime number, and the situation is that half>  the integers between 1 and p-1 are 'quadratic residues', while the other>  half are 'quadratic nonresidues';  in practice the residues and>  nonresidues are mixed up fairly well, so the first nonresidue shows up>  pretty quickly, but there's no known small upper bound on when the first>  nonresidue appears.Hmm, AFAIKT there is always at least one non-residue between 1 and pand therefore you can just writefor i in itertools.count(1):    if (i_is_a_nonresidue_modulo_p):         breakmaybe with an additional check for p > 1.
msg65825 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2008-04-25 23:16
> Hmm, AFAIKT there is always at least one non-residue between 1 and p> and therefore you can just write>> for i in itertools.count(1):>     if (i_is_a_nonresidue_modulo_p):>          break>> maybe with an additional check for p > 1.Sure.  It's just uglier that way. :-)  And I feel it would be mildly annoying not to be able to use the obvious tool for the job, for subtle reasons.  It's also a potential source of bugs: one might write such code using range and only discover later that it fails unexpectedly for large inputs.These really aren't serious objections---just mild preferences.  I'll stop being disruptive now :)
msg65835 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-04-26 08:16
Given that range() produced a list in the 2.x series (hence limited toavailable memory), and xrange() uses int internally for its values (andhence couldn't even cope with short ranges with values greater thansys.maxint).So my preference is to mimic the 2.x range's behaviour in this case byraising an overflow error if the sequence is too long.(From Python 2.5.1)>>> range(2**99, 2**100)Traceback (most recent call last):  File "<stdin>", line 1, in <module>OverflowError: range() result has too many items>>> range(2**99, 2**99+5)[633825300114114700748351602688L, 633825300114114700748351602689L,633825300114114700748351602690L, 633825300114114700748351602691L,633825300114114700748351602692L]>>> xrange(2**99, 2**99+5)Traceback (most recent call last):  File "<stdin>", line 1, in <module>OverflowError: long int too large to convert to int
msg65926 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-28 19:03
I've implemented range slicing and x in range(..) in range-sequence.diff and registered range with the Sequence ABC.
msg65937 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-28 21:47
Reviewing my own patch (range-sequence.diff), I've realized that it is being a bit too clever in handling x in range(..) where x is not an integer.  It seems that upon a failed PyLong_Check, range_contains should just do a linear search.  This is easy to implement, but I will wait for more feedback before posting further changes.
msg65942 -(view)Author: Facundo Batista (facundobatista)*(Python committer)Date: 2008-04-28 23:21
My 2 cents: for me is more useful to have a unbound range() than to beable to do a len() on it.For me, range() should mimic a number generator: no limit, no length.
msg65956 -(view)Author: Alexander Belopolsky (belopolsky)*(Python committer)Date: 2008-04-29 04:38
> For me, range() should mimic a number generator: no limit, no length.That's itertools.count()
msg65963 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-04-29 11:40
It also isn't what range() and xrange() are used for now in 2.x. range()returns an actual list, hence is limited to sequences that fit in areasonable amount of memory, and xrange() doesn't support values greaterthan sys.maxint at all (as it uses C ints for its internal storage ofthe start, stop and step values).With itertools.count() available for the unbounded iterator case, Ithink making range() mimic its 2.x counterpart as closely as possible(without the memory inefficiency) will be quite valuable.
msg65981 -(view)Author: Facundo Batista (facundobatista)*(Python committer)Date: 2008-04-29 21:05
Fair enough, specially if in the documentation of range() we put "if youwant a unbound, no limit, number generator, use itertools.count()" (orsomething well written in english ;) ).Thanks!
msg69884 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2008-07-17 16:00
Has a resolution been made on this?
msg70525 -(view)Author: Guido van Rossum (gvanrossum)*(Python committer)Date: 2008-07-31 17:39
On the issue of whether len(range()) should be allowed to be >sys.maxsize, I think this should be allowed.  I think in the future weshould change the __len__ protocol to allow unbounded lengths.  Eventoday, I think range(10**100).__len__() should return 10**100 ratherthan raising an OverflowError, even if len(range(10**100)) raisesOverflowError.I also think ranges should be introspectable, exposing their start, stopand step values just like slice objects.Probably all those changes are for post 3.0.
msg70548 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-08-01 10:15
Guido, does that mean we can apply this patch to get the cleanerlist-like behaviour for 3.0, and then work on the sq_item/sq_len fixesfor 3.1 as a separate issue?
msg72344 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-09-02 15:27
This issue was missing a priority setting.Alexander's range-sequence patch still applies cleanly to the Py3kbranch, and "make test" still runs correctly after applying it.As Alexander notes above, range_contains does still need slightly betterhandling of non-integer numbers - I suggest doing a numeric conversionvia PyNumber_Index(el) at the beginning of range_contains, and if thatconversion fails, do a conversion via PyNumber_Long(el) and immediatelyreturn False if the result is not equal to el itself (i.e. only integervalues of non-integer types will be found in the range.Since that explanation got kind of complicated, I've added a modifiedpatch that includes the above change, and adds a couple of additionaltests to ensure a non-integer floating point value won't be found in thesequence.
msg72345 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2008-09-02 15:38
It looks like your range_subscript() forgets to compute the length field...
msg72346 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-09-02 15:48
My initial patch had a few problems - I removed it and uploaded acorrected version.
msg72348 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-09-02 16:03
Good catch Antoine (I missed that in Alexander's patch) - working onthat now.
msg72352 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-09-02 16:54
v2 of my updated patch attached to fix the issue Antoine noted.Also gets rid of some tab-instead-of-spaces indenting issues in thefile, and avoids hardcoding PyRange_Type when creating new instances.However, the patch still has issues, as can be seen with the new tests Iadded to test_range to actually exercise some of the functionalitybeyond the sys.maxsize limit.
msg72354 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2008-09-02 17:27
By the way, why is this release blocker? do we have performance numbers?
msg72355 -(view)Author: Guido van Rossum (gvanrossum)*(Python committer)Date: 2008-09-02 17:32
I wonder if we shouldn't hold off on this. It's a substantial amount ofnew code. I'd be fine with it going into 3.0.1 since it's pureoptimization (IIUC!). But right now we want burn-in, not new features.
msg72357 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2008-09-02 17:57
Given the problems with it, I'm dropping this to normal priority andpunting to 3.1.(the release blocker status was just temporary to ensure we got adecision on it before rc1 - it previously didn't have a priority set)
msg110812 -(view)Author: Mark Lawrence (BreamoreBoy)*Date: 2010-07-19 21:21
@Nick, would you have time to work on this for 3.2, or should we target 3.3, or could somebody else take this over?
msg119515 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-10-24 13:57
I'd still like to have another look at this before beta 1, but can't promise I'll get to it. Unassigning in case someone else has some roundtuits to spare over the next few weeks.
msg123250 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-12-03 14:26
I brought the patch up to date for the Py3k branch, but realised just before checking it in that it may run afoul of the language moratorium (since it alters the behaviour of builtin range objects).However, the .count() and .index() methods (along with the Sequence ABC registration) as well as the O(1) containment testing for integers were already put in place. (I also noticed that the new methods from issue#9213 are not mentioned in the range() docs, unlike the O(1) optimisation)I've gone ahead and checked it in asr86970, as I see this patch as filling out the promise of the Sequence ABC registration by adding support for slicing and negative indices (with the length caching as more of an implementation detail).However, I'm also leaving the tracker issue open and assigning to Georg in case he wants to revert it before the beta goes out.Note that I also fixed the patch so that OverflowError occurs only when encountering an affected operation (primarily indexing and retrieval of the length). If you don't do any of those things, you can make your ranges as large as you like. (The indexing could fairly easily be fixed to eliminate the overflow errors - I just didn't do it in this patch, since it is a separate problem).
msg123257 -(view)Author: Daniel Stutzbach (stutzbach)(Python committer)Date: 2010-12-03 16:26
> (I also noticed that the new methods from issue#9213 are not mentioned> in the range() docsWasn't that fixed inIssue9746?
msg123259 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-12-03 16:35
On Sat, Dec 4, 2010 at 2:26 AM, Daniel Stutzbach <report@bugs.python.org> wrote:>> Daniel Stutzbach <stutzbach@google.com> added the comment:>>> (I also noticed that the new methods from issue#9213 are not mentioned>> in the range() docs>> Wasn't that fixed inIssue9746?Ah, I see what you mean. I was more referring to the lack of aversionchanged note on the range() documentation itself, rather thanthe mentioning of range() under the sequence types documentation.Some of my new additions to the range documentation could probably bedeleted and replaced with a reference to that part of the docs.Cheers,Nick.
msg123261 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-12-03 16:42
Daniel Stutzbach pointed out that range() is also mentioned under:http://docs.python.org/dev/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-rangeThe descriptions of range's limitations there is no longer accurate (slicing is supported following this patch and containment testing is now efficient)
msg123267 -(view)Author: Daniel Stutzbach (stutzbach)(Python committer)Date: 2010-12-03 16:53
> The descriptions of range's limitations there is no longer accurate> (slicing is supported following this patch and containment testing is> now efficient)Want to open a new issue for that?  (or is there one already?)
msg123268 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-12-03 16:55
(Oops, didn't mean to reclose this yet)I want to wait for Georg's verdict on including the functionality in 3.2 before stressing too much about correct documentation of it :)
msg123273 -(view)Author: Georg Brandl (georg.brandl)*(Python committer)Date: 2010-12-03 17:21
I'm fine with it: as with the other changes for .count and .index, consistency with the protocols/ABCs the types are members of is not exclusively a new feature.
msg123278 -(view)Author: Daniel Stutzbach (stutzbach)(Python committer)Date: 2010-12-03 18:12
The descriptions of range's limitations in the docs still needs an update.
msg123755 -(view)Author: SilentGhost (SilentGhost)*(Python triager)Date: 2010-12-10 22:31
Not sure this worth a patch, to me it looks like a removal of a single word. But here it goes anyway.
msg123760 -(view)Author: Raymond Hettinger (rhettinger)*(Python committer)Date: 2010-12-11 00:43
Applied inr87162
msg123762 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-12-11 01:19
The other major change for ranges is that "in" and "not in" are no longer inefficient for actual instances of int (it does an arithmetic calculation instead of a linear search).
msg123765 -(view)Author: Raymond Hettinger (rhettinger)*(Python committer)Date: 2010-12-11 02:20
Is the in/not-in fast path in 2.7?
msg123779 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2010-12-11 05:34
No, I believe it was added as part of the .index() and .count() implementation.Checking the source, there's definitely no sq_contains implementation in 3.1 or 2.7.
msg123819 -(view)Author: SilentGhost (SilentGhost)*(Python triager)Date: 2010-12-11 23:44
here is the patch for the py3k docs.
msg124258 -(view)Author: Daniel Stutzbach (stutzbach)(Python committer)Date: 2010-12-17 21:29
Doc change committed to py3k inr87346.  Thanks, SilentGhost!I also committedr87349 to reverser87162 (which was in the wrong branch).
History
DateUserActionArgs
2022-04-11 14:56:33adminsetgithub: 46942
2010-12-17 21:29:31stutzbachsetnosy:georg.brandl,rhettinger,facundobatista,amaury.forgeotdarc,mark.dickinson,ncoghlan,belopolsky,pitrou,stutzbach,SilentGhost,BreamoreBoy
messages: +msg124258
2010-12-11 23:44:45SilentGhostsetfiles: +stdtypes.rst.diff

messages: +msg123819
2010-12-11 05:34:46ncoghlansetmessages: +msg123779
2010-12-11 02:20:41rhettingersetmessages: +msg123765
2010-12-11 01:19:20ncoghlansetmessages: +msg123762
2010-12-11 00:43:24rhettingersetstatus: open -> closed

messages: +msg123760
2010-12-10 22:31:06SilentGhostsetfiles: +stdtypes.rst.diff
nosy: +SilentGhost
messages: +msg123755

2010-12-03 18:12:05stutzbachsetstatus: closed -> open
assignee:georg.brandl ->ncoghlan
messages: +msg123278
2010-12-03 17:21:54georg.brandlsetstatus: open -> closed

messages: +msg123273
2010-12-03 16:55:52ncoghlansetstatus: closed -> open
assignee:georg.brandl
messages: +msg123268
2010-12-03 16:53:00stutzbachsetmessages: +msg123267
2010-12-03 16:42:13ncoghlansetstatus: open -> closed
assignee:georg.brandl -> (no value)
messages: +msg123261
2010-12-03 16:35:38ncoghlansetmessages: +msg123259
2010-12-03 16:26:15stutzbachsetmessages: +msg123257
2010-12-03 14:26:42ncoghlansetfiles: +issue2690_lazy_overflow_check.diff

assignee:georg.brandl
title: Precompute range length -> Precompute range length and enhance range subscript support
nosy: +georg.brandl

messages: +msg123250
resolution: accepted
stage: patch review -> resolved
2010-10-24 13:57:46ncoghlansetassignee:ncoghlan -> (no value)
messages: +msg119515
2010-09-01 21:59:30gvanrossumsetnosy: -gvanrossum
2010-09-01 21:46:04stutzbachsetnosy: +stutzbach
2010-07-20 11:28:02ncoghlansetassignee:ncoghlan
2010-07-19 21:21:02BreamoreBoysetnosy: +BreamoreBoy
messages: +msg110812
2009-05-20 01:28:12rhettingersetassignee:rhettinger -> (no value)
2009-05-17 03:16:16rhettingersetassignee:rhettinger

nosy: +rhettinger
2009-05-16 19:39:07ajaksu2setstage: patch review
versions: + Python 3.2, - Python 3.1
2008-09-02 17:57:42ncoghlansetpriority: release blocker -> normal
keywords: -needs review
messages: +msg72357
versions: + Python 3.1, - Python 3.0
2008-09-02 17:32:55gvanrossumsetmessages: +msg72355
2008-09-02 17:27:56pitrousetmessages: +msg72354
2008-09-02 16:54:55ncoghlansetfiles: +issue2690-range-sequence-ncoghlan-v2.diff
messages: +msg72352
2008-09-02 16:03:12ncoghlansetmessages: +msg72348
2008-09-02 15:48:32ncoghlansetfiles: +issue2690-range-sequence-ncoghlan.diff
messages: +msg72346
2008-09-02 15:47:30ncoghlansetfiles: -issue2690-range-sequence-ncoghlan.diff
2008-09-02 15:38:23pitrousetmessages: +msg72345
2008-09-02 15:27:49ncoghlansetpriority: release blocker
keywords: +needs review
messages: +msg72344
files: +issue2690-range-sequence-ncoghlan.diff
2008-08-01 10:15:33ncoghlansetmessages: +msg70548
2008-07-31 17:39:31gvanrossumsetnosy: +gvanrossum
messages: +msg70525
2008-07-17 16:00:29pitrousetnosy: +pitrou
messages: +msg69884
2008-04-29 21:06:03facundobatistasetmessages: +msg65981
2008-04-29 11:40:19ncoghlansetmessages: +msg65963
2008-04-29 04:38:26belopolskysetmessages: +msg65956
2008-04-28 23:21:54facundobatistasetnosy: +facundobatista
messages: +msg65942
2008-04-28 21:47:57belopolskysetmessages: +msg65937
2008-04-28 19:04:43belopolskysetfiles: +range-sequence.diff
messages: +msg65926
2008-04-26 08:16:41ncoghlansetnosy: +ncoghlan
messages: +msg65835
2008-04-25 23:16:43mark.dickinsonsetmessages: +msg65825
2008-04-25 20:57:44belopolskysetmessages: +msg65812
2008-04-25 20:37:35mark.dickinsonsetmessages: +msg65810
2008-04-25 19:57:02amaury.forgeotdarcsetfiles: +anyrange.patch
nosy: +amaury.forgeotdarc
messages: +msg65807
2008-04-25 19:24:01belopolskysetmessages: +msg65803
2008-04-25 18:48:01mark.dickinsonsetmessages: +msg65802
2008-04-25 17:32:07belopolskysetmessages: +msg65798
2008-04-25 16:55:19mark.dickinsonsetnosy: +mark.dickinson
messages: +msg65793
2008-04-25 15:32:00belopolskycreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp