Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306)

Trent Micktrentm@ActiveState.com
Fri, 28 Apr 2000 15:08:57 -0700


> > Guido van Rossum wrote:> > >> > > The email below is a serious bug report.  A quick analysis shows that> > > UserString.count() calls the count() method on a string object, which> > > calls PyArg_ParseTuple() with the format string "O|ii".  The 'i'> > > format code truncates integers.  It probably should raise an overflow> > > exception instead.  But that would still cause the test to fail --> > > just in a different way (more explicit).  Then the string methods> > > should be fixed to use long ints instead -- and then something else> > > would probably break...> >MAL wrote:> > All uses in stringobject.c and unicodeobject.c use INT_MAX> > together with integers, so there's no problem on that side> > of the fence ;-)> >> > Since strings and Unicode objects use integers to describe the> > length of the object (as well as most if not all other> > builtin sequence types), the correct default value should> > thus be something like sys.maxlen which then gets set to> > INT_MAX.> >> > I'd suggest adding sys.maxlen and the modifying UserString.py,> > re.py and sre_parse.py accordingly.>Guido wrote:> Hm, I'm not so sure.  It would be much better if passing sys.maxint> would just WORK...  Since that's what people have been doing so far.>Possible solutions (I give 4 of them):1. The 'i' format code could raise an overflow exception and thePyArg_ParseTuple() call in string_count() could catch it and truncate toINT_MAX (reasoning that any overflow of the end position of a string can bebound to INT_MAX because that is the limit for any string in Python).Pros:- This "would just WORK" for usage of sys.maxint.Cons:-  This overflow exception catching should then reasonably be propagated toother similar functions (like string.endswith(), etc).- We have to assume that the exception raised in the PyArg_ParseTuple(args,"O|ii:count", &subobj, &i, &last) call is for the second integer (i.e.'last'). This is subtle and ugly.Pro or Con:- Do we want to start raising overflow exceptions for other conversionformats (i.e. 'b' and 'h' and 'l', the latter *can* overflow on Win64 wheresizeof(long) < size(void*))? I think this is a good idea in principle butmay break code (even if it *does* identify bugs in that code).2. Just change the definitions of the UserString methods to pass a variablelength argument list instead of default value parameters. For example changeUserString.count() from:    def count(self, sub, start=0, end=sys.maxint):        return self.data.count(sub, start, end)to:    def count(self, *args)):        return self.data.count(*args)The result is that the default value for 'end' is now set by string_count()rather than by the UserString implementation:>>> from UserString import UserString>>> s= 'abcabcabc'>>> u = UserString('abcabcabc')>>> s.count('abc')3>>> u.count('abc')3Pros:- Easy change.- Fixes the immediate bug.- This is a safer way to copy the string behaviour in UserString anyway (isit not?).Cons:- Does not fix the general problem of the (common?) usage of sys.maxint tomean INT_MAX rather than the actual LONG_MAX (this matters on 64-bitUnices).- The UserString code is no longer really self-documenting.3. As MAL suggested: add something like sys.maxlen (set to INT_MAX) withbreaks the logical difference with sys.maxint (set to LONG_MAX): - sys.maxint == "the largest value a Python integer can hold" - sys.maxlen == "the largest value for the length of an object in Python(e.g. length of a string, length of an array)"Pros:- More explicit in that it separates two distinct meanings for sys.maxint(which now makes a difference on 64-bit Unices).- The code changes should be fairly straightforward.Cons:- Places in the code that still use sys.maxint where they should usesys.maxlen will unknowinglybe overflowing ints and bringing about this bug.- Something else for coders to know about.4. Add something like sys.maxlen, but set it to SIZET_MAX (c.f. ANSI size_ttype). It is probably not a biggie, but Python currently makes theassumption that string never exceed INT_MAX in length. While this assumptionis not likely to be proven false it technically could be on 64-bit systems.As well, when you start compiling on Win64 (where sizeof(int) ==sizeof(long) < sizeof(size_t)) then you are going to be annoyed by hundredsof warnings about implicit casts from size_t (64-bits) to int (32-bits) forevery strlen, str*, fwrite, and sizeof call that you make.Pros:- IMHO logically more correct.- Might clean up some subtle bugs.- Cleans up annoying and disconcerting warnings.- Will probably mean less pain down the road as 64-bit systems (esp. Win64)become more prevalent.Cons:- Lot of coding changes.- As Guido said: "and then something else would probably break". (Though, oncurrently 32-bits system, there should be no effective change). Only 64-bitsystems should be affected and, I would hope, the effect would be a cleanup.I apologize for not being succinct. Note that I am volunteering here.Opinions and guidance please.Trent


[8]ページ先頭

©2009-2025 Movatter.jp