Python 2.2 有什麼新功能¶
- 作者:
A.M. Kuchling
簡介¶
This article explains the new features in Python 2.2.2, released on October 14,2002. Python 2.2.2 is a bugfix release of Python 2.2, originally released onDecember 21, 2001.
Python 2.2 can be thought of as the "cleanup release". There are some featuressuch as generators and iterators that are completely new, but most of thechanges, significant and far-reaching though they may be, are aimed at cleaningup irregularities and dark corners of the language design.
This article doesn't attempt to provide a complete specification of the newfeatures, but instead provides a convenient overview. For full details, youshould refer to the documentation for Python 2.2, such as thePython LibraryReference and thePythonReference Manual. If you want tounderstand the complete implementation and design rationale for a change, referto the PEP for a particular new feature.
PEPs 252 and 253: Type and Class Changes¶
The largest and most far-reaching changes in Python 2.2 are to Python's model ofobjects and classes. The changes should be backward compatible, so it's likelythat your code will continue to run unchanged, but the changes provide someamazing new capabilities. Before beginning this, the longest and mostcomplicated section of this article, I'll provide an overview of the changes andoffer some comments.
A long time ago I wrote a web page listing flaws in Python's design. One of themost significant flaws was that it's impossible to subclass Python typesimplemented in C. In particular, it's not possible to subclass built-in types,so you can't just subclass, say, lists in order to add a single useful method tothem. TheUserList
module provides a class that supports all of themethods of lists and that can be subclassed further, but there's lots of C codethat expects a regular Python list and won't accept aUserList
instance.
Python 2.2 fixes this, and in the process adds some exciting new capabilities.A brief summary:
You can subclass built-in types such as lists and even integers, and yoursubclasses should work in every place that requires the original type.
It's now possible to define static and class methods, in addition to theinstance methods available in previous versions of Python.
It's also possible to automatically call methods on accessing or setting aninstance attribute by using a new mechanism calledproperties. Many usesof
__getattr__()
can be rewritten to use properties instead, making theresulting code simpler and faster. As a small side benefit, attributes can nowhave docstrings, too.The list of legal attributes for an instance can be limited to a particularset usingslots, making it possible to safeguard against typos andperhaps make more optimizations possible in future versions of Python.
Some users have voiced concern about all these changes. Sure, they say, the newfeatures are neat and lend themselves to all sorts of tricks that weren'tpossible in previous versions of Python, but they also make the language morecomplicated. Some people have said that they've always recommended Python forits simplicity, and feel that its simplicity is being lost.
Personally, I think there's no need to worry. Many of the new features arequite esoteric, and you can write a lot of Python code without ever needed to beaware of them. Writing a simple class is no more difficult than it ever was, soyou don't need to bother learning or teaching them unless they're actuallyneeded. Some very complicated tasks that were previously only possible from Cwill now be possible in pure Python, and to my mind that's all for the better.
I'm not going to attempt to cover every single corner case and small change thatwere required to make the new features work. Instead this section will paintonly the broad strokes. See section相關連結, "Related Links", forfurther sources of information about Python 2.2's new object model.
舊的和新的類別¶
First, you should know that Python 2.2 really has two kinds of classes: classicor old-style classes, and new-style classes. The old-style class model isexactly the same as the class model in earlier versions of Python. All the newfeatures described in this section apply only to new-style classes. Thisdivergence isn't intended to last forever; eventually old-style classes will bedropped, possibly in Python 3.0.
So how do you define a new-style class? You do it by subclassing an existingnew-style class. Most of Python's built-in types, such as integers, lists,dictionaries, and even files, are new-style classes now. A new-style classnamedobject
, the base class for all built-in types, has also beenadded so if no built-in type is suitable, you can just subclassobject
:
classC(object):def__init__(self):......
This means thatclass
statements that don't have any base classes arealways classic classes in Python 2.2. (Actually you can also change this bysetting a module-level variable named__metaclass__
--- seePEP 253for the details --- but it's easier to just subclassobject
.)
The type objects for the built-in types are available as built-ins, named usinga clever trick. Python has always had built-in functions namedint()
,float()
, andstr()
. In 2.2, they aren't functions any more, buttype objects that behave as factories when called.
>>>int<type 'int'>>>>int('123')123
To make the set of types complete, new type objects such asdict()
andfile()
have been added. Here's a more interesting example, adding alock()
method to file objects:
classLockableFile(file):deflock(self,operation,length=0,start=0,whence=0):importfcntlreturnfcntl.lockf(self.fileno(),operation,length,start,whence)
The now-obsoleteposixfile
module contained a class that emulated all ofa file object's methods and also added alock()
method, but this classcouldn't be passed to internal functions that expected a built-in file,something which is possible with our newLockableFile
.
描述器¶
In previous versions of Python, there was no consistent way to discover whatattributes and methods were supported by an object. There were some informalconventions, such as defining__members__
and__methods__
attributes that were lists of names, but often the author of an extension typeor a class wouldn't bother to define them. You could fall back on inspectingthe__dict__
of an object, but when class inheritance or an arbitrary__getattr__()
hook were in use this could still be inaccurate.
The one big idea underlying the new class model is that an API for describingthe attributes of an object usingdescriptors has been formalized.Descriptors specify the value of an attribute, stating whether it's a method ora field. With the descriptor API, static methods and class methods becomepossible, as well as more exotic constructs.
Attribute descriptors are objects that live inside class objects, and have a fewattributes of their own:
__name__
是屬性的名稱。__doc__
是屬性的文件字串 (docstring)。__get__(object)
is a method that retrieves the attribute value fromobject.__set__(object,value)
sets the attribute onobject tovalue.__delete__(object,value)
deletes thevalue attribute ofobject.
For example, when you writeobj.x
, the steps that Python actually performsare:
descriptor=obj.__class__.xdescriptor.__get__(obj)
For methods,descriptor.__get__
returns a temporaryobject that'scallable, and wraps up the instance and the method to be called on it. This isalso why static methods and class methods are now possible; they havedescriptors that wrap up just the method, or the method and the class. As abrief explanation of these new kinds of methods, static methods aren't passedthe instance, and therefore resemble regular functions. Class methods arepassed the class of the object, but not the object itself. Static and classmethods are defined like this:
classC(object):deff(arg1,arg2):...f=staticmethod(f)defg(cls,arg1,arg2):...g=classmethod(g)
Thestaticmethod()
function takes the functionf()
, and returns itwrapped up in a descriptor so it can be stored in the class object. You mightexpect there to be special syntax for creating such methods (defstaticf
,defstaticf()
, or something like that) but no such syntax has been definedyet; that's been left for future versions of Python.
More new features, such as slots and properties, are also implemented as newkinds of descriptors, and it's not difficult to write a descriptor class thatdoes something novel. For example, it would be possible to write a descriptorclass that made it possible to write Eiffel-style preconditions andpostconditions for a method. A class that used this feature might be definedlike this:
fromeiffelimporteiffelmethodclassC(object):deff(self,arg1,arg2):# The actual function...defpre_f(self):# Check preconditions...defpost_f(self):# Check postconditions...f=eiffelmethod(f,pre_f,post_f)
Note that a person using the neweiffelmethod()
doesn't have to understandanything about descriptors. This is why I think the new features don't increasethe basic complexity of the language. There will be a few wizards who need toknow about it in order to writeeiffelmethod()
or the ZODB or whatever,but most users will just write code on top of the resulting libraries and ignorethe implementation details.
Multiple Inheritance: The Diamond Rule¶
Multiple inheritance has also been made more useful through changing the rulesunder which names are resolved. Consider this set of classes (diagram takenfromPEP 253 by Guido van Rossum):
classA:^^defsave(self):.../ \/ \/ \/ \classBclassC:^^defsave(self):... \/ \/ \/ \/classD
The lookup rule for classic classes is simple but not very smart; the baseclasses are searched depth-first, going from left to right. A reference toD.save()
will search the classesD
,B
, and thenA
, wheresave()
would be found and returned.C.save()
would never be found at all. This is bad, because ifC
'ssave()
method is saving some internal state specific toC
, not calling it willresult in that state never getting saved.
New-style classes follow a different algorithm that's a bit more complicated toexplain, but does the right thing in this situation. (Note that Python 2.3changes this algorithm to one that produces the same results in most cases, butproduces more useful results for really complicated inheritance graphs.)
List all the base classes, following the classic lookup rule and include aclass multiple times if it's visited repeatedly. In the above example, the listof visited classes is [
D
,B
,A
,C
,A
].Scan the list for duplicated classes. If any are found, remove all but oneoccurrence, leaving thelast one in the list. In the above example, the listbecomes [
D
,B
,C
,A
] after droppingduplicates.
Following this rule, referring toD.save()
will returnC.save()
,which is the behaviour we're after. This lookup rule is the same as the onefollowed by Common Lisp. A new built-in function,super()
, provides a wayto get at a class's superclasses without having to reimplement Python'salgorithm. The most commonly used form will besuper(class,obj)
, whichreturns a bound superclass object (not the actual class object). This formwill be used in methods to call a method in the superclass; for example,D
'ssave()
method would look like this:
classD(B,C):defsave(self):# Call superclass .save()super(D,self).save()# Save D's private information here...
super()
can also return unbound superclass objects when called assuper(class)
orsuper(class1,class2)
, but this probably won'toften be useful.
Attribute Access¶
A fair number of sophisticated Python classes define hooks for attribute accessusing__getattr__()
; most commonly this is done for convenience, to makecode more readable by automatically mapping an attribute access such asobj.parent
into a method call such asobj.get_parent
. Python 2.2 addssome new ways of controlling attribute access.
First,__getattr__(attr_name)
is still supported by new-style classes,and nothing about it has changed. As before, it will be called when an attemptis made to accessobj.foo
and no attribute namedfoo
is found in theinstance's dictionary.
New-style classes also support a new method,__getattribute__(attr_name)
. The difference between the two methods isthat__getattribute__()
isalways called whenever any attribute isaccessed, while the old__getattr__()
is only called iffoo
isn'tfound in the instance's dictionary.
However, Python 2.2's support forproperties will often be a simpler wayto trap attribute references. Writing a__getattr__()
method iscomplicated because to avoid recursion you can't use regular attribute accessesinside them, and instead have to mess around with the contents of__dict__
.__getattr__()
methods also end up being called by Pythonwhen it checks for other methods such as__repr__()
or__coerce__()
,and so have to be written with this in mind. Finally, calling a function onevery attribute access results in a sizable performance loss.
property
is a new built-in type that packages up three functions thatget, set, or delete an attribute, and a docstring. For example, if you want todefine asize
attribute that's computed, but also settable, you couldwrite:
classC(object):defget_size(self):result=...computation...returnresultdefset_size(self,size):...computesomethingbasedonthesizeandsetinternalstateappropriately...# Define a property. The 'delete this attribute'# method is defined as None, so the attribute# can't be deleted.size=property(get_size,set_size,None,"Storage size of this instance")
That is certainly clearer and easier to write than a pair of__getattr__()
/__setattr__()
methods that check for thesize
attribute and handle it specially while retrieving all other attributes from theinstance's__dict__
. Accesses tosize
are also the only oneswhich have to perform the work of calling a function, so references to otherattributes run at their usual speed.
Finally, it's possible to constrain the list of attributes that can bereferenced on an object using the new__slots__
class attribute. Pythonobjects are usually very dynamic; at any time it's possible to define a newattribute on an instance by just doingobj.new_attr=1
. A new-style classcan define a class attribute named__slots__
to limit the legalattributes to a particular set of names. An example will make this clear:
>>>classC(object):...__slots__=('template','name')...>>>obj=C()>>>printobj.templateNone>>>obj.template='Test'>>>printobj.templateTest>>>obj.newattr=NoneTraceback (most recent call last): File"<stdin>", line1, in?AttributeError:'C' object has no attribute 'newattr'
Note how you get anAttributeError
on the attempt to assign to anattribute not listed in__slots__
.
相關連結¶
This section has just been a quick overview of the new features, giving enoughof an explanation to start you programming, but many details have beensimplified or ignored. Where should you go to get a more complete picture?
The描述器 (Descriptor) 指南 is a lengthy tutorial introduction tothe descriptor features, written by Guido van Rossum. If my description haswhetted your appetite, go read this tutorial next, because it goes into muchmore detail about the new features while still remaining quite easy to read.
Next, there are two relevant PEPs,PEP 252 andPEP 253.PEP 252 istitled "Making Types Look More Like Classes", and covers the descriptor API.PEP 253 is titled "Subtyping Built-in Types", and describes the changes totype objects that make it possible to subtype built-in objects.PEP 253 isthe more complicated PEP of the two, and at a few points the necessaryexplanations of types and meta-types may cause your head to explode. Both PEPswere written and implemented by Guido van Rossum, with substantial assistancefrom the rest of the Zope Corp. team.
Finally, there's the ultimate authority: the source code. Most of the machineryfor the type handling is inObjects/typeobject.c
, but you should onlyresort to it after all other avenues have been exhausted, including posting aquestion to python-list or python-dev.
PEP 234:疊代器¶
Another significant addition to 2.2 is an iteration interface at both the C andPython levels. Objects can define how they can be looped over by callers.
In Python versions up to 2.1, the usual way to makeforiteminobj
work isto define a__getitem__()
method that looks something like this:
def__getitem__(self,index):return<nextitem>
__getitem__()
is more properly used to define an indexing operation on anobject so that you can writeobj[5]
to retrieve the sixth element. It's abit misleading when you're using this only to supportfor
loops.Consider some file-like object that wants to be looped over; theindexparameter is essentially meaningless, as the class probably assumes that aseries of__getitem__()
calls will be made withindex incrementing byone each time. In other words, the presence of the__getitem__()
methoddoesn't mean that usingfile[5]
to randomly access the sixth element willwork, though it really should.
In Python 2.2, iteration can be implemented separately, and__getitem__()
methods can be limited to classes that really do support random access. Thebasic idea of iterators is simple. A new built-in function,iter(obj)
oriter(C,sentinel)
, is used to get an iterator.iter(obj)
returnsan iterator for the objectobj, whileiter(C,sentinel)
returns aniterator that will invoke the callable objectC until it returnssentinel tosignal that the iterator is done.
Python classes can define an__iter__()
method, which should create andreturn a new iterator for the object; if the object is its own iterator, thismethod can just returnself
. In particular, iterators will usually be theirown iterators. Extension types implemented in C can implement atp_iter
function in order to return an iterator, and extension types that want to behaveas iterators can define atp_iternext
function.
So, after all this, what do iterators actually do? They have one requiredmethod,next()
, which takes no arguments and returns the next value. Whenthere are no more values to be returned, callingnext()
should raise theStopIteration
exception.
>>>L=[1,2,3]>>>i=iter(L)>>>printi<iterator object at 0x8116870>>>>i.next()1>>>i.next()2>>>i.next()3>>>i.next()Traceback (most recent call last): File"<stdin>", line1, in?StopIteration>>>
In 2.2, Python'sfor
statement no longer expects a sequence; itexpects something for whichiter()
will return an iterator. For backwardcompatibility and convenience, an iterator is automatically constructed forsequences that don't implement__iter__()
or atp_iter
slot, soforiin[1,2,3]
will still work. Wherever the Python interpreter loopsover a sequence, it's been changed to use the iterator protocol. This means youcan do things like this:
>>>L=[1,2,3]>>>i=iter(L)>>>a,b,c=i>>>a,b,c(1, 2, 3)
Iterator support has been added to some of Python's basic types. Callingiter()
on a dictionary will return an iterator which loops over its keys:
>>>m={'Jan':1,'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,...'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12}>>>forkeyinm:printkey,m[key]...Mar 3Feb 2Aug 8Sep 9May 5Jun 6Jul 7Jan 1Apr 4Nov 11Dec 12Oct 10
That's just the default behaviour. If you want to iterate over keys, values, orkey/value pairs, you can explicitly call theiterkeys()
,itervalues()
, oriteritems()
methods to get an appropriate iterator.In a minor related change, thein
operator now works on dictionaries,sokeyindict
is now equivalent todict.has_key(key)
.
Files also provide an iterator, which calls thereadline()
method untilthere are no more lines in the file. This means you can now read each line of afile using code like this:
forlineinfile:# do something for each line...
Note that you can only go forward in an iterator; there's no way to get theprevious element, reset the iterator, or make a copy of it. An iterator objectcould provide such additional capabilities, but the iterator protocol onlyrequires anext()
method.
也參考
- PEP 234 - 疊代器
由 Ka-Ping Yee 和 GvR 撰寫;由 Python Labs 團隊實作,主要為 GvR 和 Tim Peters 所貢獻。
PEP 255:簡單產生器¶
Generators are another new feature, one that interacts with the introduction ofiterators.
You're doubtless familiar with how function calls work in Python or C. When youcall a function, it gets a private namespace where its local variables arecreated. When the function reaches areturn
statement, the localvariables are destroyed and the resulting value is returned to the caller. Alater call to the same function will get a fresh new set of local variables.But, what if the local variables weren't thrown away on exiting a function?What if you could later resume the function where it left off? This is whatgenerators provide; they can be thought of as resumable functions.
Here's the simplest example of a generator function:
defgenerate_ints(N):foriinrange(N):yieldi
A new keyword,yield
, was introduced for generators. Any functioncontaining ayield
statement is a generator function; this isdetected by Python's bytecode compiler which compiles the function specially asa result. Because a new keyword was introduced, generators must be explicitlyenabled in a module by including afrom__future__importgenerators
statement near the top of the module's source code. In Python 2.3 thisstatement will become unnecessary.
When you call a generator function, it doesn't return a single value; instead itreturns a generator object that supports the iterator protocol. On executingtheyield
statement, the generator outputs the value ofi
,similar to areturn
statement. The big difference betweenyield
and areturn
statement is that on reaching ayield
the generator's state of execution is suspended and localvariables are preserved. On the next call to the generator'snext()
method,the function will resume executing immediately after theyield
statement. (For complicated reasons, theyield
statement isn'tallowed inside thetry
block of atry
...finally
statement; readPEP 255 for a fullexplanation of the interaction betweenyield
and exceptions.)
Here's a sample usage of thegenerate_ints()
generator:
>>>gen=generate_ints(3)>>>gen<generator object at 0x8117f90>>>>gen.next()0>>>gen.next()1>>>gen.next()2>>>gen.next()Traceback (most recent call last): File"<stdin>", line1, in? File"<stdin>", line2, ingenerate_intsStopIteration
You could equally writeforiingenerate_ints(5)
, ora,b,c=generate_ints(3)
.
Inside a generator function, thereturn
statement can only be usedwithout a value, and signals the end of the procession of values; afterwards thegenerator cannot return any further values.return
with a value, suchasreturn5
, is a syntax error inside a generator function. The end of thegenerator's results can also be indicated by raisingStopIteration
manually, or by just letting the flow of execution fall off the bottom of thefunction.
You could achieve the effect of generators manually by writing your own classand storing all the local variables of the generator as instance variables. Forexample, returning a list of integers could be done by settingself.count
to0, and having thenext()
method incrementself.count
and return it.However, for a moderately complicated generator, writing a corresponding classwould be much messier.Lib/test/test_generators.py
contains a number ofmore interesting examples. The simplest one implements an in-order traversal ofa tree using generators recursively.
# A recursive generator that generates Tree leaves in in-order.definorder(t):ift:forxininorder(t.left):yieldxyieldt.labelforxininorder(t.right):yieldx
Two other examples inLib/test/test_generators.py
produce solutions forthe N-Queens problem (placing $N$ queens on an $NxN$ chess board so that noqueen threatens another) and the Knight's Tour (a route that takes a knight toevery square of an $NxN$ chessboard without visiting any square twice).
The idea of generators comes from other programming languages, especially Icon(https://www2.cs.arizona.edu/icon/), where the idea of generators is central. InIcon, every expression and function call behaves like a generator. One examplefrom "An Overview of the Icon Programming Language" athttps://www2.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this lookslike:
sentence:="Store it in the neighboring harbor"if(i:=find("or",sentence))>5thenwrite(i)
In Icon thefind()
function returns the indexes at which the substring"or" is found: 3, 23, 33. In theif
statement,i
is firstassigned a value of 3, but 3 is less than 5, so the comparison fails, and Iconretries it with the second value of 23. 23 is greater than 5, so the comparisonnow succeeds, and the code prints the value 23 to the screen.
Python doesn't go nearly as far as Icon in adopting generators as a centralconcept. Generators are considered a new part of the core Python language, butlearning or using them isn't compulsory; if they don't solve any problems thatyou have, feel free to ignore them. One novel feature of Python's interface ascompared to Icon's is that a generator's state is represented as a concreteobject (the iterator) that can be passed around to other functions or stored ina data structure.
也參考
- PEP 255 - 簡單產生器
Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostlyby Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.
PEP 237: Unifying Long Integers and Integers¶
In recent versions, the distinction between regular integers, which are 32-bitvalues on most machines, and long integers, which can be of arbitrary size, wasbecoming an annoyance. For example, on platforms that support files larger than2**32
bytes, thetell()
method of file objects has to return a longinteger. However, there were various bits of Python that expected plain integersand would raise an error if a long integer was provided instead. For example,in Python 1.5, only regular integers could be used as a slice index, and'abc'[1L:]
would raise aTypeError
exception with the message 'sliceindex must be int'.
Python 2.2 will shift values from short to long integers as required. The 'L'suffix is no longer needed to indicate a long integer literal, as now thecompiler will choose the appropriate type. (Using the 'L' suffix will bediscouraged in future 2.x versions of Python, triggering a warning in Python2.4, and probably dropped in Python 3.0.) Many operations that used to raise anOverflowError
will now return a long integer as their result. Forexample:
>>>12345678901231234567890123L>>>2**6418446744073709551616L
In most cases, integers and long integers will now be treated identically. Youcan still distinguish them with thetype()
built-in function, but that'srarely needed.
也參考
- PEP 237 - Unifying Long Integers and Integers
Written by Moshe Zadka and Guido van Rossum. Implemented mostly by Guido vanRossum.
PEP 238: Changing the Division Operator¶
The most controversial change in Python 2.2 heralds the start of an effort tofix an old design flaw that's been in Python from the beginning. CurrentlyPython's division operator,/
, behaves like C's division operator whenpresented with two integer arguments: it returns an integer result that'struncated down when there would be a fractional part. For example,3/2
is1, not 1.5, and(-1)/2
is -1, not -0.5. This means that the results ofdivision can vary unexpectedly depending on the type of the two operands andbecause Python is dynamically typed, it can be difficult to determine thepossible types of the operands.
(The controversy is over whether this isreally a design flaw, and whetherit's worth breaking existing code to fix this. It's caused endless discussionson python-dev, and in July 2001 erupted into a storm of acidly sarcasticpostings oncomp.lang.python. I won't argue for either side hereand will stick to describing what's implemented in 2.2. ReadPEP 238 for asummary of arguments and counter-arguments.)
Because this change might break code, it's being introduced very gradually.Python 2.2 begins the transition, but the switch won't be complete until Python3.0.
First, I'll borrow some terminology fromPEP 238. "True division" is thedivision that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25,and so forth. "Floor division" is what Python's/
operator currently doeswhen given integer operands; the result is the floor of the value returned bytrue division. "Classic division" is the current mixed behaviour of/
; itreturns the result of floor division when the operands are integers, and returnsthe result of true division when one of the operands is a floating-point number.
Here are the changes 2.2 introduces:
A new operator,
//
, is the floor division operator. (Yes, we know it lookslike C++'s comment symbol.)//
always performs floor division no matterwhat the types of its operands are, so1//2
is 0 and1.0//2.0
isalso 0.0.//
is always available in Python 2.2; you don't need to enable it using a__future__
statement.By including a
from__future__importdivision
in a module, the/
operator will be changed to return the result of true division, so1/2
is0.5. Without the__future__
statement,/
still means classic division.The default meaning of/
will not change until Python 3.0.Classes can define methods called
__truediv__()
and__floordiv__()
to overload the two division operators. At the C level, there are also slots inthePyNumberMethods
structure so extension types can define the twooperators.Python 2.2 supports some command-line arguments for testing whether code willwork with the changed division semantics. Running python with
-Qwarn
will cause a warning to be issued whenever division is applied to twointegers. You can use this to find code that's affected by the change and fixit. By default, Python 2.2 will simply perform classic division without awarning; the warning will be turned on by default in Python 2.3.
也參考
- PEP 238 - Changing the Division Operator
Written by Moshe Zadka and Guido van Rossum. Implemented by Guido van Rossum..
Unicode 變更¶
Python's Unicode support has been enhanced a bit in 2.2. Unicode strings areusually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also becompiled to use UCS-4, 32-bit unsigned integers, as its internal encoding bysupplying--enable-unicode=ucs4
to the configure script. (It's alsopossible to specify--disable-unicode
to completely disable Unicodesupport.)
When built to use UCS-4 (a "wide Python"), the interpreter can natively handleUnicode characters from U+000000 to U+110000, so the range of legal values fortheunichr()
function is expanded accordingly. Using an interpretercompiled to use UCS-2 (a "narrow Python"), values greater than 65535 will stillcauseunichr()
to raise aValueError
exception. This is alldescribed inPEP 261, "Support for 'wide' Unicode characters"; consult it forfurther details.
Another change is simpler to explain. Since their introduction, Unicode stringshave supported anencode()
method to convert the string to a selectedencoding such as UTF-8 or Latin-1. A symmetricdecode([*encoding*])
method has been added to 8-bit strings (though not to Unicode strings) in 2.2.decode()
assumes that the string is in the specified encoding and decodesit, returning whatever is returned by the codec.
Using this new feature, codecs have been added for tasks not directly related toUnicode. For example, codecs have been added for uu-encoding, MIME's base64encoding, and compression with thezlib
module:
>>>s="""Here is a lengthy piece of redundant, overly verbose,...and repetitive text....""">>>data=s.encode('zlib')>>>data'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'>>>data.decode('zlib')'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'>>>prints.encode('uu')begin 666 <data>M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*end>>>"sheesh".encode('rot-13')'furrfu'
To convert a class instance to Unicode, a__unicode__()
method can bedefined by a class, analogous to__str__()
.
encode()
,decode()
, and__unicode__()
were implemented byMarc-André Lemburg. The changes to support using UCS-4 internally wereimplemented by Fredrik Lundh and Martin von Löwis.
也參考
- PEP 261 - 支援 'wide' Unicode 字元
由 Paul Prescod 撰寫。
PEP 227: Nested Scopes¶
In Python 2.1, statically nested scopes were added as an optional feature, to beenabled by afrom__future__importnested_scopes
directive. In 2.2 nestedscopes no longer need to be specially enabled, and are now always present. Therest of this section is a copy of the description of nested scopes from my"What's New in Python 2.1" document; if you read it when 2.1 came out, you canskip the rest of this section.
The largest change introduced in Python 2.1, and made complete in 2.2, is toPython's scoping rules. In Python 2.0, at any given time there are at mostthree namespaces used to look up variable names: local, module-level, and thebuilt-in namespace. This often surprised people because it didn't match theirintuitive expectations. For example, a nested recursive function definitiondoesn't work:
deff():...defg(value):...returng(value-1)+1...
The functiong()
will always raise aNameError
exception, becausethe binding of the nameg
isn't in either its local namespace or in themodule-level namespace. This isn't much of a problem in practice (how often doyou recursively define interior functions like this?), but this also made usingthelambda
expression clumsier, and this was a problem in practice.In code which useslambda
you can often find local variables beingcopied by passing them as the default values of arguments.
deffind(self,name):"Return list of any entries equal to 'name'"L=filter(lambdax,name=name:x==name,self.list_attribute)returnL
The readability of Python code written in a strongly functional style suffersgreatly as a result.
The most significant change to Python 2.2 is that static scoping has been addedto the language to fix this problem. As a first effect, thename=name
default argument is now unnecessary in the above example. Put simply, when agiven variable name is not assigned a value within a function (by an assignment,or thedef
,class
, orimport
statements),references to the variable will be looked up in the local namespace of theenclosing scope. A more detailed explanation of the rules, and a dissection ofthe implementation, can be found in the PEP.
This change may cause some compatibility problems for code where the samevariable name is used both at the module level and as a local variable within afunction that contains further function definitions. This seems rather unlikelythough, since such code would have been pretty confusing to read in the firstplace.
One side effect of the change is that thefrommoduleimport*
andexec
statements have been made illegal inside a function scope undercertain conditions. The Python reference manual has said all along thatfrommoduleimport*
is only legal at the top level of a module, but the CPythoninterpreter has never enforced this before. As part of the implementation ofnested scopes, the compiler which turns Python source into bytecodes has togenerate different code to access variables in a containing scope.frommoduleimport*
andexec
make it impossible for the compiler tofigure this out, because they add names to the local namespace that areunknowable at compile time. Therefore, if a function contains functiondefinitions orlambda
expressions with free variables, the compilerwill flag this by raising aSyntaxError
exception.
To make the preceding explanation a bit clearer, here's an example:
x=1deff():# 下一行為語法錯誤exec'x=2'defg():returnx
Line 4 containing theexec
statement is a syntax error, sinceexec
would define a new local variable namedx
whose value shouldbe accessed byg()
.
This shouldn't be much of a limitation, sinceexec
is rarely used inmost Python code (and when it is used, it's often a sign of a poor designanyway).
也參考
- PEP 227 - Statically Nested Scopes
Written and implemented by Jeremy Hylton.
New and Improved Modules¶
The
xmlrpclib
module was contributed to the standard library by FredrikLundh, providing support for writing XML-RPC clients. XML-RPC is a simpleremote procedure call protocol built on top of HTTP and XML. For example, thefollowing snippet retrieves a list of RSS channels from the O'Reilly Network,and then lists the recent headlines for one channel:importxmlrpclibs=xmlrpclib.Server('http://www.oreillynet.com/meerkat/xml-rpc/server.php')channels=s.meerkat.getChannels()# channels is a list of dictionaries, like this:# [{'id': 4, 'title': 'Freshmeat Daily News'}# {'id': 190, 'title': '32Bits Online'},# {'id': 4549, 'title': '3DGamers'}, ... ]# Get the items for one channelitems=s.meerkat.getItems({'channel':4})# 'items' is another list of dictionaries, like this:# [{'link': 'http://freshmeat.net/releases/52719/',# 'description': 'A utility which converts HTML to XSL FO.',# 'title': 'html2fo 0.3 (Default)'}, ... ]
The
SimpleXMLRPCServer
module makes it easy to create straightforwardXML-RPC servers. Seehttp://xmlrpc.scripting.com/ for more information about XML-RPC.The new
hmac
module implements the HMAC algorithm described byRFC 2104. (Contributed by Gerhard Häring.)Several functions that originally returned lengthy tuples now returnpseudo-sequences that still behave like tuples but also have mnemonic attributes suchas
memberst_mtime
ortm_year
. The enhanced functions includestat()
,fstat()
,statvfs()
, andfstatvfs()
in theos
module, andlocaltime()
,gmtime()
, andstrptime()
inthetime
module.For example, to obtain a file's size using the old tuples, you'd end up writingsomething like
file_size=os.stat(filename)[stat.ST_SIZE]
, but now this canbe written more clearly asfile_size=os.stat(filename).st_size
.The original patch for this feature was contributed by Nick Mathewson.
The Python profiler has been extensively reworked and various errors in itsoutput have been corrected. (Contributed by Fred L. Drake, Jr. and Tim Peters.)
The
socket
module can be compiled to support IPv6; specify the--enable-ipv6
option to Python's configure script. (Contributed byJun-ichiro "itojun" Hagino.)Two new format characters were added to the
struct
module for 64-bitintegers on platforms that support the Clonglong type.q
is fora signed 64-bit integer, andQ
is for an unsigned one. The value isreturned in Python's long integer type. (Contributed by Tim Peters.)In the interpreter's interactive mode, there's a new built-in function
help()
that uses thepydoc
module introduced in Python 2.1 toprovide interactive help.help(object)
displays any available help textaboutobject.help()
with no argument puts you in an online helputility, where you can enter the names of functions, classes, or modules to readtheir help text. (Contributed by Guido van Rossum, using Ka-Ping Yee'spydoc
module.)Various bugfixes and performance improvements have been made to the SRE engineunderlying the
re
module. For example, there.sub()
andre.split()
functions have been rewritten in C. Another contributed patchspeeds up certain Unicode character ranges by a factor of two, and a newfinditer()
method that returns an iterator over all the non-overlappingmatches in a given string. (SRE is maintained by Fredrik Lundh. TheBIGCHARSET patch was contributed by Martin von Löwis.)The
smtplib
module now supportsRFC 2487, "Secure SMTP over TLS", soit's now possible to encrypt the SMTP traffic between a Python program and themail transport agent being handed a message.smtplib
also supports SMTPauthentication. (Contributed by Gerhard Häring.)The
imaplib
module, maintained by Piers Lauder, has support for severalnew extensions: the NAMESPACE extension defined inRFC 2342, SORT, GETACL andSETACL. (Contributed by Anthony Baxter and Michel Pelletier.)The
rfc822
module's parsing of email addresses is now compliant withRFC 2822, an update toRFC 822. (The module's name isnot going to bechanged torfc2822
.) A new package,email
, has also been added forparsing and generating e-mail messages. (Contributed by Barry Warsaw, andarising out of his work on Mailman.)The
difflib
module now contains a newDiffer
class forproducing human-readable lists of changes (a "delta") between two sequences oflines of text. There are also two generator functions,ndiff()
andrestore()
, which respectively return a delta from two sequences, or one ofthe original sequences from a delta. (Grunt work contributed by David Goodger,from ndiff.py code by Tim Peters who then did the generatorization.)New constants
ascii_letters
,ascii_lowercase
, andascii_uppercase
were added to thestring
module. There wereseveral modules in the standard library that usedstring.letters
tomean the ranges A-Za-z, but that assumption is incorrect when locales are inuse, becausestring.letters
varies depending on the set of legalcharacters defined by the current locale. The buggy modules have all been fixedto useascii_letters
instead. (Reported by an unknown person; fixed byFred L. Drake, Jr.)The
mimetypes
module now makes it easier to use alternative MIME-typedatabases by the addition of aMimeTypes
class, which takes a list offilenames to be parsed. (Contributed by Fred L. Drake, Jr.)A
Timer
class was added to thethreading
module that allowsscheduling an activity to happen at some future time. (Contributed by ItamarShtull-Trauring.)
直譯器的變更與修正¶
Some of the changes only affect people who deal with the Python interpreter atthe C level because they're writing Python extension modules, embedding theinterpreter, or just hacking on the interpreter itself. If you only write Pythoncode, none of the changes described here will affect you very much.
Profiling and tracing functions can now be implemented in C, which can operateat much higher speeds than Python-based functions and should reduce the overheadof profiling and tracing. This will be of interest to authors of developmentenvironments for Python. Two new C functions were added to Python's API,
PyEval_SetProfile()
andPyEval_SetTrace()
. The existingsys.setprofile()
andsys.settrace()
functions still exist, and havesimply been changed to use the new C-level interface. (Contributed by Fred L.Drake, Jr.)Another low-level API, primarily of interest to implementers of Pythondebuggers and development tools, was added.
PyInterpreterState_Head()
andPyInterpreterState_Next()
let a caller walk through all the existinginterpreter objects;PyInterpreterState_ThreadHead()
andPyThreadState_Next()
allow looping over all the thread states for a giveninterpreter. (Contributed by David Beazley.)The C-level interface to the garbage collector has been changed to make iteasier to write extension types that support garbage collection and to debugmisuses of the functions. Various functions have slightly different semantics,so a bunch of functions had to be renamed. Extensions that use the old API willstill compile but willnot participate in garbage collection, so updating themfor 2.2 should be considered fairly high priority.
To upgrade an extension module to the new API, perform the following steps:
將
Py_TPFLAGS_GC
重新命名為Py_TPFLAGS_HAVE_GC
。- Use
PyObject_GC_New()
orPyObject_GC_NewVar()
to allocate objects, and
PyObject_GC_Del()
to deallocate them.
- Use
Rename
PyObject_GC_Init()
toPyObject_GC_Track()
andPyObject_GC_Fini()
toPyObject_GC_UnTrack()
.Remove
PyGC_HEAD_SIZE
from object size calculations.Remove calls to
PyObject_AS_GC()
andPyObject_FROM_GC()
.A new
et
format sequence was added toPyArg_ParseTuple()
;et
takes both a parameter and an encoding name, and converts the parameter to thegiven encoding if the parameter turns out to be a Unicode string, or leaves italone if it's an 8-bit string, assuming it to already be in the desiredencoding. This differs from thees
format character, which assumes that8-bit strings are in Python's default ASCII encoding and converts them to thespecified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCSsupport on Windows described in the following section.)A different argument parsing function,
PyArg_UnpackTuple()
, has beenadded that's simpler and presumably faster. Instead of specifying a formatstring, the caller simply gives the minimum and maximum number of argumentsexpected, and a set of pointers toPyObject* variables that will befilled in with argument values.Two new flags
METH_NOARGS
andMETH_O
are available in methoddefinition tables to simplify implementation of methods with no arguments or asingle untyped argument. Calling such methods is more efficient than calling acorresponding method that usesMETH_VARARGS
. Also, the oldMETH_OLDARGS
style of writing C methods is now officially deprecated.Two new wrapper functions,
PyOS_snprintf()
andPyOS_vsnprintf()
were added to provide cross-platform implementations for the relatively newsnprintf()
andvsnprintf()
C lib APIs. In contrast to the standardsprintf()
andvsprintf()
functions, the Python versions check thebounds of the buffer used to protect against buffer overruns. (Contributed byM.-A. Lemburg.)The
_PyTuple_Resize()
function has lost an unused parameter, so now ittakes 2 parameters instead of 3. The third argument was never used, and cansimply be discarded when porting code from earlier versions to Python 2.2.
其他改動與修正¶
As usual there were a bunch of other improvements and bugfixes scatteredthroughout the source tree. A search through the CVS change logs finds therewere 527 patches applied and 683 bugs fixed between Python 2.1 and 2.2; 2.2.1applied 139 patches and fixed 143 bugs; 2.2.2 applied 106 patches and fixed 82bugs. These figures are likely to be underestimates.
一些較顯著的變更為:
The code for the MacOS port for Python, maintained by Jack Jansen, is now keptin the main Python CVS tree, and many changes have been made to support MacOS X.
The most significant change is the ability to build Python as a framework,enabled by supplying the
--enable-framework
option to the configurescript when compiling Python. According to Jack Jansen, "This installs aself-contained Python installation plus the OS X framework "glue" into/Library/Frameworks/Python.framework
(or another location of choice).For now there is little immediate added benefit to this (actually, there is thedisadvantage that you have to change your PATH to be able to find Python), butit is the basis for creating a full-blown Python application, porting theMacPython IDE, possibly using Python as a standard OSA scripting language andmuch more."Most of the MacPython toolbox modules, which interface to MacOS APIs such aswindowing, QuickTime, scripting, etc. have been ported to OS X, but they've beenleft commented out in
setup.py
. People who want to experiment withthese modules can uncomment them manually.Keyword arguments passed to built-in functions that don't take them now cause a
TypeError
exception to be raised, with the message "function takes nokeyword arguments".Weak references, added in Python 2.1 as an extension module, are now part ofthe core because they're used in the implementation of new-style classes. The
ReferenceError
exception has therefore moved from theweakref
module to become a built-in exception.A new script,
Tools/scripts/cleanfuture.py
by Tim Peters,automatically removes obsolete__future__
statements from Python sourcecode.An additionalflags argument has been added to the built-in function
compile()
, so the behaviour of__future__
statements can now becorrectly observed in simulated shells, such as those presented by IDLE andother development environments. This is described inPEP 264. (Contributedby Michael Hudson.)The new license introduced with Python 1.6 wasn't GPL-compatible. This isfixed by some minor textual changes to the 2.2 license, so it's now legal toembed Python inside a GPLed program again. Note that Python itself is notGPLed, but instead is under a license that's essentially equivalent to the BSDlicense, same as it always was. The license changes were also applied to thePython 2.0.1 and 2.1.1 releases.
When presented with a Unicode filename on Windows, Python will now convert itto an MBCS encoded string, as used by the Microsoft file APIs. As MBCS isexplicitly used by the file APIs, Python's choice of ASCII as the defaultencoding turns out to be an annoyance. On Unix, the locale's character set isused if
locale.nl_langinfo(CODESET)
is available. (Windows support wascontributed by Mark Hammond with assistance from Marc-André Lemburg. Unixsupport was added by Martin von Löwis.)Large file support is now enabled on Windows. (Contributed by Tim Peters.)
The
Tools/scripts/ftpmirror.py
script now parses a.netrc
file, if you have one. (Contributed by Mike Romberg.)Some features of the object returned by the
xrange()
function are nowdeprecated, and trigger warnings when they're accessed; they'll disappear inPython 2.3.xrange
objects tried to pretend they were full sequencetypes by supporting slicing, sequence multiplication, and thein
operator, but these features were rarely used and therefore buggy. Thetolist()
method and thestart
,stop
, andstep
attributes are also being deprecated. At the C level, the fourth argument tothePyRange_New()
function,repeat
, has also been deprecated.There were a bunch of patches to the dictionary implementation, mostly to fixpotential core dumps if a dictionary contains objects that sneakily changedtheir hash value, or mutated the dictionary they were contained in. For a whilepython-dev fell into a gentle rhythm of Michael Hudson finding a case thatdumped core, Tim Peters fixing the bug, Michael finding another case, and roundand round it went.
On Windows, Python can now be compiled with Borland C thanks to a number ofpatches contributed by Stephen Hansen, though the result isn't fully functionalyet. (But thisis progress...)
Another Windows enhancement: Wise Solutions generously offered PythonLabs useof their InstallerMaster 8.1 system. Earlier PythonLabs Windows installers usedWise 5.0a, which was beginning to show its age. (Packaged up by Tim Peters.)
Files ending in
.pyw
can now be imported on Windows..pyw
is aWindows-only thing, used to indicate that a script needs to be run usingPYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from poppingup to display the output. This patch makes it possible to import such scripts,in case they're also usable as modules. (Implemented by David Bolen.)On platforms where Python uses the C
dlopen()
function to loadextension modules, it's now possible to set the flags used bydlopen()
using thesys.getdlopenflags()
andsys.setdlopenflags()
functions.(Contributed by Bram Stolk.)The
pow()
built-in function no longer supports 3 arguments whenfloating-point numbers are supplied.pow(x,y,z)
returns(x**y)%z
,but this is never useful for floating-point numbers, and the final result variesunpredictably depending on the platform. A call such aspow(2.0,8.0,7.0)
will now raise aTypeError
exception.
致謝¶
The author would like to thank the following people for offering suggestions,corrections and assistance with various drafts of this article: Fred Bremmer,Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger,Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg,Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore,Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, TomReinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.