Python 2.0 有什麼新功能¶
- 作者:
A.M. Kuchling 和 Moshe Zadka
簡介¶
A new release of Python, version 2.0, was released on October 16, 2000. Thisarticle covers the exciting new features in 2.0, highlights some other usefulchanges, and points out a few incompatible changes that may require rewritingcode.
Python's development never completely stops between releases, and a steady flowof bug fixes and improvements are always being submitted. A host of minor fixes,a few optimizations, additional docstrings, and better error messages went into2.0; to list them all would be impossible, but they're certainly significant.Consult the publicly available CVS logs if you want to see the full list. Thisprogress is due to the five developers working for PythonLabs are now gettingpaid to spend their days fixing bugs, and also due to the improved communicationresulting from moving to SourceForge.
What About Python 1.6?¶
Python 1.6 can be thought of as the Contractual Obligations Python release.After the core development team left CNRI in May 2000, CNRI requested that a 1.6release be created, containing all the work on Python that had been performed atCNRI. Python 1.6 therefore represents the state of the CVS tree as of May 2000,with the most significant new feature being Unicode support. Developmentcontinued after May, of course, so the 1.6 tree received a few fixes to ensurethat it's forward-compatible with Python 2.0. 1.6 is therefore part of Python'sevolution, and not a side branch.
So, should you take much interest in Python 1.6? Probably not. The 1.6finaland 2.0beta1 releases were made on the same day (September 5, 2000), the planbeing to finalize Python 2.0 within a month or so. If you have applications tomaintain, there seems little point in breaking things by moving to 1.6, fixingthem, and then having another round of breakage within a month by moving to 2.0;you're better off just going straight to 2.0. Most of the really interestingfeatures described in this document are only in 2.0, because a lot of work wasdone between May and September.
New Development Process¶
The most important change in Python 2.0 may not be to the code at all, but tohow Python is developed: in May 2000 the Python developers began using the toolsmade available by SourceForge for storing source code, tracking bug reports,and managing the queue of patch submissions. To report bugs or submit patchesfor Python 2.0, use the bug tracking and patch manager tools available fromPython's project page, located athttps://sourceforge.net/projects/python/.
The most important of the services now hosted at SourceForge is the Python CVStree, the version-controlled repository containing the source code for Python.Previously, there were roughly 7 or so people who had write access to the CVStree, and all patches had to be inspected and checked in by one of the people onthis short list. Obviously, this wasn't very scalable. By moving the CVS treeto SourceForge, it became possible to grant write access to more people; as ofSeptember 2000 there were 27 people able to check in changes, a fourfoldincrease. This makes possible large-scale changes that wouldn't be attempted ifthey'd have to be filtered through the small group of core developers. Forexample, one day Peter Schneider-Kamp took it into his head to drop K&R Ccompatibility and convert the C source for Python to ANSI C. After gettingapproval on the python-dev mailing list, he launched into a flurry of checkinsthat lasted about a week, other developers joined in to help, and the job wasdone. If there were only 5 people with write access, probably that task wouldhave been viewed as "nice, but not worth the time and effort needed" and itwould never have gotten done.
The shift to using SourceForge's services has resulted in a remarkable increasein the speed of development. Patches now get submitted, commented on, revisedby people other than the original submitter, and bounced back and forth betweenpeople until the patch is deemed worth checking in. Bugs are tracked in onecentral location and can be assigned to a specific person for fixing, and we cancount the number of open bugs to measure progress. This didn't come without acost: developers now have more e-mail to deal with, more mailing lists tofollow, and special tools had to be written for the new environment. Forexample, SourceForge sends default patch and bug notification e-mail messagesthat are completely unhelpful, so Ka-Ping Yee wrote an HTML screen-scraper thatsends more useful messages.
The ease of adding code caused a few initial growing pains, such as code waschecked in before it was ready or without getting clear agreement from thedeveloper group. The approval process that has emerged is somewhat similar tothat used by the Apache group. Developers can vote +1, +0, -0, or -1 on a patch;+1 and -1 denote acceptance or rejection, while +0 and -0 mean the developer ismostly indifferent to the change, though with a slight positive or negativeslant. The most significant change from the Apache model is that the voting isessentially advisory, letting Guido van Rossum, who has Benevolent Dictator ForLife status, know what the general opinion is. He can still ignore the result ofa vote, and approve or reject a change even if the community disagrees with him.
Producing an actual patch is the last step in adding a new feature, and isusually easy compared to the earlier task of coming up with a good design.Discussions of new features can often explode into lengthy mailing list threads,making the discussion hard to follow, and no one can read every posting topython-dev. Therefore, a relatively formal process has been set up to writePython Enhancement Proposals (PEPs), modelled on the internet RFC process. PEPsare draft documents that describe a proposed new feature, and are continuallyrevised until the community reaches a consensus, either accepting or rejectingthe proposal. Quoting from the introduction toPEP 1, "PEP Purpose andGuidelines":
PEP stands for Python Enhancement Proposal. A PEP is a design documentproviding information to the Python community, or describing a new feature forPython. The PEP should provide a concise technical specification of the featureand a rationale for the feature.
We intend PEPs to be the primary mechanisms for proposing new features, forcollecting community input on an issue, and for documenting the design decisionsthat have gone into Python. The PEP author is responsible for buildingconsensus within the community and documenting dissenting opinions.
Read the rest ofPEP 1 for the details of the PEP editorial process, style, andformat. PEPs are kept in the Python CVS tree on SourceForge, though they're notpart of the Python 2.0 distribution, and are also available in HTML form fromhttps://peps.python.org/. As of September 2000, there are 25 PEPs, rangingfromPEP 201, "Lockstep Iteration", to PEP 225, "Elementwise/ObjectwiseOperators".
Unicode¶
The largest new feature in Python 2.0 is a new fundamental data type: Unicodestrings. Unicode uses 16-bit numbers to represent characters instead of the8-bit number used by ASCII, meaning that 65,536 distinct characters can besupported.
The final interface for Unicode support was arrived at through countlessoften-stormy discussions on the python-dev mailing list, and mostly implemented byMarc-André Lemburg, based on a Unicode string type implementation by FredrikLundh. A detailed explanation of the interface was written up asPEP 100,"Python Unicode Integration". This article will simply cover the mostsignificant points about the Unicode interfaces.
In Python source code, Unicode strings are written asu"string"
. ArbitraryUnicode characters can be written using a new escape sequence,\uHHHH
, whereHHHH is a 4-digit hexadecimal number from 0000 to FFFF. The existing\xHH
escape sequence can also be used, and octal escapes can be used forcharacters up to U+01FF, which is represented by\777
.
Unicode strings, just like regular strings, are an immutable sequence type.They can be indexed and sliced, but not modified in place. Unicode strings haveanencode([encoding])
method that returns an 8-bit string in the desiredencoding. Encodings are named by strings, such as'ascii'
,'utf-8'
,'iso-8859-1'
, or whatever. A codec API is defined for implementing andregistering new encodings that are then available throughout a Python program.If an encoding isn't specified, the default encoding is usually 7-bit ASCII,though it can be changed for your Python installation by calling thesys.setdefaultencoding(encoding)
function in a customized version ofsite.py
.
Combining 8-bit and Unicode strings always coerces to Unicode, using the defaultASCII encoding; the result of'a'+u'bc'
isu'abc'
.
New built-in functions have been added, and existing built-ins modified tosupport Unicode:
unichr(ch)
returns a Unicode string 1 character long, containing thecharacterch.ord(u)
, whereu is a 1-character regular or Unicode string, returns thenumber of the character as an integer.unicode(string[,encoding] [,errors])
creates a Unicode stringfrom an 8-bit string.encoding
is a string naming the encoding to use. Theerrors
parameter specifies the treatment of characters that are invalid forthe current encoding; passing'strict'
as the value causes an exception tobe raised on any encoding error, while'ignore'
causes errors to be silentlyignored and'replace'
uses U+FFFD, the official replacement character, incase of any problems.The
exec
statement, and various built-ins such aseval()
,getattr()
, andsetattr()
will also accept Unicode strings as well asregular strings. (It's possible that the process of fixing this missed somebuilt-ins; if you find a built-in function that accepts strings but doesn'taccept Unicode strings at all, please report it as a bug.)
A new module,unicodedata
, provides an interface to Unicode characterproperties. For example,unicodedata.category(u'A')
returns the 2-characterstring 'Lu', the 'L' denoting it's a letter, and 'u' meaning that it'suppercase.unicodedata.bidirectional(u'\u0660')
returns 'AN', meaning thatU+0660 is an Arabic number.
Thecodecs
module contains functions to look up existing encodings andregister new ones. Unless you want to implement a new encoding, you'll mostoften use thecodecs.lookup(encoding)
function, which returns a4-element tuple:(encode_func,decode_func,stream_reader,stream_writer)
.
encode_func is a function that takes a Unicode string, and returns a 2-tuple
(string,length)
.string is an 8-bit string containing a portion (perhapsall) of the Unicode string converted into the given encoding, andlength tellsyou how much of the Unicode string was converted.decode_func is the opposite ofencode_func, taking an 8-bit string andreturning a 2-tuple
(ustring,length)
, consisting of the resulting Unicodestringustring and the integerlength telling how much of the 8-bit stringwas consumed.stream_reader is a class that supports decoding input from a stream.stream_reader(file_obj) returns an object that supports the
read()
,readline()
, andreadlines()
methods. These methods will alltranslate from the given encoding and return Unicode strings.stream_writer, similarly, is a class that supports encoding output to astream.stream_writer(file_obj) returns an object that supports the
write()
andwritelines()
methods. These methods expect Unicodestrings, translating them to the given encoding on output.
For example, the following code writes a Unicode string into a file, encodingit as UTF-8:
importcodecsunistr=u'\u0660\u2000ab ...'(UTF8_encode,UTF8_decode,UTF8_streamreader,UTF8_streamwriter)=codecs.lookup('UTF-8')output=UTF8_streamwriter(open('/tmp/output','wb'))output.write(unistr)output.close()
The following code would then read UTF-8 input from the file:
input=UTF8_streamreader(open('/tmp/output','rb'))printrepr(input.read())input.close()
Unicode-aware regular expressions are available through there
module,which has a new underlying implementation called SRE written by Fredrik Lundh ofSecret Labs AB.
A-U
command line option was added which causes the Python compiler tointerpret all string literals as Unicode string literals. This is intended to beused in testing and future-proofing your Python code, since some future versionof Python may drop support for 8-bit strings and provide only Unicode strings.
串列綜合運算(List Comprehension)¶
Lists are a workhorse data type in Python, and many programs manipulate a listat some point. Two common operations on lists are to loop over them, and eitherpick out the elements that meet a certain criterion, or apply some function toeach element. For example, given a list of strings, you might want to pull outall the strings containing a given substring, or strip off trailing whitespacefrom each line.
The existingmap()
andfilter()
functions can be used for thispurpose, but they require a function as one of their arguments. This is fine ifthere's an existing built-in function that can be passed directly, but if thereisn't, you have to create a little function to do the required work, andPython's scoping rules make the result ugly if the little function needsadditional information. Take the first example in the previous paragraph,finding all the strings in the list containing a given substring. You couldwrite the following to do it:
# Given the list L, make a list of all strings# containing the substring S.sublist=filter(lambdas,substring=S:string.find(s,substring)!=-1,L)
Because of Python's scoping rules, a default argument is used so that theanonymous function created by thelambda
expression knows whatsubstring is being searched for. List comprehensions make this cleaner:
sublist=[sforsinLifstring.find(s,S)!=-1]
List comprehensions have the form:
[expressionforexprinsequence1forexpr2insequence2...forexprNinsequenceNifcondition]
Thefor
...in
clauses contain the sequences to beiterated over. The sequences do not have to be the same length, because theyarenot iterated over in parallel, but from left to right; this is explainedmore clearly in the following paragraphs. The elements of the generated listwill be the successive values ofexpression. The finalif
clauseis optional; if present,expression is only evaluated and added to the resultifcondition is true.
To make the semantics very clear, a list comprehension is equivalent to thefollowing Python code:
forexpr1insequence1:forexpr2insequence2:...forexprNinsequenceN:if(condition):# Append the value of# the expression to the# resulting list.
This means that when there are multiplefor
...in
clauses, the resulting list will be equal to the product of the lengths of allthe sequences. If you have two lists of length 3, the output list is 9 elementslong:
seq1='abc'seq2=(1,2,3)>>>[(x,y)forxinseq1foryinseq2][('a',1),('a',2),('a',3),('b',1),('b',2),('b',3),('c',1),('c',2),('c',3)]
To avoid introducing an ambiguity into Python's grammar, ifexpression iscreating a tuple, it must be surrounded with parentheses. The first listcomprehension below is a syntax error, while the second one is correct:
# 語法錯誤[x,yforxinseq1foryinseq2]# 正確[(x,y)forxinseq1foryinseq2]
The idea of list comprehensions originally comes from the functional programminglanguage Haskell (https://www.haskell.org). Greg Ewing argued most effectivelyfor adding them to Python and wrote the initial list comprehension patch, whichwas then discussed for a seemingly endless time on the python-dev mailing listand kept up-to-date by Skip Montanaro.
Augmented Assignment¶
Augmented assignment operators, another long-requested feature, have been addedto Python 2.0. Augmented assignment operators include+=
,-=
,*=
,and so forth. For example, the statementa+=2
increments the value of thevariablea
by 2, equivalent to the slightly lengthiera=a+2
.
The full list of supported assignment operators is+=
,-=
,*=
,/=
,%=
,**=
,&=
,|=
,^=
,>>=
, and<<=
. Pythonclasses can override the augmented assignment operators by defining methodsnamed__iadd__()
,__isub__()
, etc. For example, the followingNumber
class stores a number and supports using += to create a newinstance with an incremented value.
classNumber:def__init__(self,value):self.value=valuedef__iadd__(self,increment):returnNumber(self.value+increment)n=Number(5)n+=3printn.value
The__iadd__()
special method is called with the value of the increment,and should return a new instance with an appropriately modified value; thisreturn value is bound as the new value of the variable on the left-hand side.
Augmented assignment operators were first introduced in the C programminglanguage, and most C-derived languages, such asawk, C++, Java, Perl,and PHP also support them. The augmented assignment patch was implemented byThomas Wouters.
String Methods¶
Until now string-manipulation functionality was in thestring
module,which was usually a front-end for thestrop
module written in C. Theaddition of Unicode posed a difficulty for thestrop
module, because thefunctions would all need to be rewritten in order to accept either 8-bit orUnicode strings. For functions such asstring.replace()
, which takes 3string arguments, that means eight possible permutations, and correspondinglycomplicated code.
Instead, Python 2.0 pushes the problem onto the string type, making stringmanipulation functionality available through methods on both 8-bit strings andUnicode strings.
>>>'andrew'.capitalize()'Andrew'>>>'hostname'.replace('os','linux')'hlinuxtname'>>>'moshe'.find('sh')2
One thing that hasn't changed, a noteworthy April Fools' joke notwithstanding,is that Python strings are immutable. Thus, the string methods return newstrings, and do not modify the string on which they operate.
The oldstring
module is still around for backwards compatibility, but itmostly acts as a front-end to the new string methods.
Two methods which have no parallel in pre-2.0 versions, although they did existin JPython for quite some time, arestartswith()
andendswith()
.s.startswith(t)
is equivalent tos[:len(t)]==t
, whiles.endswith(t)
is equivalent tos[-len(t):]==t
.
One other method which deserves special mention isjoin()
. Thejoin()
method of a string receives one parameter, a sequence of strings,and is equivalent to thestring.join()
function from the oldstring
module, with the arguments reversed. In other words,s.join(seq)
isequivalent to the oldstring.join(seq,s)
.
Garbage Collection of Cycles¶
The C implementation of Python uses reference counting to implement garbagecollection. Every Python object maintains a count of the number of referencespointing to itself, and adjusts the count as references are created ordestroyed. Once the reference count reaches zero, the object is no longeraccessible, since you need to have a reference to an object to access it, and ifthe count is zero, no references exist any longer.
Reference counting has some pleasant properties: it's easy to understand andimplement, and the resulting implementation is portable, fairly fast, and reactswell with other libraries that implement their own memory handling schemes. Themajor problem with reference counting is that it sometimes doesn't realise thatobjects are no longer accessible, resulting in a memory leak. This happens whenthere are cycles of references.
Consider the simplest possible cycle, a class instance which has a reference toitself:
instance=SomeClass()instance.myself=instance
After the above two lines of code have been executed, the reference count ofinstance
is 2; one reference is from the variable named'instance'
, andthe other is from themyself
attribute of the instance.
If the next line of code isdelinstance
, what happens? The reference countofinstance
is decreased by 1, so it has a reference count of 1; thereference in themyself
attribute still exists. Yet the instance is nolonger accessible through Python code, and it could be deleted. Several objectscan participate in a cycle if they have references to each other, causing all ofthe objects to be leaked.
Python 2.0 fixes this problem by periodically executing a cycle detectionalgorithm which looks for inaccessible cycles and deletes the objects involved.A newgc
module provides functions to perform a garbage collection,obtain debugging statistics, and tuning the collector's parameters.
Running the cycle detection algorithm takes some time, and therefore will resultin some additional overhead. It is hoped that after we've gotten experiencewith the cycle collection from using 2.0, Python 2.1 will be able to minimizethe overhead with careful tuning. It's not yet obvious how much performance islost, because benchmarking this is tricky and depends crucially on how often theprogram creates and destroys objects. The detection of cycles can be disabledwhen Python is compiled, if you can't afford even a tiny speed penalty orsuspect that the cycle collection is buggy, by specifying the--without-cycle-gc
switch when running theconfigurescript.
Several people tackled this problem and contributed to a solution. An earlyimplementation of the cycle detection approach was written by Toby Kelsey. Thecurrent algorithm was suggested by Eric Tiedemann during a visit to CNRI, andGuido van Rossum and Neil Schemenauer wrote two different implementations, whichwere later integrated by Neil. Lots of other people offered suggestions alongthe way; the March 2000 archives of the python-dev mailing list contain most ofthe relevant discussion, especially in the threads titled "Reference cyclecollection for Python" and "Finalization again".
Other Core Changes¶
Various minor changes have been made to Python's syntax and built-in functions.None of the changes are very far-reaching, but they're handy conveniences.
Minor Language Changes¶
A new syntax makes it more convenient to call a given function with a tuple ofarguments and/or a dictionary of keyword arguments. In Python 1.5 and earlier,you'd use theapply()
built-in function:apply(f,args,kw)
calls thefunctionf()
with the argument tupleargs and the keyword arguments inthe dictionarykw.apply()
is the same in 2.0, but thanks to a patchfrom Greg Ewing,f(*args,**kw)
is a shorter and clearer way to achieve thesame effect. This syntax is symmetrical with the syntax for definingfunctions:
deff(*args,**kw):# args is a tuple of positional args,# kw is a dictionary of keyword args...
Theprint
statement can now have its output directed to a file-likeobject by following theprint
with>>file
, similar to theredirection operator in Unix shells. Previously you'd either have to use thewrite()
method of the file-like object, which lacks the convenience andsimplicity ofprint
, or you could assign a new value tosys.stdout
and then restore the old value. For sending output to standarderror, it's much easier to write this:
print>>sys.stderr,"Warning: action field not supplied"
Modules can now be renamed on importing them, using the syntaximportmoduleasname
orfrommoduleimportnameasothername
. The patch was submittedby Thomas Wouters.
A new format style is available when using the%
operator; '%r' will inserttherepr()
of its argument. This was also added from symmetryconsiderations, this time for symmetry with the existing '%s' format style,which inserts thestr()
of its argument. For example,'%r%s'%('abc','abc')
returns a string containing'abc'abc
.
Previously there was no way to implement a class that overrode Python's built-inin
operator and implemented a custom version.objinseq
returnstrue ifobj is present in the sequenceseq; Python computes this by simplytrying every index of the sequence until eitherobj is found or anIndexError
is encountered. Moshe Zadka contributed a patch which adds a__contains__()
magic method for providing a custom implementation forin
. Additionally, new built-in objects written in C can define whatin
means for them via a new slot in the sequence protocol.
Earlier versions of Python used a recursive algorithm for deleting objects.Deeply nested data structures could cause the interpreter to fill up the C stackand crash; Christian Tismer rewrote the deletion logic to fix this problem. Ona related note, comparing recursive objects recursed infinitely and crashed;Jeremy Hylton rewrote the code to no longer crash, producing a useful resultinstead. For example, after this code:
a=[]b=[]a.append(a)b.append(b)
The comparisona==b
returns true, because the two recursive data structuresare isomorphic. See the thread "trashcan and PR#7" in the April 2000 archives ofthe python-dev mailing list for the discussion leading up to thisimplementation, and some useful relevant links. Note that comparisons can nowalso raise exceptions. In earlier versions of Python, a comparison operationsuch ascmp(a,b)
would always produce an answer, even if a user-defined__cmp__()
method encountered an error, since the resulting exception wouldsimply be silently swallowed.
Work has been done on porting Python to 64-bit Windows on the Itanium processor,mostly by Trent Mick of ActiveState. (Confusingly,sys.platform
is still'win32'
on Win64 because it seems that for ease of porting, MS Visual C++treats code as 32 bit on Itanium.) PythonWin also supports Windows CE; see thePython CE page athttps://pythonce.sourceforge.net/ for more information.
Another new platform is Darwin/MacOS X; initial support for it is in Python 2.0.Dynamic loading works, if you specify "configure --with-dyld --with-suffix=.x".Consult the README in the Python source distribution for more instructions.
An attempt has been made to alleviate one of Python's warts, the often-confusingNameError
exception when code refers to a local variable before thevariable has been assigned a value. For example, the following code raises anexception on theprint
statement in both 1.5.2 and 2.0; in 1.5.2 aNameError
exception is raised, while 2.0 raises a newUnboundLocalError
exception.UnboundLocalError
is a subclass ofNameError
, so any existing code that expectsNameError
to beraised should still work.
deff():print"i=",ii=i+1f()
Two new exceptions,TabError
andIndentationError
, have beenintroduced. They're both subclasses ofSyntaxError
, and are raised whenPython code is found to be improperly indented.
Changes to Built-in Functions¶
A new built-in,zip(seq1,seq2,...)
, has been added.zip()
returns a list of tuples where each tuple contains the i-th element from each ofthe argument sequences. The difference betweenzip()
andmap(None,seq1,seq2)
is thatmap()
pads the sequences withNone
if thesequences aren't all of the same length, whilezip()
truncates thereturned list to the length of the shortest argument sequence.
Theint()
andlong()
functions now accept an optional "base"parameter when the first argument is a string.int('123',10)
returns 123,whileint('123',16)
returns 291.int(123,16)
raises aTypeError
exception with the message "can't convert non-string withexplicit base".
A new variable holding more detailed version information has been added to thesys
module.sys.version_info
is a tuple(major,minor,micro,level,serial)
For example, in a hypothetical 2.0.1beta1,sys.version_info
would be(2,0,1,'beta',1)
.level is a string such as"alpha"
,"beta"
, or"final"
for a final release.
Dictionaries have an odd new method,setdefault(key,default)
, whichbehaves similarly to the existingget()
method. However, if the key ismissing,setdefault()
both returns the value ofdefault asget()
would do, and also inserts it into the dictionary as the value forkey. Thus,the following lines of code:
ifdict.has_key(key):returndict[key]else:dict[key]=[]returndict[key]
can be reduced to a singlereturndict.setdefault(key,[])
statement.
The interpreter sets a maximum recursion depth in order to catch runawayrecursion before filling the C stack and causing a core dump or GPF..Previously this limit was fixed when you compiled Python, but in 2.0 the maximumrecursion depth can be read and modified usingsys.getrecursionlimit()
andsys.setrecursionlimit()
. The default value is 1000, and a rough maximumvalue for a given platform can be found by running a new script,Misc/find_recursionlimit.py
.
Porting to 2.0¶
New Python releases try hard to be compatible with previous releases, and therecord has been pretty good. However, some changes are considered usefulenough, usually because they fix initial design decisions that turned out to beactively mistaken, that breaking backward compatibility can't always be avoided.This section lists the changes in Python 2.0 that may cause old Python code tobreak.
The change which will probably break the most code is tightening up thearguments accepted by some methods. Some methods would take multiple argumentsand treat them as a tuple, particularly various list methods such asappend()
andinsert()
. In earlier versions of Python, ifL
isa list,L.append(1,2)
appends the tuple(1,2)
to the list. In Python2.0 this causes aTypeError
exception to be raised, with the message:'append requires exactly 1 argument; 2 given'. The fix is to simply add anextra set of parentheses to pass both values as a tuple:L.append((1,2))
.
The earlier versions of these methods were more forgiving because they used anold function in Python's C interface to parse their arguments; 2.0 modernizesthem to usePyArg_ParseTuple()
, the current argument parsing function,which provides more helpful error messages and treats multi-argument calls aserrors. If you absolutely must use 2.0 but can't fix your code, you can editObjects/listobject.c
and define the preprocessor symbolNO_STRICT_LIST_APPEND
to preserve the old behaviour; this isn't recommended.
Some of the functions in thesocket
module are still forgiving in thisway. For example,socket.connect(('hostname',25))
is the correctform, passing a tuple representing an IP address, butsocket.connect('hostname',25)
also works.socket.connect_ex
andsocket.bind
are similarly easy-going. 2.0alpha1 tightened these functions up, but becausethe documentation actually used the erroneous multiple argument form, manypeople wrote code which would break with the stricter checking. GvR backed outthe changes in the face of public reaction, so for thesocket
module, thedocumentation was fixed and the multiple argument form is simply marked asdeprecated; itwill be tightened up again in a future Python version.
The\x
escape in string literals now takes exactly 2 hex digits. Previouslyit would consume all the hex digits following the 'x' and take the lowest 8 bitsof the result, so\x123456
was equivalent to\x56
.
TheAttributeError
andNameError
exceptions have a more friendlyerror message, whose text will be something like'Spam'instancehasnoattribute'eggs'
orname'eggs'isnotdefined
. Previously the errormessage was just the missing attribute nameeggs
, and code written to takeadvantage of this fact will break in 2.0.
Some work has been done to make integers and long integers a bit moreinterchangeable. In 1.5.2, large-file support was added for Solaris, to allowreading files larger than 2 GiB; this made thetell()
method of fileobjects return a long integer instead of a regular integer. Some code wouldsubtract two file offsets and attempt to use the result to multiply a sequenceor slice a string, but this raised aTypeError
. In 2.0, long integerscan be used to multiply or slice a sequence, and it'll behave as you'dintuitively expect it to;3L*'abc'
produces 'abcabcabc', and(0,1,2,3)[2L:4L]
produces (2,3). Long integers can also be used in variouscontexts where previously only integers were accepted, such as in theseek()
method of file objects, and in the formats supported by the%
operator (%d
,%i
,%x
, etc.). For example,"%d"%2L**64
willproduce the string18446744073709551616
.
The subtlest long integer change of all is that thestr()
of a longinteger no longer has a trailing 'L' character, thoughrepr()
stillincludes it. The 'L' annoyed many people who wanted to print long integers thatlooked just like regular integers, since they had to go out of their way to chopoff the character. This is no longer a problem in 2.0, but code which doesstr(longval)[:-1]
and assumes the 'L' is there, will now lose the finaldigit.
Taking therepr()
of a float now uses a different formatting precisionthanstr()
.repr()
uses%.17g
format string for C'ssprintf()
, whilestr()
uses%.12g
as before. The effect is thatrepr()
may occasionally show more decimal places thanstr()
, forcertain numbers. For example, the number 8.1 can't be represented exactly inbinary, sorepr(8.1)
is'8.0999999999999996'
, while str(8.1) is'8.1'
.
The-X
command-line option, which turned all standard exceptions intostrings instead of classes, has been removed; the standard exceptions will nowalways be classes. Theexceptions
module containing the standardexceptions was translated from Python to a built-in C module, written by BarryWarsaw and Fredrik Lundh.
Extending/Embedding Changes¶
Some of the changes are under the covers, and will only be apparent to peoplewriting C extension modules or embedding a Python interpreter in a largerapplication. If you aren't dealing with Python's C API, you can safely skipthis section.
The version number of the Python C API was incremented, so C extensions compiledfor 1.5.2 must be recompiled in order to work with 2.0. On Windows, it's notpossible for Python 2.0 to import a third party extension built for Python 1.5.xdue to how Windows DLLs work, so Python will raise an exception and the importwill fail.
Users of Jim Fulton's ExtensionClass module will be pleased to find out thathooks have been added so that ExtensionClasses are now supported byisinstance()
andissubclass()
. This means you no longer have toremember to write code such asiftype(obj)==myExtensionClass
, but can usethe more naturalifisinstance(obj,myExtensionClass)
.
ThePython/importdl.c
file, which was a mass of #ifdefs to supportdynamic loading on many different platforms, was cleaned up and reorganised byGreg Stein.importdl.c
is now quite small, and platform-specific codehas been moved into a bunch ofPython/dynload_*.c
files. Anothercleanup: there were also a number ofmy*.h
files in the Include/directory that held various portability hacks; they've been merged into a singlefile,Include/pyport.h
.
Vladimir Marangozov's long-awaited malloc restructuring was completed, to makeit easy to have the Python interpreter use a custom allocator instead of C'sstandardmalloc()
. For documentation, read the comments inInclude/pymem.h
andInclude/objimpl.h
. For the lengthydiscussions during which the interface was hammered out, see the web archives ofthe 'patches' and 'python-dev' lists at python.org.
Recent versions of the GUSI development environment for MacOS support POSIXthreads. Therefore, Python's POSIX threading support now works on theMacintosh. Threading support using the user-space GNUpth
library was alsocontributed.
Threading support on Windows was enhanced, too. Windows supports thread locksthat use kernel objects only in case of contention; in the common case whenthere's no contention, they use simpler functions which are an order ofmagnitude faster. A threaded version of Python 1.5.2 on NT is twice as slow asan unthreaded version; with the 2.0 changes, the difference is only 10%. Theseimprovements were contributed by Yakov Markovitch.
Python 2.0's source now uses only ANSI C prototypes, so compiling Python nowrequires an ANSI C compiler, and can no longer be done using a compiler thatonly supports K&R C.
Previously the Python virtual machine used 16-bit numbers in its bytecode,limiting the size of source files. In particular, this affected the maximumsize of literal lists and dictionaries in Python source; occasionally people whoare generating Python code would run into this limit. A patch by Charles G.Waldman raises the limit from2**16
to2**32
.
Three new convenience functions intended for adding constants to a module'sdictionary at module initialization time were added:PyModule_AddObject()
,PyModule_AddIntConstant()
, andPyModule_AddStringConstant()
. Eachof these functions takes a module object, a null-terminated C string containingthe name to be added, and a third argument for the value to be assigned to thename. This third argument is, respectively, a Python object, a C long, or a Cstring.
A wrapper API was added for Unix-style signal handlers.PyOS_getsig()
getsa signal handler andPyOS_setsig()
will set a new handler.
Distutils: Making Modules Easy to Install¶
Before Python 2.0, installing modules was a tedious affair -- there was no wayto figure out automatically where Python is installed, or what compiler optionsto use for extension modules. Software authors had to go through an arduousritual of editing Makefiles and configuration files, which only really work onUnix and leave Windows and MacOS unsupported. Python users faced wildlydiffering installation instructions which varied between different extensionpackages, which made administering a Python installation something of a chore.
The SIG for distribution utilities, shepherded by Greg Ward, has created theDistutils, a system to make package installation much easier. They form thedistutils
package, a new part of Python's standard library. In the bestcase, installing a Python module from source will require the same steps: firstyou simply mean unpack the tarball or zip archive, and the run "pythonsetup.pyinstall
". The platform will be automatically detected, the compilerwill be recognized, C extension modules will be compiled, and the distributioninstalled into the proper directory. Optional command-line arguments providemore control over the installation process, the distutils package offers manyplaces to override defaults -- separating the build from the install, buildingor installing in non-default directories, and more.
In order to use the Distutils, you need to write asetup.py
script. Forthe simple case, when the software contains only .py files, a minimalsetup.py
can be just a few lines long:
fromdistutils.coreimportsetupsetup(name="foo",version="1.0",py_modules=["module1","module2"])
Thesetup.py
file isn't much more complicated if the software consistsof a few packages:
fromdistutils.coreimportsetupsetup(name="foo",version="1.0",packages=["package","package.subpackage"])
A C extension can be the most complicated case; here's an example taken fromthe PyXML package:
fromdistutils.coreimportsetup,Extensionexpat_extension=Extension('xml.parsers.pyexpat',define_macros=[('XML_NS',None)],include_dirs=['extensions/expat/xmltok','extensions/expat/xmlparse'],sources=['extensions/pyexpat.c','extensions/expat/xmltok/xmltok.c','extensions/expat/xmltok/xmlrole.c',])setup(name="PyXML",version="0.5.4",ext_modules=[expat_extension])
The Distutils can also take care of creating source and binary distributions.The "sdist" command, run by "pythonsetup.pysdist
', builds a sourcedistribution such asfoo-1.0.tar.gz
. Adding new commands isn'tdifficult, "bdist_rpm" and "bdist_wininst" commands have already beencontributed to create an RPM distribution and a Windows installer for thesoftware, respectively. Commands to create other distribution formats such asDebian packages and Solaris.pkg
files are in various stages ofdevelopment.
All this is documented in a new manual,Distributing Python Modules, thatjoins the basic set of Python documentation.
XML 模組¶
Python 1.5.2 included a simple XML parser in the form of thexmllib
module, contributed by Sjoerd Mullender. Since 1.5.2's release, two differentinterfaces for processing XML have become common: SAX2 (version 2 of the SimpleAPI for XML) provides an event-driven interface with some similarities toxmllib
, and the DOM (Document Object Model) provides a tree-basedinterface, transforming an XML document into a tree of nodes that can betraversed and modified. Python 2.0 includes a SAX2 interface and a stripped-downDOM interface as part of thexml
package. Here we will give a briefoverview of these new interfaces; consult the Python documentation or the sourcecode for complete details. The Python XML SIG is also working on improveddocumentation.
SAX2 支援¶
SAX defines an event-driven interface for parsing XML. To use SAX, you mustwrite a SAX handler class. Handler classes inherit from various classesprovided by SAX, and override various methods that will then be called by theXML parser. For example, thestartElement()
andendElement()
methods are called for every starting and end tag encountered by the parser, thecharacters()
method is called for every chunk of character data, and soforth.
The advantage of the event-driven approach is that the whole document doesn'thave to be resident in memory at any one time, which matters if you areprocessing really huge documents. However, writing the SAX handler class canget very complicated if you're trying to modify the document structure in someelaborate way.
For example, this little example program defines a handler that prints a messagefor every starting and ending tag, and then parses the filehamlet.xml
using it:
fromxmlimportsaxclassSimpleHandler(sax.ContentHandler):defstartElement(self,name,attrs):print'Start of element:',name,attrs.keys()defendElement(self,name):print'End of element:',name# Create a parser objectparser=sax.make_parser()# Tell it what handler to usehandler=SimpleHandler()parser.setContentHandler(handler)# Parse a file!parser.parse('hamlet.xml')
For more information, consult the Python documentation, or the XML HOWTO athttps://pyxml.sourceforge.net/topics/howto/xml-howto.html.
DOM 支援¶
The Document Object Model is a tree-based representation for an XML document. Atop-levelDocument
instance is the root of the tree, and has a singlechild which is the top-levelElement
instance. ThisElement
has children nodes representing character data and any sub-elements, which mayhave further children of their own, and so forth. Using the DOM you cantraverse the resulting tree any way you like, access element and attributevalues, insert and delete nodes, and convert the tree back into XML.
The DOM is useful for modifying XML documents, because you can create a DOMtree, modify it by adding new nodes or rearranging subtrees, and then produce anew XML document as output. You can also construct a DOM tree manually andconvert it to XML, which can be a more flexible way of producing XML output thansimply writing<tag1>
...</tag1>
to a file.
The DOM implementation included with Python lives in thexml.dom.minidom
module. It's a lightweight implementation of the Level 1 DOM with support forXML namespaces. Theparse()
andparseString()
conveniencefunctions are provided for generating a DOM tree:
fromxml.domimportminidomdoc=minidom.parse('hamlet.xml')
doc
is aDocument
instance.Document
, like all the otherDOM classes such asElement
andText
, is a subclass of theNode
base class. All the nodes in a DOM tree therefore support certaincommon methods, such astoxml()
which returns a string containing the XMLrepresentation of the node and its children. Each class also has specialmethods of its own; for example,Element
andDocument
instances have a method to find all child elements with a given tag name.Continuing from the previous 2-line example:
perslist=doc.getElementsByTagName('PERSONA')printperslist[0].toxml()printperslist[1].toxml()
For theHamlet XML file, the above few lines output:
<PERSONA>CLAUDIUS,kingofDenmark.</PERSONA><PERSONA>HAMLET,sontothelate,andnephewtothepresentking.</PERSONA>
The root element of the document is available asdoc.documentElement
, andits children can be easily modified by deleting, adding, or removing nodes:
root=doc.documentElement# Remove the first childroot.removeChild(root.childNodes[0])# Move the new first child to the endroot.appendChild(root.childNodes[0])# Insert the new first child (originally,# the third child) before the 20th child.root.insertBefore(root.childNodes[0],root.childNodes[20])
Again, I will refer you to the Python documentation for a complete listing ofthe differentNode
classes and their various methods.
Relationship to PyXML¶
The XML Special Interest Group has been working on XML-related Python code for awhile. Its code distribution, called PyXML, is available from the SIG's webpages athttps://www.python.org/community/sigs/current/xml-sig. The PyXML distribution also usedthe package namexml
. If you've written programs that used PyXML, you'reprobably wondering about its compatibility with the 2.0xml
package.
The answer is that Python 2.0'sxml
package isn't compatible with PyXML,but can be made compatible by installing a recent version PyXML. Manyapplications can get by with the XML support that is included with Python 2.0,but more complicated applications will require that the full PyXML package willbe installed. When installed, PyXML versions 0.6.0 or greater will replace thexml
package shipped with Python, and will be a strict superset of thestandard package, adding a bunch of additional features. Some of the additionalfeatures in PyXML include:
4DOM, a full DOM implementation from FourThought, Inc.
The xmlproc validating parser, written by Lars Marius Garshol.
The
sgmlop
parser accelerator module, written by Fredrik Lundh.
模組變更¶
Lots of improvements and bugfixes were made to Python's extensive standardlibrary; some of the affected modules includereadline
,ConfigParser
,cgi
,calendar
,posix
,readline
,xmllib
,aifc
,chunk
,wave
,random
,shelve
,andnntplib
. Consult the CVS logs for the exact patch-by-patch details.
Brian Gallew contributed OpenSSL support for thesocket
module. OpenSSLis an implementation of the Secure Socket Layer, which encrypts the data beingsent over a socket. When compiling Python, you can editModules/Setup
to include SSL support, which adds an additional function to thesocket
module:socket.ssl(socket,keyfile,certfile)
, which takes a socketobject and returns an SSL socket. Thehttplib
andurllib
moduleswere also changed to supporthttps://
URLs, though no one has implementedFTP or SMTP over SSL.
Thehttplib
module has been rewritten by Greg Stein to support HTTP/1.1.
Backward compatibility with the 1.5 version ofhttplib
is provided,though using HTTP/1.1 features such as pipelining will require rewriting code touse a different set of interfaces.
TheTkinter
module now supports Tcl/Tk version 8.1, 8.2, or 8.3, andsupport for the older 7.x versions has been dropped. The Tkinter module nowsupports displaying Unicode strings in Tk widgets. Also, Fredrik Lundhcontributed an optimization which makes operations likecreate_line
andcreate_polygon
much faster, especially when using lots of coordinates.
Thecurses
module has been greatly extended, starting from OliverAndrich's enhanced version, to provide many additional functions from ncursesand SYSV curses, such as colour, alternative character set support, pads, andmouse support. This means the module is no longer compatible with operatingsystems that only have BSD curses, but there don't seem to be any currentlymaintained OSes that fall into this category.
As mentioned in the earlier discussion of 2.0's Unicode support, the underlyingimplementation of the regular expressions provided by there
module hasbeen changed. SRE, a new regular expression engine written by Fredrik Lundh andpartially funded by Hewlett Packard, supports matching against both 8-bitstrings and Unicode strings.
新增模組¶
A number of new modules were added. We'll simply list them with briefdescriptions; consult the 2.0 documentation for the details of a particularmodule.
atexit
: For registering functions to be called before the Pythoninterpreter exits. Code that currently setssys.exitfunc
directly should bechanged to use theatexit
module instead, importingatexit
andcallingatexit.register()
with the function to be called on exit.(Contributed by Skip Montanaro.)codecs
,encodings
,unicodedata
: Added as part of the newUnicode support.filecmp
: Supersedes the oldcmp
,cmpcache
anddircmp
modules, which have now become deprecated. (Contributed by GordonMacMillan and Moshe Zadka.)gettext
: This module provides internationalization (I18N) andlocalization (L10N) support for Python programs by providing an interface to theGNU gettext message catalog library. (Integrated by Barry Warsaw, from separatecontributions by Martin von Löwis, Peter Funk, and James Henstridge.)linuxaudiodev
: Support for the/dev/audio
device on Linux, atwin to the existingsunaudiodev
module. (Contributed by Peter Bosch,with fixes by Jeremy Hylton.)mmap
: An interface to memory-mapped files on both Windows and Unix. Afile's contents can be mapped directly into memory, at which point it behaveslike a mutable string, so its contents can be read and modified. They can evenbe passed to functions that expect ordinary strings, such as there
module. (Contributed by Sam Rushing, with some extensions by A.M. Kuchling.)pyexpat
: An interface to the Expat XML parser. (Contributed by PaulPrescod.)robotparser
: Parse arobots.txt
file, which is used for writingweb spiders that politely avoid certain areas of a web site. The parser acceptsthe contents of arobots.txt
file, builds a set of rules from it, andcan then answer questions about the fetchability of a given URL. (Contributedby Skip Montanaro.)tabnanny
: A module/script to check Python source code for ambiguousindentation. (Contributed by Tim Peters.)UserString
: A base class useful for deriving objects that behave likestrings.webbrowser
: A module that provides a platform independent way to launcha web browser on a specific URL. For each platform, various browsers are triedin a specific order. The user can alter which browser is launched by setting theBROWSER environment variable. (Originally inspired by Eric S. Raymond's patchtourllib
which added similar functionality, but the final module comesfrom code originally implemented by Fred Drake asTools/idle/BrowserControl.py
, and adapted for the standard library byFred.)_winreg
: An interface to the Windows registry._winreg
is anadaptation of functions that have been part of PythonWin since 1995, but has nowbeen added to the core distribution, and enhanced to support Unicode._winreg
was written by Bill Tutt and Mark Hammond.zipfile
: A module for reading and writing ZIP-format archives. Theseare archives produced byPKZIP on DOS/Windows orzip onUnix, not to be confused withgzip-format files (which aresupported by thegzip
module) (Contributed by James C. Ahlstrom.)imputil
: A module that provides a simpler way for writing customizedimport hooks, in comparison to the existingihooks
module. (Implementedby Greg Stein, with much discussion on python-dev along the way.)
IDLE Improvements¶
IDLE is the official Python cross-platform IDE, written using Tkinter. Python2.0 includes IDLE 0.6, which adds a number of new features and improvements. Apartial list:
UI improvements and optimizations, especially in the area of syntaxhighlighting and auto-indentation.
The class browser now shows more information, such as the top level functionsin a module.
Tab width is now a user settable option. When opening an existing Python file,IDLE automatically detects the indentation conventions, and adapts.
There is now support for calling browsers on various platforms, used to openthe Python documentation in a browser.
IDLE now has a command line, which is largely similar to the vanilla Pythoninterpreter.
Call tips were added in many places.
IDLE can now be installed as a package.
In the editor window, there is now a line/column bar at the bottom.
Three new keystroke commands: Check module (Alt-F5), Import module (F5) andRun script (Ctrl-F5).
Deleted and Deprecated Modules¶
A few modules have been dropped because they're obsolete, or because there arenow better ways to do the same thing. Thestdwin
module is gone; it wasfor a platform-independent windowing toolkit that's no longer developed.
A number of modules have been moved to thelib-old
subdirectory:cmp
,cmpcache
,dircmp
,dump
,find
,grep
,packmail
,poly
,util
,whatsound
,zmod
. If you have code which relies on a module that's been moved tolib-old
, you can simply add that directory tosys.path
to get themback, but you're encouraged to update any code that uses these modules.
致謝¶
The authors would like to thank the following people for offering suggestions onvarious drafts of this article: David Bolen, Mark Hammond, Gregg Hauser, JeremyHylton, Fredrik Lundh, Detlef Lannert, Aahz Maruch, Skip Montanaro, VladimirMarangozov, Tobias Polzin, Guido van Rossum, Neil Schemenauer, and Russ Schmidt.