29th June, 2012
21 April, 2020
Throughout the long transition to “Python 3 by default” in the Python ecosystem,the question was occasionally raised as to whether or not the core Pythondevelopers were acting as reasonable stewards of the Python language.
While it largely stopped being a concern after the release of Python 3.5 inSeptember 2015, it was an entirely appropriate question prior to that, asPython 3 introduced backwards incompatible changes that more obviously helpedfuture users of the language than they did current users, so existing users(especially library and framework developers) were being asked to devote timeand effort to a transition that would cost them more in time and energy in thenear term than it would save them for years to come.
Since I had seen variants of these questions several times over theyears, I started this FAQ as an intermittently updated record of mythoughts on the topic, with updates generally being prompted by newiterations of the questions. I gave Sumana Harihareswara co-maintaineraccess in September 2019 so she could aid in updating it, but forsimplicity’s sake we’ve mostly retained the first-person singular (“I”)throughout. You can see the full history of changes in thesourcerepo.
The views expressed below are my own. While many of them are shared byother core developers, and I use “we” in several places where I believethat to be the case, I don’t claim to be writing on the behalf of everycore developer on every point. Several core developers (including Guido)have reviewed and offered comments on this document at various points intime, and aside from Guido noting that I was incorrect about his initialmotivation in creating Python 3, none of them has raised any objectionsto specific points or the document in general.
I am also not writing on behalf of the Python Software Foundation (ofwhich I am a nominated Fellow), nor on behalf of Python’s SteeringCouncil (of which I am a former member), nor on behalf of Red Hat (my previousemployer, for whom I worked for much of the time I maintained this).However, I do use several Red Hat specific examples when discussingenterprise perception and adoption of the Python platform -effectively bridging that gap between early adopters and the vastmajority of prospective platform users is kinda what Red Hatspecialises in, so I consider them an important measure of the inroadsPython 3 is making into more conservative development communities.
There were several extensive discussions of the state of the Python 3transition at PyCon US 2014 in Montreal, starting at the language summit,and continuing throughout the conference. These helped clarify many of theremaining points of contention, and resulted in a range of changes to Python3.5, Python 2.7, and the available tools to support forward migration fromPython 2 to Python 3. These discussions didn’t stop, but have rather continuedover the course of Python development, and can be expected to continuefor as long as folks are developing software that either fits into the commonsubset of Python 2 & 3, or else are having to maintain software that continuesto run solely under Python 2.
Note
If anyone is interested in writing about these issues in more formalmedia, please get in touch to check if particular answers are stillaccurate. Not only have the updates over the years been intermittent,they’ve also been less than completely comprehensive, so some answers mayrefer out to experiments that ultimately proved unininteresting orunsuccessful, or otherwise be out of date.
As with all essays on these pages, feedback is welcome via theissue tracker orTwitter.
Yes, we know this migration was/is disruptive.
Yes, we know that some sections of the community had never personallyexperienced the problems with the Python 2 Unicode model that thismigration was designed to eliminate, or otherwise preferred the closeralignment between the Python 2 text model and the POSIX text model.
Yes, we know that many of those problems had already been solved bysome sections of the community to their own satisfaction.
Yes, we know that by attempting to fix these problems in the core Unicodemodel we broke many of the workarounds that had been put in placeto deal with the limitations of the old model
Yes, we are trying to ensure there is a smooth migration path from Python2 to Python 3 to minimise the inevitable disruption
Yes, we know some members of the community would have liked the migration tomove faster and found the “gently, gently, there’s no rush” approach of thecore development team frustrating
No, we did not do this lightly
No, we did not see any other way to ensure Python remained a viabledevelopment platform as developer communities grow in locationswhere English is not the primary spoken language. It should be at leastpossible for users to start learning the basics of Python without havingto first learn English as a prerequisite (even if English remains arequirement for full participation in the global Python and open sourceecosystems).
It is my perspective that the web and GUI developers have the right idea:dealing with Unicode text correctly is not optional in the modern world.In large part, the Python 3 redesign involved taking Unicode handlingprinciples elaborated in those parts of the community and building theminto the core design of the language.
According to Guido, he initiated the Python 3 project to clean up a varietyof issues with Python 2 where he didn’t feel comfortable with fixing themthrough the normal deprecation process. This included the removal of classicclasses, changing integer division to automatically promote to a floatingpoint result (retaining the separate floor division operation) and changingthe core string type to be based on Unicode by default. With a compatibilitybreak taking place anyway, the case was made to just include some otherchanges in that process (like converting print to a function), rather thangoing through the full deprecation process within the Python 2 series.
If it had just been about minor cleanups, the transition would likely havebeen more straightforward, but also less beneficial. However, the changesto the text model in Python 3 are one of those ideas that has profoundlychanged the way I think about software, and we receive similar feedback frommany other users that never really understood how Unicode worked in Python 2,but were able to grasp it far more easily in Python 3. Redesigning the waythe Python builtin types model binary and text data has the ultimate aim ofhelpingall Python applications (including the standard library itself) tohandle Unicode text in a more consistent and reliable fashion (I originally had“without needing to rely on third party libraries and frameworks” here,but those are still generally needed to handle system boundaries correctly,even in Python 3).
Note
For a more complete version of this answer that places it in the widerindustry context of Unicode adoption, see this article of mine on the RedHat Developer Blog:The Transition to Multilingual Programming with Python
I also gave a presentation on the topic at PyCon Australia 2015, which isavailable onlinehere
The core Unicode support in the Python 2 series has the honour of beingdocumented inPEP 100.It was created asMisc/unicode.txt in March 2000 (before thePEP process even existed) to integrate Unicode 3.0 support into Python 2.0.Once the PEP process was defined, it was deemed more appropriate to capturethese details as an informational PEP.
Guido, along with the wider Python and software development communities,learned a lot about the best techniques for handling Unicode in the six yearsbetween the introduction of Unicode support in Python 2.0 and inaugurationof thepython-3000 mailing list in March 2006.
One of the most important guidelines for good Unicode handling is to ensurethat all encoding and decoding occurs at system boundaries, with allinternal text processing operating solely on Unicode data. The Python 2Unicode model is essentially the POSIX text model with Unicode supportbolted on to the side, so it doesn’t follow that guideline: it allowsimplicit decoding at almost any point where an 8-bit string encounters aUnicode string, along with implicit encoding at almost any location wherean 8-bit string is needed but a Unicode string is provided.
One reason this approach is problematic is that it means the traceback foran unexpectedUnicodeDecodeError
orUnicodeEncodeError
in alarge Python 2.x code base almostnever points you to the code that isbroken. Instead, you have to trace the origins of thedata in the failingoperation, and try to figure out where the unexpected 8-bit or Unicode codestring was introduced. By contrast, Python 3 is designed to fail fast inmost situations: when aUnicodeError
of any kind occurs, it is morelikely that the problem actually does lie somewhere close to the operationthat failed. In those cases where Python 3 doesn’t fail fast, it’s becauseit is designed to “round trip” - so long as the output encoding matchesthe input encoding (even if it turns out the data isn’t properly encodedaccording to that encoding), Python 3 will aim to faithfully reproduce theinput byte sequence as the output byte sequence.
The implicit nature of the conversions in Python 2 also means that encodingoperations may raise decoding errors and vice-versa, depending on the inputtypes and the codecs involved.
A more pernicious problem arises when Python 2doesn’t throw an exceptionat all - this problem occurs when two 8-bit strings with data in differenttext encodings are concatenated or otherwise combined. The result is invaliddata, but Python will happily pass it on to other applications in itscorrupted form. Python 3 isn’t completely immune to this problem, but itshould arise in substantially fewer cases.
The general guiding philosophy of the text model in Python 3 is essentially:
try to do the right thing by default
if we can’t figure out the right thing to do, throw an exception
as far as is practical, always require users to opt in to behavioursthat pose a significant risk of silently corrupting data in non-ASCIIcompatible encodings
Ned Batchelder’s wonderfulPragmatic Unicode talk/essay could just aswell be titled “This is why Python 3 exists”. There are a large number ofUnicode handling bugs in the Python 2 standard library that have not been,and will not be, fixed, as fixing them within the constraints of the Python2 text model is considered too hard to be worth the effort (to put thateffort into context: if you judge the core development team by ouractionsit is clear that we consider that creating and promoting Python 3 was aneasier andmore pleasant alternative to attempting to fix those issueswhile abiding by Python 2’s backwards compatibility requirements).
The revised text model in Python 3 also means that theprimary stringtype is now fully Unicode capable. This brings Python closer to the modelused in the JVM, Android, .NET CLR, and Unicode capable Windows APIs. Onekey consequence of this is that the interpreter core in Python 3 is farmore tolerant of paths that contain Unicode characters on Windows (so,for example, having a non-ASCII character in your username should nolonger cause any problems with running Python scripts from your homedirectory on Windows). Thesurrogateescape
error handler added inPEP 383 is designed to bridge the gap between the new text model inPython 3 and the possibility of receiving data through bytes oriented APIson POSIX systems where the declared system encoding doesn’t match theencoding of the data itself. That error handler is also useful in othercases where applications need to tolerate mismatches between declaredencodings and actual data - while it does share some of the problems of thePython 2 Unicode model, it at least has the virtue of only causing problemsin the case of errors either in the input data or the declared encoding,where Python 2 could get into trouble in the presence of multiple datasources withdifferent encodings, even if all the input was correctlyencoded in its declared encoding.
Python 3 also embeds Unicode support more deeply into the language itself.With the primary string type handling the full Unicode range, it becamepractical to make UTF-8 the default source encoding (instead of ASCII) andadjust many parts of the language that were previously restricted to ASCIItext (such as identifiers) to now permit a much wider range of Unicodecharacters. This permits developers with a native language other than Englishto use names in their own language rather than being forced to use namesthat fit within the ASCII character set. Some areas of the interpreter thatwere previously fragile in the face of Unicode text (such as displayingexception tracebacks) are also far more robust in Python 3.
Removing the implicit type conversions entirely also made it more practicalto implement the new internal Unicode data model for Python 3.3, wherethe internal representation of Unicode strings is automatically adjustedbased on the highest value code point that needs to be stored (seePEP 393 for details).
The Python 2 core text model looks like this:
str
: 8-bit type containing binary data, or encoded text data in anunknown (hopefully ASCII compatible) encoding, represented as length 18-bit strings
unicode
: 16-bit or 32-bit type (depending on build options) containingUnicode code points, represented as length 1 Unicode strings
That first type is essentially the way POSIX systems model text data, so itis incredibly convenient for interfacing with POSIX environments, since itlets you just copy bits around without worrying about their encoding. It isalso useful for dealing with the ASCII compatible segments that are partof many binary protocols.
The conceptual problem with this model is that it is an appropriate model forboundary code - the kind of code that handles the transformation betweenwire protocols and file formats (which are always a series of bytes), and themore structured data types actually manipulated by applications (which mayinclude opaque binary blobs, but are more typically things like text, numbersand containers).
Actualapplications shouldn’t be manipulating values that “might betext, might be arbitrary binary data”. In particular, manipulating textvalues as binary data in multiple different text encodings can easily causea problem the Japanese named “mojibake”: binary data that includes text inmultiple encodings, but with no clear structure that defines which parts arein which encoding.
Unfortunately, Python 2 uses a type with exactly those semantics as its corestring type, permits silent promotion from the “might be binary data” typeto the “is definitely text” type and provides little support for accountingfor encoding differences.
So Python 3 changes the core text model to be one that is more appropriateforapplication code rather than boundary code:
str
: a sequence of Unicode code points, represented as length 1strings (always contains text data)
bytes
: a sequence of integers between 0 and 255 inclusive (alwayscontains arbitrary binary data). While it still has many operations thatare designed to make it convenient to work on ASCII compatible segments inbinary data formats, itis not implicitly interoperable with thestr
type.
The hybrid “might be encoded text, might be arbitrary binary data, caninteroperate with both other instances of str and also with instances ofunicode” type wasdeliberately removed from the core text model becauseusing the same type for multiple distinct purposes makes it incrediblydifficult to reason about correctly. The core model in Python 3 opts tohandle the “arbitrary binary data” case and the “ASCII compatible segmentsin binary data formats” case, leaving the direct manipulation of encodedtext to a (currently still hypothetical) third party type (due to the manyissues that approach poses when dealing with multibyte and variable widthtext encodings).
The purpose of boundary code is then to hammer whatever comes in over thewire or is available on disk into a format suitable for passing on toapplication code.
Unfortunately, there have turned out to be some key challenges in makingthis model pervasive in Python 3:
the same design changes that improve Python 3’s Windows integration bychanging several OS interfaces to operate on text rather than binary dataalso make it more sensitive to locale misconfiguration issues onPOSIX operating systems other than Mac OS X. In Python 2, text is alwayssent and received from POSIX operating system interfaces asbinary data,and the associated decoding and encoding operations are fully under thecontrol of the application. In Python 3, the interpreter aims to handlethese operations automatically, but in releases up to and includingPython 3.6 it needs to rely on the default settings in the OS providedlocale module to handle the conversion, making it potentially sensitive toconfiguration issues that many Python 2 applications could ignore. Mostnotably, if the OS erroneously claims that “ascii” is a suitable encodingto use for operating system interfaces (as happens by default in a numberof cases, due to the formal definition of the ANSI C locale predating theinvention of UTF-8 by a few years), the Python 3 interpreter will believeit, and will complain if asked to handle non-ASCII data.PEP 538 andPEP 540 offer some possible improvements in this area (by assuming UTF-8as the preferred text encoding when running in the defaultC
locale), butit isn’t a trivial fix due to the phase of the interpreter startup sequencewhere the problem occurs. (Thanks go to Armin Ronacher for clearlyarticulating many of these details - see his write-up in theclick documentation)
when migrating libraries and frameworks from Python 2 to Python 3 thathandle boundary API problems, the lack of the hybrid “might be text, mightbe arbitrary bytes” type can be keenly felt, as the implicitlyinteroperable type was essential to being able to cleanly share codebetween the two modes of operation. This usually isn’t a major problemfornew Python 3 code - such code is typically designed to operate inthe binary domain (perhaps relying on the methods for working with ASCIIcompatible segments), the text domain, or to handle a transition betweenthem. However, code being ported from Python 2 may need to continue toimplement hybrid APIs in order to accommodate users that make differentdecisions regarding whether to operate in the binary domain or the textdomain in Python 3 - because Python 2 blurred the distinction, differentusers will make different choices, and third party libraries andframeworks may need to account for that rather than forcing a particularanswer for all users.
in the initial Python 3 design, interpolation of variables into a formatstring was treated solely as a text domain operation. While this proved to bea reasonable design decision for the flexible Python-specificstr.format
operation,PEP 461 restored printf-style interpolation for ASCIIcompatible segments in binary data in Python 3.5. Prior to that change, thelack of this feature could sometimes be an irritation when working extensivelyin Python 3 with wire protocols and file formats that include ASCII compatiblesegments.
while the API design of thestr
type in Python 3 was based directly ontheunicode
type in Python 2, thebytes
type doesn’t have such aclean heritage. Instead, it evolved over the course of the initial Python 3pre-release design period, starting from a model where theonly type forbinary data handling was the type now calledbytearray
. That type wasmodelled directly on thearray.array('B')
type, and hence producedintegers when iterating over it or indexing into it. During the pre-releasedesign period, the lack of an immutable binary data type was identified asa problem, and the (then mutable)bytes
type was renamed tobytearray
and a new immutablebytes
type added. The now familiar“bytes literal” syntax was introduced (prepending a “b” prefix to thestring literal syntax) and the representations of the two types were alsoadjusted to be based on the new bytes literal syntax. With the benefit ofhindsight, it has become clear another change should have been made at thesame time: with so many affordances switched back to matching those of thePython 2str
type (including the use of the new bytes literal syntax torefer to that type in Python 2.6 and 2.7),bytes
andbytearray
should have been been switched away from behaving like a tuple of integersand list of integers (respectively) and instead modified to be containersof length 1bytes
objects, just as thestr
type is a container of length 1str
objects. Unfortunately, that change was not made at the time, andnow backwards compatibility constraints within the Python 3 series itselfmakes it highly unlikely the behaviour will be changed in the futureeither.PEP 467 covers a number of other still visible remnants ofthis convoluted design history that are more amenable to being addressedwithin the constraints of Python’s normal Python deprecation processes.
These changes are a key source of friction when it comes to Python 3 betweenthe Python core developers and other experts that had fully mastered thePython 2 text model, especially those that focus on targeting POSIXplatforms rather than Windows or the JVM, as well as those that focus onwriting boundary code, such as networking libraries, web frameworks andfile format parsers and generators. These developers bore a lot of theburden of adjusting to these changes on behalf of their users, often whilegaining few or none of the benefits.
That said, while these issues certainly aren’t ideal, they also won’t impactmany users that are relying on libraries and frameworks to deal with boundaryissues, and can afford to ignore possible misbehaviour in misconfigured POSIXenvironments. As Python 3 has matured as a platform, most of thoseareas where it has regressed in suitability relative to Python 2 have beenaddressed. In particular, the ongoing migrations of Linux distributionutilities from Python 2 to Python 3 have seen many of the platformintegration issues on POSIX systems dealt with in a cleaner fashion. Thetuple-of-ints and list-of-ints behaviour ofbytes
andbytearray
isunlikely to change, but proposals likePEP 467 may bring better toolsfor dealing with them.
The design decision to go with a fixed width Unicode representation bothexternally and internally has a long history in Python, going all the wayback to the addition of Python’s original Unicode support in Python 2.0.Using a fixed width type at that point meant that many of the algorithmscould be shared between the original 8-bitstr
type and the new16-or-32-bitunicode
type. (Note that adoption of this particularapproach predates my own involvement in CPython core development - as withmany other aspects of CPython’s text handling support, it’s something I’velearned about while helping with the transition to pervasive Unicode supportin the standard library and elsewhere for Python 3).
That design meant that, historically, CPython builds had to choose what sizeto use for the internal representation of Unicode text. We always chose touse “narrow” builds for the Windows binary installers published onpython.org, as the UTF-16 internal representation was the best fit for theWindows text handling APIs.
Linux distributions, by contrast, almost all chose the memory hungry “wide”builds that allocated 32 bits per Unicode code point in Python 2unicode
objects and Python 3str
objects (up to & including Python 3.2), even forpure ASCII text. There’s a reason they went for that option, though: it wasbetter at handling Unicode code points outside the basic multilingual plane.In narrow builds the UTF-16 code points were exposed directly in both the CAPI and the Python API of theunicode
type, and hence were prone to bugsrelated to incorrect handling of code points greater than 65,535 in code thatassumed a one-to-one correspondence between Python code points and Unicodecode points. This wasn’t generally a big deal when code points in common useall tended to fit in the BMP, but started to become more problematic asthings like mathematical and musical notation, ancient languages, emoticonsand additional CJK ideographs were added. Given the choice between greatermemory efficiency and correctness, the Linux distributions chose correctness,imposing a non-trivial memory usage penalty on Unicode heavy applicationsthat couldn’t rely entirely onstr
objects in Python 2 orbytes
andbytearray
objects in Python 3. Those larger strings also came at a costin speed, since they not only meant having more data to move around relativeto narrow builds (or applications that only allowed 8-bit text), but thelarger memory footprint also made CPU caches less effective.
When it came to the design of the C level text representation for Python3, the existing Python 2 Unicode design wasn’t up for reconsideration - thePython 2unicode
type was mapped directly to the Python 3str
type.This is most obvious in the Python 3 C API, which still uses the samePyUnicode_*
prefix for text manipulation APIs, as that was the easiestway to preserve compatibility with C extensions that were originally writtenagainst Python 2.
However, removing the intertwining of the 8-bit str type and the unicodetype that existed in Python 2 paved the way for eliminating the narrowvs wide build distinction in Python 3.3, and eliminating a significantportion of the memory cost associated with getting correct Unicode handlingin earlier versions of Python. As a result ofPEP 393, strings thatconsist solely of latin-1 or UCS2 code points in Python 3.3+ are able to use8 or 16 bits per code point (as appropriate), while still being able to usestring manipulation algorithms that rely on the assumption of consistent codepoint sizes within a given string. As with the original Python 3implementation, there were also a large number of constraints imposed onthis redesign of the internal representation based on the public C API, andthat is reflected in some of the more complicated aspects of the PEP.
While it’s theoretically possible to write string manipulation algorithmsthat work correctly with variable width encodings (potentially saving evenmore memory), it isn’teasy to do so, and for cross-platform runtimes thatinteroperate closely with the underlying operating system the way CPythondoes, there isn’t an obvious universally correct choice even today, let aloneback in 2006 when Guido first started the Python 3 project. UTF-8 comesclosest (hence the wording of this question), but it still poses risks ofsilent data corruption on Linux if you don’t explicitly transcode data atsystem boundaries (particularly if the actual encoding of metadata providedby the system is ASCII incompatible, as can happen in East Asian countriesusing encodings like Shift-JIS and GB-18030) and still requires transcodingbetween UTF-16-LE and UTF-8 on Windows (the bytes-oriented APIs on Windows aregenerally restricted to thembcs
encoding, making them effectivelyuseless for proper Unicode handling - it’s necessary to switch to theWindows specific UTF-16 based APIs to make things work properly).
The Python 3 text model also trades additional memory usage for encodingand decoding speed in some cases, including caching the UTF-8representation of a string when appropriate. In addition to UTF-8, other keycodecs like ASCII, latin-1, UTF-16 and UTF-32 are closelyintegrated with the core text implementation in order to make them asefficient as is practical.
The current Python 3 text model certainly has its challenges, especiallyaround Linux compatibility (seePEP 383 for an example of the complexityassociated with that problem), but those are considered the lesser evil whencompared to the alternative of breaking C extension compatibility and havingto rewrite all the string manipulation algorithms to handle a variable widthinternal encoding, while still facing significant integration challenges onboth Windows and Linux. Instead of anyone pursuing such a drastic change, Iexpect the remaining Linux integration issues for the existing model to beresolved as we help Linux distributions like Ubuntu and Fedora migrate theirsystem services to Python 3 (in the specific case of Fedora, that migrationencompasses both the operating system installerand the package manager).
Still, for new runtimes invented today, particularly those aimed primarilyat new server applications running on Linux that can afford to ignore theintegration challenges that arise on Windows and older Linux systems usingencodings other than UTF-8, using UTF-8 for their internal stringrepresentation makes a lot of sense. It’s just best to avoid exposing the rawbinary representation of text data for direct manipulation in user code:experience has shown that a Unicode code point based abstraction is mucheasier to work with, even if it means opting out of providing O(1) indexingfor arbitrary code points in a string to avoid allocating additional memoryper code point based on the largest code point in the string. For newlanguages that are specifically designed to accommodate a variable widthinternal encoding for text, a file-like opaque token based seek/tell styleAPI is likely to be more appropriate for random access to strings than aPython style integer based indexing API. The kind of internal flexibilityoffered by the latter approach can be seen in Python’s ownio.StringIO
implementation - in Python 3.4+, that aims to delay creation of a full stringobject for as long as possible, an optimisation that could be implementedtransparently due to the file-like API that type exports.
Note
Python 3 does assume UTF-8 at system boundaries on Mac OS X, sincethat OS ensures that the assumption will almost always be correct. Startingwith Python 3.6, CPython on Windows also assumes that binary data passed tooperating system interfaces is in UTF-8 and transcodes it to UTF-16-LE beforepassing it to the relevant Windows APIs.
For Python 3.7,PEP 538 andPEP 540 are likely to extend the UTF-8assumption to the defaultC
locale more generally (so other systemencodings will still be supported through the locale system, but theproblematic ASCII default will be largely ignored).
The other backwards incompatible changes in Python 3 largely fell into thefollowing categories:
dropping deprecated features that were frequent sources of bugs inPython 2, or had been replaced by superior alternatives and retainedsolely for backwards compatibility
reducing the number of statements in the language
replacing concrete list and dict objects with more memory efficientalternatives
renaming modules to be more PEP 8 compliant and to automatically use Caccelerators when available
The first of those were aimed at making the language easier to learn, andeasier to maintain. Keeping deprecated features around isn’t free: in orderto maintain code that uses those features, everyone needs to remember themand new developers need to be taught them. Python 2 had acquired a lot ofquirks over the years, and the 3.x series allowed such design mistakes to becorrected.
While there were advantages to havingprint
andexec
as statements,they introduced a sharp discontinuity when switching from the statement formsto any other alternative approach (such as changingprint
tologging.debug
orexec
toexecfile
), and also required the use ofawkward hacks to cope with the fact that they couldn’t accept keywordarguments. For Python 3, they were demoted to builtin functions in orderto remove that discontinuity and to exploit the benefits of keyword onlyparameters.
The increased use of iterators and views was motivated by the fact thatmany of Python’s core APIs were designedbefore the introduction ofthe iterator protocol.That meant a lot unnecessary lists were being created when more memoryefficient alternatives were now possible.We didn’t get them all (you’ll still find APIs that unnecessarily returnconcrete lists and dictionaries in various parts of the standard library),but the core APIs are all now significantly more memory efficient by default.
As with the removal of deprecated features, the various renaming operationswere designed to make the language smaller and easier to learn. Names thatdon’t follow standard conventions need to be remembered as special cases,while those that follow a pattern can be derived just be remembering thepattern. Using the API compatible C accelerators automatically also meansthat end users no longer need to know about and explicitly request theaccelerated variant, and alternative implementations don’t need to providethe modules under two different names.
No backwards incompatible changes were made just for the sake of making them.Each one was justified (at least at the time) on the basis of making thelanguage either easier to learn or easier to use.
With the benefit of hindsight, a number of these other changes would probablyhave been better avoided (especially some of the renaming ones), but even thosecases at least seemed like a good idea at the time. At this point, internalbackwards compatibility requirements within the Python 3.x series mean itisn’t worth the hassle of changing them back, especially given the existenceof thesix compatibility project and other third party modules thatsupport both Python 2 and Python 3 (for example, therequests
packageis an excellent alternative to using the low levelurllib
interfacesdirectly, even thoughsix
does provide appropriate cross-versioncompatible access through thesix.moves.urllib
namespace).
One of the consequences of the intertwined implementations of thestr
andunicode
types in Python 2 is that it made it difficult to updatethem to correctly interoperate with anythingelse. The dual type textmodel also made it quite difficult to add Unicode support to various APIsthat previously didn’t support it.
This isn’t an exhaustive list, but here are several of the enhancementsin Python 3 that would likely be prohibitively difficult to backport toPython 2 (even when they’re technically backwards compatible):
PEP 393 (more efficient text storage in memory)
Unicode identifier support
full Unicode module name support
improvements in Unicode path handling on Windows
multiple other improvements in Unicode handling when interfacing withWindows APIs
more robust and user friendly handling of Unicode characters in objectrepresentations and when displaying exceptions
increased consistency in Unicode handling in files and at the interactiveprompt (although the C locale on POSIX systems still triggers undesirablebehaviour in Python 3)
greater functional separation between text encodings and other codecs,including tailored exceptions nudging users towards the more genericAPIs when needed (this change in Python 3.4 also eliminates certainclasses of remote DOS attack targeted at the compression codecs in thecodec machinery when using the convenience methods on the core typesrather than the unrestricted interfaces in the codecs module)
using the new IO model (with automatic encoding and decoding support) bydefault
Python 2.7.18was released on 20 April 2020,perPEP 373. More details:
January 1, 2020: Code freeze for Python 2.7.18
The “End of Life/sunset” of Python 2.7 was January 1, 2020. UntilJanuary 1, 2020, the release manager for 2.7.x was working on Release2.7.18, the final release of Python (developers added a fewimprovements to Python 2.7 between the 2.7.17 release on October 19,2019 and the sunset date of January 1, 2020). On January 1,2020, the release manager stopped development and froze the codebase(seepython-dev discussion):from that date, there were no backports to 2.7.18 from Python 3.
April 20 2020: Final Production Release of Python 2.7.18
There werea few small patches after the code freeze date for 2.7.18. (Noregressions were introduced between the Python 2.7.17 release inOctober 2019 and the code freeze date of January 1, 2020.) BetweenJanuary 1, 2020 and April 20, 2020, the release manager shepherded therelease through the beta and Release Candidate process.
Note
This list is rather incomplete and I’m unlikely to find the time tocomplete it - if anyone is curious enough to put together a morecomprehensive timeline, feel free to use this answer as a starting point,or else just send a PR to add more entries to this list.
At least the following events should be included in a more complete list:
IPython Python 3 support
Cython Python 3 support
SWIG Python 3 support
links for the Ubuntu, Fedora and openSUSE “Python 3 as default” migrationplans
SQL Alchemy Python 3 support
pytz Python 3 support
PyOpenSSL support
mod_wsgi Python 3 support (first 3.x WSGI implementation)
Tornado Python 3 support (first 3.x async web server)
Twisted Python 3 support (most comprehensive network protocol support)
Pyramid Python 3 support (first major 3.x compatible web framework)
Django 1.5 and 1.6 (experimental and stable Python 3 support)
Werkzeug and Flask Python 3 support
requests Python 3 support
pyside Python 3 support (first Python 3.x Qt bindings)
pygtk and/or pygobject Python support
wxPython phoenix project
VTK Python 3 support in August 2015 (blocked Mayavi, which blocked Canopy)
cx-Freeze Python 3 support
greenlet Python 3 support
pylint Python 3 support
nose2 Python 3 support
pytest Python 3 support
Editor/IDE support for Python 3 in: PyDev, Spyder,Python Tools for Visual Studio, PyCharm, WingIDE, Komodo (others?)
Embedded Python 3 support in: Blender, Kate, vim, gdb, gcc, LibreOffice(others?)
version availability in services like Google DataLab and Azure Notebooks
Python 3 availability in Heroku
availability in the major Chinese public cloud platforms (Alibaba/Aliyun,Tencent Qcloud, Huawei Enterprise Cloud, etc)
the day any bar onhttps://python3wos.appspot.com/ orwedge onhttp://py3readiness.org/ turned green was potentiallya significant step for some subsection of the community :)
March 2006: Guido van Rossum (the original creator of Python andhence Python’s Benevolent Dictator for Life), with financial supportfrom Google, took the previously hypothetical “Python 3000” projectand turned it into an active development project, aiming to createan updated Python language definition and reference interpreterimplementation that addressed some fundamental limitations in theability of the Python 2 reference interpreter to correctly handlenon-ASCII text. (The project actually started earlier than this - March2006 was when the python-3000 list was created to separate out the longerterm Python 3 discussions from the active preparation for the Python 2.5final release)
April 2006: Guido publishedPEP 3000, laying the ground rules forPython 3 development, and detailing the proposed migration strategyfor Python 2 projects (the recommended porting approach has changedsubstantially since then, seeWhat other changes have occurred that simplify migration? for more details).PEP 3100 describes several of the overall goals of the project, andlists many smaller changes that weren’t covered by their own PEPs.PEP 3099 covers a number of proposed changes that were explicitlydeclared out of scope of the Python 3000 project.
At this point in time, Python 2 and Python 3 started being developed inparallel by the core development team for the reference interpreter.
August 2007: The first alpha release of Python 3.0 was published.
February 2008: The first alpha release of Python 2.6 was publishedalongside the third alpha of Python 3.0. The release schedules for bothPython 2.6 and 3.0 are covered inPEP 361.
October 2008: Python 2.6 was published, including the backwardscompatible features defined for Python 3.0, along with a number of__future__
imports and the-3
switch to help make it practicalto add Python 3 support to existing Python 2 software (or to migrateentirely from Python 2 to Python 3). While Python 2.6 received its finalupstream security update in October 2013, maintenance & support remainsavailable through some commercial redistributors.
December 2008: In a fit of misguided optimism, Python 3.0 was publishedwith an unusably slow pure Python IO implementation - it worked tolerablywell for small data sets, but was entirely impractical for handlingrealistic workloads on the CPython reference interpreter. (Python 3.0received a single maintenance release, but was otherwise entirelysuperceded by the release of Python 3.1)
ActiveState became the first company I am aware of to start offeringcommercial Python 3 support by shipping ActivePython 3.0 almost immediatelyafter the upstream release was published. They have subsequently continued thistrend of closely following upstream Python 3 releases.
March 2009: The first alpha release of Python 3.1, with an updatedC accelerated IO stack, was published.PEP 375 covers the details of thePython 3.1 release cycle.
June 2009: Python 3.1 final was published, providing the first versionof the Python 3 runtime that was genuinely usable for realistic workloads.Python 3.1 received its final security update in April 2012, and even commercialsupport for this version is no longer available.
September 2009:setuptools 0.6.2 was released,the first version to support Python 3.
October 2009:PEP 3003 was published, declaring a moraratorium onlanguage level changes in Python 2.7 and Python 3.2. This was done todeliberately slow down the pace of core development for a couple of years,with additional effort focused on standard library improvements (as wellas some improvements to the builtin types).
December 2009: The first alpha of Python 2.7 was published.PEP 373covers the details of the Python 2.7 release cycle.
July 2010: Python 2.7 final was published, providing many of thebackwards compatible features added in the Python 3.1 and 3.2 releases.Python 2.7 is currently still fully supported by the core development teamand will continue receiving maintenance & security updates until at leastJanuary 2020.
Once the Python 2.7 maintenance branch was created, the py3k developmentbranch was retired: for the first time, the default branch in the mainCPython repo was the upcoming version of Python 3.
August 2010: The first alpha of Python 3.2 was published.PEP 392covers the details of the Python 3.2 release cycle. Python 3.2 restoredpreliminary support for the binary and text transform codecs that hadbeen removed in Python 3.0.
NumPy 1.5.0 was released,the first version to support Python 3.
October 2010:PEP 3333 was published to define WSGI 1.1, a Python 3compatible version of the Python Web Server Gateway Interface.
February 2011: Python 3.2 final was published, providing the firstversion of Python 3 with support for the Web Server Gateway Interface.Python 3.2 received its final security update in February 2016, and evencommercial support for this version is no longer available.
SciPy 0.9.0 was released,the first version to support Python 3.
March 2011: After Arch Linux updated their Python symlink torefer to Python 3 (breaking many scripts that expected it to refer toPython 2),PEP 394 was published to provide guidance to Linuxdistributions on more gracefully handling the transition from Python 2 toPython 3.
Also in March, CPython migrated from Subversion to Mercurial(seePEP 385), with the first message from Mercurial to thepython-checkins list beingthis commit from Senthil Kumaran.This ended more than two years of managing parallel updates of four activebranches usingsvnmerge
rather than a modern DVCS.
April 2011:pip 1.0 was released,the first version to support Python 3.
virtualenv 1.6 was released,the first version to support Python 3.
November 2011:PEP 404 (the Python 2.8 Un-release Schedule) waspublished to make it crystal clear that the core development team had no plansto make a third parallel release in the Python 2.x series.
March 2012: The first alpha of Python 3.3 was published.PEP 398covers the details of the Python 3.3 release cycle. Notably, Python3.3 restored support for Python 2 style Unicode literals after ArminRonacher and other web framework developers pointed out that this was onechange that the web frameworks couldn’t handle on behalf of their users.PEP 414 covers the detailed rationale for that change.
April 2012: Canonical published Ubuntu 12.04 LTS, including commercialsupport for both Python 2.7 and Python 3.2.
September 2012: Six and half years after the inauguration of the Python3000 project, Python 3.3 final was published as the first Python3 release without a corresponding Python 2 feature release. This releaseintroduced thePEP 380yieldfrom
syntax that was used heavily in theasyncio
coroutine framework provisionally introduced to the standard libraryin Python 3.4, and subsequently declared stable in Python 3.6.
October 2012:PEP 430 was published, and theonline Pythondocumentation updated to present the Python 3documentation by default. In order to preserve existing links, deep linkscontinue to be interpreted as referring to the Python 2.7 documentation.
March 2013:PEP 434 redefined IDLE as an application shipped withPython rather than part of the standard library, allowing the addition ofnew features in maintenance releases. Significantly, this allowed thePython 2.7 IDLE to be brought more into line with the features of the Python3.x version.
Continuum Analytics started offering commercial support for cross-platformPython 3.3+ environments through their “Anaconda” Python distributions.
Pillow 2.0.0 was released,the first version to support Python 3.
August 2013: The first alpha of Python 3.4 was published.PEP 429covers the details of the Python 3.4 release cycle. Amongst other changes,Python 3.4 restored full support for the binary and text transform codecsthat were reinstated in Python 3.2, while maintaining the “text encodingsonly” restriction for the convenience methods on the builtin types.
September 2013: Red Hat published “Red Hat Software Collections 1.0”,providing commercial support for both Python 2.7 and Python 3.3 on RedHat Enterprise Linux systems, with later editions adding support foradditional 3.x releases.
December 2013: The initial development of MicroPython, a variant of Python3 specifically for microcontrollers, was successfully crowdfunded onKickstarter.
March 2014: Python 3.4 final was published as the second Python 3release without a corresponding Python 2 release. It included severalfeatures designed to provide a better starting experience for newcomersto Python, such as bundling the “pip” installer by default, and includinga rich asynchronous IO library.
April 2014: Ubuntu 14.04 LTS, initial target release for the “OnlyPython 3 on the install media” Ubuntu migration plan. (They didn’t quitemake it - a few test packages short onUbuntu Touch, further away on the server and desktop images)
Red Hat also announced the creation ofsoftwarecollections.orgas the upstream project powering the Red Hat Software Collections product.The whole idea of both the project and the product is to make it easy to runapplications using newer (or older!) language, database and web serverruntimes, without interfering with the versions of those runtimes integrateddirectly into the operating system.
Note
With the original “5 years for migration to Python 3” target dateapproaching, April 2014 is also when Guido van Rossum amended thePython 2.7 release PEP to move the expected end-of-life datefor Python 2.7 out to 2020.
May 2014: Python 2.7.7 was published, the first Python 2.7 maintenancerelease to incorporate additional security enhancement features as described inPEP 466. Also the first release where Microsoft contributed developertime to the creation of the Windows installers.
June 2014: The first stable release of PyPy3, providing a version ofthe PyPy runtime that is compatible with Python 3.2.5 (together withPEP 414’s restoration of theu''
string literal prefix that firstappeared in Python 3.3 for CPython).
Red Hat published Red Hat Enterprise Linux 7, with Python 2.7 as the systemPython. This release ensures that Python 2.7 will remain a commerciallysupported platform untilat least 2024 (based on Red Hat’s 10 year supportlifecycle).
Note
June 2014 also marked 5 years after the first production capablePython 3.x release (Python 3.1), and the original target date forcompletion of the Python 3 migration.
July 2014: CentOS 7 was released, providing a community distro based onRed Hat Enterprise Linux 7, and marking the beginning of the end of the Python2.7 rollout (the CentOS system Python is a key dependency for many Pythonusers).
boto v2.32.0 released with Python 3 support for most modules.
nltk 3.0b1 released with Python 3 support and the NLTK book switched over tocovering Python 3 by default.
February 2015: The first alpha of Python 3.5 was published.PEP 478covers the details of the Python 3.5 release cycle. Amongst other changes,PEP 461 restored support for printf-style interpolation of binary data,addressing a significant usability regression in Python 3 relative to Python 2.
October 2014: SUSE Linux Enterprise Server 12 was released, containingsupported Python 3.4 RPMs, adding SUSE to the list of commercial Python 3redistributors.
March 2015: Microsoft Azure App Service launched with both Python 2.7 andPython 3.4 support, adding Microsoft to the list of commercial Pythonredistributors for the first time.
August 2015: At the Fedora community’s annual Flock conference,Denise Dumas (Red Hat’s VP of Platform Engineering), explicitly statedthat it would be an engineering goal to include only Python 3 in thebase operating system for the next major version of Red Hat EnterpriseLinux (previously this had been implied by Red Hat’s work on migratingFedora and its infrastructure to Python 3, but not explicitly statedin a public venue)
September 2015: Python 3.5 final was released, bringing native syntacticsupport for asynchronous coroutines and a matrix multiplication operator, aswell as the typing module for static type hints. Applications, libraries andframeworks wishing to take advantage of the new syntactic features need toreconsider whether or not to continue supporting Python 2.7.
Twisted 15.4 was released, the first version to include a Python 3 compatibleversion of the “Twisted Trial” test runner. This allowed the Twisted projectto start running its test suite under Python 3, leading to steadily increasingPython 3 compatibility in subsequent Twisted releases.
October 2015: Fedora 23 shipped with only Python 3 in the LiveCD and alldefault images other than the Server edition.
MicroPython support for the BBC micro:bit project waspublicly announced, ensuring first class Python 3 support in a significanteducational initiative.
PyInstaller 3.0 was released, supporting Python 2.7, and 3.3+.
March 2016: gevent 1.1 was released, supporting Python 2.6, 2.7, and 3.3+.
May 2016: Several key projects in the Scientific Python community publishedthePython 3 Statement, explicitlydeclaring their intent to end Python 2 support in line with the referenceinterpreter’s anticipated 2020 date for the end of free community support.
August 2016: Google App Engine added official Python 3.4(!) support to theirFlexible Environments (Python 3.5 support followed not long after, but theoriginal announcement was for Python 3.4).
As part of rolling out Python 3.5 support, Microsoft Azure publishedinstructions on how to select a particular Python version usingApp Service Site Extensions.
Initial release of Enthought Deployment Manager, with support for Python 2.7and 3.5.
Mozilla provided the PyPy project with adevelopment grantto bring their PyPy3 variant up to full compatibility with Python 3.5.
December 2016: Python 3.6 final was released, bringing further syntacticenhancements for asynchronous coroutines and static type hints, as well as anew compiler assisted string formatting syntax that manages to be both morereadable (due to the use of inline interpolation expressions) and faster (dueto the compiler assisted format parsing) than previous string formattingoptions. ThroughPEP 528 andPEP 529, this release also featuredsignificant improvements to the Windows compatibility of bytes-centricPOSIX applications, and the Windows-specificpy launcher started using Python3 by default when both Python 2.x and 3.x are available on the system.
March 2017: The first beta release of PyPy3 largely compatible withPython 3.5 waspublished(including support for the Python 3.6 f-string syntax).
Enthought Canopy 2.0.0 available, supporting Python 2.7 and 3.5 (officialbinary release date TBD - as of April 2017, the download page still offersCanopy 1.7.4)
April 2017: AWS Lambda added official Python 3.6 support, making Python 3available by default through the 3 largest public cloud providers (Amazon,Microsoft, Google).
IPython 6.0 was released, the first feature release to requirePython 3. The IPython 5.x series remains in maintenance mode as the lastversion supporting Python 2.7 (and Python 3 based variants of IPython retainfull support for running and interacting with Python 2 language kernels usingProject Jupyter’s language independent notebook protocol).
December 2017: Djangoreleased Django 2.0,the first version of Django todrop support for Python 2.7.
March 2018: Guido van Rossumclarifiedthat “The way I see the situation for 2.7 is that EOL is January 1st,2020, and there will be no updates, not even source-only securitypatches, after that date. Support (from the core devs, the PSF, andpython.org) stops completely on that date. If you want support for 2.7beyond that day you will have to pay a commercial vendor.”
June 2018:Python 3.7.0 final was released, bringingimprovements such asthe new built-inbreakpoint()
function defined byPEP 553,time
functions with nanosecond resolution perPEP 564, and morestreamlined Python documentation translations.
September 2018: matplotlibreleased 3.0.0, the first release todrop support for Python 2.x.
May 2019: Therelease of Red Hat Enterprise Linux 8. RHEL8does not come with Python 2 or Python 3 already installed andusable by default. RedHat recommended users choose Python 3, andthe platform Python foruse by system tools in RHEL 8 is Python 3.6.
August 2019: The entirety ofhttp://py3readiness.org/turnedgreen,indicating Python 3 support for the 360 most downloaded packages onPyPI.
September 2019: The release ofCentOS 7(in which Python 3 is available) andCentOS 8(whichfollows RHEL 8in its approach to Python).
January 2020: Python 2.7 switched to security fix only mode,ending roughly thirteen years of parallel maintenance of Python 2 and3 by the core development team for the reference interpreter.
20 April 2020:Python 2.7.18 released.
Note
At time of writing, the events below are in the future, and hencespeculative as to their exact nature and timing. However, they reflectcurrently available information based on the stated intentions of developersand distributors.
April 2018: Revised anticipated date for Ubuntu and Fedora to have finishedmigrating default components of their respective server editions toPython 3 (some common Linux components, most notably the Samba protocol server,proved challenging to migrate, so the stateful server variants of thesedistributions ended up taking longer to migrate to Python 3 than other variantsthat omitted those components from their default package set)
April 2021: Anticipated date for Ubuntu LTS 16.04 to go end of life, thefirst potential end date for commercial Python 2 support from Canonical (ifPython 2.7 is successfully migrated to the community supported repositories forthe Ubuntu 18.04 LTS release)
April 2024: Anticipated date for Ubuntu LTS 18.04 to go end of life, thesecond potential end date for commercial Python 2 support from Canonical (if itproves necessary to keep Python 2.7 in the commercially supported repositoriesas a dependency for the Ubuntu 18.04 LTS release)
June 2024: Anticipated date for Red Hat Enterprise Linux 7 to go end oflife, also anticipated to be the last commercially supported redistribution ofthe Python 2 series.
I put the date for this as the release of Python 3.5, in September 2015. Thisrelease brought with it two major syntactic enhancemens (one giving Python’scoroutine support its own dedicated syntax, distinct from generators, andanother providing a binary operator for matrix multiplication), and restoreda key feature that had been missing relative to Python 2 (printf-style binaryinterpolation support). It also incorporated a couple of key reliability andmaintainability enhancements, in the form of automated handling of EINTRsignals, and the inclusion of a gradual typing framework in the standardlibrary.
Others may place the boundary at the release of Python 3.6, in December 2016,as the new “f-string” syntax provides a form of compiler-assisted stringinterpolation that is both faster and more readable than its predecessors:
print("Hello%s!"%name)# All versionsprint("Hello{0}!".format(name))# Since Python 2.6 & 3.0print("Hello{}!".format(name))# Since Python 2.7 & 3.2print(f"Hello{name}!")# Since Python 3.6
Python 3.6 also provides further enhancements to the native coroutine syntax,as well as full syntactic support for annotating variables with static typehints.
Going in to this transition process, my personal estimate was thatit would take roughly 5 years to get from the first production ready releaseof Python 3 to the point where its ecosystem would be sufficiently mature forit to be recommended unreservedly for allnew Python projects.
Since 3.0 turned out to be a false start due to its IO stack being unusablyslow, I start that counter from the release of 3.1: June 27, 2009.With Python 3.5 being released a little over 6 years after 3.1 and 3.6 a littlemore than a year after that, that means we clearly missed that original goal -the text model changes in particular proved to be a larger barrier to migrationthan expected, which slowed adoption by existing library and frameworkdevelopers.
However, despite those challenges, key parts of the ecosystem were able tosuccessfully add Python 3 support well before the 3.5 release. NumPy and therest of the scientific Python stack supported both versions by 2015, as didseveral GUI frameworks (including PyGame).
The Pyramid, Django and Flask web frameworks supported both versions, as didthemod_wsgi
Python application server, and the py2exe, py2app and cx-Freezebinary creators. The upgrade of Pillow from a repackaging project to a fulldevelopment fork also brought PIL support to Python 3.
nltk supported Python 3 as of nltk 3.0, and the NLTK bookswitched to be basedon Python 3 at the same time.
For AWS users, mostboto
modules became available on Python 3 as ofhttp://boto.readthedocs.org/en/latest/releasenotes/v2.32.0.html.
PyInstaller is a popular option for creating native system installers for Pythonapplications, and it has supported Python 3 since the 3.0 release in October2015.
gevent is a popular alternative to writing natively asynchronous code, and itbecame generally available for Python 3 with the 1.1 release in March 2016.
As of April 2017, porting the full Twisted networking framework to Python 3 isstill a work in progress, but many parts of it are already fully operational,and for new projects, native asyncio-based alternatives are often going to beavailable in Python 3 (especially for common protocols like HTTPS).
I think Python 3.5 is a superior language to 2.7 in almost every way (withthe error reporting improvements being the ones I missed most when my day jobinvolved working on a Python 2.6 application).
For educational purposes, there are a few concepts like functions, iterablesand Unicode that need to be introduced earlier than was needed in Python 2, andthere are still a few rough edges in adapting between the POSIX text model andthe Python 3 one, but these are more than compensated for through improveddefault behaviours and more helpful error messages.
While students in enterprise environments may still need to learn Python 2 fora few more years, there are some significant benefits in learning Python 3first, as that means students will already know which concepts survived thetransition, and be more naturally inclined to write code that fits into thecommon subset of Python 2 and Python 3. This approach will also encouragenew Python users that need to use Python 2 for professional reasons to takeadvantage of the backports and other support modules on PyPI to bring theirPython 2.x usage as close to writing Python 3 code as is practical.
Support in enterprise Linux distributions is also a key point for uptakeof Python 3. Canonical have already shipped long term support for threeversions of Python 3 (Python 3.2 in Ubuntu 12.04 LTS, 3.4 in 14.04 LTS, and3.5 in 14.04 LTS) and are continuing withthe process of eliminatingPython 2 from the installation images.
A Python 3 stack has existed in Fedora since Fedora 13 and has beengrowing over time, with Python 2 successfully removed from the live install CDsinlate 2015 (Fedora 23). Red Hat also now ship fully supported Python 3.xruntimes as part of theRed Hat Software Collections product and theOpenShift Enterprise self-hosted Platform-as-a-Service offering (with new 3.xversions typically becoming commercially available within 6-12 months of theupstream release, and then remaining supported for 3 years from that point).
At Fedora’s annual Flock conference in August 2015, Denise Dumas (VP of PlatformEngineering) also indicated that Red Hat aimed to have the next major version ofRed Hat Enterprise Linux ship only Python 3 in the base operating system, withPython 2 available solely through the Software Collections model (inverting thecurrent situation, where Python 2 is available in both Software Collections andthe base operating system, while Python 3 is only commercially available throughSoftware Collections and the Software Collections based OpenShift environments).
The Arch Linux team have gone even further, making Python 3 thedefault Python on Arch installations. I amdubious as to the wisdomof their specific migration strategy, but I certainly can’t complain aboutthe vote of confidence!
The OpenStack project, likely the largest open source Python project short ofthe Linux distro aggregations, is also in the process of migrating from Python2 to Python 3, and maintains a detailedstatus trackingpage for the migration.
Outside the Linux ecosystem, other Python redistributors like ActiveState,Enthought, and Continuum Analytics provide both Python 2 and Python 3 releases,and Python 3 environments are also available through the major public cloudplatforms.
The short answer is: 2024, four years after CPython support endsin 2020.
Python 2 is still a good language. While I think Python 3 is abetter language (especially when it comes to the text model, errorreporting, the native coroutine syntax in Python 3.5, and the stringformatting syntax in Python 3.6), we’ve deliberately designed themigration plan so users could update ontheir timetable rather thanours (at least within a window of several years), and we expectcommercial redistributors to extend that timeline even further.
The PyPy project have also stated their intention to continue providing aPython 2.7 compatible runtime indefinitely, since the RPython language usedto implement PyPy is a subset of Python 2 rather than of Python 3.
I personally expect CPython 2.7 to remain a reasonably common deploymentplatform until mid 2024. Red Hat Enterprise Linux 7 (released in June 2014)uses CPython 2.7 as the system Python, and many library, framework andapplication developers base their minimum supported version of Python on thesystem Python in RHEL (especially since that also becomes the system Python indownstream rebuilds like CentOS and Scientific Linux). While Red Hat’s activelytrying to change that slow update cycle by encouraging application developersto target the Software Collections runtimes rather than the system Python, thatchange in itself is a significant cultural shift for the RHEL/CentOS user base.
Aside from Blender, it appears many publishing and animation toolswith Python support are happy enough with Python 2.7 that they aren’tquickly moving to Python 3.Scribus, and some AutoDesktools like3ds Max,Mayaand MotionBuilder, support Python 2.7 and are only slowly moving tosupport Python 3. But some have made stronger commitments.Inkscape’sLTS 0.92.x line aims to continue supporting Python 2.7but0.92.5 will also support Python 3, andthe 1.0 line will drop support for Python 2. AndtheVFX Reference Platform (tracked by AutoDesketc.) is moving to Python 3.7 in calendar year 2020: “Python 3 inCY2020 is a firm commitment, it will be a required upgrade as Python 2will no longer be supported beyond 2020.”
Many GIS tools similarly currently still use Python 2.7. This actuallymakes a fair bit of sense, especially for the commercial tools, sincethe Python support in these tools is there primarily to manipulate theapplication data model and there arguably aren’t any majorimprovements in Python 3 for that kind of use case as yet, but stillsome risk of breaking existing scripts if the application updates toPython 3. However,ESRi’s ArcGIS has handled the migration problemby switching to Python 3 in the new ArcGIS product line, sticking withPython 2 in the ArcGIS Desktop/Server/Engine product lines, andproviding tools to assist with migration between them.
From a web security perspective, Python 2’s standard library isalready a relic. Anyone doing web programming in Python 2 that touchesthe public internet shouldnot be relying solely on the standardlibrary, since it’s too old, and instead should be relying more onthird party modules from PyPI. For example, instead of the SSL module,useRequests.
For the open source applications when Python 2 is currently seen as a“good enough” scripting engine, the likely main driver for Python 3 scriptingsupport is likely to be commercial distribution vendors looking to dropcommercial Python 2 runtime support - the up front investment in applicationlevel Python 3 support would be intended to pay off in the form of reduced longterm sustaining engineering costs at the language runtime level.
That said, the Python 3 reference interpreter also offers quite a few new lowlevel configuration options that let embedding applications control the memoryallocators used, monitor and control all bytecode execution, and variousother improvements to the runtime embedding functionality, so the naturalincentives for application developers to migrate are starting to accumulate,which means we may see more activity on that front as the 2020 date for theend of community support of the Python 2 series gets closer.
With RHEL 8 and Ubuntu LTS 18.04 now using Python 3.6 for theirprimary system Python installation, and Debian 10 and SLES 15 offeringPython 3 support alongside Python 2, it’s reasonable to wonder why ittook more than a decade for Linux distributions to reach a point wheretheir migration away from the Python 2.x series is nearing completion.
While part of the problem was simply the sheer amount of code to bereviewed and potentially updated, the core of the delay was the issuesdiscussed in the answer toWhat’s up with POSIX systems in Python 3?: with Python 3’sinternal text model now being different from the one in POSIX, thehistorical mechanisms for interacting with POSIX systems from Python2.x didn’t quite work right in earlier Python 3.x releases, and thatsituation needed to be improved before the interpreter would onceagain be fully suitable for use in core operating system components.
That situation was largely resolved with the implementation of bothPEP 538 (locale coercion for the legacy C locale) andPEP 540(UTF-8 mode) in CPython 3.7. The system Python installation in RHEL 8actually includes a backport of the PEP 538 locale coercion behaviour,as perthe relevant section in the PEP.
(Note: Red Hat and Canonical have both contributed significantly tothe broad adoption of Python 3 as a platform, migrating not only theirown projects and applications, but also often investing time in addingPython 3 support to the open source libraries that they depend on.)
The short answer is: they decided not to ship Python, and severalother scripting languages,at all (with the OS, for end user use),and we believe that decision had nothing to do with the 2-to-3transition.
Unlike the open source Linux distributors, Apple doesn’t generallymake the rationale for their engineering decisions public. The onething we do know in this case is that inthe macOS 10.15 releasenotes,Apple have declaredall of the open source language runtimes thatthey currently ship (including Python, Perl, and Ruby) to bedeprecated, and have advised application developers that require thoseruntimes to bundle their own interpreter with their application. ThemacOS 10.15 release notes also explicitly advise against using themacOS system installation of Python 2.7 for any purpose.
So while it’s possible that the creation of Python 3 was one of thefactors that contributed to this eventual outcome, the productmanagement decision within Apple appears to have been “We will notactively promote or encourage any developer experience for ourplatforms that we don’t largely control” (specifically, Obective-C andSwift). They’re hardly unique amongst platform developers in thatregard - there were major battles for control between Sun andMicrosoft over Java that contributed to Microsoft’s eventual creationof the C# programming language, and the later fights between Oracleand Google (also over Java), presumably had some impact on thelatter’s decision to embrace Kotlin as their preferred language forAndroid app development.
(Note: Linux distribution vendors also advise against using the systemPython runtimes to run your own custom applications, and RHEL 8installs the system Python in a way that means it isn’t available tousers by default.)
While the frequency with which this question is asked has declined markedlysince 2015 or so, a common thread I saw running through such declarations of“failure” was people not quite understanding the key questions where thetransition plan was aiming to change the answers. These are the three keyquestions:
“I am interested in learning Python. Should I learn Python 2 or Python 3?”
“I am teaching a Python class. Should I teach Python 2 or Python 3?”
“I am an experienced Python developer starting a new project. Should Iuse Python 2 or Python 3?”
At the start of the migration, the answer to all of those questions wasobviously “Python 2”. By August 2015, I considered the answer to be“Python 3.4, unless you have a compelling reason to choose Python 2 instead”.Possible compelling reasons included “I am using existing course materialthat was written for Python 2”, “I am teaching the course to maintainersof an existing Python 2 code base”, “We have a large in-house collection ofexisting Python 2 only support libraries we want to reuse” and “I only usethe version of Python provided by my Linux distro vendor and they currentlyonly support Python 2” (in regards to that last point, we realised early thatthe correct place to tackle it was on thevendor side, and by late 2014,all of Canonical, Red Hat, and SUSE had commercial Python 3 offeringsavailable).
Note the question thatisn’t on the list: “I have a large Python 2application which is working well for me. Should I migrate it to Python 3?”.
While OpenStack and some key Linux distributions have answered “Yes”, for mostorganisations the answer tothat question remained “No” for several yearswhile companies like Canonical, Red Hat, Facebook, Google, Dropbox, and othersworked to migrate their own systems, and published the related migrationtools (such as thepylint--py3k
option, and the work that has gone into themypy
andtypeshed
projects to allow Python 3 static type analysis to beapplied to Python 2 programs prior to attempting to migrate them).
While platform effects are starting to shift even the answer to that questiontowards “Maybe” for the majority of users (and Python 3 gives Python 2 a muchnicer exit strategy to a newer language than COBOL ever did), the time frameforthat change is a lot longer than the five years that was projected forchanging the default choice of Python version for green field projects.
That said, reducing or eliminating any major remaining barriers to migrationis an ongoing design goal for Python 3.x releases, at least in those caseswhere the change is also judged to be an internal improvement within Python 3(for example, the restoration of binary interpolation support in Python 3.5 wasmotivated not just by making it easier to migrate from Python 2, but also tomake certain kinds of network programming and other stream processing codeeasier to write in Python 3).
In the earlier days of the Python 3 series, several of the actions taken bythe core development team were actually deliberately designed to keepconservative usersaway from Python 3 as a way of providing time for theecosystem to mature.
Now, if Python 3 had failed to offer a desirable platform, nobody would havecared about this in the slightest. Instead, what we saw was the following:
people coming up with great migration guides and utilitiesindependentlyof the core development team. Whilesix was created by a coredeveloper (Benjamin Peterson), andlib2to3
and the main porting guidesare published by the core development team,python-modernize was createdby Armin Ronacher (creator of Jinja2 and Flask), whilepython-futurewas created by Ed Schofield based on that earlier work. Lennart Regebrohas also done stellar work in creating anin-depth guide to porting toPython 3
Linux distributions aiming to make Python 2 an optional download andhave only Python 3 installed by default
commercial Python redistributors and public cloud providers ensuring thatPython 3 was included as one of their supported offerings
customers approaching operating system vendors and asking for assistancein migrating large proprietary code bases from Python 2 to Python 3
more constrained plugin ecosystems that use an embedded Python interpreter(like Blender, gcc, and gdb) either adding Python 3 support, or elsemigrating entirely from Python 2 to 3
developers lamenting the fact that theywanted to use Python 3, but werebeing blocked by various dependencies being missing, or because theypreviously used Python 2, and needed to justify the cost of migration totheir employer
library and framework developers that hadn’t already added Python 3 supportfor their own reasons being strongly encouraged by their users to offer it(sometimes in the form of code contributions, other times in the form oftracker issues, mailing list posts and blog entries)
interesting new implementations/variants like MyPy and MicroPython takingadvantage of the removal of legacy behaviour to target the leaner Python 3language design rather than trying to handle the full backwardscompatibility implications of implementing Python 2
developers complaining that the core development team wasn’t beingaggressive enough in forcing the community to migrate promptly rather thanallowing the migration to proceed at its own pace (!)
That last case only appeared around 2014 (~5 years into the migration), andthe difference in perspective appears to be an instance of the classic earlyadopter/early majority divide in platform adoption. The deliberately gentlemigration plan was (and is) for the benefit of the late adopters that drivePython’s overall popularity, not the early adopters that make up both the opensource development community and the (slightly) broader software developmentblogging community.
It’s important to keep in mind that Python 2.6 (released October 2008) has longstood as one of the most widely deployed versions of Python, purely throughbeing the system Python in Red Hat Enterprise Linux 6 and its derivatives,and usage of Python 2.4 (released November 2004) remained non-trivial through toat least March 2017 for the same reason with respect to Red Hat EnterpriseLinux 5.
I expect there is a similar effect from stable versions of Debian, Ubuntu LTSreleases and SUSE Linux Enterprise releases, but (by some strange coincidence)I’m not as familiar with the Python versions and end-of-support dates for thoseas I am with those for the products sold by my employer ;)
If we weren’t getting complaints from the early adopter crowd about the paceof the migration,then I would have been worried (as it would have indicatedthey had abandoned Python entirely and moved on to something else).
The final key point to keep in mind is that the available metrics on Python3 adoption are quite limited, and that remains true regardless of whether wethink the migration is going well or going poorly. The three main quantitativeoptions are to analyse user agents on the Python Package Index, declarationsof Python 3 support on PyPI and binary installer downloads for Mac OS X andWindows from python.org.
The first of those remains heavily dominated byexisting Python 2 users, butthe trend in Python 3 usage is still upwards. These metrics are stored as apublic data set in Google Big Query, andthis post goes oversome of the queries that are possible with the available data. The recordsare incomplete prior to June 2016, but running the query in April 2017 showsdownloads from Python 3 clients increasing from around 7% of approximately 430million downloads in June 2016 to around 12% of approximately 720 milliondownloads in March 2017.
The second is based on publisher provided package metadata rather than automatedversion compatibility checking.
Of the top 360most downloaded packages, 100% offer Python 3 support. Again, thetrend is upwards (the number in 2014 was closer to 70%), and I’m notaware of anyoneadding Python 3 support, and then removing it asimposing too much maintenance overhead.
The last metric reached the point where Python 3 downloads outnumbered Python 2downloads (54% vs 46%) back in 2013. Those stats needs to be collected manuallyfrom thewww.python.org
server access logs, so I don’t have anything morerecent than that.
The Python 3 ecosystem is definitely still the smaller of the two as of April2017 (by a non-trivial margin), but users that start with Python 3 are ableto move parts of their applications and services to Python 2 readily enough ifthe need arises, and hopefully with a clear idea of which parts of Python 2 arethe modern recommended parts that survived the transition to Python 3, and whichparts are the legacy cruft that only survives in the latest Python 2.x releasesdue to backwards compatibility concerns.
For the inverse question relating to the concern that the existing migrationplan is tooaggressive, seeAren’t you abandoning Python 2 users?.
Yes, its place as the natural successor to the already dominant Python 2platform is now assured. Commercial support has long been available frommultiple independent vendors, the vast majority of the core components from thePython 2 ecosystem are already available, and the combination of the Python3.5+ releases and Python’s uptake in the education and data analysis sectorsprovide assurance of a steady supply of both Python developers, and work forthose developers (in the 2016 edition of IEEE’s survey of programming languages,Python was 3rd, trailing only Java and C, overtaking C++ relative to its2015 position, and both C++ and C# relative to the initial 2014 survey).
For me, with my Linux-and-infrastructure-software bias, thetipping point has been Ubuntu and Fedora successfully making the transitionto only having Python 3 in their default install. That change means thata lot of key Linux infrastructure software is now Python 3 compatible, aswell as representing not only a significant statement of trust in the Python 3platform by a couple of well respected organisations (Canonical and Red Hat),but also a non-trivial investment of developer time and energy in performingthe migration. This change will also mean that Python 3 will be more readilyavailable than Python 2 on those platforms in the future, and hence more likelyto be used as the chosen language variant for Python utility scripts, and henceincrease the attractiveness of supporting Python 3 for library and frameworkdevelopers.
A significant milestone only attained over 2016 and 2017 has been the threelargest public cloud providers (Amazon Web Services, Microsoft Azure, andGoogle Cloud Platform) ensuring that Python 3 is a fully supported developmentoption on their respective platforms, adding to the support already previouslyavailable in platforms like Heroku and OpenShift Online.
Specifically in the context of infrastructure, I also see theongoing migration of OpenStack components frombeing Python 2 only applications to being Python 3 compatible as highlysignificant, as OpenStack is arguably one of the most notable Pythonprojects currently in existence in terms of spreading awareness outsidethe traditional open source and academic environs. In particular, asOpenStack becomes a Python 3 application, then the plethora of regional cloudprovider developers and hardware vendor plugin developers employedto work on it will all be learning Python 3 rather than Python 2.
A notable early contribution to adoption has been the education community’sstaunch advocacy for the wider Python community to catch up with them inembracing Python 3, rather than confusing their students with occasionalrecommendations to learn Python 2 directly, rather than learning Python 3first.
As far as the scientific community goes, they were amongst the earliestadopters of Python 3 - I assume the reduced barriers to learnability weresomething they appreciated, and the Unicode changes were not a problem thatcaused them significant trouble.
I think the web development community has certainly had the roughest time ofit. Not only were the WSGI update discussions long and drawn out (and asdraining as any standards setting exercise), resulting in a compromisesolution that at least works but isn’t simple to deal with, but they’re alsothe most directly affected by the additional challenges faced when workingdirectly with binary data in Python 3. However, even in the face of theseissues, the major modern Python web frameworks, libraries and databaseinterfacesdo support Python 3, and the return of binary interpolationsupport in Python 3.5 addressed some of the key concerns raised by thedevelopers of the Twisted networking library.
The adoption ofasyncio
asthe standard framework for asynchronous IO andthe subsequent incorporation of first class syntactic support for coroutineshave also helped the web development community resolve a long standing issuewith a lack of a standard way for web servers and web frameworks to communicateregarding long lived client connections (such as those needed for WebSocketssupport), providing a clear incentive for migration to Python 3.3+ thatdidn’t exist with earlier Python 3 versions.
As of 2015, the Python community as a whole had had more than 15 yearsto get used to the Python 2 way of handling Unicode. By contrast, for Python 3,we’d only had a production ready release available for just over 5 years,and since some of the heaviest users of Unicode are the webframework developers, and they’d only had a stable WSGI target since therelease of 3.2, you could drop that down to just under 5 years of intensiveuse by a wide range of developers with extensive practical experiencingin handling Unicode (we have someexcellent Unicode developers in thecore team, but feedback from a variety of sources is invaluable for achange of this magnitude).
That feedback has already resulted in major improvements in theUnicode support for the Python 3.2, 3.3, 3.4, 3.5, 3.6, and 3.7releases. With thecodecs
andemail
modules being brought intoline, the Python 3.4 release was the first one where the transitionfelt close to being “done” to me in terms of coping with the fullimplications of a strictly enforced distinction between binary andtext data in the standard library, while Python 3.5 revisited some ofthe earlier design decisions of the Python 3 series and changed someof them based on several years of additional experience. Python 3.6brought some major changes to the way binary system APIs are handledon Windows, and changes of similar scope in 3.7 improved support onnon-Windows systems.
While I’m optimistic that the system boundary handling changes proposed forPython 3.7 will resolve the last of the major issues, I nevertheless expectthat feedback process will continue throughout the 3.x series, since “mostlydone” and “done” aren’t quite the same thing, and attempting to closelyintegrate with POSIX systems that may be using ASCII incompatible encodingswhile using a text model with strict binary/text separation hasn’t reallybeen done before at Python’s scale (the JVM is UTF-16 based, but bypassesmost OS provided services, while other tools often choose the approach ofjust assuming that all bytes are UTF-8 encoded, regardless of what theunderlying OS claims).
In addition to the cases where blurring the binary/text distinction reallydid make things simpler in Python 2, we’re also forcing even developers instrict ASCII-only environments to have to care about Unicode correctness,or else explicitly tell the interpreter not to worry about it. This meansthat Python 2 users that may have previously been able to ignore Unicodeissues may need to account for them properly when migrating to Python 3.
I’ve written more extensively on both of these topics inPython 3 and ASCII Compatible Binary Protocols andProcessing Text Files in Python 3, whilePEP 538 andPEP 540 go into detail on the system boundary changes now being proposedfor Python 3.7.
The long march from the early assumptions of Anglocentric ASCII basedcomputing to a more global Unicode based future is still ongoing, both forthe Python community, and the computing world at large. Computers are stillgenerally much better at dealing with English and other languages withsimilarly limited character sets than they are with the full flexibility ofhuman languages, even the subset that has been pinned down to a particularbinary representation thanks to the efforts of the Unicode Consortium.
While the changes to the core text model in Python 3did implicitlyaddress many of the Unicode issues affecting Python 2, there are stillplenty of Unicode handling issues that require their own independentupdates. One recurring problem is that many of these are relativelyeasy to work around (such as by using a graphical environment ratherthan the default interactive interpreter to avoid the command linelimitations on Windows), but comparatively hard to fix properly (andthen get agreement that the proposed fix is a suitable one).
The are also more specific questions covering the state of theWSGImiddleware interface for web services, and the issues thatcan arise when dealing withWhat’s up with POSIX systems in Python 3?.
I believe so, yes, especially if teaching folks that aren’t native Englishspeakers. However, I also expect a lot of folks will stillwant to continue on and learn Python 2 even if they learn Python 3 first- I just think that for people that don’t already know C, it will beeasier to start with Python 3, and then learn Python 2 (and the relevantparts of C) in terms of the differences from Python 3 rather thanlearning Python 2 directly and having to learn all those legacy detailsat the same time as learning to program in the first place.
Note
This answer was written for Python 3.5. For Python 3.6, otherpotential benefits in teaching beginners include the new f-stringformatting syntax, the secrets module, the ability to include underscoresto improve the readability of long numeric literals, and the ordering ofarbitrary function keyword arguments reliably matching the order in whichthey’re supplied to the function call.
As noted above, Python 2 has some interesting quirks due to its C heritageand the way the language has evolved since Guido first created Python in1991. These quirks then have to be taught toevery new Python user sothat they can avoid them. The following are examples of such quirks thatare easy to demonstrate in an interactive session (and resist the temptationto point out that these can all be worked around - for teaching beginners,it’s the default behaviour that matters, not what experts can instruct theinterpreter to do with the right incantations elsewhere in the program).
You can get unexpected encoding errors when attempting to decode values andunexpected decoding errors when attempting to encode them, due to thepresence of decode and encode methods on bothstr
andunicode
objects, but more restrictive input type expectations for the underlyingcodecs that then trigger the implicitASCII based encoding or decoding:
>>>u"\xe9".decode("utf-8")Traceback (most recent call last): File"<stdin>", line1, in<module> File"/usr/lib64/python2.7/encodings/utf_8.py", line16, indecodereturncodecs.utf_8_decode(input,errors,True)UnicodeEncodeError:'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)>>>b"\xe9".encode("utf-8")Traceback (most recent call last): File"<stdin>", line1, in<module>UnicodeDecodeError:'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
Python 2 has a limited and inconsistent understanding of character setsbeyond those needed to record English text:
>>>è=1 File"<stdin>", line1è=1^SyntaxError:invalid syntax>>>print("è")è
That second line usually works in the interactive interpreter, but won’t workby default in a script:
$ echo 'print("è")' > foo.py$ python foo.py File "foo.py", line 1SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
The handling of Unicode module names is also inconsistent:
$ echo "print(__name__)" > è.py$ python -m è__main__$ python -c "import è" File "<string>", line 1 import è ^SyntaxError: invalid syntax
Beginners are often surprised to find that Python 2 can’t do basicarithmetic correctly:
>>>3/40
Can be bemused by the fact that Python 2 interprets numbers strangelyif they have a leading zero:
>>>0777511
And may also eventually notice that Python 2 has two different kinds ofinteger:
>>>type(10)istype(10**100)False>>>type(10)istype(10L)False>>>1010>>>10L10L
Theprint
statement is weirdly different from normal function calls:
>>>print1,2,31 2 3>>>print(1,2,3)(1, 2, 3)>>>print1;print2;print3123>>>print1,;print2,;print31 2 3>>>importsys>>>print>>sys.stderr,1,2,31 2 3
And theexec
statement also differs from normal function calls likeeval
andexecfile
:
>>>d={}>>>exec"x = 1"ind>>>d["x"]1>>>d2={"x":[]}>>>eval("x.append(1)",d2)>>>d2["x"][1]>>>withopen("example.py","w")asf:...f.write("x = 1\n")...>>>d3={}>>>execfile("example.py",d3)>>>d3["x"]1
Theinput
builtin has some seriously problematic default behaviour:
>>>input("This is dangerous: ")This is dangerous: __import__("os").system("echo you are in trouble now")you are in trouble now0
Theopen
builtin doesn’t handle non-ASCII files correctly (you have tousecodecs.open
instead), although this often isn’t obvious on POSIXsystems (where passing the raw bytes through the way Python 2 does oftenworks correctly).
You need parentheses to catch multiple exceptions, but forgetting that isan error that passes silently:
>>>try:...1/0...exceptTypeError,ZeroDivisionError:...print("Exception suppressed")...Traceback (most recent call last): File"<stdin>", line2, in<module>ZeroDivisionError:integer division or modulo by zero>>>try:...1/0...except(TypeError,ZeroDivisionError):...print("Exception suppressed")...Exception suppressed
And if you make a mistake in an error handler, you’ll lose the originalerror:
>>>try:...1/0...exceptException:...logging.exception("Something went wrong")...Traceback (most recent call last): File"<stdin>", line4, in<module>NameError:name 'logging' is not defined
Python 2 also presents users with a choice between two relativelyunattractive alternatives for calling up to a parent class implementationfrom a subclass method:
classMySubclass(Example):defexplicit_non_cooperative(self):Example.explicit_non_cooperative(self)defexplicit_cooperative(self):super(MySubclass,self).explicit_cooperative()
List comprehensions are one of Python’s most popular features, yet theycan have surprising side effects on the local namespace:
>>>i=10>>>squares=[i*iforiinrange(5)]>>>i4
Python 2 is still a good language despite these flaws, but users that arehappy with Python 2 shouldn’t labour under the misapprehension that thelanguage is perfect. We have made mistakes, and Python 3 came about becauseGuido and the rest of the core development team finally became tired ofmaking excuses for those limitations, and decided to start down the longroad towards fixing them instead.
All of the above issues have been addressed by backwards incompatiblechanges in Python 3. Once we had made that decision, then adding othernew featurestwice (once to Python 3 and again to Python 2) imposedsignificant additional development effort, although wedid do so for anumber of years (the Python 2.6 and 2.7 releases were both developed inparallel with Python 3 releases, and include many changes originally createdfor Python 3 that were backported to Python 2 since they were backwardscompatible and didn’t rely on other Python 3 only changes like the new,more Unicode friendly, IO stack).
I’ll give several examples below of how the above behaviours have changed inPython 3 releases, up to and including Python 3.6 (since that’s the currentlyreleased version).
In Python 3, the codec related builtin convenience methods arestrictlyreserved for use with text encodings. Accordingly, text objects no longereven have adecode
method, and binary types no longer have anencode
method:
>>>u"\xe9".decode("utf-8")Traceback (most recent call last): File"<stdin>", line1, in<module>AttributeError:'str' object has no attribute 'decode'>>>b"\xe9".encode("utf-8")Traceback (most recent call last): File"<stdin>", line1, in<module>AttributeError:'bytes' object has no attribute 'encode'
In addition to the above changes, Python 3.4 includedadditional changesto the codec systemto help with more gently easing users into the idea that there are differentkinds of codecs, and only some of them are text encodings. It also updatesmany of the networking modules to make secure connections much simpler.
Python 3 also has a much improved understanding of character sets beyondEnglish:
>>>è=1>>>è1
And this improved understanding extends to the import system:
$ echo "print(__name__)" > è.py$ python3 -m è__main__$ python3 -c "import è"è
Python 3 has learned how to do basic arithmetic, replaces the surprising Cnotation for octal numbers with the more explicit alternative supportedsince Python 2.6 and only has one kind of integer:
>>>3/40.75>>>0777 File"<stdin>", line10777^SyntaxError:invalid token>>>0o777511>>>type(10)istype(10**100)True>>>1010>>>10L File"<stdin>", line110L^SyntaxError:invalid syntax
print
is now just an ordinary function that accepts keyword arguments,rather than having its own custom (and arcane) syntax variations (notethat controlling the separator between elements is a feature thatrequires preformatting of the string to be printed in Python 2 but wastrivial to add direct support for when print was converted to an ordinarybuiltin function rather than being a separate statement):
>>>print1,2,3 File"<stdin>", line1print1,2,3^SyntaxError:invalid syntax>>>print(1,2,3)1 2 3>>>print((1,2,3))(1, 2, 3)>>>print(1);print(2);print(3)123>>>print(1,2,3,sep="\n")123>>>print(1,end=" ");print(2,end=" ");print(3)1 2 3>>>importsys>>>print(1,2,3,file=sys.stderr)1 2 3
exec
is now more consistent withexecfile
:
>>>d={}>>>exec("x=1",d)>>>d["x"]1
Convertingprint
andexec
to builtins rather than statements meansthey now also work natively with utilities that require real functionobjects (likemap
andfunctools.partial
), they can be replacedwith mock objects when testing and they can be more readily substitutedwith alternative interfaces (such as replacing raw print statements with apretty printer or a logging system). It also means they can be passed tothe builtinhelp
function without quoting, the same as other builtins.
Theinput
builtin now has the much safer behaviour that is provided asraw_input
in Python 2:
>>>input("This is no longer dangerous: ")This is no longer dangerous: __import__("os").system("echo you have foiled my cunning plan")'__import__("os").system("echo you have foiled my cunning plan")'
The entire IO stack has been rewritten in Python 3 to natively handleUnicode and (in the absence of system configuration errors), to favourUTF-8 by default rather than ASCII. Unlike Python 2,open()
in Python 3natively supportsencoding
anderrors
arguments, and thetokenize.open()
function automatically handles Python source fileencoding cookies.
Failing to trap an exception is no longer silently ignored:
>>>try:...1/0...exceptTypeError,ZeroDivisionError: File"<stdin>", line3exceptTypeError,ZeroDivisionError:^SyntaxError:invalid syntax
And most errors in exception handlers will now still report the originalerror that triggered the exception handler:
>>>try:...1/0...exceptException:...logging.exception("Something went wrong")...Traceback (most recent call last): File"<stdin>", line2, in<module>ZeroDivisionError:division by zeroDuring handling of the above exception, another exception occurred:Traceback (most recent call last): File"<stdin>", line4, in<module>NameError:name 'logging' is not defined
Note that implicit exception chaining is the thing I miss most frequentlywhen working in Python 2, and the point I consider the single biggest gainover Python 3 when migratingexisting applications - there are few thingsmore irritating when debugging a rare production failure than losing thereal problem details due to a secondary failure in a rarely invoked errorpath.
While you probably don’t want to know how it works internally, Python 3also provides a much cleaner API for calling up to the parent implementationof a method:
classMySubclass(Example):defimplicit_cooperative(self):super().implicit_cooperative()
And, like generator expressions in both Python 2 and Python 3, listcomprehensions in Python 3 no longer have any side effects on thelocal namespace:
>>>i=10>>>squares=[i*iforiinrange(5)]>>>i10
The above improvements are all changes thatcouldn’t be backported to ahypothetical Python 2.8 release, since they’re backwards incompatible withsome (but far from all) existing Python 2 code, mostly for obvious reasons.The exception chaining isn’t obviously backwards incompatible, but stillcan’t be backported due to the fact that handling the implications ofcreating a reference cycle between caught exceptions and the executionframes referenced from their tracebacks involved changing the lifecycleof the variable named in an “as” clause of an exception handler (to breakthe cycle, those names are automatically deleted at the end of the relevantexception handler in Python 3 - you now need to bind the exception to adifferent local variable name in order to keep a valid reference afterthe handler has finished running). The list comprehension changes are alsobackwards incompatible in non-obvious ways (since not only do they nolonger leak the variable, but the way the expressions access the containingscope changes - they’re now full closures rather than running directlyin the containing scope).
As documented inPEP 466, the networking security changes were deemedworthy of backporting. In contrast, while it’s perhapspossible to backportthe implicit super change, it would need to be separated from the otherbackwards incompatible changes to the type system machinery (and in thatcase, there’s no “help improve the overall security of the internet” argumentto be made in favour of doing the work).
There are some other notable changes in Python 3 that are of substantialbenefit when teaching new users (as well as for old hands), that technicallycould be included in a Python 2.8 release if the core development chose tocreate one, but in practice such a release isn’t going to happen. However,folks interested in that idea may want to check out theTauthon project,which is a Python 2/3 hybrid language that maintains full Python 2.7compatibility while backporting backwards compatible enhancement from thePython 3 series.
PEP 3151 means that Python 3.3+ has a significantly more sensible systemfor catching particular kinds of operating system errors. Here’s the racecondition free way to detect a missing file in Python 2.7:
>>>importerrno>>>try:...f=open("This does not exist")...exceptIOErroraserr:...iferr.errno!=errno.ENOENT:...raise...print("File not found")...File not found
And here’s the same operation in Python 3.3+:
>>>try:...f=open("This does not exist")...exceptFileNotFoundError:...print("File not found")...File not found
(If you’re opening the file for writing, then you can useexclusive modeto prevent race conditions without using a subdirectory - Python 2 has noequivalent. There are many other cases where Python 3 exposes operatingsystem level functionality that wasn’t broadly available when the featureset for Python 2.7 was frozen in April 2010).
Another common complaint with Python 2 is the requirement to use empty__init__.py
files to indicate a directory is a Python package, and thecomplexity of splitting a package definition across multiple directories.By contrast, here’s an example of how to split a package across multipledirectories in Python 3.3+ (note the lack of__init__.py
files). Whiletechnically this can be backported, the implementation depends on the newpure Python implementation of the import system, which in turn depends onthe Unicode friendly IO stack in Python 3, so backporting it is farfrom trivial:
$ mkdir -p dir1/nspkg$ mkdir -p dir2/nspkg$ echo 'print("Imported submodule A")' > dir1/nspkg/a.py$ echo 'print("Imported submodule B")' > dir2/nspkg/b.py$ PYTHONPATH=dir1:dir2 python3 -c "import nspkg.a, nspkg.b"Imported submodule AImported submodule B
That layout doesn’t work at all in Python 2 due to the missing__init__.py
files, and even if you add them, it still won’t findthe second directory:
$ PYTHONPATH=dir1:dir2 python -c "import nspkg.a, nspkg.b"Traceback (most recent call last): File "<string>", line 1, in <module>ImportError: No module named nspkg.a$ touch dir1/nspkg/__init__.py$ touch dir2/nspkg/__init__.py$ PYTHONPATH=dir1:dir2 python -c "import nspkg.a, nspkg.b"Imported submodule ATraceback (most recent call last): File "<string>", line 1, in <module>ImportError: No module named b
That last actually shows another limitation in Python 2’s error handlingsince import failures don’t always show the full name of the missingmodule. That is fixed in Python 3:
$ PYTHONPATH=dir1 python3 -c "import nspkg.a, nspkg.b"Imported submodule ATraceback (most recent call last): File "<string>", line 1, in <module>ModuleNotFoundError: No module named 'nspkg.b'
That said: Eric Snowhas now backported the Python 3.4 import systemto Python 2.7 asimportlib2.I’m aware of at least one large organisation using that in production andbeing quite happy with the results :)
Python 3.3 also included someminorimprovements to the error messagesproduced when functions and methods are called with incorrect arguments.
The feature set for Python 2.7 was essentially locked in April 2010 with thefirst beta release. Since then, with a very limited number of exceptionsrelated to network security, the Python core development team have only beenadding new features directly to the Python 3 series. These new features areinformed both by our experience with Python 3 itself, as well as with ourongoing experience working with Python 2 (as they’re still very similarlanguages).
As Python 2 is a mature, capable language, with a rich library of supportmodules available from the Python Package Index (including many backportsfrom the Python 3 standard library), there’s no one universally importantfeature that will provide a compelling argument to switch forexistingPython 2 users. Of necessity, existing Python 2 users are those whodidn’t find the limitations of Python 2 that lead to the creation of Python3 particularly problematic. It is for the benefit of these users that Python2 continues to be maintained.
Fornew users of Python however, Python 3 represents years of additionalwork above and beyond what was included in the Python 2.7 release. Featuresthat may require third party modules, or simply not be possible at all inPython 2, are provided by default in Python 3. This answer doesn’t attemptto provide an exhaustive list of such features, but does aim to provide anillustrative overview of the kinds of improvements that have been made.TheWhat’s New guides for thePython 3 series (especially the 3.3+ releases that occurred after thePython 2 series was placed in long term maintenance) provide morecomprehensive coverage.
While I’ve tried to just hit some highlights in this list, it’s still ratherlong. The full What’s New documents are substantially longer.
Note
This answer was written for Python 3.5. For Python 3.6, some othernotable enhancements include the new f-string formatting syntax, the secretsmodule, the ability to include underscores to improve the readability oflong string literals, changes to preserve the order of class namespacesand function keyword arguments, type hints for named variables, and more.
Some changes that are likely to affect most projects are error handlingrelated:
the exception hierarchy for operating system errors is now based on whatwent wrong, rather than which module detected the failure (seePEP 3151for details).
bugs in error handling code no longer hide the original exception (whichcan be a huge time saver when it happens to hard to reproduce bugs)
by default, if the logging system is left unconfigured, warnings andabove are written to sys.stderr, while other events are ignored
the codec system endeavours to ensure the codec name always appears in thereported error message when the underlying call fails
the error messages from failed argument binding now do a much better jobof describing the expected signature of the function
the socket module takes advantage of the new enum support to includeconstant names (rather than just numeric values) in the error messageoutput
starting in Python 3.5, all standard library modules making system callsshould handle EINTR automatically
Unicode is more deeply integrated into the language design, along with aclearer separation between binary and text data:
theopen()
builtin natively supports decoding of text files (ratherthan having to usecodecs.open()
instead)
thebytes
type provides locale independent manipulation of binary datathat may contain ASCII segments (the Python 2str
type has localedependent behaviour for some operations)
the codec system has been separated into two tiers. Thestr.encode()
,bytes.decode()
andbytearray.decode()
methods provide directaccess to Unicode text encodings, while thecodecs
module providesgeneral access to all available codecs, including binary->binary andtext->text transforms (in Python 2, all three kinds can be accessed throughthe convenience methods on the builtin types, creating ambiguity as to theexpected return types of the affected methods)
data received from the operating system is automatically decoded to textwhenever possible (this does cause integration issues in some cases whenthe OS provides incorrect configuration data, but otherwise allowsapplications to ignore more cross-platform differences in whether OS APIsnatively use bytes or UTF-16)
identifiers and the import system are no longer limited to ASCII text(allowing non-English speakers to use names in their native languageswhen appropriate)
Python 3 deliberately has no equivalent to the implicit ASCII baseddecoding that takes place in Python 2 when an 8-bitstr
objectencounters aunicode
object (note that disabling this implicitconversion in Python 2, while technically possible, is not typicallyfeasible, as turning it off breaks various parts of the standard library)
Python 3.3+ now correctly handles code points outside the basicmultilingual plane without needing to use 4 bytes per code point for allUnicode data (as Python 2 does)
A few new debugging tools are also provided out of the box:
faulthandler
allows the generation of Python tracebacks forsegmentation faults and threading deadlocks (including a-Xfaulthandler
command line option to debug arbitrary scripts)
tracemalloc
makes it possible to track where objects wereallocated and obtain a traceback summary for those locations (this relieson the dynamic memory allocator switching feature added in Python 3.4 andhence cannot be backported to Python 2 without patching the interpreterand building from source
thegc
module now provides additional introspection and hook APIs
The concurrency support has been improved in a number of ways:
The native coroutine syntax added in Python 3.5 is substantially moreapproachable than the previous “generators-as-coroutines” syntax (as itavoids triggering iterator based intuitions that aren’t actually helpful inthe coroutine case)
asyncio
(and the supportingselectors
module) providesgreatly enhanced native support for asynchronous IO
concurrent.futures
provides straightforward support for dispatchingwork to separate working processes or threads
multiprocessing
is far more configurable (including the option toavoid relying onos.fork
on POSIX systems, making it possible to avoidthe poor interactions with between threads andos.fork
, while stillusing both multiple processes and threads)
the CPython Global Interpreter Lock has been updated to switch contextsbased on absolute time intervals, rather than by counting bytecodeexecution steps (context switches will still occur between bytecodeboundaries)
For data analysis use cases, there’s one major syntactic addition:
Python 3.5 added a new binary operator symbol specifically for use in matrixmultiplication
Notable additions to the standard library’s native testing capabilitiesinclude:
theunittest.mock
module, previously only available as a third partylibrary
a “subtest” feature that allows arbitrary sections of a test to be reportedas independent results (including details on what specific values weretested), without having to completely rewrite the test to fit into aparameterised testing framework
a newFAIL_FAST
option fordoctest
that requests stopping thedoctest at the first failing test, rather than continuing on to run theremaining tests
Performance improvements include:
significant optimisation work on various text encodings, especially UTF-8,UTF-16 and UTF-32
a significantly more memory efficient Unicode representation, especiallycompared to the unconditional 4 bytes per code point used in Linux distrobuilds of Python 2
a C accelerator module for thedecimal
module
transparent use of other C accelerator modules where feasible (includingforpickle
andio
)
therange
builtin is now a memory efficient calculated sequence
the use of iterators or other memory efficient representations for variousother builtin APIs that previously returned lists
dictionary instances share their key storage when possible, reducing theamount of memory consumed by large numbers of class instances
the rewritten implementation of the import system now caches directorylistings for a brief time rather than blindly performingstat
operations for all possible file names, drastically improving startupperformance when network filesystems are present onsys.path
Security improvements include:
support for “exclusive mode” when opening files
support for the directory file descriptor APIs that avoid various symlinkbased attacks
switching the default hashing algorithm for key data types to SIPHash
providing an “isolated mode” command line switch to help ensure usersettings don’t impact execution of particular commands
disabling inheritance of file descriptors and Windows handles by childprocesses by default
new multiprocessing options that avoid sharing memory with child processby avoiding theos.fork
system call
significant improvements to the SSL module, such as TLS v1.1 and v1.2support, Server Name Indication support, access to platform certificatestores, and improved support for certificate verification (while theseare in the process of being backported to Python 2.7 as part ofPEP 466,it is not yet clear when that process will be completed, and thoseenhancements are already available in Python 3 today)
other networking modules now take advantage of many of the SSL moduleimprovements, including making it easier to use the newssl.create_default_context()
to choose settings that default toproviding reasonable security for use over the public internet, rathermaximising interoperability (but potentially allowing operation in nolonger secure modes)
thesecrets module added in 3.6
Object lifecycle and resource management has also improved significantly:
the cyclic garbage collector is now more aggressive in attempting tocollect cycles, even those containing__del__
methods. This eliminatedsome cases where generators could be flagged as uncollectable (and henceeffectively leak memory)
this means most objects will now have already been cleaned up before thelast resort “set module globals to None” step triggers during shutdown,reducing spurious tracebacks when cleanup code runs
the newweakref.finalize()
API makes it easier to register weakrefcallbacks without having to worry about managing the lifecycle of thereference itself
many more objects in the standard library now support the contextmanagement protocol for explicit lifecycle and resource management
Other quality of life improvements include:
__init__.py
files are no longer needed to declare packages - if nofoo/__init__.py
file is present, then all directories namedfoo
onsys.path
will be automatically scanned forfoo
submodules
the newsuper
builtin makes calling up to base class methodimplementations in a way that supports multiple inheritance relativelystraightforward
keyword only arguments make it much easier to add optional parameters tofunctions in a way that isn’t error prone or hard to read
theyieldfrom
syntax for delegating to subgenerators and iterators(this is a key part of theasyncio
coroutine support)
iterable unpacking syntax is now more flexible
zipapp
for bundling pure Python applications into runnable archives
enum
for creating enumeration types
ipaddress
for working with both IPv4 and IPv6 addresses
pathlib
for a higher level filesystem abstraction than the lowlevel interface provided byos.path
statistics
for a simple high school level statistics library(mean, median, mode, variance, standard deviation, etc)
datetime.timestamp()
makes it easy to convert a datetime object to aUNIX timestamp
time.get_clock_info()
and related APIs provide access to a richcollection of cross platform time measurement options
venv
provides virtual environment support out of the box, in a waythat is better integrated with the core interpreter than is possible inPython 2 with onlyvirtualenv
available
ensurepip
ensurespip
is available by default in Python 3.4+installations
memoryview`
is significantly more capable and reliable
the caching mechanism for pyc files has been redesigned to betteraccommodate sharing of Python files between multiple Python interpreters(whether different versions of CPython, or other implementation like PyPyand Jython)
as part of that change, implicitly compiled bytecode cache files arewritten to __pycache__ directories (reducing directory clutter) and areignored if the corresponding source file has been removed (avoiding obscureerrors due to stale cached bytecode files)
types.SimpleNamespace
andtypes.MappingProxyType
aremade available at the Python layer
improved introspection support, based on theinspect.signature()
API,and its integration intopydoc
, allowing accurate signatureinformation to be reported for a much wider array of callables than justactual Python function objects
defining__eq__
without also defining__hash__
implicitly disableshashing of instances, avoiding obscure errors when such types were addedto dictionaries (you now get an error about an unhashable type when firstadding an instance, rather than obscure data driven lookup bugs later)
ordered comparisons between objects of different types are now disallowedby default (again replacing obscure data driven errors with explicitexceptions)
Some more advanced higher order function manipulation and metaprogrammingcapabilities are also readily available in Python 3:
thefunctools.partialmethod()
function makes it straightforward todo partial function application in a way that still allows the instanceobject to be supplied later as a positional argument
thefunctools.singledispatch()
decorator makes it easy to creategeneric functions that interoperate cleanly with Python’s type system,including abstract base classes
thecontextlib.ExitStack
class makes it easy to manipulatecontext managers dynamically, rather than having to rely on explicituse of with statements
The new__prepare__
method, and associated functions in thetypes
module makes it possible for metaclasses to better monitor what happensduring class body execution (for example, by using an ordered dictionaryto record the order of assignments)
the updated import system permits easier creation of custom import hooks.In particular, the“source to code”translation step can be overridden, while reusing the rest of the importmachinery (including bytecode caching) in a custom import hook
thedis.Bytecode
API and related functionality makes it easier towork with CPython bytecode
Various improvements in Python 3 also permitted some significantdocumentation improvements relative to Python 2:
as the Python 3 builtin sequences are more compliant with theircorresponding abstract base classes, it has proved easier to flesh outtheir documentation to cover all the additional details that have beenintroduced since those docs were originally written
the final removal of the remnants of the legacy import system in Python3.3 made it feasible to finally document the import system mechanicsin thelanguage reference
While many of these featuresare available in Python 2 with appropriatedownloads from the Python Package Index, not all of them are, especiallythe various changes to the core interpreter and related systems.
While Python 2 does still have a longer tail of esoteric modules availableon PyPI, most popular third party modules and frameworks either support both,have alternatives that support Python 3. or can be relatively easily portedusing tools likefuturize
(part ofpython-future). The3to2
project, and thepasteurize
tool (also part ofpython-future)offer options for migrating a pure Python 3 application to the large commonsubset of Python 2 and Python 3 if a critical Python 2 only dependency isidentified, and it can’t be invoked in a separate Python 2 process, or costeffectively ported to also run on Python 3.
With Python 3 software collections available for both Red Hat EnterpriseLinux and CentOS, Ubuntu including a fully supported Python 3 stack in itslatest LTS release, and Continuum Analytics releasing Anaconda3 (a Python 3based version of their scientific software distribution), the number of caseswhere using Python 2 is preferable to using Python 3 is dwindling to thosewhere:
for some reason, an application absolutely needs to run in the systemPython on Red Hat Enterprise Linux or CentOS (for example, depending on anOS level package that isn’t available from PyPI, or needing a complexbinary dependency that isn’t available for the Python 3 software collectionand not being permitted to add additional dependencies from outside thedistro)
the particular application can’t tolerate the current integration issueswith the POSIX C locale or the Windows command line in environments thatactually need full Unicode support
there’s a critical Python 2 only dependency that is known before theproject even starts, and separating that specific component out to its ownPython 2 process while writing the bulk of the application in Python 3isn’t considered an acceptable architecture
Note
This answer was written for Python 3.5, and has onlypartially been updated for 3.7 and later. For instance,PEP 461, an accepted proposal to restore support forbinary interpolation that is to be source and semanticallycompatible for the use cases we actually want to support inPython 3, was finalized in Python 3.5.
At this point in time, not quite. Python 3.5 comes much closer to thisthan Python 3.4 (which in turn was closer than 3.3, etc), but there arestill some use cases that are more convenient in Python 2 because it handlesthem by default, where Python 3 needs some additional configuration, or evenseparate code paths for things that could be handled by a common algorithm inPython 2.
In particular, many binary protocols include ASCII compatible segments,so it is sometimes convenient to treat them as text strings. Python 2 makesthis easier in many cases, since the 8-bitstr
type blurs the boundarybetween binary and text data. By contrast, if you want to treat binary datalike text in Python 3 in a way that isn’t directly supported by thebytes
type, you actually need to convert it to text first, andmake conscious decisions about encoding issues that Python 2 largely letsyou ignore. I’ve written a separate essay specifically about this point:Python 3 and ASCII Compatible Binary Protocols.
Python 3 also requires a bit of additional up front design work whenaiming to handle improperly encoded data. This also has its own essay:Processing Text Files in Python 3.
The Python 3 model also required more complex impedance matching on POSIXplatforms, which is covered by a separate question:What’s up with POSIX systems in Python 3?.
Until Python 3.4, the Python 3 codec system also didn’t cleanly handlethe transform codecs provided as part of the standard library. Python 3.4includes several changes to the way these codecs are handled that nudgeusers towards the type neutral APIs in the codecs module when they attemptto use them with the text encoding specific convenience methods on thebuiltin types.
Another change that has yet to be fully integrated is the switch toproducing dynamic views from thekeys
,values
anditems
methods of dict objects. It currently isn’t easy to implement fullyconformant versions of those in pure Python code, so many alternatemapping implementations in Python 3 don’t worry about doing so - theyjust produce much simpler iterators, equivalent to theiterkeys
,itervalues
anditeritems
methods from Python 2.
Some of the changes in Python 3 designed for the benefit of largerapplications (like the increased use of iterators), or for improvedlanguage consistency (like changing print to be a builtin functionrather than a statement) are also less convenient at the interactiveprompt.map
, for example, needs to be wrapped in alist
callto produce useful output in the Python 3 REPL, since by default itnow just creates an iterator, without actually doing any iteration. InPython 2, the fact it combined both defining the iteration and actuallydoing the iteration was convenient at the REPL, even though it oftenresulted in redundant data copying and increased memory usage in actualapplication code.
Having to type the parentheses when using print is mostly an irritationfor Python 2 users that need to retrain their fingers. I’ve personallyjust trained myself to only use the single argument form (with parentheses)that behaves the same way in both Python 2 and 3, and use string formattingfor anything more complex (or else just print the tuple when using thePython 2 interactive prompt). However, I alsocreated a patch that proves it is possible toimplement a general implicit call syntax within the constraints ofCPython’s parsing rules. Anyone that wishes to do so is free to take thatpatch and turn it into a full PEP that proposes the addition of ageneral implicit call syntax to Python 3.5 (or later). While such a PEPwould need to address the ambiguity problems noted on the tracker issues(likely by restricting the form of the expression used in an implicitcall to only permit unqualified names), it’s notable that the popular IPythoninteractive interpreter already provides this kind of implicit “autocall”behaviour by default, and many other languages provide a similar “noparentheses, parameters as suffix” syntax for statements that consist ofa single function call.
Thanks are due especially to Armin Ronacher for describing several of theseissues in fine detail when it comes to the difficulties they posespecifically when writing wire protocol handling code in Python 3. Hisfeedback has been invaluable to me (and others) in attempting to makePython 3 more convenient for wire protocol development without reverting tothe Python 2 model that favoured wire protocol development over normalapplication development (where binary data should exist only at applicationboundaries and be converted to text or other structured data for internalprocessing). There’s still plenty of additional improvements that could bemade for Python 3.8 and later, though. Possible avenues for improvementpreviously discussed on python-dev, python-ideas or the CPython issuetracker include:
PEP 467 is a draft proposal to clean up some of the legacy of theoriginal Python 3 mutablebytes
design. A related change is to betterdocument the tuple-of-ints and list-of-ints behaviour ofbytes
andbytearray
.
taking the internal “text encoding” marking system added in Python 3.4and giving either it or a more general codec type description system apublic API for use when developing custom codecs.
making it easier to register custom codecs (preferably making use ofthe native namespace package support added in Python 3.3).
introducing a string tainting mechanism that allows strings containingsurrogate escaped bytes to be tagged with their encoding assumption andinformation about where the assumption was introduced. Attempting toprocess strings with incompatible encoding assumptions would then reportboth the incompatible assumptions and where they were introduced.
creating a “strview” type that uses memoryview to provide a str-likeinterface to arbitrary binary buffers containing ASCII compatibleprotocol data.
The process of developing and updating standards can be slow, frustratingand often acrimonious. One of the key milestones in enabling Python 3adoption was when the web framework developers and web server developerswere able to agree on an updated WSGI 1.1 specification that at leastmakes itpossible to write WSGI applications, frameworks and middlewarethat support Python 2 and Python 3 from a single source code base, eventhough it isn’t necessarily easy to do so correctly.
In particular, the Python 2str
type was particular well suited tohandling the “data in unknown ASCII compatible encoding” that is commonin web protocols, and included in the data passed through from the webserver to the application (and vice versa). At this point in time(March 2014), nobody has created a type for Python 3 that is similarlywell suited to manipulating ASCII compatible binary protocol data. Therecertainly wasn’t any such type available for consideration when WSGI 1.1was standardised in October 2010.
As a result, the “least bad” option chosen for those fields in the Python 3version of the WSGI protocol was to publish them to the web applicationaslatin-1
decoded strings. This means that applications need to treatthese fields as wire protocol data (even though they claim to be textbased on their type), encode them back to bytes aslatin-1
and then decode them again using thecorrect encoding (as indicatedby other metadata).
The WSGI 1.1 spec is definitely a case of a “good enough” solution winninga battle of attrition. I’m actually hugely appreciative of the webdevelopment folks that put their time and energy both into creating theWSGI 1.1 specificationand into updating their tools to support it. Likethe Python core developers, most of the web development folks weren’t ina position to use Python 3 professionally during the early years of itsdevelopment, butunlike most of the core developers, the kind of code theywrite falls squarely into the ASCIIcompatible binary protocol space where Python 3 still had some significantground to make up relative to Python 2 in terms of usability (althoughwe’ve also converted our share of such code, just in bringing the standardlibrary up to scratch).
Note
This answer was written for Python 3.5. SeePEP 538 andPEP 540for discussion of some key changes now being considered for Python 3.7.
The fact that the Python 2 text model was essentially the POSIX text modelwith Unicode support bolted on to the side meant that interoperabilitybetween Python 2 and even misconfigured POSIX systems was generally quitestraightforward - if the implicit decoding as ASCII never triggered (whichwas likely for code that only included 8-bit strings and never explicitlydecoded anything as Unicode), non-ASCII data would silently pass throughunmodified.
One option we considered was to just assume everything was UTF-8 by default,similar to the choice made by the Windows .NET platform, the GNOME GUItoolkit and other systems. However, we decided that posed an unacceptablerisk of silently corrupting user’s data on systems thatwere properlyconfigured to use an encoding other than UTF-8 (this concern was raisedprimarily by contributors based in Europe and Asia).
This was a deliberate choice of attempting to be compatible with othersoftware on the end user’s system at the cost of increased sensitivity toconfiguration errors in the environment and differences in defaultbehaviour between environments with different configurations. There are alsocurrent technical limitations in the reference interpreter’s startup codethat force us to rely on the locale encoding claimed by the operating systemon POSIX systems.
PEP 383 added the surrogateescape error handler to cope with the fact thatthe configuration settings on POSIX systems aren’t always a reliable guide totheactual encoding of the data you encounter. One of the most commoncauses of problems is the seriously broken default encoding for the defaultlocale in POSIX (due to the age of the ANSI C spec where that default isdefined, that default is ASCII rather than UTF-8). Bad default environmentsand environment forwarding in ssh sessions are another source of problems,since an environment forwarded from a client is not a reliable guide to theserver configuration, and if the ssh environment defaults to the C/POSIXlocale, it will tell Python 3 to use ASCII as the default encoding ratherthan something more appropriate.
When surrogateescape was added, we considered enabling it foreveryoperating system interface by default (including file I/O), but the pointwas once again made that this idea posed serious risks for silent datacorruption on Asian systems configured to use Shift-JIS, ISO-2022, orother ASCII-incompatible encodings (European users were generally in asafer position on this one, since Europe has substantially lower usage ofASCII incompatible codecs than Asia does).
This means we’ve been judiciously adding surrogateescape to interfaces aswe decide the increase in convenience justifies any increased risk ofdata corruption. For Python 3.5, this isalso being applied tosys.stdin
andsys.stdout
onPOSIX systems that claim that we should be usingascii
as the defaultencoding. Such a result almost certainly indicates a configurationerror in the environment, but using ascii+surrogateescape in such cases shouldmake for a more usable result than the current approach of ascii+strict.There’s still some risk of silent data corruption in the face of ASCIIincompatible encodings, but the assumption is that systems that areconfigured with a non-ASCII compatible encoding should already haverelatively robust configurations that avoid ever relying on the default POSIXlocale.
This is an area where we’re genuinely open to the case being made fordifferent defaults, or additional command line or environment variableconfiguration options. POSIX is just seriously broken in this space, andwe’re having to trade-off user convenience against the risk of silent datacorruption - that means the “right answer” isnot obvious, and any PEPproposing a change needs to properly account for the rationale behind thecurrent decision (in particular, it has to account for the technicallimitations in the startup code that create the coupling to the defaultlocale encoding reported by the operating system, which may require achange on the scale ofPEP 432 to actually fix properly).
The biggest change made specifically to ease migration from Python 2 was thereintroduction of Unicode literals in Python 3.3 (inPEP 414). Thisallows developers supporting both Python 2 and 3 in a single code base toeasily distinguish binary literals, text literals and native strings, asb"binary"
means bytes in Python 3 and str in Python 2,u"text"
means str in Python 3.3+ and unicode in Python 2, while"native"
meansstr in both Python 2 and 3.
The restoration of binary interpolation support in Python 3.5 was designed insuch as way as to also serve to make a lot of 8-bit string interpolationoperations in Python 2 code “just work” in Python 3.5+.
A smaller change to simplify migration was the reintroduction of thenon-text encoding codecs (likehex_codec
) in Python 3.2, and therestoration of their convenience aliases (likehex
) in Python 3.4. Thecodecs.encode
andcodecs.decode
convenience functions allow them tobe used in a single source code base (since those functions have been presentand covered by the test suite since Python 2.4, even though they were onlyadded to the documentation recently).
The WSGI update inPEP 3333 also standardised the Python 3 interfacebetween web servers and frameworks, which is what allowed the web frameworksto start adding Python 3 support with the release of Python 3.2.
A number of standard library APIs that were originally either binary only ortext only in Python 3 have also been updated to accept either type. Inthese cases, there is typically a requirement that the “alternative” type bestrict 7-bit ASCII data - use cases that need anything more than that areexpected to do their encoding or decoding at the application boundary ratherthan relying on the implicit encoding and decoding provided by the affectedAPIs. This is a concession in the Python 3 text model specifically designedto ease migration in “pure ASCII” environments - while relying on it canreintroduce the same kind of obscure data driven failures that are seenwith the implicit encoding and decoding operations in Python 2, these APIsare at least unlikely to silently corrupt data streams (even in the presenceof data encoded using a non-ASCII compatible encoding).
The original migration guides unconditionally recommended running anapplications test suite using the-3
flag in Python 2.6 or 2.7 (toensure no warnings were generated), and then using the2to3
utilityto perform a one-time conversion to Python 3.
That approach is still a reasonable choice for migrating a fully integratedapplication that can completely abandon Python 2 support at the time of theconversion, but is no longer considered a good option for migration oflibraries, frameworks and applications that want to add Python 3 supportwithout losing Python 2 support. The approach of running2to3
automatically at install time is also no longer recommended, as it createsan undesirable discrepancy between the deployed code and the code in sourcecontrol that makes it difficult to correctly interpret any reportedtracebacks.
Instead, the preferred alternative in the latter case is now to create asingle code base that can run under both Python 2 and 3. Thesixcompatibility library can help with several aspects of that, and thepython-modernize utility is designed to take existing code that supportsolder Python versions and update it to run in the large common subset ofPython 2.6+ and Python 3.3+ (or 3.2+ if the unicode literal support inPython 3.3 isn’t needed).
The “code modernisation” approach also has the advantage of being able to bedone incrementally over several releases, as failures under Python 3 can beaddressed progressively by modernising the relevant code, until eventuallythe code runs correctly under both versions. Another benefit of thisincremental approach is that this modernisation activity can be undertakeneven while waiting for other dependencies to add Python 3 support.
More recently, thepython-future project was created to assist thosedevelopers that would like to primarily write Python 3 code, but wouldalso like to support their software on Python 2 for the benefit ofpotential (or existing) users that are not themselves able to upgrade toPython 3.
The addition of thepylint--py3k
flag was designed to make it easier forfolks to ensure that code migrated to the common subset of Python 2 and Python3 remained there rather than reintroducing Python 2 only constructs.
Thelanding page for the Python documentationwas also switched some time ago to display the Python 3 documentation bydefault, although deep links still refer to the Python 2 documentation inorder to preserve the accuracy of third party references (seePEP 430for details).
Most of the changes designed to further simplify migration landed in Python 3.5.
One less obviously migration related aspect of those changes is that the newgradual typing system is designed to allow Python 2 applications to betypechecked as if they were Python 3 applications, and hence many potentialporting problems detected even if they’re not covered by tests, or the testsuite can’t yet be run on Python 3.
Cooperation between the major implementations (primarily CPython, PyPy,Jython, IronPython, but also a few others) has never been greater thanit has been in recent years.The core development community that handles both the language definitionand the CPython implementation includes representatives from all of thosegroups.
The language moratorium that severely limited the kinds of changespermitted in Python 3.2 was a direct result of that collaboration - itgave the other implementations breathing room to catch up to Python 2.7.That moratorium was only lifted for 3.3 with the agreement of the developmentleads for those other implementations. Significantly, one of the mostdisruptive aspects of the 3.x transition for CPython and PyPy (handling alltext as Unicode data) was already the case for Jython and IronPython, asthey use the string model of the underlying JVM and CLR platforms.
We have also institutednew guidelines for CPython development whichrequire that new standard library additions be granted special dispensationif they are to be included as C extensions without an API compatible Pythonimplementation.
Python 3 specifically introducedResourceWarning
, which alertsdevelopers when they are relying on the garbage collector to clean upexternal resources like sockets. This warning is off by default, butswitched on automatically by many test frameworks. The goal of this warningis to detect any cases where__del__
is being used to clean up aresource, such as a file or socket or database connection. Such cases arethen updated to use either explicit resource management (via awith
ortry
statement) or else switched over toweakref
ifnon-deterministic clean-up is considered appropriate (the latter is quiterare in the standard library). The aim of this effort is specifically toensure that the entire standard library will run correctly on Pythonimplementations that don’t use refcounting for object lifecycle management.
Finally, Python 3.3 converted the bulk of the import system over to purePython code so that all implementations can finally start sharing a commonimport implementation. Some work will be needed from each implementation towork out how to bootstrap that code into the running interpreter (this wasone of the trickiest aspects for CPython), but once that hurdle is passedall future import changes should be supported with minimal additional effort.
All that said, there’s often a stark difference in the near termgoals ofthe core development team and the developers for other implementations.Criticism of the Python 3 project has been somewhat vocal from a number ofPyPy core developers, and that makes sense when you consider that one ofthe core aims of PyPy is to provide a better runtime forexisting Pythonapplications. However, despite those reservations, PyPy was still the firstof the major alternative implementations to support Python 3 (with theinitial release of their PyPy3 runtime in June 2014). The initial PyPy3release targeted Python 3.2 compatibility, but the changes needed to catchup on subsequent Python 3 releases are relatively minor compared to thechanges between Python 2 and Python 3, and the PyPy team received a fundeddevelopment grant from Mozilla to bring PyPy3 at least up to Python 3.5compatibility. Work also continues on another major compatibility projectfor PyPy, numpypy, which aims to integrate PyPy with the various componentsof the scientific Python stack.
Note
The info below on Jython and IronPython is currently quite dated.This section should also be updated to mention the new Python 3 onlybytecode-focused implementations targeting the JVM (BeeWare’s VOC), andJavaScript runtimes (BeeWare’s Batavia)
Jython’s development efforts are currently still focused on getting theircurrently-in-beta Python 2.7 support to a full release, and there is alsosome significant work happening on JyNI (which, along the same lines asPyPy’s numpypy project, aims to allow the use of the scientific Python stackfrom the JVM).
The IronPython folks havestarted working on a Python 3compatible version, but there currently isn’t a target date for a release.IronClad already supports the use ofscientific libraries from IronPython.
One interesting point to note for Jython and IronPython is that the changesto the Python 3 text model bring it more into line with the text models ofthe JVM and the CLR. This may mean that projects updated to run in thecommon subset of Python 2 and 3 will be more likely to run correctly onJython and IronPython, and once they implement Python 3 support, thecompatibility of Python 3 only modules should be even better.
We’re well aware of this concern, and have taken what steps we can tomitigate it.
First and foremost is the extended maintenance period for thePython 2.7 release. We knew it would take some time before the Python 3ecosystem caught up to the Python 2 ecosystem in terms of real worldusability. Thus, the extended maintenance period on 2.7 to ensure itcontinues to build and run on new platforms. While python-dev maintenanceof 2.7 was originally slated to revert to security-fix only mode in July2015, Guido extended that out to 2020 at PyCon 2014. We’re now workingwith commercial redistributors to help ensure the appropriate resourcesare put in place to actually meet that commitment. In addition to theongoing support from the core development team, 2.6 will still besupported by enterprise Linux vendors until at least 2020, while Python 2.7will be supported until at least 2024.
We have also implemented various mechanisms which are designed to ease thetransition from Python 2 to Python 3. The-3
command line switch inPython 2.6 and 2.7 makes it possible to check for cases where code is goingto change behaviour in Python 3 and update it accordingly.
The automated2to3
code translator can handle many of the mechanicalchanges in updating a code base, and thepython-modernize variantperforms a similar translation that targets the (large) common subset ofPython 2.6+ and Python 3 with the aid of thesix compatibility module,whilepython-future does something similar with itsfuturize
utility.
PEP 414 was implemented in Python 3.3 to restore support for explicitUnicode literals primarily to reduce the number of purely mechanical codechanges being imposed on users that are doing the right thing in Python 2and using Unicode for their text handling.
One outcome of some of the discussions at PyCon 2014 was thepylint--py3k
utility to help make it easier for folks to migrate software incrementally andopportunistically, first switching to the common subset running on Python 2.7,before migrating to the common subset on Python 3.
So far we’ve managed to walk the line by persuading our Python 2 users thatwe aren’t going to leave them in the lurch when it comes to appropriateplatform support for the Python 2.7 series, thus allowing them to perform themigration on their own schedule as their dependencies become available,while doing what we can to ease the migration process so that following ourlead remains the path of least resistance for the future evolution of thePython ecosystem.
PEP 404 (yes, the choice of PEP number is deliberate - it was too goodan opportunity to pass up) was created to make it crystal clear thatpython-dev has no intention of creating a 2.8 release that backports2.x compatible features from the 3.x series. After you make it throughthe opening Monty Python references, you’ll find the explanationthat makes it unlikely that anyone else will take advantage of the “rightto fork” implied by Python’s liberal licensing model: we had very goodreasons for going ahead with the creation of Python 3, and very goodreasons for discontinuing the Python 2 series. We didn’t decide to disruptan entire community of developers just for the hell of it - we did itbecause there was a core problem in the language design, and a backwardscompatibility break was the only way we could find to solve it once andfor all.
For the inverse question relating to the concern that the existing migrationplan is tooconservative, seeBut uptake is so slow, doesn’t this mean Python 3 is failing as a platform?.
With the Python 2.7 sunset date in the past, both the Debian/Ubuntuand Fedora/RHEL/CentOS ecosystems well advanced in their migrationplans, public cloud providers offering Python 3 in addition to Python2, major commercial end users like Facebook, Google and Dropboxmigrating, and the PSF’s own major services like python.org and thePython Package Index switching to Python 3, the short answer here is“That’s not going to happen”.
While a crash in general Python adoption might have made us change our minds,Python ended up working its way into more and more nichesdespite thePython 3 transition, so the only case that could be made is “adoption wouldbe growing even faster without Python 3 in the picture”, which is a hardstatement to prove (particularly when we suspect that at least some ofthe growth in countries where English is not the primary spoken languageis likely to bebecause of Python 3 rather than in spite of it, and thatthe Python 3 text model is in a much better position to serve as a bridgebetween the POSIX text model and the JVM and CLR text models than the Python 2model ever was).
Another scenario that would have made us seriously question our currentstrategy is if professional educators had told us that Python 2 was a betterteaching language, but that didn’t happen - they’re amongst Python 3’s morevocal advocates, encouraging the rest of the community to “just upgradealready”.
In a word: no. In several words: maybe, but at such a high cost, the coredevelopment team consider it a much better idea to invest that effort inimproving Python 3, migration tools and helping to port libraries andapplications (hence why credible contributors can apply to the PSF for agrant to help port key libraries to Python 3, but PSF funding isn’t availablefor a Python 2.8 release).
The rationale for this proposal appears to be that if backporting Python 3changes to Python 2.6 and 2.7 was a good idea to help Python 3 adoption,then continuing to do so with a new Python 2.8 release would also be agood idea.
What this misses is that those releases were made during a period when thecore development team was still in the process of ensuring that Python 3 wasin a position to stand on its own as a viable development platform. Wedidn’t want conservative users that were currently happy with Python 2to migrate at that point, as we were still working out various details toget it back to feature parity with Python 2. One of the most notable ofthose was getting a usable WSGI specification back in 3.2, and another beingthe restoration of Unicode literals in 3.3 to help with migration from Python2.
If we hadn’t considered Python 3.2 to be at least back to parity withPython 2.7,that is when we would have decided to continue on to do aPython 2.8 release. We’re even less inclined to do so now that Python 3has several additional years of feature development under its belt relative tothe Python 2 series.
Thereare parts of the Python 3 standard library that are also useful inPython 2. In those cases, they’re frequently available as backports onthe Python Package Index (including even a backport of the new asynchronousIO infrastructure).
There are also various language level changes that are backwards compatiblewith Python 2.7, and theTauthon project was started specificallyto create a hybrid runtime implementation that expanded the “common subset”of Python 2 & 3 to include those additional features.
However, I think a key point that is often missed in these discussions is thatthe adoption cycles for new versions of the core Python runtime havealwaysbeen measured in years due to the impact of stable platforms like Red HatEnterprise Linux.
Consider the following map of RHEL/CentOS versions to Python versions(release date given is thePython release date, and Python 2.5 wasskipped due to RHEL5 being published not long before it was released inSeptember 2006):
4 = 2.3 (first released July 2003)
5 = 2.4 (first released November 2004)
6 = 2.6 (first released October 2008)
7 = 2.7 (first released July 2010)
Now consider these Twisted compatibility requirements (going by themodification dates on the tagged INSTALL file):
10.0 dropped Python 2.3 in March 2010
10.2 dropped Python 2.4 (Windows) in November 2010
12.0 dropped Python 2.4 (non-Windows) in February 2012
12.2 dropped Python 2.5 in August 2012
15.4 dropped Python 2.6 in September 2015
Python 2.6 compatibility was still required more than 7 years after itsoriginal release, and didn’t get dropped until well after the first CentOS 7release was available (not to mention the earlier release of a Python 2.7SCL).
I believe Twisted has one ofthe most conservative user bases in thePython community, and I consider this one of the main reasons we see thisgeneral pattern of only dropping support for an older release 6-7 yearsafter it was first made available. That’s also why I considered the Twisteddevelopers a key audience for any increases in the scope of single sourcesupport in Python 3.5 (and their support for the idea was certainly one ofthe factors behind the planned return of binary interpolation support).
That’s the way the path to Python 3 will be smoothed at this point: byidentifying blockers to migration and knocking them down, one by one. ThePSF has helped fund the migration of some key libraries. Barry Warsaw drovea fair amount of Python 3 migration work for Ubuntu at Canonical. VictorStinner is working hard to encourage and support the OpenStack migration. Ihave been offering advice and encouragement to Bohuslav Kabrda (the maininstigator of Fedora’s migration to Python 3), Petr Viktorin, and other membersof Red Hat’s Python maintenance team, as well as helping out withFedora policy recommendations on supporting parallel Python 2 and 3 stacks (Ihave actually had very little to do with Red Hat’s efforts to support Python3 overall, as I haven’t needed to. Things like Python 3 support in Red HatSoftware Collections and OpenShift Online happened because other folks atRed Hat made sure they happened). Guido approved the restoration of Unicodeliteral support after web framework developers realised they couldn’t maskthat particular change for their users, and he has also approved therestoration of binary interpolation support. I went through and made thebinary transform codecs that had been restored in Python 3.2 easier todiscover and use effectively in Python 3.4. R. David Murray put in a lotof time and effort to actually handle Unicode sensibly in theemail
module, Brett Cannon has been updating the official migration guide basedon community feedback, etc, etc (I’m sure I’m missing a bunch of otherrelevant changes).
Outside of CPython and its documentation, Benjamin Peterson published thesix
, Lennart Regebro put together his excellent guide for porting,Armin Ronacher createdpython-modernize
and Ed Schofield createdpython-future
. Multiple folks have contributed patches to a widevariety of projects to allow them to add Python 3 support.
Certainly - a change of this magnitude is sufficiently disruptive thatmany members of the Python community are legitimately upset at the impactit has had on them.
This is particularly the case for users that had never personally beenbitten by the broken Python 2 Unicode model, either because they workin an environment where almost all data is encoded as ASCII text(increasingly uncommon, but still not all that unusual in English speakingcountries) or else in an environment where the appropriate infrastructureis in place to deal with the problem even in Python 2 (for example, webframeworks hide most of the problems with the Python 2 approach fromtheir users).
Another category of users are upset that we chose to stop adding newfeatures to the Python 2 series, and have beenquite emphatic that attemptsto backport features (other than via PyPI modules likeunittest2
,contextlib2
andconfigparser
) are unlikely to receive significantsupport from python-dev. As long as they don’t attempt to present themselvesas providing official Python releases, we’re notopposed to such efforts -it’s merely the case that (outside a few specific exceptions likePEP 466)we aren’t interested in doing them ourselves, and are unlikely to devotesignificant amounts of time to assisting those thatare interested.
A third category of user negatively affected by the change are those usersthat deal regularly with binary data formats and had mastered theidiosyncrasies of the Python 2 text model to the point where writingcorrect code using that model was effortless. The kinds of hybridbinary-or-text APIs that thestr
type made easy in Python 2 can berelatively awkward to write and maintain in Python 3 (or in the commonsubset of the two languages). While native Python 3 code can generallysimply avoid defining such APIs in the first place, developers portinglibraries and frameworks from Python 2 generally have little choice, asthey have to continue to support both styles of usage in order to allowtheirusers to effectively port to Python 3.
However, we have done everything we can to make migrating to Python 3 theeasiest exit strategy for Python 2, and provided a fairly leisurely timeframe for the user community to make the transition. Full maintenance ofPython 2.7 has now been extended to 2020, source only security releasesmay continue for some time after that, and, as noted above, I expectenterprise Linux vendors and other commercial Python redistributors tocontinue to provide paid support for some time after community support ends.
Essentially, the choices we have set up for Python 2 users that findPython 3 features that are technically backwards compatible with Python 2attractive are:
Live without the features for the moment and continue to use Python 2.7
For standard library modules/features, use a backported version from PyPI(or create a backport if one doesn’t already exist and the module doesn’trely specifically on Python 3 only language features)
Migrate to Python 3 themselves
Fork Python 2 to add the missing features for their own benefit
Migrate to a language other than Python
The first three of those approaches are all fully supported by python-dev.Many standard library additions in Python 3 started as modules on PyPI andthus remain available to Python 2 users. For other cases, such asunittest
orconfigparser
, the respective standard library maintainer also maintainsa PyPI backport.
The fourth choice exists as theTauthon project, so it will be interestingto see if that gains significant traction with developers and platformproviders.
The final choice would be unfortunate, but we’ve done what we can to makethe other alternatives (especially the first three) more attractive.
Again, many of us in core development are aware of this concern, andhave been taking active steps to ensure that even the most risk averseenterprise users can feel comfortable in adopting Python for theirdevelopment stack, despite the current transition.
Obviously, much of the content in the answers above regarding theviability of Python 2 as a development platform, with a clear futuremigration path to Python 3, is aimed at enterprise users. Government agenciesand large companies are the environments where risk management tends to cometo the fore, as the organisation has something to lose. The start up andopen source folks are far more likely to complain that the pace of Pythoncore development istoo slow.
The main change to improve the perceived stability of Python 3 is thatwe’ve started making greater use of the idea of “documenteddeprecation”. This is exactly what it says: a pointer in the documentationto say that a particular interface has been replaced by an alternative weconsider superior that should be used in preference for new code. Wehave no plans to remove any of these APIs from Python - they work, there’snothing fundamentally wrong with them, there is just an updated alternativethat was deemed appropriate for inclusion in the standard library.
Programmatic deprecation is now reserved for cases where an API or featureis considered so fundamentally flawed that using it is very likely to causebugs in user code. An example of this is the deeply flawedcontextlib.nested
API which encouraged a programming style that wouldfail to correctly close resources on failure. For Python 3.3, it was finallyreplaced with a superior incrementalcontextlib.ExitStack
API whichsupports similar functionality without being anywhere near as error prone.
Secondly, code level deprecation warnings are now silenced by default. Theexpectation is that test frameworks and test suites will enable them (sodevelopers can fix them), while they won’t be readily visible to end usersof applications that happen to be written in Python. (This change canactually cause problems with ad hoc user scripts breaking when upgrading toa newer version of Python, but the longevity of Python 2.7 actually works inour favour on that front)
Finally, and somewhat paradoxically, the introduction ofprovisional APIsin Python 3 is a feature largely for the benefit of enterprise users. Thisis a documentation marker that allows us to flag particular APIs aspotentially unstable. It grants us a full release cycle (or more) to ensurethat an API design doesn’t contain any nasty usability traps beforedeclaring it ready for use in environments that require rock solidbackwards compatibility guarantees.
Technically, even the core developers weren’t consulted: Python 3 happenedbecause the creator of the language, Guido van Rossum, wanted itto happen, and Google paid for him to devote half of his working hours toleading the development effort.
In practice, Guido consults extensively with the other core developers, andif he can’t persuade even us that something is a good idea, he’s likely toback down. In the case of Python 3, though, it is our collective opinionthat the problems with Unicode in Python 2 are substantial enough tojustify a backwards compatibility break in order to address them, andthat continuing to maintain both versions in parallel indefinitely wouldnot be a good use of limited development resources.
We as a group also continue to consult extensively with the authors of otherPython implementations, authors of key third party frameworks, libraries andapplications, our own colleagues and other associates, employees of keyvendors, Python trainers, attendees at Python conferences, and, well, justabout anyone that cares enough to sign up to the python-dev or python-ideasmailing lists or add their Python-related blog to the Planet Python feed,or simply discuss Python on the internet such that the feedbackeventually makes it way back to a place where we see it.
Some notable changes within the Python 3 series, specificallyPEP 3333(which updated the Web Server Gateway Interface to cope with the Python 3text model) andPEP 414 (which restored support for explicit Unicodeliterals) have been driven primarily by the expressed needs of the webdevelopment community in order to make Python 3 better meet their needs.
The restoration of binary interpolation support in Python 3.5 is similarlyintended to increase the size of the common subset of Python 2 and Python 3in a way that makes it easier for developers to migrate to the new versionof the language (as well as being a useful new feature for Python 3 in itsown right).
If you want to keep track of Python’s development and get some idea ofwhat’s coming down the pipe in the future, it’s allavailable on the internet.
One previously popular approach to saying why Python 2 should be used overPython 3 even fornew projects was to appeal to the authority of someone likeArmin Ronacher (creator of Jinja2, Flask, Click, etc) or Greg Wilson (creatorof Software Carpentry).
The piece missing from that puzzle is the fact that Guido van Rossum, thecreator of Python,and every core developer of CPython, have not only beenpersuaded that the disruption posed by the Python 3 transition is worth theeffort, but have been busily adding the features we notice missing from bothPython 2 and 3 solely to the Python 3 series since the feature freeze forPython 2.7 back in 2010.
Where’s the disconnect? Well, it arises in a couple of ways. Firstly, whencreating Python 3, wedeliberately made it worse than Python 2 inparticular areas. That sounds like a ridiculous thing for a language designteam to do, but programming language design is a matter of making trade-offsand if you try to optimise for everything at once, you’ll end up with anunreadable mess that isn’t optimised for anything. In many of those cases,we were trading problems we considered unfixable for ones that could at leastbe solved in theory, even if they haven’t been solvedyet.
In Armin’s case, the disconnect was that his primary interest is in writingserver components for POSIX systems, and cross-platform command clients forthose applications. This runs into issues, because Python 3’s operatingsystem integration could get confused in a few situations:
on POSIX systems (other than Mac OS X), in the default C locale
on POSIX systems (other than Mac OS X), when ssh environment forwardingconfigures a server session with the client locale and the client andserver have differing locale settings
at the Windows command line
This change is due to the fact that where Python 2 decodes from 8-bit datato Unicode textlazily at operating system boundaries, Python 3 does soeagerly. This change was made to better accommodate Windows systems (wherethe 8-bit APIs use the mbcs codec, rendering them effectively useless), butcame at the cost of being more reliant on receiving correct encoding anddecoding advice from the operating system. Operating systems are normallypretty good about providing that info, but they fail hard in the abovescenarios.
In almost purely English environments, none of this causes any problems, justas the Unicode handling defects in Python 2 tend not to cause problems insuch environments. In the presence of non-English text however, we had todecide between cross-platform consistency (i.e. assuming UTF-8 everywhere),and attempting to integrate correctly with the encoding assumptions of otherapplications on the same system. We opted for the latter approach, primarilydue to the dominance of ASCII incompatible encodings in East Asian countries(ShiftJIS, ISO-2022, GB-18030, various CJK codecs, etc). For ordinary user spaceapplications, including the IPython Notebook, this already works fine. For othercode, we’re now working through the process of assuming UTF-8 as the defaultbinary encoding when the operating system presents us with dubious encodingrecommendations (that will be a far more viable assumption in 2016 than it wasin 2008).
For anyone that would like to use Python 3, but is concerned by ArminRonacher’s comments, the best advice I can offer is touse his librariesto avoid those problems. Seriously, the guy’s brilliant - you’re unlikely togo seriously wrong in deciding to use his stuff when it applies to yourproblems. It offers a fine developer experience, regardless of which versionof Python you’re using. His complaints are about the fact thatwriting thoselibraries became more difficult in Python 3 in some respects, but he gainedthe insight needed to comprehensively document those concerns the hard way:by porting his code. His feedback on the topic was cogent and constructiveenough that it was cited as one of the reasons he received a Python SoftwareFoundation Community Service Award inOctober 2014.
The complaints from the Software Carpentry folks (specifically Greg Wilson)were different. Those were more about the fact that we hadn’t done a verygood job of explaining the problems that the Python 3 transition wasdesigned to fix. This is an example of something Greg himself calls “thecurse of knowledge”: experts don’t necessarily know what other peopledon’t know. In our case, we thought we were fixing bugs that trippedup everyone. In reality, what we were doing was fixing things thatwethought were still too hard, even with years (or decades in some cases) ofPython experience. We’d waste memory creating lists that we then justiterated over and threw away, we’d get our Unicode handling wrong so ourapplications broke on Windows narrow builds (or just plain broke the firsttime they encountered a non-ASCII character or text in multiple encodings),we’d lose rare exception details because we had a latent defect in anerror handler. We baked fixes for all of those problems (and more)directlyinto the design of Python 3, and then became confused when other Pythonusers tried to tell us Python 2 wasn’t broken and they didn’t see whatPython 3 had to offer them. So we’re now in a position where we’re having tounpack years (or decades) of experience with Python 2 to explain why wedecided to put that into long term maintenance mode and switch our featuredevelopment efforts to Python 3 instead.
After hearing Greg speak on this, I’m actually really excited when I hearGreg say that Python 3 is no harder to learn than Python 2 for Englishspeakers, as we took some of the more advanced concepts from Python 2 andmade themno longer optional when designing Python 3. The Python 3“Hello World!” now introduces users to string literals, builtins, functioncalls and expression statements, rather than just to string literals and asingle dedicated print statement. Iterators arrive much earlier in thecurriculum than they used to, as does Unicode. The chained exceptions thatare essential for improving the experience of debugging obscure productionfailures can present some readability challenges for new users. If we’vemanaged to front load all of that hard earned experience into the basedesign of the language and the end result is “just as easy to learn asPython 2”, then I’mhappy with that. It means we were wrong when we thoughtwe were making those changes for the benefit of beginners - it turns outEnglish speaking beginners aren’t at a point where the issues we addressedare even on their radar as possible problems. But Greg’s feedback nowsuggests to me that we have actually succeeded in removing some ofthe barriers between competence and mastery, withoutharming the beginnerexperience. There are also other changes in Python 3, like the removal ofthe “__init__.py” requirement for package directories, the improvements toerror messages when functions are called incorrectly, the inclusion ofadditional standard library modules like statistics, asyncio and ipaddress,the bundling of pip, and more automated configuration of Windows systems inthe installer that should genuinely improve the learning experience for newusers.
Greg’s also correct that anyrenaming of existing standard libraryfunctionality should be driven by objective user studies - we learned thatthe hard way by discovering that the name changes and rearrangements we didin the Python 3 transition based on our own intuition were largely anannoying waste of time that modules likesix
andfuture
now have tohelp folks moving from Python 2 to Python 3 handle. However, we’re notexactly drowned in offers to do that research, so unless someone can figureout how to get it funded and executed, it isn’t going to happen any timesoon. As soon as someone does figure that out, though, I look forward toseeing Python Enhancement Proposals backed specifically by research done tomake the case for particular name changes, including assessments of theadditional cognitive load imposed by students having to learn both the newnames suggested by the usability research and the old names that will stillhave to be kept around for backwards compatibility reasons. In the meantime,we’ll continue with the much lower cost “use expert intuition and arguing onthe internet to name new things, leave the names of existing things alone”approach. That low cost option almost certainly doesn’t findoptimalnames for features, but it does tend to find names that aregood enough.
The other piece that we’re really missing is feedback from folks teachingPython to users in languagesother than English. Much of the design ofPython 3 is aimed at working better with East Asian and African languageswhere there are no suitable 8-bit encodings - you really need the full powerof Unicode to handle them correctly. With suitable library support, Python 2can be made to handle those languages at the application level, butPython 3 aims to handle them at the language and interpreter level - Pythonshouldn’t fail just because a user is attempting to run it from their homedirectory and their name can’t be represented using the latin-1 alphabet(or koi8-r, or some other 8-bit encoding). Similarly, naming a module inyour native language shouldn’t mean that Python can’t import it, but inPython 2, module names (like all identifiers) are limited to the ASCIIcharacter set. Python 3 lifts the limitations on non-ASCII module namesand identifiers in general, meaning that imposing such restrictions entersthe domain of project-specific conventions that can be enforced with toolslike pylint, rather than being an inherent limitation of the language itself.
With Eric Snow’s publication of his intent to investigate enhancingCPython’s existing subinterpreter model to provide native support forCommunicating Sequential Processes based parallel execution, the discussion ofPython’s multicore processing support that previously appeared here has beenmoved out to its ownarticle.
Note
This answer was written for Python 3.5. While CPython 3.6 stilldoesn’t ship with a JIT compiler by default, itdoes ship with a dynamicframe evaluation hookthat allows third party method JITs like Pyjion to be enabled at runtime.
This is another one of those changes which is significantly easier saidthan done - the problem is with the “just”, not the “add JIT compilation”.Armin Rigo (one of the smartest people I’ve had the pleasure of meeting)tried to provide one as an extension module (thepsyco
project) buteventually grew frustrated with working within CPython’s limitations andeven the limitations of existing compiler technology, so he went off andinvented an entirely new way of building language interpreters instead -that’s what thePyPy
project is, a way of writing language interpretersthat also gives you a tracing JIT compiler, almost for free.
However, while PyPy is an amazing platform for running Pythonapplications,the extension module compatibility problems introduced by using a differentreference counting mechanism mean it isn’t yet quite as good as CPython asanorchestration system, so those users in situations where their Pythoncode isn’t the performance bottleneck stick with the simpler platform. Thatcurrently includes scientists, Linux vendors, Apple, cloud providers and soon and so forth. As noted above when discussing the possible future ofconcurrency in Python, it seems entirely plausible to me that PyPy willeventually become the defaultapplication runtime for Python software,with CPython being used primarily as a tool for handling orchestration tasksand embedding in other applications, and only being used to run fullapplications if PyPy isn’t available for some reason. That’s going to takea while though, as vendors are currently still wary of offering commercialsupport for PyPy, not through lack of technical merit, but simply becauseit represents an entirely new way of creating software and they’re not sureif they trust it yet (they’ll likely get over those reservations eventually,but it’s going to take time - as the CPython core development team havegood reason to know, adoption of new platforms is a slow, complex business,especially when many users of the existing platform don’t experience theproblem that the alternative version is aiming to solve).
While PyPy is a successful example of creating anew Python implementationwith JIT compilation support (Jython and IronPython benefit from the JITcompilation support in the JVM and CLR respectively), the Unladen Swallowproject came about when some engineers at Google made a second attempt atadding a JIT compiler directly to the CPython code base.
The Unladen Swallow team did have a couple of successes: they made severalimprovements to LLVM to make it more usable as a JIT compiler, and they puttogether an excellent set of Python macro benchmarks that are used by bothPyPy and CPython for relative performance comparisons to this day. However,even though Guido gave in principle approval for the idea, one thing theydidn’t succeed at doing is adding implicit JIT compilation supportdirectly to CPython.
The most recent attempt at adding JIT compilation to CPython is a projectcalledNumba, and similar topsyco
, Numba doesn’t attempt to provideimplicit JIT compilation of arbitrary Python code. Instead, you have todecorate the methods you would like accelerated. The advantage of this isthat it means that Numbadoesn’t need to cope with the full dynamism ofPython the way PyPy does - instead, it can tweak the semantics within thedecorated functions to reduce the dynamic nature of the language a bit,allowing for simpler optimisation.
Anyone that is genuinely interested in getting implicit JIT support into thedefault CPython implementation would do well to look into resurrecting thespeed.python.org project. Modelled afterthespeed.pypy.org project (and using the samesoftware), this project has foundered for lack of interested volunteers andleadership. It comes back to the problem noted above - if you’re using Pythonfor orchestration, the Python code becoming a bottleneck is usually taken asindicating an architectural issue rather than the Python runtime being tooslow.
The availability of PyPy limits the appeal of working on adding JITcompilation to CPython as a volunteer or sponsoring it as a commercial usereven further - if all of the extensions an application needs are alsoavailable on PyPy, then it’s possible to just use that instead, and iftheyaren’t available, then porting them or creating alternatives withcffi or a pure Python implementation is likely to be seen as a moreinteresting and cost effective solution than attempting to add JITcompilation support to CPython.
I actually find it quite interesting - the same psychological and commercialfactors that work against creating Python 2.8 and towards increasingadoption of Python 3 also workagainst adding JIT compilation supportto CPython and towards increasing adoption of PyPy for application styleworkloads.
The suggestions that adding a new carrot like free threading or a JITcompiler to Python 3 would suddenly encourage users that are happy withPython 2 to migrate generally misunderstand the perspective of conservativeusers.
Early adopters are readily attracted by shiny new features - that’s whatmakes them early adopters. And we’re very grateful to the early adopters ofPython 3 - without their interest and feedback, there’s no way the newversion of the language would have matured as it has over the last severalyears.
However, the kinds of things that attract conservative users are verydifferent - they’re not as attracted by shiny new features as they are byreliability and support. For these users, the question isn’t necessarily“Why would I start using Python 3?”, it is more likely to be“Why would I stop using Python 2?”.
The efforts of the first several years of Python 3 deployment were aboutpositioning it to start crossing that gap between early adopters and moreconservative users. Around 2014, those pieces started falling into place,especially as more enterprise Linux vendors brought supported Python 3offerings to market.
This means that while conservative users that arealready using Python arelikely to stick with Python 2 for the time being (“if it isn’t broken for us,why change it?”),new conservative users will see a fully supportedenvironment, and 3 is a higher number than 2, even if the ecosystem still hasquite a bit of catching up to do (conservative users aren’t going to bedownloading much directly from PyPI either - they often prefer to outsourcethat kind of filtering to software vendors rather than doing it themselves).