Porting Python 2 Code to Python 3¶
- author
Brett Cannon
Abstract
With Python 3 being the future of Python while Python 2 is still in activeuse, it is good to have your project available for both major releases ofPython. This guide is meant to help you figure out how best to support bothPython 2 & 3 simultaneously.
If you are looking to port an extension module instead of pure Python code,please seePorting Extension Modules to Python 3.
If you would like to read one core Python developer’s take on why Python 3came into existence, you can read Nick Coghlan’sPython 3 Q & A orBrett Cannon’sWhy Python 3 exists.
For help with porting, you can view the archivedpython-porting mailing list.
The Short Explanation¶
To make your project be single-source Python 2/3 compatible, the basic stepsare:
Only worry about supporting Python 2.7
Make sure you have good test coverage (coverage.py can help;
python-mpipinstallcoverage)Learn the differences between Python 2 & 3
UseFuturize (orModernize) to update your code (e.g.
python-mpipinstallfuture)UsePylint to help make sure you don’t regress on your Python 3 support(
python-mpipinstallpylint)Usecaniusepython3 to find out which of your dependencies are blocking youruse of Python 3 (
python-mpipinstallcaniusepython3)Once your dependencies are no longer blocking you, use continuous integrationto make sure you stay compatible with Python 2 & 3 (tox can help testagainst multiple versions of Python;
python-mpipinstalltox)Consider using optional static type checking to make sure your type usageworks in both Python 2 & 3 (e.g. usemypy to check your typing under bothPython 2 & Python 3;
python-mpipinstallmypy).
Note
Note: Usingpython-mpipinstall guarantees that thepip you invokeis the one installed for the Python currently in use, whether it bea system-widepip or one installed within avirtual environment.
Details¶
A key point about supporting Python 2 & 3 simultaneously is that you can starttoday! Even if your dependencies are not supporting Python 3 yet that doesnot mean you can’t modernize your codenow to support Python 3. Most changesrequired to support Python 3 lead to cleaner code using newer practices even inPython 2 code.
Another key point is that modernizing your Python 2 code to also supportPython 3 is largely automated for you. While you might have to make some APIdecisions thanks to Python 3 clarifying text data versus binary data, thelower-level work is now mostly done for you and thus can at least benefit fromthe automated changes immediately.
Keep those key points in mind while you read on about the details of portingyour code to support Python 2 & 3 simultaneously.
Drop support for Python 2.6 and older¶
While you can make Python 2.5 work with Python 3, it ismuch easier if youonly have to work with Python 2.7. If dropping Python 2.5 is not anoption then thesix project can help you support Python 2.5 & 3 simultaneously(python-mpipinstallsix). Do realize, though, that nearly all the projects listedin this HOWTO will not be available to you.
If you are able to skip Python 2.5 and older, then the required changesto your code should continue to look and feel like idiomatic Python code. Atworst you will have to use a function instead of a method in some instances orhave to import a function instead of using a built-in one, but otherwise theoverall transformation should not feel foreign to you.
But you should aim for only supporting Python 2.7. Python 2.6 is no longerfreely supported and thus is not receiving bugfixes. This meansyou will haveto work around any issues you come across with Python 2.6. There are also sometools mentioned in this HOWTO which do not support Python 2.6 (e.g.,Pylint),and this will become more commonplace as time goes on. It will simply be easierfor you if you only support the versions of Python that you have to support.
Make sure you specify the proper version support in yoursetup.py file¶
In yoursetup.py file you should have the propertrove classifierspecifying what versions of Python you support. As your project does not supportPython 3 yet you should at least haveProgrammingLanguage::Python::2::Only specified. Ideally you shouldalso specify each major/minor version of Python that you do support, e.g.ProgrammingLanguage::Python::2.7.
Have good test coverage¶
Once you have your code supporting the oldest version of Python 2 you want itto, you will want to make sure your test suite has good coverage. A good rule ofthumb is that if you want to be confident enough in your test suite that anyfailures that appear after having tools rewrite your code are actual bugs in thetools and not in your code. If you want a number to aim for, try to get over 80%coverage (and don’t feel bad if you find it hard to get better than 90%coverage). If you don’t already have a tool to measure test coverage thencoverage.py is recommended.
Learn the differences between Python 2 & 3¶
Once you have your code well-tested you are ready to begin porting your code toPython 3! But to fully understand how your code is going to change and whatyou want to look out for while you code, you will want to learn what changesPython 3 makes in terms of Python 2. Typically the two best ways of doing thatis reading the“What’s New” doc for each release of Python 3 and thePorting to Python 3 book (which is free online). There is also a handycheat sheet from the Python-Future project.
Update your code¶
Once you feel like you know what is different in Python 3 compared to Python 2,it’s time to update your code! You have a choice between two tools in portingyour code automatically:Futurize andModernize. Which tool you choose willdepend on how much like Python 3 you want your code to be.Futurize does itsbest to make Python 3 idioms and practices exist in Python 2, e.g. backportingthebytes type from Python 3 so that you have semantic parity between themajor versions of Python.Modernize,on the other hand, is more conservative and targets a Python 2/3 subset ofPython, directly relying onsix to help provide compatibility. As Python 3 isthe future, it might be best to consider Futurize to begin adjusting to any newpractices that Python 3 introduces which you are not accustomed to yet.
Regardless of which tool you choose, they will update your code to run underPython 3 while staying compatible with the version of Python 2 you started with.Depending on how conservative you want to be, you may want to run the tool overyour test suite first and visually inspect the diff to make sure thetransformation is accurate. After you have transformed your test suite andverified that all the tests still pass as expected, then you can transform yourapplication code knowing that any tests which fail is a translation failure.
Unfortunately the tools can’t automate everything to make your code work underPython 3 and so there are a handful of things you will need to update manuallyto get full Python 3 support (which of these steps are necessary vary betweenthe tools). Read the documentation for the tool you choose to use to see what itfixes by default and what it can do optionally to know what will (not) be fixedfor you and what you may have to fix on your own (e.g. usingio.open() overthe built-inopen() function is off by default in Modernize). Luckily,though, there are only a couple of things to watch out for which can beconsidered large issues that may be hard to debug if not watched for.
Division¶
In Python 3,5/2==2.5 and not2; all division betweenint valuesresult in afloat. This change has actually been planned since Python 2.2which was released in 2002. Since then users have been encouraged to addfrom__future__importdivision to any and all files which use the/ and// operators or to be running the interpreter with the-Q flag. If youhave not been doing this then you will need to go through your code and do twothings:
Add
from__future__importdivisionto your filesUpdate any division operator as necessary to either use
//to use floordivision or continue using/and expect a float
The reason that/ isn’t simply translated to// automatically is that ifan object defines a__truediv__ method but not__floordiv__ then yourcode would begin to fail (e.g. a user-defined class that uses/ tosignify some operation but not// for the same thing or at all).
Text versus binary data¶
In Python 2 you could use thestr type for both text and binary data.Unfortunately this confluence of two different concepts could lead to brittlecode which sometimes worked for either kind of data, sometimes not. It alsocould lead to confusing APIs if people didn’t explicitly state that somethingthat acceptedstr accepted either text or binary data instead of onespecific type. This complicated the situation especially for anyone supportingmultiple languages as APIs wouldn’t bother explicitly supportingunicodewhen they claimed text data support.
To make the distinction between text and binary data clearer and morepronounced, Python 3 did what most languages created in the age of the internethave done and made text and binary data distinct types that cannot blindly bemixed together (Python predates widespread access to the internet). For any codethat deals only with text or only binary data, this separation doesn’t pose anissue. But for code that has to deal with both, it does mean you might have tonow care about when you are using text compared to binary data, which is whythis cannot be entirely automated.
To start, you will need to decide which APIs take text and which take binary(it ishighly recommended you don’t design APIs that can take both due tothe difficulty of keeping the code working; as stated earlier it is difficult todo well). In Python 2 this means making sure the APIs that take text can workwithunicode and those that work with binary data work with thebytes type from Python 3 (which is a subset ofstr in Python 2 and actsas an alias forbytes type in Python 2). Usually the biggest issue isrealizing which methods exist on which types in Python 2 & 3 simultaneously(for text that’sunicode in Python 2 andstr in Python 3, for binarythat’sstr/bytes in Python 2 andbytes in Python 3). The followingtable lists theunique methods of each data type across Python 2 & 3(e.g., thedecode() method is usable on the equivalent binary data type ineither Python 2 or 3, but it can’t be used by the textual data type consistentlybetween Python 2 and 3 becausestr in Python 3 doesn’t have the method). Donote that as of Python 3.5 the__mod__ method was added to the bytes type.
Text data | Binary data |
decode | |
encode | |
format | |
isdecimal | |
isnumeric |
Making the distinction easier to handle can be accomplished by encoding anddecoding between binary data and text at the edge of your code. This means thatwhen you receive text in binary data, you should immediately decode it. And ifyour code needs to send text as binary data then encode it as late as possible.This allows your code to work with only text internally and thus eliminateshaving to keep track of what type of data you are working with.
The next issue is making sure you know whether the string literals in your coderepresent text or binary data. You should add ab prefix to anyliteral that presents binary data. For text you should add au prefix tothe text literal. (there is a__future__ import to force all unspecifiedliterals to be Unicode, but usage has shown it isn’t as effective as adding ab oru prefix to all literals explicitly)
As part of this dichotomy you also need to be careful about opening files.Unless you have been working on Windows, there is a chance you have not alwaysbothered to add theb mode when opening a binary file (e.g.,rb forbinary reading). Under Python 3, binary files and text files are clearlydistinct and mutually incompatible; see theio module for details.Therefore, youmust make a decision of whether a file will be used forbinary access (allowing binary data to be read and/or written) or textual access(allowing text data to be read and/or written). You should also useio.open()for opening files instead of the built-inopen() function as theiomodule is consistent from Python 2 to 3 while the built-inopen() functionis not (in Python 3 it’s actuallyio.open()). Do not bother with theoutdated practice of usingcodecs.open() as that’s only necessary forkeeping compatibility with Python 2.5.
The constructors of bothstr andbytes have different semantics for thesame arguments between Python 2 & 3. Passing an integer tobytes in Python 2will give you the string representation of the integer:bytes(3)=='3'.But in Python 3, an integer argument tobytes will give you a bytes objectas long as the integer specified, filled with null bytes:bytes(3)==b'\x00\x00\x00'. A similar worry is necessary when passing abytes object tostr. In Python 2 you just get the bytes object back:str(b'3')==b'3'. But in Python 3 you get the string representation of thebytes object:str(b'3')=="b'3'".
Finally, the indexing of binary data requires careful handling (slicing doesnot require any special handling). In Python 2,b'123'[1]==b'2' while in Python 3b'123'[1]==50. Because binary datais simply a collection of binary numbers, Python 3 returns the integer value forthe byte you index on. But in Python 2 becausebytes==str, indexingreturns a one-item slice of bytes. Thesix project has a functionnamedsix.indexbytes() which will return an integer like in Python 3:six.indexbytes(b'123',1).
To summarize:
Decide which of your APIs take text and which take binary data
Make sure that your code that works with text also works with
unicodeandcode for binary data works withbytesin Python 2 (see the table abovefor what methods you cannot use for each type)Mark all binary literals with a
bprefix, textual literals with auprefixDecode binary data to text as soon as possible, encode text as binary data aslate as possible
Open files using
io.open()and make sure to specify thebmode whenappropriateBe careful when indexing into binary data
Use feature detection instead of version detection¶
Inevitably you will have code that has to choose what to do based on whatversion of Python is running. The best way to do this is with feature detectionof whether the version of Python you’re running under supports what you need.If for some reason that doesn’t work then you should make the version check beagainst Python 2 and not Python 3. To help explain this, let’s look at anexample.
Let’s pretend that you need access to a feature ofimportlib thatis available in Python’s standard library since Python 3.3 and available forPython 2 throughimportlib2 on PyPI. You might be tempted to write code toaccess e.g. theimportlib.abc module by doing the following:
importsysifsys.version_info[0]==3:fromimportlibimportabcelse:fromimportlib2importabc
The problem with this code is what happens when Python 4 comes out? It wouldbe better to treat Python 2 as the exceptional case instead of Python 3 andassume that future Python versions will be more compatible with Python 3 thanPython 2:
importsysifsys.version_info[0]>2:fromimportlibimportabcelse:fromimportlib2importabc
The best solution, though, is to do no version detection at all and instead relyon feature detection. That avoids any potential issues of getting the versiondetection wrong and helps keep you future-compatible:
try:fromimportlibimportabcexceptImportError:fromimportlib2importabc
Prevent compatibility regressions¶
Once you have fully translated your code to be compatible with Python 3, youwill want to make sure your code doesn’t regress and stop working underPython 3. This is especially true if you have a dependency which is blocking youfrom actually running under Python 3 at the moment.
To help with staying compatible, any new modules you create should haveat least the following block of code at the top of it:
from__future__importabsolute_importfrom__future__importdivisionfrom__future__importprint_function
You can also run Python 2 with the-3 flag to be warned about variouscompatibility issues your code triggers during execution. If you turn warningsinto errors with-Werror then you can make sure that you don’t accidentallymiss a warning.
You can also use thePylint project and its--py3k flag to lint your codeto receive warnings when your code begins to deviate from Python 3compatibility. This also prevents you from having to runModernize orFuturizeover your code regularly to catch compatibility regressions. This does requireyou only support Python 2.7 and Python 3.4 or newer as that is Pylint’sminimum Python version support.
Check which dependencies block your transition¶
After you have made your code compatible with Python 3 you should begin tocare about whether your dependencies have also been ported. Thecaniusepython3project was created to help you determine which projects– directly or indirectly – are blocking you from supporting Python 3. Thereis both a command-line tool as well as a web interface athttps://caniusepython3.com.
The project also provides code which you can integrate into your test suite sothat you will have a failing test when you no longer have dependencies blockingyou from using Python 3. This allows you to avoid having to manually check yourdependencies and to be notified quickly when you can start running on Python 3.
Update yoursetup.py file to denote Python 3 compatibility¶
Once your code works under Python 3, you should update the classifiers inyoursetup.py to containProgrammingLanguage::Python::3 and to notspecify sole Python 2 support. This will tell anyone using your code that yousupport Python 2and 3. Ideally you will also want to add classifiers foreach major/minor version of Python you now support.
Use continuous integration to stay compatible¶
Once you are able to fully run under Python 3 you will want to make sure yourcode always works under both Python 2 & 3. Probably the best tool for runningyour tests under multiple Python interpreters istox. You can then integratetox with your continuous integration system so that you never accidentally breakPython 2 or 3 support.
You may also want to use the-bb flag with the Python 3 interpreter totrigger an exception when you are comparing bytes to strings or bytes to an int(the latter is available starting in Python 3.5). By default type-differingcomparisons simply returnFalse, but if you made a mistake in yourseparation of text/binary data handling or indexing on bytes you wouldn’t easilyfind the mistake. This flag will raise an exception when these kinds ofcomparisons occur, making the mistake much easier to track down.
And that’s mostly it! At this point your code base is compatible with bothPython 2 and 3 simultaneously. Your testing will also be set up so that youdon’t accidentally break Python 2 or 3 compatibility regardless of which versionyou typically run your tests under while developing.
Consider using optional static type checking¶
Another way to help port your code is to use a static type checker likemypy orpytype on your code. These tools can be used to analyze your code asif it’s being run under Python 2, then you can run the tool a second time as ifyour code is running under Python 3. By running a static type checker twice likethis you can discover if you’re e.g. misusing binary data type in one versionof Python compared to another. If you add optional type hints to your code youcan also explicitly state whether your APIs use textual or binary data, helpingto make sure everything functions as expected in both versions of Python.