Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork436
A cross-version Python bytecode decompiler
License
rocky/python-uncompyle6
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A native Python cross-version decompiler and fragment decompiler.The successor to decompyle, uncompyle, and uncompyle2.
I gave a talk on this atBlackHat Asia 2024.
uncompyle6 translates Python bytecode back into equivalent Pythonsource code. It accepts bytecodes from Python version 1.0 to version3.8, spanning over 24 years of Python releases. We include Dropbox'sPython 2.5 bytecode and some PyPy bytecodes.
Ok, I'll say it: this software is amazing. It is more than yournormal hacky decompiler. Usingcompiler technology, the programcreates a parse tree of the program from the instructions; nodes atthe upper levels that look a little like what might come from a PythonAST. So we can really classify and understand what's going on insections of Python bytecode.
Building on this, another thing that makes this different from otherCPython bytecode decompilers is the ability to deparse justfragments of source code and give source-code information around agiven bytecode offset.
I use the tree fragments to deparse fragments of codeat run timeinside mytrepandebuggers. For that, bytecode offsets are recordedand associated with fragments of the source code. This purpose,although compatible with the original intention, is yet a little bitdifferent. Seethis for more information.
Python fragment deparsing given an instruction offset is useful inshowing stack traces and can be incorporated into any program thatwants to show a location in more detail than just a line number atruntime. This code can be also used when source-code information doesnot exist and there is just bytecode. Again, my debuggers make use ofthis.
There were (and still are) a number of decompyle, uncompyle,uncompyle2, uncompyle3 forks around. Many of them come basically fromthe same code base, and (almost?) all of them are no longer activelymaintained. One was really good at decompiling Python 1.5-2.3, anotherreally good at Python 2.7, but that only. Another handles Python 3.2only; another patched that and handled only 3.3. You get theidea. This code pulls all of these forks together andmovesforward. There is some serious refactoring and cleanup in this codebase over those old forks. Even more experimental refactoring is goingon indecompyle3.
This demonstrably does the best in decompiling Python across allPython versions. And even when there is another project that onlyprovides decompilation for subset of Python versions, we generally dodemonstrably better for those as well.
How can we tell? By taking Python bytecode that comes distributed withthat version of Python and decompiling these. Among those thatsuccessfully decompile, we can then make sure the resulting programsare syntactically correct by running the Python interpreter for thatbytecode version. Finally, in cases where the program has a test foritself, we can run the check on the decompiled code.
We use an automated processes to find bugs. In the issue trackers forother decompilers, you will find a number of bugs we've found alongthe way. Very few to none of them are fixed in the other decompilers.
The code in the git repository can be run from Python 2.4 to thelatest Python version, with the exception of Python 3.0 through3.2. Volunteers are welcome to address these deficiencies if there adesire to do so.
The way it does this though is by segregating consecutive Python versions intogit branches:
- master
- Python 3.6 and up (uses type annotations)
- python-3.3-to-3.5
- Python 3.3 through 3.5 (Generic Python 3)
- python-2.4
- Python 2.4 through 2.7 (Generic Python 2)
PyPy 3-2.4 and later works as well.
The bytecode files it can read have been tested on Pythonbytecodes from versions 1.4, 2.1-2.7, and 3.0-3.8 and later PyPyversions.
You can install from PyPI using the nameuncompyle6
:
pip install uncompyle6
To install from source code, this project uses setup.py, so it follows the standard Python routine:
$ pip install -e . # set up to run from source tree
or:
$ python setup.py install # may need sudo
A GNU Makefile is also provided somake install
(possibly as root orsudo) will do the steps above.
make check
A GNU makefile has been added to smooth over setting running the rightcommand, and running tests from fastest to slowest.
If you haveremake installed, you can see the list of all tasksincluding tests viaremake --tasks
Run
$ uncompyle6 *compiled-python-file-pyc-or-pyo*
For usage help:
$ uncompyle6 -h
In older versions of Python it was possible to verify bytecode bydecompiling bytecode, and then compiling using the Python interpreterfor that bytecode version. Having done this, the bytecode producedcould be compared with the original bytecode. However as Python's codegeneration got better, this no longer was feasible.
If you want Python syntax verification of the correctness of thedecompilation process, add the--syntax-verify
option. However sincePython syntax changes, you should use this option if the bytecode isthe right bytecode for the Python interpreter that will be checkingthe syntax.
You can also cross compare the results with another version ofuncompyle6 since there are sometimes regressions in decompilingspecific bytecode as the overall quality improves.
For Python 3.7 and 3.8, the code indecompyle3 is generallybetter.
Or try specific another python decompiler likeuncompyle2,unpyc37,orpycdc. Since the later two work differently, bugs here oftenaren't in that, and vice versa.
There is an interesting class of these programs that is readilyavailable give stronger verification: those programs that when runtest themselves. Our test suite includes these.
And Python comes with another a set of programs like this: its testsuite for the standard library. We have some code intest/stdlib
tofacilitate this kind of checking too.
The biggest known and possibly fixable (but hard) problem has to dowith handling control flow. (Python has probably the most diverse andscrewy set of compound statements I've ever seen; thereare "else" clauses on loops and try blocks that I suspect manyprogrammers don't know about.)
All of the Python decompilers that I have looked at have problemsdecompiling Python's control flow. In some cases we can detect anerroneous decompilation and report that.
Python support is pretty good for Python 2
On the lower end of Python versions, decompilation seems pretty good althoughwe don't have any automated testing in place for Python's distributed tests.Also, we don't have a Python interpreter for versions 1.6, and 2.0.
In the Python 3 series, Python support is strongest around 3.4 or3.3 and drops off as you move further away from those versions. Python3.0 is weird in that it in some ways resembles 2.6 more than it does3.1 or 2.7. Python 3.6 changes things drastically by using word codesrather than byte codes. As a result, the jump offset field in a jumpinstruction argument has been reduced. This makes theEXTENDED_ARG
instructions are now more prevalent in jump instruction; previouslythey had been rare. Perhaps to compensate for the additionalEXTENDED_ARG
instructions, additional jump optimization has beenadded. So in sum handling control flow by ad hoc means as is currentlydone is worse.
Between Python 3.5, 3.6, 3.7 there have been major changes to theMAKE_FUNCTION
andCALL_FUNCTION
instructions.
Python 3.8 removesSETUP_LOOP
,SETUP_EXCEPT
,BREAK_LOOP
, andCONTINUE_LOOP
, instructions which maymake control-flow detection harder, lacking the more sophisticatedcontrol-flow analysis that is planned. We'll see.
Currently not all Python magic numbers are supported. Specifically insome versions of Python, notably Python 3.6, the magic number haschanges several times within a version.
We support only released versions, not candidate versions. Notehowever that the magic of a released version is usually the same asthelast candidate version prior to release.
There are also customized Python interpreters, notably Dropbox,which use their own magic and encrypt bytecode. With the exception ofthe Dropbox's old Python 2.5 interpreter this kind of thing is nothandled.
We also don't handlePJOrion or otherwise obfuscated code. ForPJOrion try: PJOrionDeobfuscator to unscramble the bytecode to getvalid bytecode before trying this tool;pydecipher might help with that.
This program can't decompile Microsoft Windows EXE files created byPy2EXE, although we can probably decompile the code after you extractthe bytecode properly.Pydeinstaller may help with unpacking Pyinstaller bundlers.
Handling pathologically long lists of expressions or statements isslow. We don't handleCython or MicroPython which don't use bytecode.
There are numerous bugs in decompilation. And that's true for everyother CPython decompiler I have encountered, even the ones thatclaimed to be "perfect" on some particular version like 2.4.
As Python progresses decompilation also gets harder because thecompilation is more sophisticated and the language itself is moresophisticated. I suspect that attempts there will be fewer ad-hocattempts likeunpyc37 (which is based on a 3.3 decompiler) simplybecause it is harder to do so. The good news, at least from mystandpoint, is that I think I understand what's needed to address theproblems in a more robust way. But right now until such time asproject is better funded, I do not intend to make any serious effortto support Python versions 3.8 or 3.9, including bugs that might comein. I imagine at some point I may be interested in it.
You can easily find bugs by running the tests against the standardtest suite that Python uses to check itself. At any given time, there aredozens of known problems that are pretty well isolated and that couldbe solved if one were to put in the time to do so. The problem is thatthere aren't that many people who have been working on bug fixing.
Some of the bugs in 3.7 and 3.8 are simply a matter of back-portingthe fixes indecompyle3. Any volunteers?
You may run across a bug, that you want to report. Please do so afterreadingHow to report a bug andfollow theinstructions when opening an issue.
Be aware that it might not get my attention for a while. If yousponsor or support the project in some way, I'll prioritize yourissues above the queue of other things I might be doing instead. Inrare situtations, I can do a hand decompilation of bytecode for a fee.However this is expansive, usually beyond what most people are willingto spend.
- https://rocky.github.io/blackhat-asia-2024-additional/all-notes-print.html : How to Read and Write a High-Level Bytecode Decompiler:
uncompyle6
decompyle3
-- BlackHat 2024 Asia (video). A big thanks to the Organizers and Reviewers for letting me speak. This kind of thing encourages me to work on projects like this. - https://github.com/rocky/python-decompile3 : Much smaller and more modern code, focusing on 3.7 and 3.8. Changes in that will get migrated back here.
- https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here. Currently unmaintained.
- https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations. Currently unmaintained.
- https://github.com/wibiti/uncompyle2 : supports Python 2.7 only, but does that fairly well. There are situations where
uncompyle6
results are incorrect whileuncompyle2
results are not, but more often uncompyle6 is correct when uncompyle2 is not. Becauseuncompyle6
adheres to accuracy over idiomatic Python,uncompyle2
can produce more natural-looking code when it is correct. Currentlyuncompyle2
is lightly maintained. See its issuetracker for more details. - How to report a bug
- TheHISTORY file.
- https://github.com/rocky/python-xdis : Cross Python version disassembler
- https://github.com/rocky/python-xasm : Cross Python version assembler
- https://github.com/rocky/python-uncompyle6/wiki : Wiki Documents which describe the code and aspects of it in more detail
- https://github.com/zrax/pycdc : The README for this C++ code says it aims to support all versions of Python. You can aim your slign shot for the moon too, but I doubt you are going to hit it. This code is best for Python versions around 2.7 and 3.3 when the code was initially developed. Accuracy for current versions of Python3 and early versions of Python is lacking. Without major effort, it is unlikely it can be made to support current Python 3. See itsissue tracker for details. Currently lightly maintained.
About
A cross-version Python bytecode decompiler
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.