Movatterモバイル変換


[0]ホーム

URL:


— FREE Email Series —

🐍 Python Tricks 💌

Python Tricks Dictionary Merge

🔒 No spam. Unsubscribe any time.

Browse TopicsGuided Learning Paths
Basics Intermediate Advanced
aialgorithmsapibest-practicescareercommunitydatabasesdata-sciencedata-structuresdata-vizdevopsdjangodockereditorsflaskfront-endgamedevguimachine-learningnewsnumpyprojectspythonstdlibtestingtoolsweb-devweb-scraping

Table of Contents

What Is the __pycache__ Folder in Python?

What Is the __pycache__ Folder in Python?

byBartosz ZaczyńskiReading time estimate 31mintermediatepython

Table of Contents

Remove ads

When you develop a self-contained Python script, you might not notice anything unusual about your directory structure. However, as soon as your project becomes more complex, you’ll often decide to extract parts of the functionality into additionalmodules or packages. That’s when you may start to see a__pycache__ folder appearing out of nowhere next to your source files in seemingly random places:

project/│├── mathematics/│   ││   ├── __pycache__/│   ││   ├── arithmetic/│   │   ├── __init__.py│   │   ├── add.py│   │   └── sub.py│   ││   ├── geometry/│   │   ││   │   ├── __pycache__/│   │   ││   │   ├── __init__.py│   │   └── shapes.py│   ││   └── __init__.py│└── calculator.py

Notice that the__pycache__ folder can be present at different levels in your project’s directory tree when you have multiple subpackages nested in one another. At the same time, other packages or folders with your Python source files may not contain this mysterious cache directory.

Note: To maintain a cleaner workspace, manyPython IDEs and code editors are configured out-of-the-box to hide the__pycache__ folders from you, even if those folders exist on your file system.

You may encounter a similar situation after you clone a remoteGit repository with a Python project and run the underlying code.So, what causes the__pycache__ folder to appear, and for what purpose?

Get Your Code:Click here to download the free sample code that shows you how to work with thepycache folder in Python.

Take the Quiz: Test your knowledge with our interactive “What Is the __pycache__ Folder in Python?” quiz. You’ll receive a score upon completion to help you track your learning progress:


What Is the __pycache__ Folder in Python?

Interactive Quiz

What Is the __pycache__ Folder in Python?

In this quiz, you'll have the opportunity to test your knowledge of the __pycache__ folder, including when, where, and why Python creates these folders.

In Short: It Makes Importing Python Modules Faster

Even though Python is aninterpreted programming language, its interpreter doesn’t operate directly on your Python code, which would be very slow. Instead, when yourun a Python script orimport a Python module, the interpreter compiles your high-level Python source code intobytecode, which is an intermediate binary representation of the code.

This bytecode enables the interpreter to skip recurring steps, such aslexing and parsing the code into anabstract syntax tree and validating its correctness every time you run the same program. As long as the underlying source code hasn’t changed, Python can reuse theintermediate representation, which is immediately ready for execution. This saves time, speeding up your script’s startup time.

Remember that while loading the compiled bytecode from__pycache__ makes Python modules import faster, it doesn’t affect their execution speed!

Why bother with bytecode at all instead of compiling the code straight to the low-levelmachine code? While machine code is what executes on the hardware, providing the ultimate performance, it’s not asportable or quick to produce as bytecode.

Machine code is a set of binary instructions understood by your specific CPU architecture, wrapped in a container format likeEXE,ELF, orMach-O, depending on the operating system. In contrast, bytecode provides a platform-independent abstraction layer and is typically quicker to compile.

Python uses local__pycache__ folders to store the compiled bytecode ofimported modules in your project. On subsequent runs, the interpreter will try to load precompiled versions of modules from these folders, provided they’re up-to-date with the corresponding source files. Note that thiscaching mechanism only gets triggered for modules youimport in your code rather than executing as scripts in the terminal.

In addition to thison-disk bytecode caching, Python keeps anin-memory cache of modules, which you can access through thesys.modules dictionary. It ensures that when you import the same module multiple times from different places within your program, Python will use the already imported module without needing to reload or recompile it. Both mechanisms work together to reduce the overhead of importing Python modules.

Next, you’re going to find out exactly how much faster Python loads the cached bytecode as opposed to compiling the source code on the fly when you import a module.

How Much Faster Is Loading Modules From Cache?

The caching happens behind the scenes and usually goes unnoticed since Python is quite rapid at compiling the bytecode. Besides, unless you often run short-lived Python scripts, the compilation step remains insignificant when compared to the total execution time. That said, without caching, the overhead associated with bytecode compilation could add up if you had lots of modules and imported them many times over.

To measure the difference inimport time between a cached and uncached module, you can pass the-X importtime option to thepython command or set the equivalentPYTHONPROFILEIMPORTTIME environment variable. When this option is enabled, Python will display a table summarizing how long it took to import each module, including the cumulative time in case a module depends on other modules.

Suppose you had acalculator.py script that imports and calls a utilityfunction from a localarithmetic.py module:

Pythoncalculator.py
fromarithmeticimportaddadd(3,4)

The imported module defines a single function:

Pythonarithmetic.py
defadd(a,b):returna+b

As you can see, the main script delegates the addition of two numbers, three and four, to theadd() function imported from thearithmetic module.

Note: Even though you use thefrom ... import syntax, which only brings the specified symbol into your currentnamespace, Python reads and compiles the entire module anyway. Moreover,unused imports would also trigger the compilation.

The first time you run your script, Python compiles and saves the bytecode of the module you imported into a local__pycache__ folder. If such a folder doesn’t already exist, then Python automatically creates one before moving on. Now, when you execute your script again, Python should find and load the cached bytecode as long as you didn’t alter the associated source code.

Once the cachewarms up, it’ll contribute to a fasterstartup time of your Python script:

Shell
$python-Ximporttimecalculator.py(...)import time:     20092 |      20092 | arithmetic$python-Ximporttimecalculator.py(...)import time:       232 |        232 | arithmetic$python-Ximporttimecalculator.py(...)import time:       203 |        203 | arithmetic

Although the exact measurements may vary across runs, the improvement is clearly visible. Without the__pycache__ folder, the initial attempt to importarithmetic was two orders of magnitude slower than during subsequent runs. This change may look staggering at first glance, but these values are expressed inmicroseconds, so you most likely won’t even notice the difference despite such a dramatic drop in numbers.

The performance gain is usually barely reflected by the startup time of most Python scripts, which you can assess using thetime command on Unix-like systems:

Shell
$rm-rf__pycache__$timepythoncalculator.pyreal    0m0.088suser    0m0.064ssys     0m0.028s$timepythoncalculator.pyreal    0m0.086suser    0m0.060ssys     0m0.030s

Here, thetotal execution time remains virtually the same regardless of whether the cache exists or not. Removing the__pycache__ folder delays the execution by about two milliseconds, which is negligible for most applications.

Python’s bytecode compiler is pretty snappy when you compare it to a more sophisticated counterpart inJava, which can take advantage ofstatic typing.

For example, if you have a sampleCalculator.java file, then you can either compile it to a.class file upfront, which is the usual way of working with Java code, or run the.java file directly. In the latter case, the Java runtime will compile the code in the background into a temporary location before running it:

Shell
$javacCalculator.java$timejavaCalculatorreal    0m0.039suser    0m0.026ssys     0m0.019s$timejavaCalculator.javareal    0m0.574suser    0m1.182ssys     0m0.069s

When you manually compile the source code using thejavac command and run the resulting bytecode, the execution takes about forty milliseconds. On the other hand, when you let thejava command handle the compilation, the total execution time rises to just over half a second. Therefore, unlike in Python, the Java compiler’s overhead is very noticeable, even with a parallel compilation on multiple CPU cores!

Now that you know the purpose of the__pycache__ folder, you might be curious about its content.

What’s Inside a__pycache__ Folder?

When you take a peek inside the__pycache__ folder, you’ll see one or more files that end with the.pyc extension. It stands for acompiled Python module:

Shell
$ls-1__pycache__arithmetic.cpython-311.pycarithmetic.cpython-312.pycarithmetic.pypy310.opt-1.pycarithmetic.pypy310.opt-2.pycarithmetic.pypy310.pycsolver.cpython-312.pycunits.cpython-312.pyc

Each of these files contains the bytecode of the corresponding Python module, defined in the current package, that you imported atruntime. The compiled bytecode targets a specific Python implementation, version, and an optionaloptimization level. All of this information is encoded in the file name.

For example, a file namedarithmetic.pypy310.opt-2.pyc is the bytecode of yourarithmetic.py module compiled byPyPy 3.10 with an optimization level of two. This optimization removes theassert statements and discards anydocstrings. Conversely,arithmetic.cpython-312.pyc represents the same module but compiled for CPython 3.12 without any optimizations.

In total, there are five bytecode variants compiled from a singlearithmetic.py module in the__pycache__ folder above, which are highlighted.

Note: Before Python 3.5, enabling one of the optimization levels with the-O or-OO flag would compile the bytecode into a separate.pyo file in the cache folder. This changed afterPEP 488 went into effect, and the optional optimization level is now encoded in the.pyc file name.

Such a file naming scheme ensures compatibility across different Python versions and flavors. When you run the same script using PyPy or an earlierCPython version, the interpreter will compile all imported modules against its own runtime environment so that it can reuse them later. The Python version, along with other metadata, is also stored in the.pyc file itself.

You’ll take a closer look at the.pyc files stored in the cache folder at the end of this tutorial. Now, it’s time to learn about the circumstances that trigger Python to create the cache folder.

When Does Python Create Cache Folders?

The interpreter will only store the compiled bytecode of Python modules in the__pycache__ folder when youimport those modules or sometimes their parent package. It won’t create the cache folder when you run a regular Python script that doesn’t import any modules or packages. This is based on the assumption that modules are less likely to change and that you may import them multiple times during a single execution.

Note: Although you only get exposed to bytecode compilation through the__pycache__ folder when you importlocal modules or packages, the underlying machinery works equally well for third-party and standard-library packages.

When you install a third-party package into avirtual environment, by default,pip willpre-compile the package’s modules into.pyc files using the Python interpreter that it’s currently running on. Similarly, the pure-Python modules in Python’s standard library come pre-compiled with all optimization levels possible.

If you were to remove one of those.pyc files, either from the virtual environment’ssite-packages/ folder or the Python’slib/ folder, then the interpreter would recreate it the next time you imported the corresponding module.

When you import anindividual module from a package, Python will produce the corresponding.pyc file and cache it in the__pycache__ folder located inside that package. It’ll also compile the package’s__init__.py file but won’t touch any other modules or nested subpackages. However, if the imported module itself imports other modules, then those modules will also be compiled, and so on.

Here’s an example demonstrating the most basic case, assuming that you placed a suitableimport statement in thecalculator.py script below:

project/│├── arithmetic/│   ││   ├── __pycache__/│   │   ├── __init__.cpython-312.pyc│   │   └── add.cpython-312.pyc│   ││   ├── __init__.py│   ├── add.py│   └── sub.py│└── calculator.py

After running the script for the first time, Python will ensure that a__pycache__ folder exists in thearithmetic package. Then, it’ll compile the package’s__init__.py along with any modules imported from that package. In this case, you only requested to import thearithmetic.add module, so you can see a.pyc file associated withadd.py but not withsub.py.

All of the followingimport statements would yield the same result depicted above:

Python
importarithmetic.addfromarithmeticimportaddfromarithmetic.addimportadd_function

Regardless of how you import a Python module and whether you import it in its entirety or just a specific symbol, such as aclass or aconstant, the interpreter compiles the whole module, as it can’t read modules partially.

Conversely, importing a wholepackage would normally cause Python to compile only__init__.py in that package. However, it’s fairly common for packages to expose their internal modules or subpackages from within__init__.py for more convenient access. For instance, consider this:

Pythonarithmetic/__init__.py
fromarithmeticimportaddfromarithmetic.subimportsub_function

Imports within__init__.py like these create additional.pyc files, even when you use the plainimport arithmetic statement in your script.

If you imported asubpackage or a deeply nested module or symbol, then all intermediate packages leading up to the top-level package would also have their__init__.py files compiled and placed in their respective cache folders. However, Python won’t go in the other direction byrecursively scanning the nested subpackages, as that would be unnecessary. It’ll only compile the modules you really need by importing them explicitly or indirectly.

Note: The mere act of importing a package or module triggers the compilation. You can have unused import statements in your code, and Python will still compile them!

What if Python has already compiled your module into a.pyc file, but you decide to modify its source code in the original.py file? You’ll find out next!

What Actions Invalidate the Cache?

Running an obsolete bytecode could cause an error or, worse, it could lead to completely unpredictable behavior. Fortunately, Python is clever enough to detect when you modify the source code of a compiled module and will recompile it as necessary.

To determine whether a module needs recompiling, Python uses one of twocache invalidation strategies:

  1. Timestamp-based
  2. Hash-based

The first one compares thesource file’s size and itslast-modified timestamp to the metadata stored in the corresponding.pyc file. Later, you’ll learn how these values are persisted in the.pyc file, along with other metadata.

In contrast, the second strategy calculates thehash value of the source file and checks it against a special field in the file’s header (PEP 552), which was introduced inPython 3.7. This strategy is more secure and deterministic but also a bit slower. That’s why the timestamp-based strategy remains the default for now.

When you artificially update the modification time (mtime) of the source file, for example, by using thetouch command on macOS or Linux, you’ll force Python to compile the module again:

Shell
$tree-D--dirsfirst[Apr 26 09:48]  .├── [Apr 26 09:48]  __pycache__│     └── [Apr 26 09:48]  arithmetic.cpython-312.pyc├── [Apr 26 09:48]  arithmetic.py└── [Apr 26 09:48]  calculator.py2 directories, 3 files$toucharithmetic.py$pythoncalculator.py$tree-D--dirsfirst[Apr 26 09:48]  .├── [Apr 26 09:52]  __pycache__│     └── [Apr 26 09:52]  arithmetic.cpython-312.pyc├── [Apr 26 09:52]  arithmetic.py└── [Apr 26 09:48]  calculator.py2 directories, 3 files

Initially, the cachedarithmetic.cpython-312.pyc file was last modified at 09:48 am. After touching the source file,arithmetic.py, Python deems the compiled bytecode outdated and recompiles the module when you run the script that importsarithmetic. This results in a new.pyc file with an updated timestamp of 09:52 am.

To producehash-based.pyc files, you must use Python’scompileall module with the--invalidation-mode option set accordingly. For instance, this command will compile all modules in the current folder and subfolders into the so-calledchecked hash-based variant:

Shell
$python-mcompileall--invalidation-modechecked-hash

The official documentation explains the difference betweenchecked andunchecked variants of the hash-based.pyc files as follows:

For checked hash-based.pyc files, Python validates the cache file by hashing the source file and comparing the resulting hash with the hash in the cache file. If a checked hash-based cache file is found to be invalid, Python regenerates it and writes a new checked hash-based cache file. For unchecked hash-based.pyc files, Python simply assumes the cache file is valid if it exists. (Source)

Nevertheless, you can always override the default validation behavior of hash-based.pyc files with the--check-hash-based-pycs option when you run the Python interpreter.

Knowing when and where Python creates the__pycache__ folders, as well as when it updates their content, will give you an idea about whether it’s safe to remove them.

Is It Safe to Remove a Cache Folder?

Yes, although you should ask yourself whether you really should! At this point, you understand that removing a__pycache__ folder is harmless because Python regenerates the cache on each invocation. Anyway, removing the individual cache folders by hand is a tedious job. Besides, it’ll only last until the next time you run your code.

Note: While you don’t need to remove the cache folders in most cases, it’s good practice to exclude them from aversion control system like Git, for example, by adding an appropriate file pattern to your project’s.gitignore. Moreover, if your code editor hasn’t done it already, then you might want to configure it to hide such folders from the file explorer to avoid distraction.

The good news is that you can automate the removal of those cache folders from your project if you really insist, which you’ll do now.

How to Recursively Remove All Cache Folders?

Okay, you’ve already established that removing the cached bytecode that Python compiles is no big deal. However, the trouble with these__pycache__ folders is that they can appear in multiple subdirectories if you have a complex project structure. Finding and deleting them by hand would be a chore, especially since they rise like a phoenix from the ashes every time you run Python.

Below, you’ll find the platform-specific commands that recursively remove all__pycache__ folders from the current directory and all of its nested subdirectories in one go:

Windows PowerShell
PS>$dirs=Get-ChildItem-Path.-Filter__pycache__-Recurse-DirectoryPS>$dirs|Remove-Item-Recurse-Force
Shell
$find.-typed-name__pycache__-execrm-rf{}+

You should exercise caution when running bulk delete commands, as they can remove more than you intended if you’re not careful. Always double-check the paths and filter criteria before you execute such commands!

Deleting the__pycache__ folders can declutter your project’s workspace, but only provisionally. If you’re still annoyed by having to repeatedly run the recursive delete command, then you may prefer to take control of the cache folder handling in the first place. Next, you’ll explore two approaches to tackling this issue.

How to Prevent Python From Creating Cache Folders?

If you don’t want Python to cache the compiled bytecode, then you can pass the-B option to thepython command when running a script. This will prevent the__pycache__ folders from appearing unless they already exist. That said, Python will continue to leverage any.pyc files that it can find in existing cache folders. It just won’t write new files to disk.

For a more permanent and global effect that spans across multiple Python interpreters, you can set thePYTHONDONTWRITEBYTECODE environment variable in yourshell or its configuration file:

Shell
exportPYTHONDONTWRITEBYTECODE=1

It’ll affect any Python interpreter, including one within a virtual environment that you’ve activated.

Still, you should think carefully about whether suppressing bytecode compilation is the right approach for your use case. The alternative is to tell Python to create the individual__pycache__ folders in a single shared location on your file system.

How to Store the Cache in a Centralized Folder?

When you disable the bytecode compilation altogether, you gain a cleaner workspace but lose the benefits of caching for faster load times. If you’d like to combine the best of both worlds, then you can instruct Python to write the.pyc files into a parallel tree rooted at the specified directory using the-X pycache_prefix option:

Shell
$python-Xpycache_prefix=/tmp/pycachecalculator.py

In this case, you tell Python to cache the compiled bytecode in atemporary folder located at/tmp/pycache on your file system. When you run this command, Python won’t try to create local__pycache__ folders in your project anymore. Instead, it’ll mirror the directory structure of your project under the indicated root folder and store all.pyc files there:

tmp/└── pycache/    └── home/        └── user/            │            ├── other_project/            │   └── solver.cpython-312.pyc            │            └── project/                │                └── mathematics/                    │                    ├── arithmetic/                    │   ├── __init__.cpython-312.pyc                    │   ├── add.cpython-312.pyc                    │   └── sub.cpython-312.pyc                    │                    ├── geometry/                    │   ├── __init__.cpython-312.pyc                    │   └── shapes.cpython-312.pyc                    │                    └── __init__.cpython-312.pyc

Notice two things here. First, because the cache directory is kept separate from your source code, there’s no need to nest the compiled.pyc files inside the__pycache__ folders. Secondly, since the hierarchy within such a centralized cache matches your project’s structure, you can share this cache folder between multiple projects.

Other advantages of this setup include aneasier cleanup, as you can remove all.pyc files belonging to the same project with a single keystroke without having to manually traverse all the directories. Additionally, you can store the cache folder on a separatephysical disk to take advantage of parallel reads, or keep the cache in apersistent volume when working withDocker containers.

Remember that you must use the-X pycache_prefix option every time you run thepython command for this to work consistently. As an alternative, you can set the path to a shared cache folder through thePYTHONPYCACHEPREFIX environment variable:

Shell
exportPYTHONPYCACHEPREFIX=/tmp/pycache

Either way, you can programmatically verify whether Python will use the specified cache directory or fall back to the default behavior and create local__pycache__ folders:

Python
>>>importsys>>>sys.pycache_prefix'/tmp/pycache'

Thesys.pycache_prefix variable can be either astring orNone.

You’ve come a long way through this tutorial, and now you know a thing or two about dealing with__pycache__ folders in your Python projects. It’s finally time to see how you can work directly with the.pyc files stored in those folders.

What’s Inside a Cached.pyc File?

A.pyc file consists of aheader with metadata followed by theserializedcode object to be executed at runtime. The file’s header begins with amagic number that uniquely identifies the specific Python version the bytecode was compiled for. Next, there’s abit field defined inPEP 552, which determines one of threecache invalidation strategies explained earlier.

Intimestamp-based.pyc files, the bit field is filled with zeros and followed by two four-byte fields. These fields correspond to theUnix time of the last modification and the size of the source.py file, respectively:

OffsetField SizeFieldDescription
04Magic numberIdentifies the Python version
44Bit fieldFilled with zeros
84TimestampThe time of.py file’s modification
124File sizeConcerns the source.py file

Conversely, forhash-based.pyc files, the bit field can be equal to either one, indicating anunchecked variant, or three, meaning thechecked variant. Then, instead of the timestamp and file size, there’s only one eight-byte field with thehash value of the Python source code:

OffsetField SizeFieldDescription
04Magic numberIdentifies the Python version
44Bit fieldEquals 1 (unchecked) or 3 (checked)
88Hash valueSource code’s hash value

In both cases, the header is sixteen bytes long, which you can skip if you’re not interested in reading the encoded metadata. By doing so, you’ll jump straight to thecode object serialized with themarshal module, which occupies the remaining part of the.pyc file.

With this information, you can X-ray one of your compiled.pyc files and directly execute the underlying bytecode, even if you no longer have the original.py file with the associated source code.

How to Read and Execute the Cached Bytecode?

By itself, Python will only run a companion.pyc file if the original.py file still exists. If you remove the source module after it’s already been compiled, then Python will refuse to run the.pyc file. That’s by design. However, you can always run the bytecode by hand if you want to.

Note: If you’re reading this in the distant future, then there’s a chance that the underlying.pyc file format or the standard-library API has changed. So, the script you’re about to see may need some tweaks before it works as intended with the current Python release. Alternatively, you can run it with Python 3.12 to ensure compatibility.

The following Python script shows how you can read the.pyc file’s header, as well as how to deserialize and execute the bytecode that comes with it:

Pythonxray.py
 1importmarshal 2fromdatetimeimportdatetime,timezone 3fromimportlib.utilimportMAGIC_NUMBER 4frompathlibimportPath 5frompprintimportpp 6frompy_compileimportPycInvalidationMode 7fromsysimportargv 8fromtypesimportSimpleNamespace 910defmain(path):11metadata,code=load_pyc(path)12pp(vars(metadata))13ifmetadata.magic_number==MAGIC_NUMBER:14exec(code,globals())15else:16print("Bytecode incompatible with this interpreter")1718defload_pyc(path):19withPath(path).open(mode="rb")asfile:20return(21parse_header(file.read(16)),22marshal.loads(file.read()),23)2425defparse_header(header):26metadata=SimpleNamespace()27metadata.magic_number=header[0:4]28metadata.magic_int=int.from_bytes(header[0:4][:2],"little")29metadata.python_version=f"3.{(metadata.magic_int-2900)//50}"30metadata.bit_field=int.from_bytes(header[4:8],"little")31metadata.pyc_type={320:PycInvalidationMode.TIMESTAMP,331:PycInvalidationMode.UNCHECKED_HASH,343:PycInvalidationMode.CHECKED_HASH,35}.get(metadata.bit_field)36ifmetadata.pyc_typeisPycInvalidationMode.TIMESTAMP:37metadata.timestamp=datetime.fromtimestamp(38int.from_bytes(header[8:12],"little"),39timezone.utc,40)41metadata.file_size=int.from_bytes(header[12:16],"little")42else:43metadata.hash_value=header[8:16]44returnmetadata4546if__name__=="__main__":47main(argv[1])

There are several things going on here, so you can break them down line by line:

  • Line 11 receives atuple comprising the metadata parsed from the file’s header and a deserialized code object ready to execute. Both are loaded from a.pyc file specified as the only requiredcommand-line argument.
  • Line 12pretty prints the decoded metadata onto the screen.
  • Lines 13 to 16 conditionally execute the bytecode from the.pyc file usingexec() orprint an error message. To determine whether the file was compiled for the current interpreter version, this code fragment compares the magic number obtained from the header to the interpreter’s magic number. If everything succeeds, the loaded module’s symbols get imported intoglobals().
  • Lines 18 to 23 open the.pyc file in binary mode using thepathlib module, parse the header, and unmarshal the code object.
  • Lines 25 to 44 parse the header fields using their corresponding offsets and byte sizes, interpreting multi-byte values with thelittle-endian byte order.
  • Lines 28 and 29 extract the Python version from the magic number, which increments with eachminor release of Python according to the formula2900 + 50n, where n is the minor release of Python 3.11 or later.
  • Lines 31 to 35 determine the type of.pyc file based on the preceding bit field (PEP 552).
  • Lines 37 to 40 convert the source file’s modification time into adatetime object in theUTC timezone.

You can run the X-ray script outlined above against a.pyc file of your choice. When you enable Python’sinteractive mode (-i), you’ll be able to inspect the variables and the state of the program after it terminates:

Shell
$python-ixray.py__pycache__/arithmetic.cpython-312.pyc{'magic_number': b'\xcb\r\r\n', 'magic_int': 3531, 'python_version': '3.12', 'bit_field': 0, 'pyc_type': <PycInvalidationMode.TIMESTAMP: 1>, 'timestamp': datetime.datetime(2024, 4, 26, 17, 34, 57, tzinfo=….utc), 'file_size': 32}>>> add(3, 4)7

The script prints out the decoded header fields, including the magic number and the source file’s modification time. Right after that, due to thepython -i option, you’re dropped into the interactivePython REPL where you calladd(), which was imported into the global namespace by executing the module’s bytecode.

This works as expected because the Python interpreter that you’re currently running happens to match the module’s bytecode version. Here’s what would happen if you tried to run another.pyc file targeting a different Python version or one of itsalternative implementations:

Shell
$python-ixray.py__pycache__/arithmetic.cpython-311.pyc{'magic_number': b'\xa7\r\r\n', 'magic_int': 3495, 'python_version': '3.11', 'bit_field': 0, 'pyc_type': <PycInvalidationMode.TIMESTAMP: 1>, 'timestamp': datetime.datetime(2024, 4, 25, 14, 40, 26, tzinfo=….utc), 'file_size': 32}Bytecode incompatible with this interpreter>>> add(3, 4)Traceback (most recent call last):  ...NameError: name 'add' is not defined

This time, the output shows a message indicating a mismatch between the bytecode version in the.pyc file and the interpreter being used. As a result, the bytecode wasn’t executed and theadd() function wasn’t defined, so you can’t call it.

Now, if you X-ray a hash-based.pyc file (checked or unchecked), then this is what you might get:

Shell
$pythonxray.py__pycache__/arithmetic.cpython-312.pyc{'magic_number': b'\xcb\r\r\n', 'magic_int': 3531, 'python_version': '3.12', 'bit_field': 3, 'pyc_type': <PycInvalidationMode.CHECKED_HASH: 2>, 'hash_value': b'\xf3\xdd\x87j\x8d>\x0e)'}

Python can compare the hash value embedded in the.pyc file with one that it calculates from the associated.py file by callingsource_hash() on the source code:

Python
>>>fromimportlib.utilimportsource_hash>>>frompathlibimportPath>>>source_hash(Path("arithmetic.py").read_bytes())b'\xf3\xdd\x87j\x8d>\x0e)'

This is a more reliable method of cache invalidation and verifying code integrity than comparing a volatile last-modified attribute of the source file. Notice how the computed hash value agrees with the one read from the.pyc file.

Now that you know how to import Python modules from a compiled binary form, it might feel tempting to distribute your commercial Python programs without sharing the source code.

Can Bytecode Obfuscate Python Programs?

Earlier, you learned that Python won’t import a module from a.pyc file if the associated.py file can’t be found. However, there’s one notable exception since that’s exactly what happens when youimport Python code from a ZIP file specified on thePYTHONPATH. Such archives typically contain only the compiled.pyc files without the accompanying source code.

The ability to import compiled modules, either by hand or through these ZIP files, lets you implement a rudimentarycode obfuscation mechanism. Unfortunately, it wouldn’t be particularly bulletproof since more tech-savvy users could try todecompile your.pyc files back into high-level Python code using specialized tools likeuncompyle6 orpycdc.

But even without those external tools, you candisassemble Python’s bytecode into human-readableopcodes, making the analysis andreverse-engineering of your programs fairly accessible. The proper way to conceal Python source code is by compiling it to machine code. For example, you can help yourself with tools likeCython or rewrite the core parts of your code in a lower-level programming language likeC,C++, orRust.

How to Disassemble the Cached Bytecode?

Once you get a handle on a code object in Python, you can use thedis module from the standard library to disassemble the compiled bytecode. For the sake of example, you’ll quickly generate a code object yourself without having to rely on the.pyc files cached by Python:

Python
>>>frompathlibimportPath>>>source_code=Path("arithmetic.py").read_text(encoding="utf-8")>>>module=compile(source_code,"arithmetic.py",mode="exec")>>>module<code object <module> at 0x7d09a9c92f50, file "arithmetic.py", line 1>

You call the built-incompile() function with the"exec" mode as a parameter to compile a Python module. Now, you can display the human-readable opcode names of the resulting code object usingdis:

Python
>>>fromdisimportdis>>>dis(module)  0           0 RESUME                   0  1           2 LOAD_CONST               0 (<code object add at 0x7d...>)              4 MAKE_FUNCTION            0              6 STORE_NAME               0 (add)              8 RETURN_CONST             1 (None)Disassembly of <code object add at 0x7d...>:  1           0 RESUME                   0  2           2 LOAD_FAST                0 (a)              4 LOAD_FAST                1 (b)              6 BINARY_OP                0 (+)             10 RETURN_VALUE

The opcodesMAKE_FUNCTION andSTORE_NAME tell you that there’s a function namedadd() in this bytecode. When you look closely at the disassembled code object of that function, you’ll see that it takes two arguments calleda andb, adds them using the binary plus operator (+), and returns the calculated value.

Alternatively, you could walk through the tree of opcodes and iterate over the individual instructions, to try to reconstruct the high-level Python source code:

Python
>>>fromdisimportBytecode>>>fromtypesimportCodeType>>>deftraverse(code):...print(code.co_name,code.co_varnames)...forinstructioninBytecode(code):...ifisinstance(instruction.argval,CodeType):...traverse(instruction.argval)...>>>traverse(module)<module> ()add ('a', 'b')

This code snippet is far from being complete. Plus, decompiling is a complicated process in general. It often leads to imperfect results, as some information gets irreversibly lost when certain optimizations are applied during the bytecode compilation. Anyway, it should give you a rough idea of how decompilers work.

Conclusion

In this tutorial, you’ve dived into the inner workings of Python’s bytecode caching mechanism. You now understand that caching is all about the modules you import. By storing compiled bytecode in__pycache__ folders, Python avoids the overhead of recompiling modules on every program run, leading to faster startup times.

Now you understand what triggers the creation of cache folders, how to suppress them, and how to move them to a centralized folder on your file system. Along the way, you built a utility tool to let you read and execute the individual.pyc files from a__pycache__ folder.

Get Your Code:Click here to download the free sample code that shows you how to work with thepycache folder in Python.

Take the Quiz: Test your knowledge with our interactive “What Is the __pycache__ Folder in Python?” quiz. You’ll receive a score upon completion to help you track your learning progress:


What Is the __pycache__ Folder in Python?

Interactive Quiz

What Is the __pycache__ Folder in Python?

In this quiz, you'll have the opportunity to test your knowledge of the __pycache__ folder, including when, where, and why Python creates these folders.

🐍 Python Tricks 💌

Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

AboutBartosz Zaczyński

Bartosz is an experienced software engineer and Python educator with an M.Sc. in Applied Computer Science.

» More about Bartosz

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

MasterReal-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

MasterReal-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.


Looking for a real-time conversation? Visit theReal Python Community Chat or join the next“Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics:intermediatepython

Related Tutorials:

Keep reading Real Python by creating a free account or signing in:

Already have an account?Sign-In

Almost there! Complete this form and click the button below to gain instant access:

What Is the __pycache__ Folder in Python?

What Is the __pycache__ Folder in Python? (Sample Code)

🔒 No spam. We take your privacy seriously.


[8]ページ先頭

©2009-2026 Movatter.jp