Recommended Video Course
Advanced Python import Techniques
Table of Contents
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Advanced Python import Techniques
In Python, you use theimport
keyword to make code in onemodule available in another. Imports in Python are important forstructuring your code effectively. Using imports properly will make you more productive, allowing you to reuse code while keeping your projects maintainable.
This tutorial will provide a thorough overview of Python’simport
statement and how it works. The import system is powerful, and you’ll learn how to harness this power. While you’ll cover many of the concepts behind Python’s import system, this tutorial is mostly example driven. You’ll learn from several code examples throughout.
In this tutorial, you’ll learn how to:
Throughout the tutorial, you’ll see examples of how to play with the Python import machinery in order to work most efficiently. While all the code is shown in the tutorial, you can also download it by clicking the box below:
Get the Source Code:Click here to get the source code you’ll use to learn about the Python import system in this tutorial.
Take the Quiz: Test your knowledge with our interactive “Python import: Advanced Techniques and Tips” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python import: Advanced Techniques and TipsIn this quiz, you'll test your understanding of Python's import statement and how it works. You'll revisit your understanding of how to use modules and how to import modules dynamically at runtime.
import
Python code is organized into bothmodules and packages. This section will explain how they differ and how you can work with them.
Later in the tutorial, you’ll see some advanced and lesser-known uses of Python’s import system. However, let’s get started with the basics: importing modules and packages.
ThePython.org glossary definesmodule as follows:
An object that serves as an organizational unit of Python code. Modules have a namespace containing arbitrary Python objects. Modules are loaded into Python by the process of importing. (Source)
In practice, a module usually corresponds to one.py
file containing Python code.
The true power of modules is that they can be imported and reused in other code. Consider the following example:
>>>importmath>>>math.pi3.141592653589793
In the first line,import math
, you import the code in themath
module and make it available to use. In the second line, you access thepi
variable within themath
module.math
is part ofPython’s standard library, which means that it’s always available to import when you’re running Python.
Note that you writemath.pi
and not just simplypi
. In addition to being a module,math
acts as anamespace that keeps all the attributes of the module together. Namespaces are useful for keeping your code readable and organized. In the words of Tim Peters:
Namespaces are one honking great idea—let’s do more of those! (Source)
You can list the contents of a namespace withdir()
:
>>>importmath>>>dir()['__annotations__', '__builtins__', ..., 'math']>>>dir(math)['__doc__', ..., 'nan', 'pi', 'pow', ...]
Usingdir()
without any argument shows what’s in the global namespace. To see the contents of themath
namespace, you usedir(math)
.
You’ve already seen the most straightforward use ofimport
. However, there are other ways to use it that allow you to import specific parts of a module and to rename the module as you import it.
The following code imports only thepi
variable from themath
module:
>>>frommathimportpi>>>pi3.141592653589793>>>math.piNameError: name 'math' is not defined
Note that this placespi
in the global namespace and not within amath
namespace.
You can also rename modules and attributes as they’re imported:
>>>importmathasm>>>m.pi3.141592653589793>>>frommathimportpiasPI>>>PI3.141592653589793
For more details about the syntax for importing modules, check outPython Modules and Packages – An Introduction.
You can use a package to further organize your modules. The Python.org glossary definespackage as follows:
A Python module which can contain submodules or recursively, subpackages. Technically, a package is a Python module with an
__path__
attribute. (Source)
Note that a package is still a module. As a user, you usually don’t need to worry about whether you’re importing a module or a package.
In practice, a package typically corresponds to a file directory containing Python files and other directories. To create a Python package yourself, you create a directory and afile named__init__.py
inside it. The__init__.py
file contains the contents of the package when it’s treated as a module. It can be left empty.
Note: Directories without an__init__.py
file are still treated as packages by Python. However, these won’t be regular packages, but something callednamespace packages. You’ll learn more about themlater.
In general, submodules and subpackages aren’t imported when you import a package. However, you can use__init__.py
to include any or all submodules and subpackages if you want. To show a few examples of this behavior, you’ll create a package for sayingHello world
in a few different languages. The package will consist of the following directories and files:
world/│├── africa/│ ├── __init__.py│ └── zimbabwe.py│├── europe/│ ├── __init__.py│ ├── greece.py│ ├── norway.py│ └── spain.py│└── __init__.py
Each country fileprints out a greeting, while the__init__.py
files selectively import some of the subpackages and submodules. The exact contents of the files are as follows:
# world/africa/__init__.py (Empty file)# world/africa/zimbabwe.pyprint("Shona: Mhoroyi vhanu vese")print("Ndebele: Sabona mhlaba")# world/europe/__init__.pyfrom.importgreecefrom.importnorway# world/europe/greece.pyprint("Greek: Γειά σας Κόσμε")# world/europe/norway.pyprint("Norwegian: Hei verden")# world/europe/spain.pyprint("Castellano: Hola mundo")# world/__init__.pyfrom.importafrica
Note thatworld/__init__.py
imports onlyafrica
and noteurope
. Similarly,world/africa/__init__.py
doesn’t import anything, whileworld/europe/__init__.py
importsgreece
andnorway
but notspain
. Each country module will print a greeting when it’s imported.
Let’s play with theworld
package at the interactive prompt to get a better understanding of how the subpackages and submodules behave:
>>>importworld>>>world<module 'world' from 'world/__init__.py'>>>># The africa subpackage has been automatically imported>>>world.africa<module 'world.africa' from 'world/africa/__init__.py'>>>># The europe subpackage has not been imported>>>world.europeAttributeError: module 'world' has no attribute 'europe'
Wheneurope
is imported, theeurope.greece
andeurope.norway
modules are imported as well. You can see this because the country modules print a greeting when they’re imported:
>>># Import europe explicitly>>>fromworldimporteuropeGreek: Γειά σας ΚόσμεNorwegian: Hei verden>>># The greece submodule has been automatically imported>>>europe.greece<module 'world.europe.greece' from 'world/europe/greece.py'>>>># Because world is imported, europe is also found in the world namespace>>>world.europe.norway<module 'world.europe.norway' from 'world/europe/norway.py'>>>># The spain submodule has not been imported>>>europe.spainAttributeError: module 'world.europe' has no attribute 'spain'>>># Import spain explicitly inside the world namespace>>>importworld.europe.spainCastellano: Hola mundo>>># Note that spain is also available directly inside the europe namespace>>>europe.spain<module 'world.europe.spain' from 'world/europe/spain.py'>>>># Importing norway doesn't do the import again (no output), but adds>>># norway to the global namespace>>>fromworld.europeimportnorway>>>norway<module 'world.europe.norway' from 'world/europe/norway.py'>
Theworld/africa/__init__.py
file is empty. This means that importing theworld.africa
package creates the namespace but has no other effect:
>>># Even though africa has been imported, zimbabwe has not>>>world.africa.zimbabweAttributeError: module 'world.africa' has no attribute 'zimbabwe'>>># Import zimbabwe explicitly into the global namespace>>>fromworld.africaimportzimbabweShona: Mhoroyi vhanu veseNdebele: Sabona mhlaba>>># The zimbabwe submodule is now available>>>zimbabwe<module 'world.africa.zimbabwe' from 'world/africa/zimbabwe.py'>>>># Note that zimbabwe can also be reached through the africa subpackage>>>world.africa.zimbabwe<module 'world.africa.zimbabwe' from 'world/africa/zimbabwe.py'>
Remember, importing a module both loads the contents and creates a namespace containing the contents. The last few examples show that it’s possible for the same module to be part of different namespaces.
Technical Detail: The module namespace is implemented as aPython dictionary and is available at the.__dict__
attribute:
>>>importmath>>>math.__dict__["pi"]3.141592653589793
You rarely need to interact with.__dict__
directly. To learn more about.__dict__
, you can check out theUsing Python’s.__dict__
to Work With Attributes tutorial.
Similarly, Python’sglobal namespace is also a dictionary. You can access it throughglobals()
.
It’s fairly common to import subpackages and submodules in an__init__.py
file to make them more readily available to your users. You can seeone example of this in the popularrequests
package.
Recall the source code ofworld/__init__.py
in the earlier example:
from.importafrica
You’ve already seenfrom...import
statements such asfrom math import pi
, but what does the dot (.
) infrom . import africa
mean?
The dot refers to the current package, and the statement is an example of arelative import. You can read it as “From the current package, import the subpackageafrica
.”
There’s an equivalentabsolute import statement in which you explicitly name the current package:
fromworldimportafrica
In fact, all imports inworld
could have been done explicitly with similar absolute imports.
Relative imports must be in the formfrom...import
, and the location you’re importing from must start with a dot.
ThePEP 8 style guide recommends using absolute imports in general. However, relative imports are an alternative for organizing package hierarchies. For more information, seeAbsolute vs Relative Imports in Python.
How does Python find the modules and packages it imports? You’ll see more details about the mechanics of the Python import systemlater. For now, just know that Python looks for modules and packages in itsimport path. This is a list of locations that are searched for modules to import.
Note: When you typeimport something
, Python will look forsomething
a few different places before searching the import path.
In particular, it’ll look in a module cache to see ifsomething
has already been imported, and it’ll search among the built-in modules.
You’ll learn more about the full Python import machinery in alater section.
You can inspect Python’s import path by printingsys.path
. Broadly speaking, this list will contain three different kinds of locations:
PYTHONPATH
environment variableTypically, Python will start at the beginning of the list of locations and look for a given module in each location until the first match. Since the script directory or the current directory is always first in this list, you can make sure that your scripts find your self-made modules and packages by organizing your directories and being careful about which directory you run Python from.
However, you should also be careful that you don’t create modules thatshadow, or hide, other important modules. As an example, say that you define the followingmath
module:
# math.pydefdouble(number):return2*number
Using this module works as expected:
>>>importmath>>>math.double(3.14)6.28
But this module also shadows themath
module that’s included in the standard library. Unfortunately, that means our earlier example of looking up the value of π no longer works:
>>>importmath>>>math.piTraceback (most recent call last): File"<stdin>", line1, in<module>AttributeError:module 'math' has no attribute 'pi'>>>math<module 'math' from 'math.py'>
The problem is that Python now searches your newmath
module forpi
instead of searching themath
module in the standard library.
To avoid these kinds of issues, you should be careful with the names of your modules and packages. In particular, your top-level module and package names should be unique. Ifmath
is defined as a submodule within a package, then it won’t shadow the built-in module.
While it’s possible to organize your imports by using the current directory as well as by manipulatingPYTHONPATH
and evensys.path
, the process is often unruly and prone to errors. To see a typical example, consider the following application:
structure/│├── files.py└── structure.py
The app will re-create a given file structure by creating directories and empty files. Thestructure.py
file contains the main script, andfiles.py
is a library module with a few functions for dealing with files. The following is an example of output from the app, in this case by running it in thestructure
directory:
$pythonstructure.py.Create file: /home/gahjelle/structure/001/structure.pyCreate file: /home/gahjelle/structure/001/files.pyCreate file: /home/gahjelle/structure/001/__pycache__/files.cpython-38.pyc
The two source code files as well as the automatically created.pyc
file are re-created inside a new directory named001
.
Now take a look at the source code. The main functionality of the app is defined instructure.py
:
1# structure/structure.py 2 3# Standard library imports 4importpathlib 5importsys 6 7# Local imports 8importfiles 910defmain():11# Read path from command line12try:13root=pathlib.Path(sys.argv[1]).resolve()14exceptIndexError:15print("Need one argument: the root of the original file tree")16raiseSystemExit()1718# Re-create the file structure19new_root=files.unique_path(pathlib.Path.cwd(),"{:03d}")20forpathinroot.rglob("*"):21ifpath.is_file()andnew_rootnotinpath.parents:22rel_path=path.relative_to(root)23files.add_empty_file(new_root/rel_path)2425if__name__=="__main__":26main()
Inlines 12 to 16, you read a root path from the command line. In the above example you use a dot, which means the current directory. This path will be used as theroot
of the file hierarchy that you’ll re-create.
The actual work happens inlines 19 to 23. First, you create a unique path,new_root
, that will be the root of your new file hierarchy. Then you loop through all paths below the originalroot
and re-create them as empty files inside the new file hierarchy.
For manipulating paths like this,pathlib
in the standard library is quite useful. For more details on how it’s used, check outPython’spathlib
Module: Taming the File System.
Online 26, you callmain()
. You’ll learn more about theif
test online 25later. For now, you should know that the special variable__name__
has the value__main__
inside scripts, but it gets the name of the module inside imported modules. For more information on__name__
, check outDefining Main Functions in Python andWhat Does ifname == “main” Do in Python?.
Note that you importfiles
online 8. This library module contains two utility functions:
# structure/files.pydefunique_path(directory,name_pattern):"""Find a path name that does not already exist"""counter=0whileTrue:counter+=1path=directory/name_pattern.format(counter)ifnotpath.exists():returnpathdefadd_empty_file(path):"""Create an empty file at the given path"""print(f"Create file:{path}")path.parent.mkdir(parents=True,exist_ok=True)path.touch()
unique_path()
uses a counter to find a path that doesn’t already exist. In the app, you use it to find a unique subdirectory to use as thenew_root
of the re-created file hierarchy. Next,add_empty_file()
makes sure all necessary directories are created before creating an empty file using.touch()
.
Have a look at the import offiles
again:
7# Local imports 8importfiles
It looks quite innocent. However, as the project grows, this line will cause you some headaches. Even though you importfiles
from thestructure
project, the import isabsolute: it doesn’t start with a dot. This means thatfiles
must be found in the import path for the import to work.
Luckily, the directory containing the current script is always in Python’s import path, so this works fine for now. However, if your project gains some traction, then it may be used in other ways.
For example, someone might want to import the script into aJupyter Notebook and run it from there. Or they may want to reuse thefiles
library in another project. They may evencreate an executable with PyInstaller to more easily distribute it. Unfortunately, any of these scenarios can create issues with the import offiles
.
To see an example, you can follow the PyInstaller guide andcreate an entry point to your application. Add an extra directory outside your application directory:
structure/│├── structure/│ ├── files.py│ └── structure.py│└── cli.py
In the outer directory, create the entry point script,cli.py
:
# cli.pyfromstructure.structureimportmainif__name__=="__main__":main()
This script will importmain()
from your original script and run it. Note thatmain()
isn’t run whenstructure
is imported because of theif
test online 25 instructure.py
. That means you need to runmain()
explicitly.
In theory, this should work similarly to running the app directly:
$pythoncli.pystructureTraceback (most recent call last): File "cli.py", line 1, in <module> from structure.structure import main File "/home/gahjelle/structure/structure/structure.py", line 8, in <module> import filesModuleNotFoundError: No module named 'files'
Why didn’t that work? Suddenly, the import offiles
raises an error.
The problem is that by starting the app withcli.py
, you’ve changed the location of the current script, which in turn changes the import path.files
is no longer on the import path, so it can’t be imported absolutely.
One possible solution is to change Python’s import path:
7# Local imports 8sys.path.insert(0,str(pathlib.Path(__file__).parent)) 9importfiles
This works because the import path includes the folder containingstructure.py
andfiles.py
. The issue with this approach is that your import path can get very messy and hard to understand.
In practice, you’re re-creating a feature of early Python versions calledimplicit relative imports. These were removed from the language byPEP 328 with the following rationale:
In Python 2.4 and earlier, if you’re reading a module located inside a package, it is not clear whether
import foo
refers to a top-level module or to another module inside the package. As Python’s library expands, more and more existing package internal modules suddenly shadow standard library modules by accident. It’s a particularly difficult problem inside packages because there’s no way to specify which module is meant. (Source)
Another solution is to use a relative import instead. Change the import instructure.py
as follows:
7# Local imports 8from.importfiles
You can now start your app through the entry point script:
$pythoncli.pystructureCreate file: /home/gahjelle/structure/001/structure.pyCreate file: /home/gahjelle/structure/001/files.pyCreate file: /home/gahjelle/structure/001/__pycache__/structure.cpython-38.pycCreate file: /home/gahjelle/structure/001/__pycache__/files.cpython-38.pyc
Unfortunately, you can no longer call the app directly:
$pythonstructure.py.Traceback (most recent call last): File "structure.py", line 8, in <module> from . import filesImportError: cannot import name 'files' from '__main__' (structure.py)
The problem is thatrelative imports are resolved differently in scripts than are imported modules. Of course, you could go back and restore the absolute import before running the script directly, or you could even do sometry...except
acrobatics to import files absolutely or relatively depending on what works.
There’s even anofficially sanctioned hack to make relative imports work in scripts. Unfortunately, this also forces you to changesys.path
in most cases. To quoteRaymond Hettinger:
There must be a better way! (Source)
Indeed, a better—and more stable—solution is to play along with Python’s import and packaging system and install your project as a local packageusingpip
.
When you install a package fromPyPI, that package is available to all scripts in your environment. However, you can also install packages from your local computer, and they’ll also be made available in the same way.
Creating a local package doesn’t involve much overhead. First, create minimalsetup.cfg
andsetup.py
files in the outerstructure
directory:
# setup.cfg[metadata]name=local_structureversion=0.1.0[options]packages=structure# setup.pyimportsetuptoolssetuptools.setup()
In theory, thename
andversion
can be whatever you like. However, they’ll be used bypip
when referring to your package, so you should choose values that are recognizable and don’t collide with other packages you use.
One tip is to give all such local packages a common prefix likelocal_
or your username.packages
should list the directory or directories containing your source code. You can then install the package locally usingpip
:
$python-mpipinstall-e.
This command will install the package to your system.structure
will then be found on Python’s import path, meaning you can use it anywhere without having to worry about the script directory, relative imports, or other complications. The-e
option stands foreditable, which is important because it allows you to change the source code of your package without reinstalling it.
Note: This kind of setup file works great when you’re working with projects on your own. However, if you plan to share the code with others, then you should add some more information to your setup file.
For more details on setup files, check outHow to Publish an Open-Source Python Package to PyPI.
Now thatstructure
is installed on your system, you can use the following import statement:
7# Local imports 8fromstructureimportfiles
This will work no matter how you end up calling your application.
Tip: In your own code, you should consciously separate scripts and libraries. Here’s a good rule of thumb:
You might have code that you want to both run on its own and import from other scripts. In that case, it’s usually worthwhile torefactor your code so that you split the common part into a library module.
While it’s a good idea to separate scripts and libraries, all Python files can be both executed and imported. In alater section, you’ll learn more about how to create modules that handle both well.
Python modules and packages are very closely related to files and directories. This sets Python apart from many other programming languages in which packages merely act as namespaces without enforcing how the source code is organized. See the discussion inPEP 402 for examples.
Namespace packages have been available in Python since version 3.3. These are less dependent on the underlying file hierarchy. In particular, namespace packages can be split across multiple directories. A namespace package is created automatically if you have a directory containing a.py
file but no__init__.py
. SeePEP 420 for a detailed explanation.
Note: To be precise,implicit namespace packages were introduced in Python 3.3. In earlier versions of Python, you could create namespace packages manually inseveral different incompatible ways. PEP 420 unifies and simplifies these earlier approaches.
To get a better understanding of why namespace packages can be useful, let’s try to implement one. As a motivating example, you’ll have another go at the problem solved inThe Factory Method Pattern and Its Implementation in Python: given aSong
object, you want to convert it to one of several string representations. In other words, you want toserializeSong
objects.
To be more concrete, you want to implement code that works something like this:
>>>song=Song(song_id="1",title="The Same River",artist="Riverside")>>>song.serialize()'{"id": "1", "title": "The Same River", "artist": "Riverside"}'
Let’s assume that you’re lucky and come across athird-party implementation of several of the formats that you need to serialize to, and it’s organized as a namespace package:
third_party/│└── serializers/ ├── json.py └── xml.py
The filejson.py
contains code that can serialize an object to theJSON format:
# third_party/serializers/json.pyimportjsonclassJsonSerializer:def__init__(self):self._current_object=Nonedefstart_object(self,object_name,object_id):self._current_object=dict(id=object_id)defadd_property(self,name,value):self._current_object[name]=valuedef__str__(self):returnjson.dumps(self._current_object)
This serializer interface is a bit limited, but it’ll be enough to demonstrate how namespace packages work.
The filexml.py
contains a similarXmlSerializer
that can convert an object toXML:
# third_party/serializers/xml.pyimportxml.etree.ElementTreeasetclassXmlSerializer:def__init__(self):self._element=Nonedefstart_object(self,object_name,object_id):self._element=et.Element(object_name,attrib={"id":object_id})defadd_property(self,name,value):prop=et.SubElement(self._element,name)prop.text=valuedef__str__(self):returnet.tostring(self._element,encoding="unicode")
Note that both of these classes implement the same interface with.start_object()
,.add_property()
, and.__str__()
methods.
You then create aSong
class that can use these serializers:
# song.pyclassSong:def__init__(self,song_id,title,artist):self.song_id=song_idself.title=titleself.artist=artistdefserialize(self,serializer):serializer.start_object("song",self.song_id)serializer.add_property("title",self.title)serializer.add_property("artist",self.artist)returnstr(serializer)
ASong
is defined by its ID, title, and artist. Note that.serialize()
doesn’t need to know which format it converts to because it uses the common interface defined earlier.
Assuming that you’ve installed the third-partyserializers
package, you can use it as follows:
>>>fromserializers.jsonimportJsonSerializer>>>fromserializers.xmlimportXmlSerializer>>>fromsongimportSong>>>song=Song(song_id="1",title="The Same River",artist="Riverside")>>>song.serialize(JsonSerializer())'{"id": "1", "title": "The Same River", "artist": "Riverside"}'>>>song.serialize(XmlSerializer())'<song id="1"><title>The Same River</title><artist>Riverside</artist></song>'
By providing different serializer objects to.serialize()
, you get different representations of your song.
Note: You might get aModuleNotFoundError
or anImportError
when running the code yourself. This is becauseserializers
isn’t in yourPython import path. You’ll soon see how to solve that.
So far, so good. However, now you realize that you also need to convert your songs to aYAML representation, which is not supported in the third-party library. Enter the magic of namespace packages: you can add your ownYamlSerializer
to theserializers
package without touching the third-party library.
First, create a directory on your local file system calledserializers
. It’s important that the name of the directory matches the name of the namespace package that you’re customizing:
local/│└── serializers/ └── yaml.py
In theyaml.py
file, you define your ownYamlSerializer
. You base this on thePyYAML
package, which must be installed from PyPI:
$python-mpipinstallPyYAML
Since YAML and JSON are quite similar formats, you can reuse most of the implementation ofJsonSerializer
:
# local/serializers/yaml.pyimportyamlfromserializers.jsonimportJsonSerializerclassYamlSerializer(JsonSerializer):def__str__(self):returnyaml.dump(self._current_object)
Note that theYamlSerializer
is based on theJsonSerializer
, which is imported fromserializers
itself. Since bothjson
andyaml
are part of the same namespace package, you could even use a relative import:from .json import JsonSerializer
.
Continuing the above example, you can now convert the song to YAML as well:
>>>fromserializers.yamlimportYamlSerializer>>>song.serialize(YamlSerializer())"artist: Riverside\nid: '1'\ntitle: The Same River\n"
Just like regular modules and packages, namespace packages must be found on the Python import path. If you were following along with the previous examples, then you might have had issues with Python not findingserializers
. In actual code, you would have usedpip
to install the third-party library, so it would be in your path automatically.
Note: In theoriginal example, the choice of serializer was made more dynamically. You’ll see how to use namespace packages in a properfactory method patternlater.
You should also make sure that your local library is available like a normal package. As explained above, you can do this either by running Python from the proper directory or by usingpip
to install the local library as well.
In this example, you’re testing how to integrate a fake third-party package with your local package. Ifthird_party
were a real package, then you would download it from PyPI usingpip
. Since this isn’t possible, you can simulate it by installingthird_party
locally like you did in thestructure
example earlier.
Alternatively, you can mess with your import path. Put thethird_party
andlocal
directories inside the same folder, then customize your Python path as follows:
>>>importsys>>>sys.path.extend(["third_party","local"])>>>fromserializersimportjson,xml,yaml>>>json<module 'serializers.json' from 'third_party/serializers/json.py'>>>>yaml<module 'serializers.yaml' from 'local/serializers/yaml.py'>
You can now use all serializers without worrying about whether they’re defined in the third-party package or locally.
PEP 8, the Python style guide, has a couple ofrecommendations about imports. As always with Python, keeping your code both readable and maintainable is an important consideration. Here are a few general rules of thumb for how to style your imports:
from module import *
.isort
andreorder-python-imports
are great tools for enforcing a consistent style on your imports.
Here’s an example of an import section inside theReal Python feed reader package:
# Standard library importsimportsysfromtypingimportDict,List# Third party importsimportfeedparserimporthtml2text# Reader importsfromreaderimportURL
Note how this grouping makes thedependencies of this module clear:feedparser
andhtml2text
need to be installed on the system. You can generally assume that the standard library is available. Separating imports from within your package gives you some overview over the internal dependencies of your code.
There are cases in which it makes sense to bend these rules a little. You’ve already seen that relative imports can be an alternative to organizing package hierarchies.Later, you’ll see how in some cases you can move imports into a function definition to break import cycles.
Sometimes you’ll have code that depends on data files or other resources. In small scripts, this isn’t a problem—you can specify the path to your data file and carry on!
However, if the resource file is important for your package and you want to distribute your package to other users, then a few challenges will arise:
You won’t have control over the path to the resource since that will depend on your user’s setup as well as on how the package is distributed and installed. You can try to figure out the resource path based on your package’s__file__
or__path__
attributes, but this may not always work as expected.
Your package mayreside inside a ZIP file or an old.egg
file, in which case the resource won’t even be a physical file on the user’s system.
There have been several attempts at solving these challenges, includingsetuptools.pkg_resources
. However, with the introduction ofimportlib.resources
into the standard library inPython 3.7, there’s now one standard way of dealing with resource files.
importlib.resources
importlib.resources
gives access to resources within packages. In this context, aresource is any file located within an importable package. The file may or may not correspond to a physical file on the file system.
This has a couple of advantages. By reusing the import system, you get a more consistent way of dealing with the files inside your packages. It also gives you easier access to resource files in other packages. The documentation sums it up nicely:
If you can import a package, you can access resources within that package. (Source)
importlib.resources
became part of the standard library in Python 3.7. However, on older versions of Python, abackport is available asimportlib_resources
. To use the backport, install it fromPyPI:
$python-mpipinstallimportlib_resources
The backport is compatible with Python 2.7 as well as Python 3.4 and later versions.
There’s one requirement when usingimportlib.resources
: your resource files must be available inside a regular package. Namespace packages aren’t supported. In practice, this means that the file must be in a directory containing an__init__.py
file.
As a first example, assume you haveresources inside a package like this:
books/│├── __init__.py├── alice_in_wonderland.png└── alice_in_wonderland.txt
__init__.py
is just an empty file necessary to designatebooks
as a regular package.
You can then useopen_text()
andopen_binary()
to open text and binary files, respectively:
>>>fromimportlibimportresources>>>withresources.open_text("books","alice_in_wonderland.txt")asfid:...alice=fid.readlines()...>>>print("".join(alice[:7]))CHAPTER I. Down the Rabbit-HoleAlice was beginning to get very tired of sitting by her sister on thebank, and of having nothing to do: once or twice she had peeped into thebook her sister was reading, but it had no pictures or conversations init, 'and what is the use of a book,' thought Alice 'without pictures orconversations?'>>>withresources.open_binary("books","alice_in_wonderland.png")asfid:...cover=fid.read()...>>>cover[:8]# PNG file signatureb'\x89PNG\r\n\x1a\n'
open_text()
andopen_binary()
are equivalent to the built-inopen()
with themode
parameter set tort
andrb
, respectively. Convenient functions for reading text or binary files directly are also available asread_text()
andread_binary()
. See theofficial documentation for more information.
Note: To seamlessly fall back to using the backport on older Python versions, you can importimportlib.resources
as follows:
try:fromimportlibimportresourcesexceptModuleNotFoundError:importimportlib_resourcesasresources
See thetips and tricks section of this tutorial for more information.
The rest of this section will show a few elaborate examples of using resource files in practice.
As a more complete example of using data files, you’ll see how to implement a quiz program based onUnited Nations population data. First, create adata
package and downloadWPP2019_TotalPopulationBySex.csv
fromthe UN web page:
data/│├── __init__.py└── WPP2019_TotalPopulationBySex.csv
Open the CSV file and have a look at the data:
LocID,Location,VarID,Variant,Time,PopMale,PopFemale,PopTotal,PopDensity4,Afghanistan,2,Medium,1950,4099.243,3652.874,7752.117,11.8744,Afghanistan,2,Medium,1951,4134.756,3705.395,7840.151,12.0094,Afghanistan,2,Medium,1952,4174.45,3761.546,7935.996,12.1564,Afghanistan,2,Medium,1953,4218.336,3821.348,8039.684,12.315...
Each line contains the population of a country for a given year and a given variant, which indicates what kind of scenario is used for the projection. The file contains population projections until the year 2100.
The following function reads this file and picks out the total population of each country for a givenyear
andvariant
:
importcsvfromimportlibimportresourcesdefread_population_file(year,variant="Medium"):population={}print(f"Reading population data for{year},{variant} scenario")withresources.open_text("data","WPP2019_TotalPopulationBySex.csv")asfid:rows=csv.DictReader(fid)# Read data, filter the correct yearforrowinrows:ifrow["Time"]==yearandrow["Variant"]==variant:pop=round(float(row["PopTotal"])*1000)population[row["Location"]]=popreturnpopulation
The highlighted lines show howimportlib.resources
is used to open the data file. For more information about working with CSV files, check outReading and Writing CSV Files in Python.
The above function returns a dictionary with population numbers:
>>>population=read_population_file("2020")Reading population data for 2020, Medium scenario>>>population["Norway"]5421242
You can do any number of interesting things with this population dictionary, including analysis and visualizations. Here, you’ll create a quiz game that asks users to identify which country in a set is most populous. Playing the game will look something like this:
$pythonpopulation_quiz.pyQuestion 1:1. Tunisia2. Djibouti3. BelizeWhich country has the largest population? 1Yes, Tunisia is most populous (11,818,618)Question 2:1. Mozambique2. Ghana3. HungaryWhich country has the largest population? 2No, Mozambique (31,255,435) is more populous than Ghana (31,072,945)...
The details of the implementation are too far outside the topic of this tutorial, so they won’t be discussed here. However, you can expand the section below to see the complete source code.
The population quiz consists of two functions, one that reads the population data like you did above and one that runs the actual quiz:
1# population_quiz.py 2 3importcsv 4importrandom 5 6try: 7fromimportlibimportresources 8exceptModuleNotFoundError: 9importimportlib_resourcesasresources1011defread_population_file(year,variant="Medium"):12"""Read population data for the given year and variant"""13population={}1415print(f"Reading population data for{year},{variant} scenario")16withresources.open_text(17"data","WPP2019_TotalPopulationBySex.csv"18)asfid:19rows=csv.DictReader(fid)2021# Read data, filter the correct year22forrowinrows:23if(24int(row["LocID"])<90025androw["Time"]==year26androw["Variant"]==variant27):28pop=round(float(row["PopTotal"])*1000)29population[row["Location"]]=pop3031returnpopulation3233defrun_quiz(population,num_questions,num_countries):34"""Run a quiz about the population of countries"""35num_correct=036forq_numinrange(num_questions):37print(f"\n\nQuestion{q_num+1}:")38countries=random.sample(population.keys(),num_countries)39print("\n".join(f"{i}.{a}"fori,ainenumerate(countries,start=1)))4041# Get user input42whileTrue:43guess_str=input("\nWhich country has the largest population? ")44try:45guess_idx=int(guess_str)-146guess=countries[guess_idx]47except(ValueError,IndexError):48print(f"Please answer between 1 and{num_countries}")49else:50break5152# Check the answer53correct=max(countries,key=lambdak:population[k])54ifguess==correct:55num_correct+=156print(f"Yes,{guess} is most populous ({population[guess]:,})")57else:58print(59f"No,{correct} ({population[correct]:,}) is more populous "60f"than{guess} ({population[guess]:,})"61)6263returnnum_correct6465defmain():66"""Read population data and run quiz"""67population=read_population_file("2020")68num_correct=run_quiz(population,num_questions=10,num_countries=3)69print(f"\nYou answered{num_correct} questions correctly")7071if__name__=="__main__":72main()
Note that online 24, you also check that theLocID
is less than900
. Locations with aLocID
of900
and above are not proper countries, but aggregates likeWorld
,Asia
, and so on.
When building graphical user interfaces (GUIs), you often need to include resource files like icons. The following example shows how you can do that usingimportlib.resources
. The final app will look quite basic, but it’ll have a custom icon as well as an illustration on theGoodbye button:
The example usesTkinter, which is a GUI package available in the standard library. It’s based on theTk windowing system, originally developed for the Tcl programming language. There are many other GUI packages available for Python. If you’re using a different one, then you should be ableadd icons to your app using ideas similar to the ones presented here.
In Tkinter, images are handled by thePhotoImage
class. To create aPhotoImage
, you pass in a path to an image file.
Remember, when distributing your package, you’re not even guaranteed that resource files will exist as physical files on the file system.importlib.resources
solves this by providingpath()
. This function will return apath to the resource file, creating a temporary file if necessary.
To make sure any temporary files are cleaned up properly, you should usepath()
as a context manager using the keywordwith
:
>>>fromimportlibimportresources>>>withresources.path("hello_gui.gui_resources","logo.png")aspath:...print(path).../home/gahjelle/hello_gui/gui_resources/logo.png
For the full example, assume you have the following file hierarchy:
hello_gui/│├── gui_resources/│ ├── __init__.py│ ├── hand.png│ └── logo.png│└── __main__.py
If you want to try the example yourself, then you can download these files along with the rest of the source code used in this tutorial by clicking the link below:
Get the Source Code:Click here to get the source code you’ll use to learn about the Python import system in this tutorial.
The code is stored in a file with the special name__main__.py
. This name indicates that the file is the entry point for the package. Having a__main__.py
file allows your package to be executed withpython -m
:
$python-mhello_gui
For more information on calling a package with-m
, seeHow to Publish an Open-Source Python Package to PyPI.
The GUI is defined in a class calledHello
. Note that you useimportlib.resources
to obtain the path of the image files:
1# hello_gui/__main__.py 2 3importtkinterastk 4fromtkinterimportttk 5 6try: 7fromimportlibimportresources 8exceptModuleNotFoundError: 9importimportlib_resourcesasresources1011classHello(tk.Tk):12def__init__(self,*args,**kwargs):13super().__init__(*args,**kwargs)14self.wm_title("Hello")1516# Read image, store a reference to it, and set it as an icon17withresources.path("hello_gui.gui_resources","logo.png")aspath:18self._icon=tk.PhotoImage(file=path)19self.iconphoto(True,self._icon)2021# Read image, create a button, and store a reference to the image22withresources.path("hello_gui.gui_resources","hand.png")aspath:23hand=tk.PhotoImage(file=path)24button=ttk.Button(25self,26image=hand,27text="Goodbye",28command=self.quit,29compound=tk.LEFT,# Add the image to the left of the text30)31button._image=hand32button.pack(side=tk.TOP,padx=10,pady=10)3334if__name__=="__main__":35hello=Hello()36hello.mainloop()
If you want to learn more about building GUIs with Tkinter, then check outPython GUI Programming With Tkinter. The official documentation also has anice list of resources to start with, and thetutorial at TkDocs is another great resource that shows how to use Tk in other languages.
Note: One source of confusion and frustration when working with images in Tkinter is that you must make sure the images aren’tgarbage collected. Due to the way Python and Tk interact, the garbage collector in Python (at least inCPython) doesn’t register that images are used by.iconphoto()
andButton
.
To make sure that the images are kept around, you should manually add a reference to them. You can see examples of this in the code above onlines 18 and 31.
One of Python’s defining features is that it’s a very dynamic language. Although it’s sometimes a bad idea, you can do many things to a Python program when it’s running, including adding attributes to a class, redefining methods, or changing thedocstring of a module. For instance, you can changeprint()
so that it doesn’t do anything:
>>>print("Hello dynamic world!")Hello dynamic world!>>># Redefine the built-in print()>>>print=lambda*args,**kwargs:None>>>print("Hush, everybody!")>>># Nothing is printed
Technically, you’re not redefiningprint()
. Instead, you’re defininganotherprint()
that shadows the built-in one. To return to using the originalprint()
, you can delete your custom one withdel print
. If you’re so inclined, you can shadow any Python object that is built into the interpreter.
Note: In the above example, you redefineprint()
using a lambda function. You also could have used a normal function definition:
>>>defprint(*args,**kwargs):...pass
To learn more about lambda functions, seeHow to Use Python Lambda Functions.
In this section, you’ll learn how to dodynamic imports in Python. With them, you won’t have to decide what to import until your program is running.
importlib
So far, you’ve used Python’simport
keyword to import modules and packages explicitly. However, the whole import machinery is available in theimportlib
package, and this allows you to do your imports more dynamically. The following script asks the user for the name of a module, imports that module, and prints its docstring:
# docreader.pyimportimportlibmodule_name=input("Name of module? ")module=importlib.import_module(module_name)print(module.__doc__)
import_module()
returns a module object that you can bind to any variable. Then you can treat that variable as a regularly imported module. You can use the script like this:
$pythondocreader.pyName of module? mathThis module is always available. It provides access to themathematical functions defined by the C standard.$pythondocreader.pyName of module? csvCSV parsing and writing.This module provides classes that assist in the reading and writingof Comma Separated Value (CSV) files, and implements the interfacedescribed by PEP 305. Although many CSV files are simple to parse,the format is not formally defined by a stable specification andis subtle enough that parsing lines of a CSV file with somethinglike line.split(",") is bound to fail. The module supports threebasic APIs: reading, writing, and registration of dialects.[...]
In each case, the module is imported dynamically byimport_module()
.
Think back to theserializers example from earlier. Withserializers
implemented as a namespace package, you had the ability to add custom serializers. In theoriginal example from a previous tutorial, the serializers were made available through a serializer factory. Usingimportlib
, you can do something similar.
Add the following code to your localserializers
namespace package:
# local/serializers/factory.pyimportimportlibdefget_serializer(format):try:module=importlib.import_module(f"serializers.{format}")serializer=getattr(module,f"{format.title()}Serializer")except(ImportError,AttributeError):raiseValueError(f"Unknown format{format!r}")fromNonereturnserializer()defserialize(serializable,format):serializer=get_serializer(format)serializable.serialize(serializer)returnstr(serializer)
Theget_serializer()
factory can create serializers dynamically based on theformat
parameter, andserialize()
can then apply the serializer to any object that implements a.serialize()
method.
The factory makes some strong assumptions about the naming of both the module and the class containing the individual serializers. In thenext section, you’ll learn about a plugin architecture that allows more flexibility.
You can now re-create the earlier example as follows:
>>>fromserializersimportfactory>>>fromsongimportSong>>>song=Song(song_id="1",title="The Same River",artist="Riverside")>>>factory.serialize(song,"json")'{"id": "1", "title": "The Same River", "artist": "Riverside"}'>>>factory.serialize(song,"yaml")"artist: Riverside, id: '1', title: The Same River\n">>>factory.serialize(song,"toml")ValueError: Unknown format 'toml'
In this case, you no longer need to explicitly import each serializer. Instead, you specify the name of a serializer with a string. The string could even be chosen by your user at runtime.
Note: In a regular package, you probably would have implementedget_serializer()
andserialize()
in an__init__.py
file. That would have allowed you to simply importserializers
and then callserializers.serialize()
.
However, namespace packages aren’t allowed to use__init__.py
, so you need to implement these functions in a separate module instead.
The final example shows that you also get a decent error message if you try to serialize to a format that hasn’t been implemented.
Let’s look at another example of using dynamic imports. You can use the following module to set up a flexible plugin architecture in your code. This is similar to the previous example, in which you could plug in serializers for different formats by adding new modules.
One application that uses plugins effectively is theGlue exploratory visualization tool. Glue can read many different data formats out of the box. However, if your data format isn’t supported, then you can write your owncustom data loader.
You do this by adding a function that you decorate and place in a special location to make it easy for Glue to find. You don’t need to alter any part of the Glue source code. See thedocumentation for all the details.
You can set up a similar plugin architecture that you can use in your own projects. Within the architecture, there are two levels:
Theplugins
module that exposes the plugin architecture has the following functions:
# plugins.pydefregister(func):"""Decorator for registering a new plugin"""defnames(package):"""List all plugins in one package"""defget(package,plugin):"""Get a given plugin"""defcall(package,plugin,*args,**kwargs):"""Call the given plugin"""def_import(package,plugin):"""Import the given plugin file from a package"""def_import_all(package):"""Import all plugins in a package"""defnames_factory(package):"""Create a names() function for one package"""defget_factory(package):"""Create a get() function for one package"""defcall_factory(package):"""Create a call() function for one package"""
The factory functions are used to conveniently add functionality to plugin packages. You’ll see some examples of how they’re used shortly.
Looking at all the details of this code is outside the scope of this tutorial. If you’re interested, then you can see an implementation by expanding the section below.
The following code shows the implementation ofplugins.py
described above:
# plugins.pyimportfunctoolsimportimportlibfromcollectionsimportnamedtuplefromimportlibimportresources# Basic structure for storing information about one pluginPlugin=namedtuple("Plugin",("name","func"))# Dictionary with information about all registered plugins_PLUGINS={}defregister(func):"""Decorator for registering a new plugin"""package,_,plugin=func.__module__.rpartition(".")pkg_info=_PLUGINS.setdefault(package,{})pkg_info[plugin]=Plugin(name=plugin,func=func)returnfuncdefnames(package):"""List all plugins in one package"""_import_all(package)returnsorted(_PLUGINS[package])defget(package,plugin):"""Get a given plugin"""_import(package,plugin)return_PLUGINS[package][plugin].funcdefcall(package,plugin,*args,**kwargs):"""Call the given plugin"""plugin_func=get(package,plugin)returnplugin_func(*args,**kwargs)def_import(package,plugin):"""Import the given plugin file from a package"""importlib.import_module(f"{package}.{plugin}")def_import_all(package):"""Import all plugins in a package"""files=resources.contents(package)plugins=[f[:-3]forfinfilesiff.endswith(".py")andf[0]!="_"]forplugininplugins:_import(package,plugin)defnames_factory(package):"""Create a names() function for one package"""returnfunctools.partial(names,package)defget_factory(package):"""Create a get() function for one package"""returnfunctools.partial(get,package)defcall_factory(package):"""Create a call() function for one package"""returnfunctools.partial(call,package)
This implementation is a bit simplified. In particular, it doesn’t do any explicit error handling. Check out thePyPlugs project for a more complete implementation.
You can see that_import()
usesimportlib.import_module()
to dynamically load plugins. Additionally,_import_all()
usesimportlib.resources.contents()
to list all available plugins in a given package.
Let’s look at some examples of how to use plugins. The first example is agreeter
package that you can use to add many different greetings to your app. A full plugin architecture is definitely overkill for this example, but it shows how the plugins work.
Assume you have the followinggreeter
package:
greeter/│├── __init__.py├── hello.py├── howdy.py└── yo.py
Eachgreeter
module defines a function that takes onename
argument. Note how they’re all registered as plugins using the@register
decorator:
# greeter/hello.pyimportplugins@plugins.registerdefgreet(name):print(f"Hello{name}, how are you today?")# greeter/howdy.pyimportplugins@plugins.registerdefgreet(name):print(f"Howdy good{name}, honored to meet you!")# greeter/yo.pyimportplugins@plugins.registerdefgreet(name):print(f"Yo{name}, good times!")
To learn more about decorators and how they’re used, check outPrimer on Python Decorators.
Note: To simplify the discovery and import of plugins, each plugin’s name is based on the name of the module that contains it instead of the function name. This restricts you to having only one plugin per file.
To finish setting upgreeter
as a plugin package, you can use the factory functions inplugins
to add functionality to thegreeter
package itself:
# greeter/__init__.pyimportpluginsgreetings=plugins.names_factory(__package__)greet=plugins.call_factory(__package__)
You can now usegreetings()
andgreet()
as follows:
>>>importgreeter>>>greeter.greetings()['hello', 'howdy', 'yo']>>>greeter.greet(plugin="howdy",name="Guido")Howdy good Guido, honored to meet you!
Note thatgreetings()
automatically discovers all the plugins that are available in the package.
You can also more dynamically choose which plugin to call. In the following example, you choose a plugin at random. However, you could also select a plugin based on a configuration file or user input:
>>>importgreeter>>>importrandom>>>greeting=random.choice(greeter.greetings())>>>greeter.greet(greeting,name="Frida")Hello Frida, how are you today?>>>greeting=random.choice(greeter.greetings())>>>greeter.greet(greeting,name="Frida")Yo Frida, good times!
To discover and call the different plugins, you need to import them. Let’s have a quick look at howplugins
handles imports. The main work is done in the following two functions insideplugins.py
:
importimportlibimportpathlibfromimportlibimportresourcesdef_import(package,plugin):"""Import the given plugin file from a package"""importlib.import_module(f"{package}.{plugin}")def_import_all(package):"""Import all plugins in a package"""files=resources.contents(package)plugins=[f[:-3]forfinfilesiff.endswith(".py")andf[0]!="_"]forplugininplugins:_import(package,plugin)
_import()
looks deceptively straightforward. It usesimportlib
to import a module. But there are a couple of things also happening in the background:
@register
decorators defined inside each plugin module register each imported plugin._import_all()
discovers all the plugins within a package. Here’s how it works:
contents()
fromimportlib.resources
lists all the files inside a package.Let’s end this section with a final version of theserializers namespace package. One outstanding issue was that theget_serializer()
factory made strong assumptions about the naming of the serializer classes. You can make this more flexible using plugins.
First, add a line registering each of the serializers. Here is an example of how it’s done in theyaml
serializer:
# local/serializers/yaml.pyimportpluginsimportyamlfromserializers.jsonimportJsonSerializer@plugins.registerclassYamlSerializer(JsonSerializer):def__str__(self):returnyaml.dump(self._current_object)
Next, updateget_serializers()
to useplugins
:
# local/serializers/factory.pyimportpluginsget_serializer=plugins.call_factory(__package__)defserialize(serializable,format):serializer=get_serializer(format)serializable.serialize(serializer)returnstr(serializer)
You implementget_serializer()
usingcall_factory()
since that will automatically instantiate each serializer. With this refactoring, the serializers work just the same as earlier. However, you have more flexibility in naming your serializer classes.
For more information about using plugins, check outPyPlugs on PyPI and thePlug-ins: Adding Flexibility to Your Apps presentation fromPyCon 2019.
You’ve seen many ways to take advantage of Python’s import system. In this section, you’ll learn a bit more about what happens behind the scenes as modules and packages are imported.
As with most parts of Python, the import system can be customized. You’ll see several ways that you can change the import system, including automatically downloading missing packages from PyPI and importing data files as if they were modules.
The details of the Python import system are described inthe official documentation. At a high level, three things happen when you import a module (or package). The module is:
For the usual imports—those done with theimport
statement—all three steps happen automatically. When you useimportlib
, however, only the first two steps are automatic. You need to bind the module to a variable or namespace yourself.
For instance, the following methods of importing and renamingmath.pi
are roughly equivalent:
>>>frommathimportpiasPI>>>PI3.141592653589793>>>importimportlib>>>_tmp=importlib.import_module("math")>>>PI=_tmp.pi>>>del_tmp>>>PI3.141592653589793
Of course, in normal code you should prefer the former.
One thing to note is that, even when you import only one attribute from a module, the whole module is loaded and executed. The rest of the contents of the module just aren’t bound to the current namespace. One way to prove this is to have a look at what’s known as themodule cache:
>>>frommathimportpi>>>pi3.141592653589793>>>importsys>>>sys.modules["math"].cos(pi)-1.0
sys.modules
acts as a module cache. It contains references to all modules that have been imported.
The module cache plays a very important role in the Python import system. The first place Python looks for modules when doing an import is insys.modules
. If a module is already available, then it isn’t loaded again.
This is a great optimization, but it’s also a necessity. If modules were reloaded each time they were imported, then you could end up with inconsistencies in certain situations, such as when the underlying source code changes while a script is running.
Recall theimport path you saw earlier. It essentially tells Python where to search for modules. However, if Python finds a module in the module cache, then it won’t bother searching the import path for the module.
Inobject-oriented programming, asingleton is a class with at most one instance. While it’s possible toimplement singletons in Python, most good uses of singletons can be handled by modules instead. You can trust the module cache to instantiate a class only once.
As an example, let’s return to the United Nations population data you sawearlier. The following module defines a class wrapping the population data:
# population.pyimportcsvfromimportlibimportresourcesimportmatplotlib.pyplotaspltclass_Population:def__init__(self):"""Read the population file"""self.data={}self.variant="Medium"print(f"Reading population data for{self.variant} scenario")withresources.open_text("data","WPP2019_TotalPopulationBySex.csv")asfid:rows=csv.DictReader(fid)# Read data, filter the correct variantforrowinrows:ifint(row["LocID"])>=900orrow["Variant"]!=self.variant:continuecountry=self.data.setdefault(row["Location"],{})population=float(row["PopTotal"])*1000country[int(row["Time"])]=round(population)defget_country(self,country):"""Get population data for one country"""data=self.data[country]years,population=zip(*data.items())returnyears,populationdefplot_country(self,country):"""Plot data for one country, population in millions"""years,population=self.get_country(country)plt.plot(years,[p/1e6forpinpopulation],label=country)deforder_countries(self,year):"""Sort countries by population in decreasing order"""countries={c:self.data[c][year]forcinself.data}returnsorted(countries,key=lambdac:countries[c],reverse=True)# Instantiate the Singletondata=_Population()
Reading the data from disk takes some time. Since you don’t expect the data file to change, you instantiate the class when you load the module. The name of the class starts with anunderscore toindicate to users that they shouldn’t use it.
You can use thepopulation.data
singleton to create aMatplotlib graph showing the population projection for the most populous countries:
>>>importmatplotlib.pyplotasplt>>>importpopulationReading population data for Medium scenario>>># Pick out five most populous countries in 2050>>>forcountryinpopulation.data.order_countries(2050)[:5]:...population.data.plot_country(country)...>>>plt.legend()>>>plt.xlabel("Year")>>>plt.ylabel("Population [Millions]")>>>plt.title("UN Population Projections")>>>plt.show()
This creates a chart like the following:
Note that loading the data at import time is a kind ofantipattern. Ideally, you want your imports to be as free of side effects as possible. A better approach would be to load the data lazily when you need it. You can do this quite elegantly using properties. Expand the following section to see an example.
The lazy implementation ofpopulation
stores the population data in._data
the first time it’s read. The.data
property handles this caching of data:
# population.pyimportcsvfromimportlibimportresourcesimportmatplotlib.pyplotaspltclass_Population:def__init__(self):"""Prepare to read the population file"""self._data={}self.variant="Medium"@propertydefdata(self):"""Read data from disk"""ifself._data:# Data has already been read, return it directlyreturnself._data# Read data and store it in self._dataprint(f"Reading population data for{self.variant} scenario")withresources.open_text("data","WPP2019_TotalPopulationBySex.csv")asfid:rows=csv.DictReader(fid)# Read data, filter the correct variantforrowinrows:ifint(row["LocID"])>=900orrow["Variant"]!=self.variant:continuecountry=self._data.setdefault(row["Location"],{})population=float(row["PopTotal"])*1000country[int(row["Time"])]=round(population)returnself._datadefget_country(self,country):"""Get population data for one country"""country=self.data[country]years,population=zip(*country.items())returnyears,populationdefplot_country(self,country):"""Plot data for one country, population in millions"""years,population=self.get_country(country)plt.plot(years,[p/1e6forpinpopulation],label=country)deforder_countries(self,year):"""Sort countries by population in decreasing order"""countries={c:self.data[c][year]forcinself.data}returnsorted(countries,key=lambdac:countries[c],reverse=True)# Instantiate the singletondata=_Population()
Now the data won’t be loaded at import time. Instead, it’ll be imported the first time you access the_Population.data
dictionary. For more information about properties and the more general concept of descriptors, seePython Descriptors: An Introduction.
The module cache can be a little frustrating when you’re working in the interactive interpreter. It’s not trivial to reload a module after you change it. For example, take a look at the following module:
# number.pyanswer=24
As part of testing anddebugging this module, you import it in a Python console:
>>>importnumber>>>number.answer24
Let’s say you realize that you have a bug in your code, so you update thenumber.py
file in your editor:
# number.pyanswer=42
Returning to your console, you import the updated module to see the effect of your fix:
>>>importnumber>>>number.answer24
Why is the answer still24
? The module cache is doing its (now frustrating) magic: since Python importednumber
earlier, it sees no reason to load the module again even though you just changed it.
The most straightforward solution to this is to exit the Python console and restart it. This forces Python to clear its module cache as well:
>>>importnumber>>>number.answer42
However, restarting the interpreter isn’t always feasible. You might be in a more complicated session that has taken you a long time to set up. If that’s the case, then you can useimportlib.reload()
to reload a module instead:
>>>importnumber>>>number.answer24>>># Update number.py in your editor>>>importimportlib>>>importlib.reload(number)<module 'number' from 'number.py'>>>>number.answer42
Note thatreload()
requires a module object, not a string likeimport_module()
does. Also, be aware thatreload()
has some caveats. In particular, variables referring to objects within a module are not re-bound to new objects when that module is reloaded. Seethe documentation for more details.
You sawearlier that creating modules with the same name as standard libraries can create problems. For example, if you have a file namedmath.py
in Python’s import path, then you won’t be able to importmath
from the standard library.
This isn’t always the case, though. Create a file namedtime.py
with the following content:
# time.pyprint("Now's the time!")
Next, open a Python interpreter and import this new module:
>>>importtime>>># Nothing is printed>>>time.ctime()'Mon Jun 15 14:26:12 2020'>>>time.tzname('CET', 'CEST')
Something weird happened. It doesn’t seem like Python imported your newtime
module. Instead, it imported thetime
module from the standard library. Why are the standard library modules behaving inconsistently? You can get a hint by inspecting the modules:
>>>importmath>>>math<module 'math' from '.../python/lib/python3.8/lib-dynload/math.cpython.so'>>>>importtime>>>time<module 'time' (built-in)>
You can see thatmath
is imported from a file, whereastime
is some kind of built-in module. It seems that built-in modules aren’t shadowed by local ones.
Note: The built-in modules are compiled into the Python interpreter. Typically, they’re foundational modules likebuiltins
,sys
, andtime
. Which modules are built in depends on your Python interpreter, but you can find their names insys.builtin_module_names
.
Let’s dig even deeper into Python’s import system. This will also show why built-in modules aren’t shadowed by local ones. There are several steps involved when importing a module:
Python checks if the module is available in themodule cache. Ifsys.modules
contains the name of the module, then the module is already available, and the import process ends.
Python starts looking for the module using severalfinders. A finder will search for the module using a given strategy. The default finders can import built-in modules, frozen modules, and modules on the import path.
Python loads the module using aloader. Which loader Python uses is determined by the finder that located the module and is specified in something called amodule spec.
You can extend the Python import system by implementing your own finder and, if necessary, your own loader. You’ll see a more useful example of a finder later. For now, you’ll learn how to do basic (and possibly silly) customizations of the import system.
sys.meta_path
controls which finders are called during the import process:
>>>importsys>>>sys.meta_path[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
First, note that this answers the question from earlier: built-in modules aren’t shadowed by local modules because the built-in finder is called before the import path finder, which finds local modules. Second, note that you can customizesys.meta_path
to your liking.
To quickly mess up your Python session, you can remove all finders:
>>>importsys>>>sys.meta_path.clear()>>>sys.meta_path[]>>>importmathTraceback (most recent call last): File"<stdin>", line1, in<module>ModuleNotFoundError:No module named 'math'>>>importimportlib# Autoimported at start-up, still in the module cache>>>importlib<module 'importlib' from '.../python/lib/python3.8/importlib/__init__.py'>
Since there are no finders, Python can’t find or import new modules. However, Python can still import modules that are already in the module cache since it looks there before calling any finders.
In the example above,importlib
was already loaded under the hood before you cleared the list of finders. If you really want to make your Python session completely unusable, then you can also clear the module cache,sys.modules
.
The following is a slightly more useful example. You’ll write a finder that prints a message to the console identifying the module being imported. The example shows how to add your own finder, although it doesn’t actually attempt to find a module:
1# debug_importer.py 2 3importsys 4 5classDebugFinder: 6@classmethod 7deffind_spec(cls,name,path,target=None): 8print(f"Importing{name!r}") 9returnNone1011sys.meta_path.insert(0,DebugFinder)
All finders must implement a.find_spec()
class method, which should try to find a given module. There are three ways that.find_spec()
can terminate:
None
if it doesn’t know how to find and load the moduleModuleNotFoundError
to indicate that the module can’t be importedTheDebugFinder
prints a message to the console and then explicitly returnsNone
to indicate that other finders should figure out how to actually import the module.
Note: Since Pythonimplicitly returnsNone
from any function or method without an explicitreturn
, you can leave outline 9. However, in this case it’s good to includereturn None
to make it clear thatDebugFinder
doesn’t find a module.
By insertingDebugFinder
first in the list of finders, you get a running list of all modules being imported:
>>>importdebug_importer>>>importcsvImporting 'csv'Importing 're'Importing 'enum'Importing 'sre_compile'Importing '_sre'Importing 'sre_parse'Importing 'sre_constants'Importing 'copyreg'Importing '_csv'
You can, for instance, see that importingcsv
triggers the import of several other modules thatcsv
depends on. Note that the verbose option to the Python interpreter,python -v
, gives the same information and much, much more.
For another example, say that you’re on a quest to rid the world ofregular expressions. (Now,why would you want such a thing? Regular expressions aregreat!) You could implement the following finder that bans there
regular expressions module:
# ban_importer.pyimportsysBANNED_MODULES={"re"}classBanFinder:@classmethoddeffind_spec(cls,name,path,target=None):ifnameinBANNED_MODULES:raiseModuleNotFoundError(f"{name!r} is banned")sys.meta_path.insert(0,BanFinder)
Raising aModuleNotFoundError
ensures that no finder later in the list of finders will be executed. This effectively stops you from usingregular expressions in Python:
>>>importban_importer>>>importcsvTraceback (most recent call last): File"<stdin>", line1, in<module> File".../python/lib/python3.8/csv.py", line6, in<module>importre File"ban_importer.py", line11, infind_specraiseModuleNotFoundError(f"{name!r} is banned")ModuleNotFoundError:'re' is banned
Even though you’re importing onlycsv
, that module is importingre
behind the scenes, so an error is raised.
Because the Python import system is already quite powerful and useful, there are many more ways to mess it up than there are to extend it in a useful way. However, the following example can be useful in certain situations.
ThePython Package Index (PyPI) is your one-stopcheese shop for finding third-party modules and packages. It’s also the place from whichpip
downloads packages.
In otherReal Python tutorials, you may have seen instructions to usepython -m pip install
to install the third-party modules and packages you need for following along with examples. Wouldn’t it be great to have Python automatically install missing modules for you?
Warning: In most cases, it reallywouldn’t be great to have Python install modules automatically. For instance, in most production settings you want to stay in control of your environment. Furthermore, thedocumentation cautions against usingpip
this way.
To avoid messing up your Python installation, you should play with this code only in environments that you wouldn’t mind deleting or reinstalling.
The following finder attempts to install modules usingpip
:
# pip_importer.pyfromimportlibimportutilimportsubprocessimportsysclassPipFinder:@classmethoddeffind_spec(cls,name,path,target=None):print(f"Module{name!r} not installed. Attempting to pip install")cmd=f"{sys.executable} -m pip install{name}"try:subprocess.run(cmd.split(),check=True)exceptsubprocess.CalledProcessError:returnNonereturnutil.find_spec(name)sys.meta_path.append(PipFinder)
Compared to the finders you saw earlier, this one is slightly more complicated. By putting this finder last in the list of finders, you know that if you callPipFinder
, then the module won’t be found on your system. The job of.find_spec()
is therefore just to do thepip install
. If the installation works, then the module spec will be created and returned.
Try to use theparse
library without installing it yourself:
>>>importpip_importer>>>importparseModule 'parse' not installed. Attempting to pip installCollecting parse Downloading parse-1.15.0.tar.gz (29 kB)Building wheels for collected packages: parse Building wheel for parse (setup.py) ... doneSuccessfully built parseInstalling collected packages: parseSuccessfully installed parse-1.15.0>>>pattern="my name is{name}">>>parse.parse(pattern,"My name is Geir Arne")<Result () {'name': 'Geir Arne'}>
Normally,import parse
would’ve raised aModuleNotFoundError
, but in this caseparse
is installed and imported.
While thePipFinder
seemingly works, there are some challenges with this approach. One major problem is that the import name of a module doesn’t always correspond to its name on PyPI. For example, theReal Python feed reader is calledrealpython-reader
on PyPI, but the import name is simplyreader
.
UsingPipFinder
to import and installreader
ends up installing the wrong package:
>>>importpip_importer>>>importreaderModule 'reader' not installed. Attempting to pip installCollecting reader Downloading reader-1.2-py3-none-any.whl (68 kB)...
This could have disastrous consequences for your project.
One situation in which automatic installations can be quite helpful is when you’re running Python in the cloud with more limited control over your environment, such as when you’re runningJupyter-style notebooks atGoogle Colaboratory. The Colab notebook environment is great for doing cooperative data exploration.
A typical notebook comes with many data science packages installed, includingNumPy,Pandas, andMatplotlib, and you canadd new packages withpip
. But you can also activate automatic installation:
Sincepip_importer
isn’t available locally on the Colab server, the code is copied into the first cell of the notebook.
The final example in this section is inspired by Aleksey Bilogur’s great blog postImport Almost Anything in Python: An Intro to Module Loaders and Finders. You’vealready seen how to useimportlib.resources
to import datafiles. Here, you’ll instead implement a custom loader that can import a CSV file directly.
Earlier, you worked with a huge CSV file with population data. To make the custom loader example more manageable, consider the following smalleremployees.csv
file:
name,department,birthday monthJohn Smith,Accounting,NovemberErica Meyers,IT,March
The first line is a header naming three fields, and the following two rows of data each contain information about an employee. For more information about working with CSV files, check outReading and Writing CSV Files in Python.
Your goal in this section is to write a finder and a loader that allow you to import the CSV file directly so that you can write code like the following:
>>>importcsv_importer>>>importemployees>>>employees.name('John Smith', 'Erica Meyers')>>>forrowinemployees.data:...print(row["department"])...AccountingIT>>>forname,monthinzip(employees.name,employees.birthday_month):...print(f"{name} is born in{month}")...John Smith is born in NovemberErica Meyers is born in March>>>employees.__file__'employees.csv'
The job of the finder will be to search for and recognize CSV files. The loader’s job will be to import the CSV data. Often, you can implement finders and corresponding loaders in one common class. That’s the approach you’ll take here:
1# csv_importer.py 2 3importcsv 4importpathlib 5importre 6importsys 7fromimportlib.machineryimportModuleSpec 8 9classCsvImporter():10def__init__(self,csv_path):11"""Store path to CSV file"""12self.csv_path=csv_path1314@classmethod15deffind_spec(cls,name,path,target=None):16"""Look for CSV file"""17package,_,module_name=name.rpartition(".")18csv_file_name=f"{module_name}.csv"19directories=sys.pathifpathisNoneelsepath20fordirectoryindirectories:21csv_path=pathlib.Path(directory)/csv_file_name22ifcsv_path.exists():23returnModuleSpec(name,cls(csv_path))2425defcreate_module(self,spec):26"""Returning None uses the standard machinery for creating modules"""27returnNone2829defexec_module(self,module):30"""Executing the module means reading the CSV file"""31# Read CSV data and store as a list of rows32withself.csv_path.open()asfid:33rows=csv.DictReader(fid)34data=list(rows)35fieldnames=tuple(_identifier(f)forfinrows.fieldnames)3637# Create a dict with each field38values=zip(*(row.values()forrowindata))39fields=dict(zip(fieldnames,values))4041# Add the data to the module42module.__dict__.update(fields)43module.__dict__["data"]=data44module.__dict__["fieldnames"]=fieldnames45module.__file__=str(self.csv_path)4647def__repr__(self):48"""Nice representation of the class"""49returnf"{self.__class__.__name__}({str(self.csv_path)!r})"5051def_identifier(var_str):52"""Create a valid identifier from a string5354 See https://stackoverflow.com/a/330573155 """56returnre.sub(r"\W|^(?=\d)","_",var_str)5758# Add the CSV importer at the end of the list of finders59sys.meta_path.append(CsvImporter)
There’s quite a bit of code in this example! Luckily, most of the work is done in.find_spec()
and.exec_module()
. Let’s look at them in more detail.
As you saw earlier,.find_spec()
is responsible for finding the module. In this case, you’re looking for CSV files, so you create a filename with a.csv
suffix.name
contains the full name of the module that is imported. For example, if you usefrom data import employees
, thenname
will bedata.employees
. In this case, the filename will beemployees.csv
.
For top-level imports,path
will beNone
. In that case, you look for the CSV file in the full import path, which will include the current working directory. If you’re importing a CSV file within a package, thenpath
will be set to the path or paths of the package. If you find a matching CSV file, then a module spec is returned. This module spec tells Python to load the module usingCsvImporter
.
The CSV data is loaded by.exec_module()
. You can usecsv.DictReader
from the standard library to do the actual parsing of the file. Like most things in Python, modules are backed by dictionaries. By adding the CSV data tomodule.__dict__
, you make it available as attributes of the module.
For instance, addingfieldnames
to the module dictionary online 44 allows you to list the field names in the CSV file as follows:
>>>employees.fieldnames('name', 'department', 'birthday_month')
In general, CSV field names can contain spaces and other characters that aren’t allowed in Python attribute names. Before adding the fields as attributes on the module, yousanitize the field names using a regular expression. This is done in_identifier()
starting online 51.
You can see an example of this effect in thebirthday_month
field name above. If you look at the original CSV file, then you’ll see that the header saysbirthday month
with a space instead of an underscore.
By hooking thisCsvImporter
into the Python import system, you get a fair bit of functionality for free. For example, the module cache will make sure that the data file is loaded only once.
To round out this tutorial, you’ll see a few tips about how to handle certain situations that come up from time to time. You’ll see how to deal with missing packages, cyclical imports, and even packages stored inside ZIP files.
Sometimes you need to deal with packages that have different names depending on the Python version. You’ve already seen one example of this:importlib.resources
has only been available since Python 3.7. In earlier versions of Python, you need to install and useimportlib_resources
instead.
As long as the different versions of the package are compatible, you can handle this by renaming the package withas
:
try:fromimportlibimportresourcesexceptModuleNotFoundError:importimportlib_resourcesasresources
In the rest of the code, you can refer toresources
and not worry about whether you’re usingimportlib.resources
orimportlib_resources
.
Normally, it’s easiest to use atry...except
statement to figure out which version to use. Another option is to inspect the version of the Python interpreter. However, this may add some maintenance cost if you need to update the version numbers.
You could rewrite the previous example as follows:
importsysifsys.version_info>=(3,7):fromimportlibimportresourceselse:importimportlib_resourcesasresources
This would useimportlib.resources
on Python 3.7 and newer while falling back toimportlib_resources
on older versions of Python. See theflake8-2020
project for good and future-proof advice on how to check which Python version is running.
The following use case is closely related to the previous example. Assume there’s a compatible reimplementation of a package. The reimplementation is better optimized, so you want to use it if it’s available. However, the original package is more easily available and also delivers acceptable performance.
One such example isquicktions
, which is an optimized version offractions
from the standard library. You can handle these preferences the same way you handled different package names earlier:
try:fromquicktionsimportFractionexceptModuleNotFoundError:fromfractionsimportFraction
This will usequicktions
if it’s available and fall back tofractions
if not.
Another similar example is theUltraJSON package, an ultrafast JSON encoder and decoder that can be used as a replacement forjson
in the standard library:
try:importujsonasjsonexceptModuleNotFoundError:importjson
By renamingujson
tojson
, you don’t have to worry about which package was actually imported.
A third, related example is adding a package that provides a nice-to-have feature that’s not strictly necessary for your app. Again, this can be solved by addingtry...except
to your imports. The extra challenge is how you will replace the optional package if it’s not available.
For a concrete example, say that you’re usingColorama to add colored text in the console. Colorama mainly consists of special string constants that add color when printed:
>>>importcolorama>>>colorama.init(autoreset=True)>>>fromcoloramaimportBack,Fore>>>Fore.RED'\x1b[31m'>>>print(f"{Fore.RED}Hello Color!")Hello Color!>>>print(f"{Back.RED}Hello Color!")Hello Color!
Unfortunately, the color doesn’t render in the example above. In yourterminal it’ll look something like this:
Before you start using Colorama colors, you should callcolorama.init()
. Settingautoreset
toTrue
means that the color directives will be automatically reset at the end of the string. It’s a useful setting if you want to color just one line at a time.
If you’d rather haveall your output be (for example) blue, then you can letautoreset
beFalse
and addFore.BLUE
to the beginning of your script. The following colors are available:
>>>fromcoloramaimportFore>>>sorted(cforcindir(Fore)ifnotc.startswith("_"))['BLACK', 'BLUE', 'CYAN', 'GREEN', 'LIGHTBLACK_EX', 'LIGHTBLUE_EX', 'LIGHTCYAN_EX', 'LIGHTGREEN_EX', 'LIGHTMAGENTA_EX', 'LIGHTRED_EX', 'LIGHTWHITE_EX', 'LIGHTYELLOW_EX', 'MAGENTA', 'RED', 'RESET', 'WHITE', 'YELLOW']
You can also usecolorama.Style
to control the style of your text. You can choose betweenDIM
,NORMAL
, andBRIGHT
.
Finally,colorama.Cursor
provides codes for controlling the position of the cursor. You can use it to display the progress or status of a running script. The following example displays a countdown from10
:
# countdown.pyimportcoloramafromcoloramaimportCursor,Foreimporttimecolorama.init(autoreset=True)countdown=[f"{Fore.BLUE}{n}"forninrange(10,0,-1)]countdown.append(f"{Fore.RED}Lift off!")print(f"{Fore.GREEN}Countdown starting:\n")forcountincountdown:time.sleep(1)print(f"{Cursor.UP(1)}{count} ")
Note how the counter stays in place instead of printing on separate lines as it normally would:
Let’s get back to the task at hand. For many applications, adding color to your console output is cool but not critical. To avoid adding yet another dependency to your app, you want to use Colorama only if it’s available on the system and not break the app if it isn’t.
To do this, you can take inspiration fromtesting and its use ofmocks. A mock can substitute for another object while allowing you to control its behavior. Here’s a naïve attempt at mocking Colorama:
>>>fromunittest.mockimportMock>>>colorama=Mock()>>>colorama.init(autoreset=True)<Mock name='mock.init()' id='139887544431728'>>>>Fore=Mock()>>>Fore.RED<Mock name='mock.RED' id='139887542331320'>>>>print(f"{Fore.RED}Hello Color!")<Mock name='mock.RED' id='139887542331320'>Hello Color!
This doesn’t quite work, becauseFore.RED
is represented by a string that messes up your output. Instead, you want to create an object that always renders as the empty string.
It’s possible to change the return value of.__str__()
onMock
objects. However, in this case, it’s more convenient to write your own mock:
# optional_color.pytry:fromcoloramaimportinit,Back,Cursor,Fore,StyleexceptModuleNotFoundError:fromcollectionsimportUserStringclassColoramaMock(UserString):def__call__(self,*args,**kwargs):returnselfdef__getattr__(self,key):returnselfinit=ColoramaMock("")Back=Cursor=Fore=Style=ColoramaMock("")
ColoramaMock("")
is an empty string that will also return the empty string when it’s called. This effectively gives us a reimplementation of Colorama, just without the colors.
The final trick is that.__getattr__()
returns itself, so that all colors, styles, and cursor movements that are attributes onBack
,Fore
,Style
, andCursor
are mocked as well.
Theoptional_color
module is designed to be a drop-in replacement for Colorama, so you can update the countdown example using search and replace:
# countdown.pyimportoptional_colorfromoptional_colorimportCursor,Foreimporttimeoptional_color.init(autoreset=True)countdown=[f"{Fore.BLUE}{n}"forninrange(10,0,-1)]countdown.append(f"{Fore.RED}Lift off!")print(f"{Fore.GREEN}Countdown starting:\n")forcountincountdown:time.sleep(1)print(f"{Cursor.UP(1)}{count} ")
If you run this script on a system in which Colorama isn’t available, then it’ll still work, but it may not look as nice:
With Colorama installed, you should see the same results as earlier.
One difference between scripts and library modules is that scripts typically do something, whereas libraries provide functionality. Both scripts and libraries live inside regular Python files, and as far as Python is concerned, there’s no difference between them.
Instead, the difference is in how the file is meant to be used: should it be executed withpython file.py
or imported withimport file
inside another script?
Sometimes you’ll have a module that works as both a script and a library. You could try torefactor your module into two different files.
One example of this in the standard library is thejson
package. You usually use it as a library, but it also comes bundled with a script that can prettify JSON files. Assume you have the followingcolors.json
file:
{"colors":[{"color":"blue","category":"hue","type":"primary","code":{"rgba":[0,0,255,1],"hex":"#00F"}},{"color":"yellow","category":"hue","type":"primary","code":{"rgba":[255,255,0,1],"hex":"#FF0"}}]}
As JSON is often read only by machines, many JSON files aren’t formatted in a readable fashion. In fact, it’s quite common for JSON files to consist of one very long line of text.
json.tool
is a script that uses thejson
library to format JSON in a more readable fashion:
$python-mjson.toolcolors.json--sort-keys{ "colors": [ { "category": "hue", "code": { "hex": "#00F", "rgba": [ 0, 0, 255, 1 ] }, "color": "blue", "type": "primary" }, { "category": "hue", "code": { "hex": "#FF0", "rgba": [ 255, 255, 0, 1 ] }, "color": "yellow", "type": "primary" } ]}
Now the structure of the JSON file becomes much less complicated to grasp. You can use the--sort-keys
option to sort keys alphabetically.
While it’s good practice to split scripts and libraries, Python has an idiom that makes it possible to treat a module as both a script and a library at the same time. Asnoted earlier, the value of the special__name__
module variable is set at runtime based on whether the module is imported or run as a script.
Let’s test it out! Create the following file:
# name.pyprint(__name__)
If you run this file, then you’ll see that__name__
is set to the special value__main__
:
$pythonname.py__main__
However, if you import the module, then__name__
is set to the name of the module:
>>>importnamename
This behavior is leveraged in the following pattern:
defmain():...if__name__=="__main__":main()
Let’s use this in a bigger example. In an attempt tokeep you young, the following script will replace any “old” age (25
or above) with24
:
1# feel_young.py 2 3defmake_young(text): 4words=[replace_by_age(w)forwintext.split()] 5return" ".join(words) 6 7defreplace_by_age(word,new_age=24,age_range=(25,120)): 8ifword.isdigit()andint(word)inrange(*age_range): 9returnstr(new_age)10returnword1112if__name__=="__main__":13text=input("Tell me something: ")14print(make_young(text))
You can run this as a script, and it will interactively make the age you type younger:
$pythonfeel_young.pyTell me something: Forever young - Bob is 79 years oldForever young - Bob is 24 years old
You can also use the module as an importable library. Theif
test online 12 makes sure that there are no side effects when you import the library. Only the functionsmake_young()
andreplace_by_age()
are defined. You can, for instance, use this library as follows:
>>>fromfeel_youngimportmake_young>>>headline="Twice As Many 100-Year-Olds">>>make_young(headline)'Twice As Many 24-Year-Olds'
Without the protection of theif
test, the import would have triggered the interactiveinput()
and madefeel_young
very hard to use as a library.
A slightly obscure feature of Python is that it canrun scripts packaged into ZIP files. The main advantage of this is that you can distribute a full package as a single file.
Note, however, that this still requires Python to be installed on the system. If you want to distribute your Python application as a stand-alone executable file, then seeUsing PyInstaller to Easily Distribute Python Applications.
If yougive the Python interpreter a ZIP file, then it’ll look for a file named__main__.py
inside the ZIP archive, extract it, and run it. As a basic example, create the following__main__.py
file:
# __main__.pyprint(f"Hello from{__file__}")
This will print a message when you run it:
$python__main__.pyHello from __main__.py
Now add it to a ZIP archive. You may be able to do this on the command line:
$ziphello.zip__main__.py adding: __main__.py (stored 0%)
On Windows, you can instead usepoint and click. Select the file in the File Explorer, then right-click and selectSend to → Compressed (zipped) folder.
Since__main__
isn’t a very descriptive name, you named the ZIP filehello.zip
. You can now call it directly with Python:
$pythonhello.zipHello from hello.zip/__main__.py
Note that your script is aware that it lives insidehello.zip
. Furthermore, the root of your ZIP file is added to Python’s import path so that your scripts can import other modules inside the same ZIP file.
Think back to the earlier example in which youcreated a quiz based on population data. It’s possible to distribute this whole application as a single ZIP file.importlib.resources
will make sure the data file is extracted from the ZIP archive when it’s needed.
The app consists of the following files:
population_quiz/│├── data/│ ├── __init__.py│ └── WPP2019_TotalPopulationBySex.csv│└── population_quiz.py
You could add these to a ZIP file in the same way you did above. However, Python comes with a tool calledzipapp
that streamlines the process of packing applications into ZIP archives. You use it as follows:
$python-mzipapppopulation_quiz-mpopulation_quiz:main
This command essentially does two things: it creates an entry point and packages your application.
Remember that you needed a__main__.py
file as an entry point inside your ZIP archive. If you supply the-m
option with information about how your app should be started, thenzipapp
creates this file for you. In this example, the generated__main__.py
looks like this:
# -*- coding: utf-8 -*-importpopulation_quizpopulation_quiz.main()
This__main__.py
is packaged, along with the contents of thepopulation_quiz
directory, into a ZIP archive namedpopulation_quiz.pyz
. The.pyz
suffix signals that this is a Python file wrapped into a ZIP archive.
Note: By default,zipapp
doesn’t compress any files. It only packages them into a single file. You can tellzipapp
to compress the files as well by adding the-c
option.
However, this feature is available only in Python 3.7 and later. See thezipapp
documentation for more information.
On Windows,.pyz
files should already be registered as Python files. On Mac and Linux, you can havezipapp
create executable files by using the-p
interpreter option and specifying which interpreter to use:
$python-mzipapppopulation_quiz-mpopulation_quiz:main\>-p"/usr/bin/env python"
The-p
option adds ashebang (#!
) thattells the operating system how to run the file. Additionally, it makes the.pyz
file executable so that you can run the file just by typing its name:
$./population_quiz.pyzReading population data for 2020, Medium scenarioQuestion 1:1. Timor-Leste2. Viet Nam3. BermudaWhich country has the largest population?
Notice the./
in front of the filename. This is a typical trick on Mac and Linux to run executable files in the current directory. If you move the file to a directory on yourPATH
, or if you’re using Windows, then you should be able to use only the filename:population_quiz.pyz
.
Note: On Python 3.6 and older, the previous command will fail with a message saying that it couldn’t find the population data resource in thedata
directory. This is due to alimitation inzipimport
.
A workaround is to supply the absolute path topopulation_quiz.pyz
. On Mac and Linux, you can do this with the following trick:
$`pwd`/population_quiz.pyz
Thepwd
command expands to the path of the current directory.
Let’s close this section by looking at a nice effect of usingimportlib.resources
. Remember that you used the following code to open the data file:
fromimportlibimportresourceswithresources.open_text("data","WPP2019_TotalPopulationBySex.csv")asfid:...
A more common way to open data files is to locate them based on your module’s__file__
attribute:
importpathlibDATA_DIR=pathlib.Path(__file__).parent/"data"withopen(DATA_DIR/"WPP2019_TotalPopulationBySex.csv")asfid:...
This approach usually works well. However, it falls apart when your application is packed into a ZIP file:
$pythonpopulation_quiz.pyzReading population data for 2020, Medium scenarioTraceback (most recent call last): ...NotADirectoryError: 'population_quiz.pyz/data/WPP2019_TotalPopulationBySex.csv'
Your data file is inside the ZIP archive, soopen()
isn’t able to open it.importlib.resources
, on the other hand, will extract your data to a temporary file before opening it.
A cyclical import happens when you have two or more modules importing each other. More concretely, imagine that the moduleyin
usesimport yang
and the moduleyang
similarly importsyin
.
Python’s import system is to some extent designed to handle import cycles. For instance, the following code—while not very useful—runs fine:
# yin.pyprint(f"Hello from yin")importyangprint(f"Goodbye from yin")# yang.pyprint(f"Hello from yang")importyinprint(f"Goodbye from yang")
Trying to importyin
in the interactive interpreter importsyang
as well:
>>>importyinHello from yinHello from yangGoodbye from yangGoodbye from yin
Note thatyang
is imported in the middle of the import ofyin
, precisely at theimport yang
statement in the source code ofyin
. The reason this doesn’t end up in endlessrecursion is our old friend the module cache.
When you typeimport yin
, a reference toyin
is added to the module cache even beforeyin
is loaded. Whenyang
tries to importyin
later, it simply uses the reference in the module cache.
You can also have modules that do something slightly more useful. If you define attributes and functions in your modules, then it all still works:
# yin.pyprint(f"Hello from yin")importyangnumber=42defcombine():returnnumber+yang.numberprint(f"Goodbye from yin")# yang.pyprint(f"Hello from yang")importyinnumber=24defcombine():returnnumber+yin.numberprint(f"Goodbye from yang")
Importingyin
works the same as before:
>>>importyinHello from yinHello from yangGoodbye from yangGoodbye from yin
The issues associated with recursive imports start popping up when you actually use the other module at import time instead of just defining functions that will use the other module later. Add one line toyang.py
:
# yin.pyprint(f"Hello from yin")importyangnumber=42defcombine():returnnumber+yang.numberprint(f"Goodbye from yin")# yang.pyprint(f"Hello from yang")importyinnumber=24defcombine():returnnumber+yin.numberprint(f"yin and yang combined is{combine()}")print(f"Goodbye from yang")
Now Python gets confused by the import:
>>>importyinHello from yinHello from yangTraceback (most recent call last):... File".../yang.py", line8, incombinereturnnumber+yin.numberAttributeError:module 'yin' has no attribute 'number'
The error message may seem a bit puzzling at first. Looking back at the source code, you can confirm thatnumber
is defined in theyin
module.
The problem is thatnumber
isn’t defined inyin
at the timeyang
gets imported. Consequently,yin.number
is used by the call tocombine()
.
To add to the confusion, you’ll have no issues importingyang
:
>>>importyangHello from yangHello from yinGoodbye from yinyin and yang combined is 66Goodbye from yang
By the timeyang
callscombine()
,yin
is fully imported andyin.number
is well defined. As a final twist, because of the module cache you saw earlier,import yin
might work if you do some other imports first:
>>>importyangHello from yangHello from yinGoodbye from yinyin and yang combined is 66Goodbye from yang>>>yinTraceback (most recent call last): File"<stdin>", line1, in<module>NameError:name 'yin' is not defined>>>importyin>>>yin.combine()66
So how can you avoid being bogged down and confused by cyclical imports? Having two or more modules importing each other is often a sign that you can improve the design of your modules.
Often, the easiest time to fix cyclical imports isbefore you implement them. If you see cycles in your architecture sketches, have a closer look andtry to break the cycles.
Still, there are times when it’s reasonable to introduce an import cycle. As you saw above, this isn’t a problem so long as your modules define only attributes, functions, classes, and so on. The second tip—which is also good design practice—is tokeep your modules free of side effects at import time.
If you really need modules with import cycles and side effects, there’s still another way out:do your imports locally inside functions.
Note that in the following code,import yang
is done insidecombine()
. This has two consequences. First,yang
is available only inside thecombine()
function. More importantly, the import doesn’t happen until you callcombine()
afteryin
has been fully imported:
# yin.pyprint(f"Hello from yin")number=42defcombine():importyangreturnnumber+yang.numberprint(f"Goodbye from yin")# yang.pyprint(f"Hello from yang")importyinnumber=24defcombine():returnnumber+yin.numberprint(f"yin and yang combined is{combine()}")print(f"Goodbye from yang")
Now there are no issues importing and usingyin
:
>>>importyinHello from yinGoodbye from yin>>>yin.combine()Hello from yangyin and yang combined is 66Goodbye from yang66
Notice thatyang
is, in fact, not imported until you callcombine()
. For another perspective on cyclical imports, seeFredrik Lundh’s classic note.
One concern when importing several modules and packages is that it will add to the startup time of your script. Depending on your application, this may or may not be critical.
Since the release ofPython 3.7, you’ve had a quick way of knowing how much time it takes to import packages and modules. Python 3.7 supports the-X importtime
command-line option, which measures and prints how much time each module takes to import:
$python-Ximporttime-c"import datetime"import time: self [us] | cumulative | imported package...import time: 87 | 87 | timeimport time: 180 | 180 | mathimport time: 234 | 234 | _datetimeimport time: 820 | 1320 | datetime
Thecumulative
column shows the cumulative time of import (in microseconds) on a per-package basis. You can read the listing as follows: Python spent1320
microseconds to fully importdatetime
, which involved importingtime
,math
, and the C implementation_datetime
as well.
Theself
column shows the time it took to import only the given module, excluding any recursive imports. You can see thattime
took87
microseconds to import,math
took180
,_datetime
took234
, and the import ofdatetime
itself took820
microseconds. All in all, this adds up to a cumulative time of1320
microseconds (within rounding errors).
Have a look at thecountdown.py
example from theColorama section:
$python3.7-Ximporttimecountdown.pyimport time: self [us] | cumulative | imported package...import time: 644 | 7368 | colorama.ansitowin32import time: 310 | 11969 | colorama.initialiseimport time: 333 | 12301 | coloramaimport time: 297 | 12598 | optional_colorimport time: 119 | 119 | time
In this example, importingoptional_color
took almost 0.013 seconds. Most of that time was spent importing Colorama and its dependencies. Theself
column shows the import time excluding nested imports.
For an extreme example, consider thepopulation
singleton from earlier. Because it’s loading a big data file, it’s extremely slow to import. To test this, you can runimport population
as a script with the-c
option:
$python3.7-Ximporttime-c"import population"import time: self [us] | cumulative | imported package...import time: 4933 | 322111 | matplotlib.pyplotimport time: 1474 | 1474 | typingimport time: 420 | 1894 | importlib.resourcesReading population data for Medium scenarioimport time: 1593774 | 1921024 | population
In this case, it takes almost 2 seconds to importpopulation
, of which about 1.6 seconds are spent in the module itself, mainly for loading the data file.
-X importtime
is a great tool for optimizing your imports. If you need to do more general monitoring and optimization of your code, then check outPython Timer Functions: Three Ways to Monitor Your Code.
In this tutorial, you’ve gotten to know the Python import system. Like many things in Python, it’s fairly straightforward to use for basic tasks like importing modules and packages. At the same time, the import system is quite complex, flexible, and extendable. You’ve learned several import-related tricks that you can take advantage of in your own code.
In this tutorial, you’ve learned how to:
Throughout the tutorial, you’ve seen many links to further info. The most authoritative source on the Python import system is the official documentation:
You can put your knowledge of Python imports to use by following along with the examples in this tutorial. Click the link below for access to the source code:
Get the Source Code:Click here to get the source code you’ll use to learn about the Python import system in this tutorial.
Take the Quiz: Test your knowledge with our interactive “Python import: Advanced Techniques and Tips” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python import: Advanced Techniques and TipsIn this quiz, you'll test your understanding of Python's import statement and how it works. You'll revisit your understanding of how to use modules and how to import modules dynamically at runtime.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Advanced Python import Techniques
🐍 Python Tricks 💌
Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.
AboutGeir Arne Hjelle
Geir Arne is an avid Pythonista and a member of the Real Python tutorial team.
» More about Geir ArneMasterReal-World Python Skills With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
MasterReal-World Python Skills
With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
What Do You Think?
What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.
Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.
Keep Learning
Related Topics:intermediatepython
Recommended Video Course:Advanced Python import Techniques
Related Tutorials:
Already have an account?Sign-In
Almost there! Complete this form and click the button below to gain instant access:
Python import (Source Code)