importlib_resources is a library that leverages Python’s import system toprovide access toresources withinpackages and alongsidemodules. Giventhat this library is built on top of the import system, it is highly efficientand easy to use. This library’s philosophy is that, if one can import amodule, one can access resources associated with that module. Resources can beopened or read, in either binary or text mode.
What exactly do we mean by “a resource”? It’s easiest to think about themetaphor of files and directories on the file system, though it’s important tokeep in mind that this is just a metaphor. Resources and packagesdo nothave to exist as physical files and directories on the file system.
If you have a file system layout such as:
data/__init__.pyone/__init__.pyresource1.txtmodule1.pyresources1/resource1.1.txttwo/__init__.pyresource2.txtstandalone.pyresource3.txt
then the directories aredata,data/one, anddata/two. Each ofthese are also Python packages by virtue of the fact that they all contain__init__.py files. That means that in Python, all of these importstatements work:
importdataimportdata.onefromdataimporttwo
Each import statement gives you a Pythonmodule corresponding to the__init__.py file in each of the respective directories. These modules arepackages since packages are just special module instances that have anadditional attribute, namely a__path__[1].
In this analogy then, resources are just files or directories contained in apackage directory, sodata/one/resource1.txt anddata/two/resource2.txt are both resources,as are the__init__.py files in all the directories.
Resources in packages are always accessed relative to the package that theylive in.resource1.txt andresources1/resource1.1.txt are resourceswithin thedata.one package, andtwo/resource2.txt is a resourcewithin thedata package.
Resources may also be referenced relative to anotheranchor, a module in apackage (data.one.module1) or a standalone module (standalone). Inthis case, resources are loaded from the same loader that loaded that module.
Let’s say you are writing an email parsing library and in your test suite youhave a sample email message in a file calledmessage.eml. You would liketo access the contents of this file for your tests, so you put this in yourproject under theemail/tests/data/message.eml path. Let’s say your unittests live inemail/tests/test_email.py.
Your test could read the data file by doing something like:
data_dir=os.path.join(os.path.dirname(__file__),'tests','data')data_path=os.path.join(data_dir,'message.eml')withopen(data_path,encoding='utf-8')asfp:eml=fp.read()
But there’s a problem with this! The use of__file__ doesn’t work if yourpackage lives inside a zip file, since in that case this code does not live onthe file system.
You could use thepkg_resources API like so:
# In Python 3, resource_string() actually returns bytes!frompkg_resourcesimportresource_stringasresource_byteseml=resource_bytes('email.tests.data','message.eml').decode('utf-8')
This requires you to make Python packages of bothemail/tests andemail/tests/data, by placing an empty__init__.py files in each ofthose directories.
The problem with thepkg_resources approach is that, depending on thepackages in your environment,pkg_resources can be expensivejust to import. This behaviorcan have a serious negative impact on things like command line startup timefor Python implement commands.
importlib_resources solves this performance challenge by being builtentirely on the back of thestdlibimportlib. By taking advantage of all the efficiencies inPython’s import system, and the fact that it’s built into Python, usingimportlib_resources can be much more performant. The equivalent codeusingimportlib_resources would look like:
fromimportlib_resourcesimportfiles# Reads contents with UTF-8 encoding and returns str.eml=files('email.tests.data').joinpath('message.eml').read_text()
Theimportlib_resourcesfiles API takes ananchor as its firstparameter, which can either be a package name (as astr) or an actualmodule object. If a string is passed in, it must name an importable Pythonmodule, which is imported prior to loading any resources. Thus the aboveexample could also be written as:
importemail.tests.dataeml=files(email.tests.data).joinpath('message.eml').read_text()
importlib_resources supports namespace packages as anchors just likeany other package. Similar to modules in a namespace package,resources in a namespace package are not allowed to collide by name.For example, if two packages both exposenspkg/data/foo.txt, thoseresources are unsupported by this library. The package will also likelyexperience problems due to the collision with installers.
It’s perfectly valid, however, for two packages to present different resourcesin the same namespace package, regular package, or subdirectory.For example, one package could exposenspkg/data/foo.txt and anotherexposenspkg/data/bar.txt and those two packages could be installedinto separate paths, and the resources should be queryable:
data=importlib_resources.files('nspkg').joinpath('data')data.joinpath('foo.txt').read_text()data.joinpath('bar.txt').read_text()
A consumer need not worry whether any given package is on the file systemor in a zip file, as theimportlib_resources APIs abstracts those details.Sometimes though, the user needs a path to an actual file on the file system.For example, some SSL APIs require a certificate file to be specified by areal file system path, and C’sdlopen() function also requires a real filesystem path.
To support this need,importlib_resources provides an API to extract theresource from a zip file to a temporary file or folder and return the filesystem path to this materialized resource as apathlib.Pathobject. In order to properly clean up this temporary file, what’s actuallyreturned is a context manager for use in awith-statement:
fromimportlib_resourcesimportfiles,as_filesource=files(email.tests.data).joinpath('message.eml')withas_file(source)aseml:third_party_api_requiring_file_system_path(eml)
Use all the standardcontextlib APIs to manage this context manager.
Starting with Python 3.9 andimportlib_resources 1.4, this packageintroduced thefiles() API, to be preferred over the legacy API,i.e. the functionsopen_binary,open_text,path,contents,read_text,read_binary, andis_resource.
To port to thefiles() API, refer to the_legacy moduleto see simple wrappers that enable drop-in replacement based on thepreferred API, and either copy those or adapt the usage to utilize thefiles andTraversableinterfaces directly.
Starting with Python 3.9 andimportlib_resources 2.0, this packageprovides an interface for non-standard loaders, such as those used byexecutable bundlers, to supply resources. These loaders should supply aget_resource_reader method, which is passed a module name andshould return aTraversableResources instance.
Footnotes
[1]As ofPEP 451 thisinformation is also available on the module’s__spec__.submodule_search_locations attribute, which will not beNone for packages.