Links
setup.cfg filespyproject.toml filespkg_resourcesProject
setup() Keywordsdependency_linkszip_safe flagsetuptools commandsIn the Python ecosystem, the term “data files” is used in various complex scenariosand can have nuanced meanings. For the purposes of this documentation,we define “data files” as non-Python files that are installed alongside Pythonmodules and packages on the user’s machine when they install adistribution viawheel.
These files are typically intended for use atruntime by the package itself orto influence the behavior of other packages or systems.
Old packaging installation methods in the Python ecosystemhave traditionally allowed installation of “data files”, whichare placed in a platform-specific location. However, the most common use casefor data files distributed with a package is for useby the package, usuallyby including the data filesinside the package directory.
Setuptools focuses on this most common type of data files and offers three waysof specifying which files should be included in your packages, as described inthe following section.
include_package_data¶First, you can use theinclude_package_data keyword.
For example, if the package tree looks like this:
project_root_directory├── setup.py # and/or setup.cfg, pyproject.toml└── src └── mypkg ├── __init__.py ├── data1.rst ├── data2.rst ├── data1.txt └── data2.txt
Whenat least one of the following conditions are met:
These files are included via theMANIFEST.in file,like so:
includesrc/mypkg/*.txtincludesrc/mypkg/*.rst
They are being tracked by a revision control system such as Git, Mercurialor SVN,AND you have configured an appropriate plugin such assetuptools-scm orsetuptools-svn.(See the section below onAdding Support for Revision Control Systems for information on how to configure such plugins.)
then all the.txt and.rst files will be included intothe source distribution.
To further include them into thewheels, you can use theinclude_package_data keyword:
[tool.setuptools]# ...# By default, include-package-data is true in pyproject.toml,# so you do NOT have to specify this line.include-package-data=true[tool.setuptools.packages.find]where=["src"]
[options]# ...packages=find:package_dir==srcinclude_package_data=True[options.packages.find]where=src
fromsetuptoolsimportsetup,find_packagessetup(# ...,packages=find_packages(where="src"),package_dir={"":"src"},include_package_data=True)
Note
Added in version v61.0.0:The default value fortool.setuptools.include-package-data istruewhen projects are configured viapyproject.toml.This behaviour differs fromsetup.cfg andsetup.py(whereinclude_package_data isFalse by default), which was not changedto ensure backwards compatibility with existing projects.
package_data¶By default,include_package_data considersall non.py files found insidethe package directory (src/mypkg in this case) as data files, and includes those thatsatisfy (at least) one of the above two conditions into the source distribution, andconsequently in the installation of your package.If you want finer-grained control over what files are included, then you can also usethepackage_data keyword.For example, if the package tree looks like this:
project_root_directory├── setup.py # and/or setup.cfg, pyproject.toml└── src └── mypkg ├── __init__.py ├── data1.rst ├── data2.rst ├── data1.txt └── data2.txt
then you can use the following configuration to capture the.txt and.rst files asdata files:
[tool.setuptools.packages.find]where=["src"][tool.setuptools.package-data]mypkg=["*.txt","*.rst"]
[options]# ...packages=find:package_dir==src[options.packages.find]where=src[options.package_data]mypkg=*.txt*.rst
fromsetuptoolsimportsetup,find_packagessetup(# ...,packages=find_packages(where="src"),package_dir={"":"src"},package_data={"mypkg":["*.txt","*.rst"]})
Thepackage_data argument is a dictionary that maps from package names tolists of glob patterns. Note that the data files specified using thepackage_dataoption neither require to be included within aMANIFEST.infile, nor require to be added by a revision control system plugin.
Note
If your glob patterns use paths, youmust use a forward slash (/) asthe path separator, even if you are on Windows.setuptools automaticallyconverts slashes to appropriate platform-specific separators at build time.
Important
Glob patterns do not automatically match dotfiles, i.e., directory or file namesstarting with a dot (.). To include such files, you must explicitly startthe pattern with a dot, e.g..* to match.gitignore.
If you have multiple top-level packages and a common pattern of data files for all thesepackages, for example:
project_root_directory├── setup.py # and/or setup.cfg, pyproject.toml└── src ├── mypkg1 │ ├── data1.rst │ ├── data1.txt │ └── __init__.py └── mypkg2 ├── data2.txt └── __init__.py
Here, both packagesmypkg1 andmypkg2 share a common pattern of having.txtdata files. However, onlymypkg1 has.rst data files. In such a case, if you want touse thepackage_data option, the following configuration will work:
[tool.setuptools.packages.find]where=["src"][tool.setuptools.package-data]"*"=["*.txt"]mypkg1=["data1.rst"]
[options]packages=find:package_dir==src[options.packages.find]where=src[options.package_data]*=*.txtmypkg1=data1.rst
fromsetuptoolsimportsetup,find_packagessetup(# ...,packages=find_packages(where="src"),package_dir={"":"src"},package_data={"":["*.txt"],"mypkg1":["data1.rst"]},)
Notice that if you list patterns inpackage_data under the empty string"" insetup.py, and the asterisk* insetup.cfg andpyproject.toml, thesepatterns are used to find files in every package. For example, we use"" or*to indicate that the.txt files from all packages should be captured as data files.These placeholders are treated as a special case,setuptoolsdo notsupport glob patterns on package names for this configuration(patterns are only supported on the file paths).Also note how we can continue to specify patterns for individual packages, i.e.we specify thatdata1.rst frommypkg1 alone should be captured as well.
Note
When building ansdist, the data files are also drawn from thepackage_name.egg-info/SOURCES.txt file which works as a form of cache.So make sure that this file is removed ifpackage_data is updated,before re-building the package.
Attention
In Python any directory is considered a package(even if it does not contain__init__.py,seenative namespaces packages onPackaging namespace packages).Therefore, if you are not relying onautomatic discovery,youSHOULD ensure thatall packages (including the ones that don’tcontain any Python files) are included in thepackages configuration(seePackage Discovery and Namespace Packages for more information).
Moreover, it is advisable to use full packages name using the dotnotation instead of a nested path, to avoid error prone configurations.Please checksection subdirectories below.
exclude_package_data¶Sometimes, theinclude_package_data orpackage_data options alonearen’t sufficient to precisely define what files you want included. For example,consider a scenario where you haveinclude_package_data=True, and you are usinga revision control system with an appropriate plugin.Sometimes developers add directory-specific marker files (such as.gitignore,.gitkeep,.gitattributes, or.hgignore), these files are probably beingtracked by the revision control system, and therefore by default they will beincluded when the package is installed.
Supposing you want to prevent these files from being included in theinstallation (they are not relevant to Python or the package), then you coulduse theexclude_package_data option:
[tool.setuptools.packages.find]where=["src"][tool.setuptools.exclude-package-data]mypkg=[".gitattributes"]
[options]# ...packages=find:package_dir==srcinclude_package_data=True[options.packages.find]where=src[options.exclude_package_data]mypkg=.gitattributes
fromsetuptoolsimportsetup,find_packagessetup(# ...,packages=find_packages(where="src"),package_dir={"":"src"},include_package_data=True,exclude_package_data={"mypkg":[".gitattributes"]},)
Theexclude_package_data option is a dictionary mapping package names tolists of wildcard patterns, just like thepackage_data option. And, justas with that option, you can use the empty string key"" insetup.py and theasterisk* insetup.cfg andpyproject.toml to match all top-level packages.
Any files that match these patterns will beexcluded from installation,even if they were listed inpackage_data or were included as a result of usinginclude_package_data.
Meanwhile, to further clarify the interplay between these three keywords,to include certain data file into the source distribution, the followinglogic condition has to be met:
MANIFEST.inor(package-dataandnotexclude-package-data)
In plain language, the file should be either:
included inMANIFEST.in; or
selected bypackage-data AND not excluded byexclude-package-data.
To include some data file into the.whl:
(notexclude-package-data)and((include-package-dataandMANIFEST.in)orpackage-data)
In other words, the file should not be excluded byexclude-package-data(highest priority), AND should be either:
selected bypackage-data; or
selected byMANIFEST.in AND useinclude-package-data=true.
In summary, the three options allow you to:
include_package_dataAccept all data files and directories matched byMANIFEST.in or added byaplugin.
package_dataSpecify additional patterns to match files that may or maynot be matched byMANIFEST.inor added by aplugin.
exclude_package_dataSpecify patterns for data files and directories that shouldnot beincluded when a package is installed, even if they would otherwise havebeen included due to the use of the preceding options.
Note
Due to the way the build process works, a data file that youinclude in your project and then stop including may be “orphaned” in yourproject’s build directories, requiring you to manually deleting them.This may also be important for your users and contributorsif they track intermediate revisions of your project using Subversion; be sureto let them know when you make changes that remove files from inclusion so theycan also manually delete them.
See also troubleshooting information inCaching and Troubleshooting.
A common pattern is where some (or all) of the data files are placed undera separate subdirectory. For example:
project_root_directory├── setup.py # and/or setup.cfg, pyproject.toml└── src └── mypkg ├── data │ ├── data1.rst │ └── data2.rst ├── __init__.py ├── data1.txt └── data2.txt
Here, the.rst files are placed under adata subdirectory insidemypkg,while the.txt files are directly undermypkg.
In this case, the recommended approach is to treatdata as a namespace package(referPEP 420). This way, you can rely on the same methods described above,using either2. package_data or1. include_package_data.For the sake of completeness, we include below configuration examplesfor the subdirectory structure, but please refer to the detailedinformation in the previous sections of this document.
With2. package_data, the configuration might look like this:
# Scanning for namespace packages in the ``src`` directory is true by# default in pyproject.toml, so you do NOT need to include the# `tool.setuptools.packages.find` if it looks like the following:# [tool.setuptools.packages.find]# namespaces = true# where = ["src"][tool.setuptools.package-data]mypkg=["*.txt"]"mypkg.data"=["*.rst"]
[options]# ...packages=find_namespace:package_dir==src[options.packages.find]where=src[options.package_data]mypkg=*.txtmypkg.data=*.rst
fromsetuptoolsimportsetup,find_namespace_packagessetup(# ...,packages=find_namespace_packages(where="src"),package_dir={"":"src"},package_data={"mypkg":["*.txt"],"mypkg.data":["*.rst"],})
In other words, we allowsetuptools to scan for namespace packages in thesrc directory,which enables thedata directory to be identified, and then, we separately specify datafiles for the root packagemypkg, and the namespace packagedata under the packagemypkg.
Alternatively, you can also rely on1. include_package_data.Note that this is the default behaviour inpyproject.toml, but you need tomanually enable scanning of namespace packages insetup.cfg orsetup.py:
[tool.setuptools]# ...# By default, include-package-data is true in pyproject.toml, so you do# NOT have to specify this line.include-package-data=true[tool.setuptools.packages.find]# scanning for namespace packages is true by default in pyproject.toml, so# you need NOT include this configuration.namespaces=truewhere=["src"]
[options]packages=find_namespace:package_dir==srcinclude_package_data=True[options.packages.find]where=src
fromsetuptoolsimportsetup,find_namespace_packagessetup(# ... ,packages=find_namespace_packages(where="src"),package_dir={"":"src"},include_package_data=True,)
To avoid common mistakes with1. include_package_data,please ensureMANIFEST.in is properly setor use a revision control system plugin (seeControlling files in the distribution).
Typically, existing programs manipulate a package’s__file__ attribute inorder to find the location of data files. For example, if you have a structurelike this:
project_root_directory├── setup.py # and/or setup.cfg, pyproject.toml└── src └── mypkg ├── data │ └── data1.txt ├── __init__.py └── foo.py
Then, inmypkg/foo.py, you may try something like this in order to accessmypkg/data/data1.txt:
importosdata_path=os.path.join(os.path.dirname(__file__),'data','data1.txt')withopen(data_path,'r')asdata_file:...
However, this manipulation isn’t compatible withPEP 302-based import hooks,including importing from zip files and Python Eggs. It is strongly recommended that,if you are using data files, you should useimportlib.resources to access them.In this case, you would do something like this:
fromimportlib.resourcesimportfilesdata_text=files('mypkg.data').joinpath('data1.txt').read_text()
importlib.resources was added to Python 3.7. However, the API illustrated inthis code (usingfiles()) was added only in Python 3.9,[2] and supportfor accessing data files via namespace packages was added only in Python 3.10[3](thedata subdirectory is a namespace package under the root packagemypkg).Therefore, you may find this code to work only in Python 3.10 (and above). For otherversions of Python, you are recommended to use theimportlib-resources backportwhich provides the latest version of this library. In this case, the only change thathas to be made to the above code is to replaceimportlib.resources withimportlib_resources, i.e.
fromimportlib_resourcesimportfiles...
SeeUsing importlib_resources for detailed instructions.
Tip
Files inside the package directory should beread-only to avoid aseries of common problems (e.g. when multiple users share a common Pythoninstallation, when the package is loaded from a zip file, or when multipleinstances of a Python application run in parallel).
If your Python package needs to write to a file for shared data or configuration,you can use standard platform/OS-specific system directories, such as~/.local/config/$appname or/usr/share/$appname/$version (Linux specific)[1].A common approach is to add a read-only template file to the packagedirectory that is then copied to the correct system directory if nopre-existing file is found.
You can resort to anative/implicit namespace package (as a container for files)if you want plugins and extensions to your package to contribute with package data files.This way, all files will be listed during runtimewhenusing importlib.resources.Note that, although not strictly guaranteed, mainstream Python package managers,likepip and derived tools, will install files belong to multiple distributionsthat share a same namespace into the same directory in the file system.This means that the overhead forimportlib.resources will be minimum.
Historically,setuptools by way ofeasy_install would encapsulate datafiles from the distribution into the egg (seethe old docs). As eggs are deprecated and pip-based installsfall back to the platform-specific location for installing data files, there isno supported facility to reliably retrieve these resources.
Instead, the PyPA recommends that any data files you wish to be accessible atrun time be includedinside the package.
These locations can be discovered with the help ofthird-party libraries such asplatformdirs.
[2]Reference:https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy
[3]Reference:https://github.com/python/importlib_resources/pull/196#issuecomment-734520374