Datasets (scipy.datasets)#

Dataset Methods#

ascent()

Get an 8-bit grayscale bit-depth, 512 x 512 derived image for easy use in demos.

face([gray])

Get a 1024 x 768, color image of a raccoon face.

electrocardiogram()

Load an electrocardiogram as an example for a 1-D signal.

Utility Methods#

download_all([path])

Utility method to download all the dataset files forscipy.datasets module.

clear_cache([datasets])

Cleans the scipy datasets cache directory.

Usage of Datasets#

SciPy dataset methods can be simply called as follows:'<dataset-name>()'This downloads the dataset files over the network once, and saves the cache,before returning anumpy.ndarray object representing the dataset.

Note that the return data structure and data type might be different fordifferent dataset methods. For a more detailed example on usage, please lookinto the particular dataset method documentation above.

How dataset retrieval and storage works#

SciPy dataset files are stored within individual GitHub repositories under theSciPy GitHub organization, following a naming convention as'dataset-<name>', for examplescipy.datasets.face files live atscipy/dataset-face. Thescipy.datasets submodule utilizesand depends onPooch, a Pythonpackage built to simplify fetching data files. Pooch uses these repos toretrieve the respective dataset files when calling the dataset function.

A registry of all the datasets, essentially a mapping of filenames with theirSHA256 hash and repo urls are maintained, which Pooch uses to handle and verifythe downloads on function call. After downloading the dataset once, the filesare saved in the system cache directory under'scipy-data'.

Dataset cache locations may vary on different platforms.

For macOS:

'~/Library/Caches/scipy-data'

For Linux and other Unix-like platforms:

'~/.cache/scipy-data'# or the value of the XDG_CACHE_HOME env var, if defined

For Windows:

'C:\Users\<user>\AppData\Local\<AppAuthor>\scipy-data\Cache'

In environments with constrained network connectivity for various securityreasons or on systems without continuous internet connections, one may manuallyload the cache of the datasets by placing the contents of the dataset repo inthe above mentioned cache directory to avoid fetching dataset errors withoutthe internet connectivity.