fsspec: Filesystem interfaces for Python

Filesystem Spec (fsspec) is a project to provide a unified pythonic interface tolocal, remote and embedded file systems and bytes storage.

Brief Overview

There are many places to store bytes, from in memory, to the local disk, clusterdistributed storage, to the cloud. Many files also contain internal mappings of names to bytes,maybe in a hierarchical directory-oriented tree. Working with all these differentstorage media, and their associated libraries, is a pain.fsspec exists toprovide a familiar API that will work the same whatever the storage backend.As much as possible, we iron out the quirks specific to each implementation,so you need do no more than provide credentials for each service you access(if needed) and thereafter not have to worry about the implementation again.

Why

fsspec provides two main concepts: a set of filesystem classes with uniform APIs(i.e., functions such ascp,rm,cat,mkdir, …) supplying operations on a range ofstorage systems; and top-level convenience functions likefsspec.open(), to allowyou to quickly get from a URL to a file-like object that you can use with a third-partylibrary or your own code.

The sectionBackground gives motivation and history of this project, butmost users will want to skip straight toUsage to find out how to usethe package andFeatures of fsspec to see the long list of added functionalityincluded along with the basic file-system interface.

Who usesfsspec?

You can usefsspec’s file objects with any python function that acceptsfile objects, because ofduck typing.

You may well be usingfsspec already without knowing it.The following libraries usefsspec internally for path and file handling:

  1. Dask, the parallel, out-of-core and distributedprogramming platform

  2. Intake, the data source cataloguing and loadinglibrary and its plugins

  3. pandas, the tabular data analysis package

  4. xarray andzarr, multidimensional arraystorage and labelled operations

  5. DVC, version control systemfor machine learning projects

  6. Kedro, a Python framework for reproducible,maintainable and modular data science code

  7. pyxet, a Python library for mounting andaccessing very large datasets from XetHub

  8. Huggingface🤗 Datasets, a popular library toload&manipulate data for Deep Learning models

fsspec filesystems are also supported by:

  1. pyarrow, the in-memory data layout engine

  2. petl, a general purpose package for extracting, transforming and loading tables of data.

… plus many more that we don’t know about.

Installation

fsspec can be installed from PyPI or conda and has no dependencies of its own

pipinstallfsspeccondainstall-cconda-forgefsspec

Not all filesystem implementations are available without installing extradependencies. For example to be able to access data in GCS, you can use the optionalpip install syntax below, or install the specific package required

pipinstallfsspec[gcs]condainstall-cconda-forgegcsfs

fsspec attempts to provide the right message when you attempt to use a filesystemfor which you need additional dependencies.The current list of known implementations can be found as follows

fromfsspec.registryimportknown_implementationsknown_implementations