API Reference
User Functions
Return a list of the implemented compressions. | |
Return a list of the implemented protocols. | |
| Instantiate filesystems for given protocol and arguments |
| Mount stuff in a local directory |
| Sync files between two directory trees |
| Fetch named protocol implementation from the registry |
| Create key-value interface for given URL and options |
| Panel-based graphical file selector widget |
| Given a path or paths, return one |
| Given a path or paths, return a list of |
| Open file(s) which can be resolved to local |
- fsspec.available_protocols()[source]
Return a list of the implemented protocols.
Note that any given protocol may require extra packages to be importable.
- fsspec.filesystem(protocol,**storage_options)[source]
Instantiate filesystems for given protocol and arguments
storage_options
are specific to the protocol being chosen, and arepassed directly to the class.
- fsspec.fuse.run(fs,path,mount_point,foreground=True,threads=False,ready_file=False,ops_class=<class'fsspec.fuse.FUSEr'>)[source]
Mount stuff in a local directory
This uses fusepy to make it appear as if a given path on an fsspecinstance is in fact resident within the local file-system.
This requires that fusepy by installed, and that FUSE be available onthe system (typically requiring a package to be installed withapt, yum, brew, etc.).
- Parameters:
- fs: file-system instance
From one of the compatible implementations
- path: str
Location on that file-system to regard as the root directory tomount. Note that you typically should include the terminating “/”character.
- mount_point: str
An empty directory on the local file-system where the contents ofthe remote path will appear.
- foreground: bool
Whether or not calling this function will block. Operation willtypically be more stable if True.
- threads: bool
Whether or not to create threads when responding to file operationswithin the mounter directory. Operation will typically be morestable if False.
- ready_file: bool
Whether the FUSE process is ready. The
.fuse_ready
file willexist in themount_point
directory if True. Debugging purpose.- ops_class: FUSEr or Subclass of FUSEr
To override the default behavior of FUSEr. For Example, loggingto file.
- fsspec.generic.rsync(source,destination,delete_missing=False,source_field='size',dest_field='size',update_cond='different',inst_kwargs=None,fs=None,**kwargs)[source]
Sync files between two directory trees
(experimental)
- Parameters:
- source: str
Root of the directory tree to take files from. This must be a directory, butdo not include any terminating “/” character
- destination: str
Root path to copy into. The contents of this location should beidentical to the contents of
source
when done. This will be made adirectory, and the terminal “/” should not be included.- delete_missing: bool
If there are paths in the destination that don’t exist in thesource and this is True, delete them. Otherwise, leave them alone.
- source_field: str | callable
If
update_field
is “different”, this is the key in the infoof source files to consider for difference. Maybe a function of theinfo dict.- dest_field: str | callable
If
update_field
is “different”, this is the key in the infoof destination files to consider for difference. May be a function ofthe info dict.- update_cond: “different”|”always”|”never”
If “always”, every file is copied, regardless of whether it exists inthe destination. If “never”, files that exist in the destination arenot copied again. If “different” (default), only copy if the infofields given by
source_field
anddest_field
(usually “size”)are different. Other comparisons may be added in the future.- inst_kwargs: dict|None
If
fs
is None, use this set of keyword arguments to make aGenericFileSystem instance- fs: GenericFileSystem|None
Instance to use if explicitly given. The instance defines how toto make downstream file system instances from paths.
- Returns:
- dict of the copy operations that were performed, {source: destination}
- fsspec.get_filesystem_class(protocol)[source]
Fetch named protocol implementation from the registry
The dict
known_implementations
maps protocol names to the locationsof classes implementing the corresponding file-system. When used for thefirst time, appropriate imports will happen and the class will be placed inthe registry. All subsequent calls will fetch directly from the registry.Some protocol implementations require additional dependencies, and so theimport may fail. In this case, the string in the “err” field of the
known_implementations
will be given as the error message.
- fsspec.get_mapper(url='',check=False,create=False,missing_exceptions=None,alternate_root=None,**kwargs)[source]
Create key-value interface for given URL and options
The URL will be of the form “protocol://location” and point to the rootof the mapper required. All keys will be file-names below this location,and their values the contents of each key.
Also accepts compound URLs like zip::s3://bucket/file.zip , see
fsspec.open
.- Parameters:
- url: str
Root URL of mapping
- check: bool
Whether to attempt to read from the location before instantiation, tocheck that the mapping does exist
- create: bool
Whether to make the directory corresponding to the root beforeinstantiating
- missing_exceptions: None or tuple
If given, these exception types will be regarded as missing keys andreturn KeyError when trying to read data. By default, you get(FileNotFoundError, IsADirectoryError, NotADirectoryError)
- alternate_root: None or str
In cases of complex URLs, the parser may fail to pick the correct partfor the mapper root, so this arg can override
- Returns:
FSMap
instance, the dict-like key-value store.
- classfsspec.gui.FileSelector(url=None,filters=None,ignore=None,kwargs=None)[source]
Panel-based graphical file selector widget
Instances of this widget are interactive and can be displayed in jupyter by havingthem as the output of a cell, or in a separate browser tab using
.show()
.- propertyfs
Current filesystem instance
- open_file(mode='rb',compression=None,encoding=None)[source]
Create OpenFile instance for the currently selected item
For example, in a notebook you might do something like
[]:sel=FileSelector();sel# user selects their file[]:withsel.open_file('rb')asf:...out=f.read()
- Parameters:
- mode: str (optional)
Open mode for the file.
- compression: str (optional)
The interact with the file as compressed. Set to ‘infer’ to guesscompression from the file ending
- encoding: str (optional)
If using text mode, use this encoding; defaults to UTF8.
- propertystorage_options
Value of the kwargs box as a dictionary
- propertyurlpath
URL of currently selected item
- fsspec.open(urlpath,mode='rb',compression=None,encoding='utf8',errors=None,protocol=None,newline=None,expand=None,**kwargs)[source]
Given a path or paths, return one
OpenFile
object.- Parameters:
- urlpath: string or list
Absolute or relative filepath. Prefix with a protocol like
s3://
to read from alternative filesystems. Should not include globcharacter(s).- mode: ‘rb’, ‘wt’, etc.
- compression: string or None
If given, open file using compression codec. Can either be a compressionname (a key in
fsspec.compression.compr
) or “infer” to guess thecompression from the filename suffix.- encoding: str
For text mode only
- errors: None or str
Passed to TextIOWrapper in text mode
- protocol: str or None
If given, overrides the protocol found in the URL.
- newline: bytes or None
Used for line terminator in text mode. If None, uses system default;if blank, uses no translation.
- expand: bool or None
Whether to regard file paths containing special glob characters as needingexpansion (finding the first match) or absolute. Setting False allows usingpaths which do embed such characters. If None (default), this argumenttakes its value from the DEFAULT_EXPAND module variable, which takesits initial value from the “open_expand” config value at startup, which willbe False if not set.
- **kwargs: dict
Extra options that make sense to a particular storage connection, e.g.host, port, username, password, etc.
- Returns:
OpenFile
object.
Notes
For a full list of the available protocols and the implementations thatthey map across to see the latest online documentation:
For implementations built into
fsspec
seehttps://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementationsFor implementations in separate packages seehttps://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
Examples
>>>openfile=open('2015-01-01.csv')>>>openfile=open(...'s3://bucket/2015-01-01.csv.gz',compression='gzip'...)>>>withopenfileasf:...df=pd.read_csv(f)...
- fsspec.open_files(urlpath,mode='rb',compression=None,encoding='utf8',errors=None,name_function=None,num=1,protocol=None,newline=None,auto_mkdir=True,expand=True,**kwargs)[source]
Given a path or paths, return a list of
OpenFile
objects.For writing, a str path must contain the “*” character, which will be filledin by increasing numbers, e.g., “part*” -> “part1”, “part2” if num=2.
For either reading or writing, can instead provide explicit list of paths.
- Parameters:
- urlpath: string or list
Absolute or relative filepath(s). Prefix with a protocol like
s3://
to read from alternative filesystems. To read from multiple files youcan pass a globstring or a list of paths, with the caveat that theymust all have the same protocol.- mode: ‘rb’, ‘wt’, etc.
- compression: string or None
If given, open file using compression codec. Can either be a compressionname (a key in
fsspec.compression.compr
) or “infer” to guess thecompression from the filename suffix.- encoding: str
For text mode only
- errors: None or str
Passed to TextIOWrapper in text mode
- name_function: function or None
if opening a set of files for writing, those files do not yet exist,so we need to generate their names by formatting the urlpath foreach sequence number
- num: int [1]
if writing mode, number of files we expect to create (passed toname+function)
- protocol: str or None
If given, overrides the protocol found in the URL.
- newline: bytes or None
Used for line terminator in text mode. If None, uses system default;if blank, uses no translation.
- auto_mkdir: bool (True)
If in write mode, this will ensure the target directory exists beforewriting, by calling
fs.mkdirs(exist_ok=True)
.- expand: bool
- **kwargs: dict
Extra options that make sense to a particular storage connection, e.g.host, port, username, password, etc.
- Returns:
- An
OpenFiles
instance, which is a list ofOpenFile
objects that can - be used as a single context
- An
Notes
For a full list of the available protocols and the implementations thatthey map across to see the latest online documentation:
For implementations built into
fsspec
seehttps://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementationsFor implementations in separate packages seehttps://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
Examples
>>>files=open_files('2015-*-*.csv')>>>files=open_files(...'s3://bucket/2015-*-*.csv.gz',compression='gzip'...)
- fsspec.open_local(url:str|list[str]|Path|list[Path],mode:str='rb',**storage_options:dict)→str|list[str][source]
Open file(s) which can be resolved to local
For files which either are local, or get downloaded upon open(e.g., by file caching)
- Parameters:
- url: str or list(str)
- mode: str
Must be read mode
- storage_options:
passed on to FS for or used by open_files (e.g., compression)
Base Classes
A generic superclass for implementing Archive-based filesystems. | |
| Async file operations, default implementations |
| Base class and interface for callback mechanism |
Simple example Callback implementation | |
| This implementation of Callback does exactly nothing |
| A callback to display a progress bar using tqdm |
| Pass-though cache: doesn't keep anything, calls every time |
| File-like object to be used in a context |
| List of OpenFile instances |
| Filesystem, deterministic token, and paths from a urlpath and options. |
| Turn fully-qualified and potentially chained URL into filesystem instance |
| Caching of directory listings, in a structure like. |
| Wrap a FileSystem instance as a mutable wrapping. |
| Wrapper over all other FS types |
Add implementation class to the registry | |
| Convenient class to derive from to provide buffering |
| An abstract super-class for pythonic file-systems |
| Filesystem transaction write context |
- classfsspec.archive.AbstractArchiveFileSystem(*args,**kwargs)[source]
A generic superclass for implementing Archive-based filesystems.
Currently, it is shared amongst
ZipFileSystem
,LibArchiveFileSystem
andTarFileSystem
.- info(path,**kwargs)[source]
Give details of entry at path
Returns a single dictionary, with exactly the same information as
ls
would withdetail=True
.The default implementation calls ls and could be overridden by ashortcut. kwargs are passed on to
`ls()
.Some file systems might not be able to measure the file’s size, inwhich case, the returned dict will include
'size':None
.- Returns:
- dict with keys: name (full path in the FS), size (in bytes), type (file,
- directory, or something else) and other FS-specific keys.
- ls(path,detail=True,**kwargs)[source]
List objects at path.
This should include subdirectories and files at that location. Thedifference between a file and a directory must be clear when detailsare requested.
The specific keys, or perhaps a FileInfo class, or similar, is TBD,but must be consistent across implementations.Must include:
full path to the entry (without protocol)
size of the entry, in bytes. If the value cannot be determined, willbe
None
.type of entry, “file”, “directory” or other
Additional informationmay be present, appropriate to the file-system, e.g., generation,checksum, etc.
May use refresh=True|False to allow use of self._ls_from_cache tocheck for a saved listing and avoid calling the backend. This would becommon where listing may be expensive.
- Parameters:
- path: str
- detail: bool
if True, gives a list of dictionaries, where each is the same asthe result of
info(path)
. If False, gives a list of paths(str).- kwargs: may have additional backend-specific options, such as version
information
- Returns:
- List of strings if detail is False, or list of directory information
- dicts if detail is True.
- classfsspec.callbacks.Callback(size=None,value=0,hooks=None,**kwargs)[source]
Base class and interface for callback mechanism
This class can be used directly for monitoring file transfers byproviding
callback=Callback(hooks=...)
(see thehooks
argument,below), or subclassed for more specialised behaviour.- Parameters:
- size: int (optional)
Nominal quantity for the value that corresponds to a completetransfer, e.g., total number of tiles or total number ofbytes
- value: int (0)
Starting internal counter value
- hooks: dict or None
A dict of named functions to be called on each update. The signatureof these must be
f(size,value,**kwargs)
- classmethodas_callback(maybe_callback=None)[source]
Transform callback=… into Callback instance
For the special value of
None
, return the global instance ofNoOpCallback
. This is an alternative to includingcallback=DEFAULT_CALLBACK
directly in a method signature.
- branch(path_1,path_2,kwargs)[source]
Set callbacks for child transfers
If this callback is operating at a higher level, e.g., put, which maytrigger transfers that can also be monitored. The passed kwargs areto bemutated to add
callback=
, if this class supports branchingto children.- Parameters:
- path_1: str
Child’s source path
- path_2: str
Child’s destination path
- kwargs: dict
arguments passed to child method, e.g., put_file.
- Returns:
- branched(path_1,path_2,**kwargs)[source]
Return callback for child transfers
If this callback is operating at a higher level, e.g., put, which maytrigger transfers that can also be monitored. The function returns a callbackthat has to be passed to the child method, e.g., put_file,as
callback=
argument.The implementation uses
callback.branch
for compatibility.When implementing callbacks, it is recommended to override this function insteadofbranch
and avoid callingsuper().branched(...)
.Prefer using this function over
branch
.- Parameters:
- path_1: str
Child’s source path
- path_2: str
Child’s destination path
- **kwargs:
Arbitrary keyword arguments
- Returns:
- callback: Callback
A callback instance to be passed to the child method
- call(hook_name=None,**kwargs)[source]
Execute hook(s) with current state
Each function is passed the internal size and current value
- Parameters:
- hook_name: str or None
If given, execute on this hook
- kwargs: passed on to (all) hook(s)
- relative_update(inc=1)[source]
Delta increment the internal counter
Triggers
call()
- Parameters:
- inc: int
- classfsspec.callbacks.DotPrinterCallback(chr_to_print='#',**kwargs)[source]
Simple example Callback implementation
Almost identical to Callback with a hook that prints a char; here wedemonstrate how the outer layer may print “#” and the inner layer “.”
- classfsspec.callbacks.NoOpCallback(size=None,value=0,hooks=None,**kwargs)[source]
This implementation of Callback does exactly nothing
- classfsspec.callbacks.TqdmCallback(tqdm_kwargs=None,*args,**kwargs)[source]
A callback to display a progress bar using tqdm
- Parameters:
- tqdm_kwargsdict, (optional)
Any argument accepted by the tqdm constructor.See thetqdm doc.Will be forwarded to
tqdm_cls
.- tqdm_cls: (optional)
subclass of
tqdm.tqdm
. If not passed, it will default totqdm.tqdm
.
Examples
>>>importfsspec>>>fromfsspec.callbacksimportTqdmCallback>>>fs=fsspec.filesystem("memory")>>>path2distant_data="/your-path">>>fs.upload( ".", path2distant_data, recursive=True, callback=TqdmCallback(), )
You can forward args to tqdm using the
tqdm_kwargs
parameter.>>>fs.upload( ".", path2distant_data, recursive=True, callback=TqdmCallback(tqdm_kwargs={"desc": "Your tqdm description"}), )
You can also customize the progress bar by passing a subclass of
tqdm
.classTqdmFormat(tqdm):'''Provides a `total_time` format parameter'''@propertydefformat_dict(self):d=super().format_dicttotal_time=d["elapsed"]*(d["total"]or0)/max(d["n"],1)d.update(total_time=self.format_interval(total_time)+" in total")returnd
>>>withTqdmCallback( tqdm_kwargs={ "desc": "desc", "bar_format": "{total_time}: {percentage:.0f}%|{bar}{r_bar}", }, tqdm_cls=TqdmFormat, ) as callback: fs.upload(".", path2distant_data, recursive=True, callback=callback)
- classfsspec.core.BaseCache(blocksize:int,fetcher:Callable[[int,int],bytes],size:int)[source]
Pass-though cache: doesn’t keep anything, calls every time
Acts as base class for other cachers
- Parameters:
- blocksize: int
How far to read ahead in numbers of bytes
- fetcher: func
Function of the form f(start, end) which gets bytes from remote asspecified
- size: int
How big this file is
- classfsspec.core.OpenFile(fs,path,mode='rb',compression=None,encoding=None,errors=None,newline=None)[source]
File-like object to be used in a context
Can layer (buffered) text-mode and compression over any file-system, whichare typically binary-only.
These instances are safe to serialize, as the low-level file objectis not created until invoked using
with
.- Parameters:
- fs: FileSystem
The file system to use for opening the file. Should be a subclass or duck-typewith
fsspec.spec.AbstractFileSystem
- path: str
Location to open
- mode: str like ‘rb’, optional
Mode of the opened file
- compression: str or None, optional
Compression to apply
- encoding: str or None, optional
The encoding to use if opened in text mode.
- errors: str or None, optional
How to handle encoding errors if opened in text mode.
- newline: None or str
Passed to TextIOWrapper in text mode, how to handle line endings.
- autoopen: bool
If True, calls open() immediately. Mostly used by pickle
- pos: int
If given and autoopen is True, seek to this location immediately
- classfsspec.core.OpenFiles(*args,mode='rb',fs=None)[source]
List of OpenFile instances
Can be used in a single context, which opens and closes all of thecontained files. Normal list access to get the elements works asnormal.
A special case is made for caching filesystems - the files willbe down/uploaded together at the start or end of the context, andthis may happen concurrently, if the target filesystem supports it.
- fsspec.core.get_fs_token_paths(urlpath,mode='rb',num=1,name_function=None,storage_options=None,protocol=None,expand=True)[source]
Filesystem, deterministic token, and paths from a urlpath and options.
- Parameters:
- urlpath: string or iterable
Absolute or relative filepath, URL (may include protocols like
s3://
), or globstring pointing to data.- mode: str, optional
Mode in which to open files.
- num: int, optional
If opening in writing mode, number of files we expect to create.
- name_function: callable, optional
If opening in writing mode, this callable is used to generate pathnames. Names are generated for each partition by
urlpath.replace('*',name_function(partition_index))
.- storage_options: dict, optional
Additional keywords to pass to the filesystem class.
- protocol: str or None
To override the protocol specifier in the URL
- expand: bool
Expand string paths for writing, assuming the path is a directory
- fsspec.core.url_to_fs(url,**kwargs)[source]
Turn fully-qualified and potentially chained URL into filesystem instance
- Parameters:
- urlstr
The fsspec-compatible URL
- **kwargs: dict
Extra options that make sense to a particular storage connection, e.g.host, port, username, password, etc.
- Returns:
- filesystemFileSystem
The new filesystem discovered from
url
and created with**kwargs
.- urlpathstr
The file-systems-specific URL for
url
.
- classfsspec.dircache.DirCache(use_listings_cache=True,listings_expiry_time=None,max_paths=None,**kwargs)[source]
Caching of directory listings, in a structure like:
{"path0":[{"name":"path0/file0","size":123,"type":"file",...},{"name":"path0/file1",},...],"path1":[...]}
Parameters to this class control listing expiry or indeed turncaching off
- __init__(use_listings_cache=True,listings_expiry_time=None,max_paths=None,**kwargs)[source]
- Parameters:
- use_listings_cache: bool
If False, this cache never returns items, but always reports KeyError,and setting items has no effect
- listings_expiry_time: int or float (optional)
Time in seconds that a listing is considered valid. If None,listings do not expire.
- max_paths: int (optional)
The number of most recent listings that are considered valid; ‘recent’refers to when the entry was set.
- classfsspec.FSMap(root,fs,check=False,create=False,missing_exceptions=None)[source]
Wrap a FileSystem instance as a mutable wrapping.
The keys of the mapping become files under the given root, and thevalues (which must be bytes) the contents of those files.
- Parameters:
- root: string
prefix for all the files
- fs: FileSystem instance
- check: bool (=True)
performs a touch at the location, to check for write access.
Examples
>>>fs=FileSystem(**parameters)>>>d=FSMap('my-data/path/',fs)or, more likely>>>d=fs.get_mapper('my-data/path/')
>>>d['loc1']=b'Hello World'>>>list(d.keys())['loc1']>>>d['loc1']b'Hello World'
- propertydirfs
dirfs instance that can be used with the same keys as the mapper
- getitems(keys,on_error='raise')[source]
Fetch multiple items from the store
If the backend is async-able, this might proceed concurrently
- Parameters:
- keys: list(str)
They keys to be fetched
- on_error“raise”, “omit”, “return”
If raise, an underlying exception will be raised (converted to KeyErrorif the type is in self.missing_exceptions); if omit, keys with exceptionwill simply not be included in the output; if “return”, all keys areincluded in the output, but the value will be bytes or an exceptioninstance.
- Returns:
- dict(key, bytes|exception)
- classfsspec.generic.GenericFileSystem(*args,**kwargs)[source]
Wrapper over all other FS types
<experimental!>
This implementation is a single unified interface to be able to run FS operationsover generic URLs, and dispatch to the specific implementations using the URLprotocol prefix.
Note: instances of this FS are always async, even if you never use it with any asyncbackend.
- fsspec.registry.register_implementation(name,cls,clobber=False,errtxt=None)[source]
Add implementation class to the registry
- Parameters:
- name: str
Protocol name to associate with the class
- cls: class or str
if a class: fsspec-compliant implementation class (normally inherits from
fsspec.AbstractFileSystem
, gets added straight to the registry. If astr, the full path to an implementation class like package.module.class,which gets added to known_implementations,so the import is deferred until the filesystem is actually used.- clobber: bool (optional)
Whether to overwrite a protocol with the same name; if False, will raiseinstead.
- errtxt: str (optional)
If given, then a failure to import the given class will result in thistext being given.
- classfsspec.spec.AbstractBufferedFile(fs,path,mode='rb',block_size='default',autocommit=True,cache_type='readahead',cache_options=None,size=None,**kwargs)[source]
Convenient class to derive from to provide buffering
In the case that the backend does not provide a pythonic file-like objectalready, this class contains much of the logic to build one. The onlymethods that need to be overridden are
_upload_chunk
,_initiate_upload
and_fetch_range
.- flush(force=False)[source]
Write buffered data to backend store.
Writes the current buffer, if it is larger than the block-size, or ifthe file is being closed.
- Parameters:
- force: bool
When closing, write the last block even if it is smaller thanblocks are allowed to be. Disallows further writing to this file.
- read(length=-1)[source]
Return data from cache, or fetch pieces as necessary
- Parameters:
- length: int (-1)
Number of bytes to read; if <0, all remaining bytes.
- readinto(b)[source]
mirrors builtin file’s readinto method
https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
- readline()[source]
Read until and including the first occurrence of newline character
Note that, because of character encoding, this is not necessarily atrue line ending.
- readlines()[source]
Return all data, split by the newline character, including the newline character
- readuntil(char=b'\n',blocks=None)[source]
Return data between current position and first occurrence of char
char is included in the output, except if the end of the tile isencountered first.
- Parameters:
- char: bytes
Thing to find
- blocks: None or int
How much to read in each go. Defaults to file blocksize - which maymean a new read on every call.
- classfsspec.spec.AbstractFileSystem(*args,**kwargs)[source]
An abstract super-class for pythonic file-systems
Implementations are expected to be compatible with or, better, subclassfrom here.
- cat(path,recursive=False,on_error='raise',**kwargs)[source]
Fetch (potentially multiple) paths’ contents
- Parameters:
- recursive: bool
If True, assume the path(s) are directories, and get all thecontained files
- on_error“raise”, “omit”, “return”
If raise, an underlying exception will be raised (converted to KeyErrorif the type is in self.missing_exceptions); if omit, keys with exceptionwill simply not be included in the output; if “return”, all keys areincluded in the output, but the value will be bytes or an exceptioninstance.
- kwargs: passed to cat_file
- Returns:
- dict of {path: contents} if there are multiple paths
- or the path has been otherwise expanded
- cat_file(path,start=None,end=None,**kwargs)[source]
Get the content of a file
- Parameters:
- path: URL of file on this filesystems
- start, end: int
Bytes limits of the read. If negative, backwards from end,like usual python slices. Either can be None for start orend of file, respectively
- kwargs: passed to ``open()``.
- cat_ranges(paths,starts,ends,max_gap=None,on_error='return',**kwargs)[source]
Get the contents of byte ranges from one or more files
- Parameters:
- paths: list
A list of of filepaths on this filesystems
- starts, ends: int or list
Bytes limits of the read. If using a single int, the same value will beused to read all the specified files.
- checksum(path)[source]
Unique value for current version of file
If the checksum is the same from one moment to another, the contentsare guaranteed to be the same. If the checksum changes, the contentsmight have changed.
This should normally be overridden; default will probably capturecreation/modification timestamp (which would be good) or maybeaccess timestamp (which would be bad)
- classmethodclear_instance_cache()[source]
Clear the cache of filesystem instances.
Notes
Unless overridden by setting the
cachable
class attribute to False,the filesystem class stores a reference to newly created instances. Thisprevents Python’s normal rules around garbage collection from working,since the instances refcount will not drop to zero untilclear_instance_cache
is called.
- copy(path1,path2,recursive=False,maxdepth=None,on_error=None,**kwargs)[source]
Copy within two locations in the filesystem
- on_error“raise”, “ignore”
If raise, any not-found exceptions will be raised; if ignore anynot-found exceptions will cause the path to be skipped; defaults toraise unless recursive is true, where the default is ignore
- cp(path1,path2,**kwargs)[source]
Alias of
AbstractFileSystem.copy
.
- classmethodcurrent()[source]
Return the most recently instantiated FileSystem
If no instance has been created, then create one with defaults
- delete(path,recursive=False,maxdepth=None)[source]
Alias of
AbstractFileSystem.rm
.
- disk_usage(path,total=True,maxdepth=None,**kwargs)[source]
Alias of
AbstractFileSystem.du
.
- download(rpath,lpath,recursive=False,**kwargs)[source]
Alias of
AbstractFileSystem.get
.
- du(path,total=True,maxdepth=None,withdirs=False,**kwargs)[source]
Space used by files and optionally directories within a path
Directory size does not include the size of its contents.
- Parameters:
- path: str
- total: bool
Whether to sum all the file sizes
- maxdepth: int or None
Maximum number of directory levels to descend, None for unlimited.
- withdirs: bool
Whether to include directory paths in the output.
- kwargs: passed to ``find``
- Returns:
- Dict of {path: size} if total=False, or int otherwise, where numbers
- refer to bytes used.
- expand_path(path,recursive=False,maxdepth=None,**kwargs)[source]
Turn one or more globs or directories into a list of all matching pathsto files or directories.
kwargs are passed to
glob
orfind
, which may in turn callls
- find(path,maxdepth=None,withdirs=False,detail=False,**kwargs)[source]
List all files below path.
Like posix
find
command without conditions- Parameters:
- pathstr
- maxdepth: int or None
If not None, the maximum number of levels to descend
- withdirs: bool
Whether to include directory paths in the output. This is Truewhen used by glob, but users usually only want files.
- kwargs are passed to ``ls``.
- staticfrom_dict(dct:dict[str,Any])→AbstractFileSystem[source]
Recreate a filesystem instance from dictionary representation.
See
.to_dict()
for the expected structure of the input.- Parameters:
- dct: Dict[str, Any]
- Returns:
- file system instance, not necessarily of this particular class.
Warning
This can import arbitrary modules (as determined by the
cls
key).Make sure you haven’t installed any modules that may execute malicious codeat import time.
- staticfrom_json(blob:str)→AbstractFileSystem[source]
Recreate a filesystem instance from JSON representation.
See
.to_json()
for the expected structure of the input.- Parameters:
- blob: str
- Returns:
- file system instance, not necessarily of this particular class.
Warning
This can import arbitrary modules (as determined by the
cls
key).Make sure you haven’t installed any modules that may execute malicious codeat import time.
- propertyfsid
Persistent filesystem id that can be used to compare filesystemsacross sessions.
- get(rpath,lpath,recursive=False,callback=<fsspec.callbacks.NoOpCallbackobject>,maxdepth=None,**kwargs)[source]
Copy file(s) to local.
Copies a specific file or tree of files (if recursive=True). If lpathends with a “/”, it will be assumed to be a directory, and target fileswill go within. Can submit a list of paths, which may be glob-patternsand will be expanded.
Calls get_file for each source.
- get_file(rpath,lpath,callback=<fsspec.callbacks.NoOpCallbackobject>,outfile=None,**kwargs)[source]
Copy single remote file to local
- get_mapper(root='',check=False,create=False,missing_exceptions=None)[source]
Create key/value store based on this file-system
Makes a MutableMapping interface to the FS at the given root path.See
fsspec.mapping.FSMap
for further details.
- glob(path,maxdepth=None,**kwargs)[source]
Find files by glob-matching.
Pattern matching capabilities for finding files that match the given pattern.
- Parameters:
- path: str
The glob pattern to match against
- maxdepth: int or None
Maximum depth for
'**'
patterns. Applied on the first'**'
found.Must be at least 1 if provided.- kwargs:
Additional arguments passed to
find
(e.g., detail=True)
- Returns:
- List of matched paths, or dict of paths and their info if detail=True
Notes
Supported patterns:- ‘*’: Matches any sequence of characters within a single directory level-
'**'
: Matches any number of directory levels (must be an entire path component)- ‘?’: Matches exactly one character- ‘[abc]’: Matches any character in the set- ‘[a-z]’: Matches any character in the range- ‘[!abc]’: Matches any character NOT in the setSpecial behaviors:- If the path ends with ‘/’, only folders are returned- Consecutive ‘*’ characters are compressed into a single ‘*’- Empty brackets ‘[]’ never match anything- Negated empty brackets ‘[!]’ match any single character- Special characters in character classes are escaped properly
Limitations:-
'**'
must be a complete path component (e.g.,'a/**/b'
, not'a**b'
)- No brace expansion (‘{a,b}.txt’)- No extended glob patterns (‘+(pattern)’, ‘!(pattern)’)
- info(path,**kwargs)[source]
Give details of entry at path
Returns a single dictionary, with exactly the same information as
ls
would withdetail=True
.The default implementation calls ls and could be overridden by ashortcut. kwargs are passed on to
`ls()
.Some file systems might not be able to measure the file’s size, inwhich case, the returned dict will include
'size':None
.- Returns:
- dict with keys: name (full path in the FS), size (in bytes), type (file,
- directory, or something else) and other FS-specific keys.
- invalidate_cache(path=None)[source]
Discard any cached directory information
- Parameters:
- path: string or None
If None, clear all listings cached else listings at or under givenpath.
- listdir(path,detail=True,**kwargs)[source]
Alias of
AbstractFileSystem.ls
.
- ls(path,detail=True,**kwargs)[source]
List objects at path.
This should include subdirectories and files at that location. Thedifference between a file and a directory must be clear when detailsare requested.
The specific keys, or perhaps a FileInfo class, or similar, is TBD,but must be consistent across implementations.Must include:
full path to the entry (without protocol)
size of the entry, in bytes. If the value cannot be determined, willbe
None
.type of entry, “file”, “directory” or other
Additional informationmay be present, appropriate to the file-system, e.g., generation,checksum, etc.
May use refresh=True|False to allow use of self._ls_from_cache tocheck for a saved listing and avoid calling the backend. This would becommon where listing may be expensive.
- Parameters:
- path: str
- detail: bool
if True, gives a list of dictionaries, where each is the same asthe result of
info(path)
. If False, gives a list of paths(str).- kwargs: may have additional backend-specific options, such as version
information
- Returns:
- List of strings if detail is False, or list of directory information
- dicts if detail is True.
- makedir(path,create_parents=True,**kwargs)[source]
Alias of
AbstractFileSystem.mkdir
.
- makedirs(path,exist_ok=False)[source]
Recursively make directories
Creates directory at path and any intervening required directories.Raises exception if, for instance, the path already exists but is afile.
- Parameters:
- path: str
leaf directory name
- exist_ok: bool (False)
If False, will error if the target already exists
- mkdir(path,create_parents=True,**kwargs)[source]
Create directory entry at path
For systems that don’t have true directories, may create an forthis instance only and not touch the real filesystem
- Parameters:
- path: str
location
- create_parents: bool
if True, this is equivalent to
makedirs
- kwargs:
may be permissions, etc.
- mkdirs(path,exist_ok=False)[source]
Alias of
AbstractFileSystem.makedirs
.
- move(path1,path2,**kwargs)[source]
Alias of
AbstractFileSystem.mv
.
- mv(path1,path2,recursive=False,maxdepth=None,**kwargs)[source]
Move file(s) from one location to another
- open(path,mode='rb',block_size=None,cache_options=None,compression=None,**kwargs)[source]
Return a file-like object from the filesystem
The resultant instance must function correctly in a context
with
block.- Parameters:
- path: str
Target file
- mode: str like ‘rb’, ‘w’
See builtin
open()
Mode “x” (exclusive write) may be implemented by the backend. Even ifit is, whether it is checked up front or on commit, and whether it isatomic is implementation-dependent.- block_size: int
Some indication of buffering - this is a value in bytes
- cache_optionsdict, optional
Extra arguments to pass through to the cache.
- compression: string or None
If given, open file using compression codec. Can either be a compressionname (a key in
fsspec.compression.compr
) or “infer” to guess thecompression from the filename suffix.- encoding, errors, newline: passed on to TextIOWrapper for text mode
- pipe(path,value=None,**kwargs)[source]
Put value into path
(counterpart to
cat
)- Parameters:
- path: string or dict(str, bytes)
If a string, a single remote location to put
value
bytes; if a dict,a mapping of {path: bytesvalue}.- value: bytes, optional
If using a single path, these are the bytes to put there. Ignored if
path
is a dict
- put(lpath,rpath,recursive=False,callback=<fsspec.callbacks.NoOpCallbackobject>,maxdepth=None,**kwargs)[source]
Copy file(s) from local.
Copies a specific file or tree of files (if recursive=True). If rpathends with a “/”, it will be assumed to be a directory, and target fileswill go within.
Calls put_file for each source.
- put_file(lpath,rpath,callback=<fsspec.callbacks.NoOpCallbackobject>,mode='overwrite',**kwargs)[source]
Copy single file to remote
- read_block(fn,offset,length,delimiter=None)[source]
Read a block of bytes from
Starting at
offset
of the file, readlength
bytes. Ifdelimiter
is set then we ensure that the read starts and stops atdelimiter boundaries that follow the locationsoffset
andoffset+length
. Ifoffset
is zero then we start at zero. Thebytestring returned WILL include the end delimiter string.If offset+length is beyond the eof, reads to eof.
- Parameters:
- fn: string
Path to filename
- offset: int
Byte offset to start read
- length: int
Number of bytes to read. If None, read to end.
- delimiter: bytes (optional)
Ensure reading starts and stops at delimiter bytestring
See also
Examples
>>>fs.read_block('data/file.csv',0,13)b'Alice, 100\nBo'>>>fs.read_block('data/file.csv',0,13,delimiter=b'\n')b'Alice, 100\nBob, 200\n'
Use
length=None
to read to the end of the file.>>> fs.read_block(‘data/file.csv’, 0, None, delimiter=b’n’) # doctest: +SKIPb’Alice, 100nBob, 200nCharlie, 300’
- read_bytes(path,start=None,end=None,**kwargs)[source]
Alias of
AbstractFileSystem.cat_file
.
- read_text(path,encoding=None,errors=None,newline=None,**kwargs)[source]
Get the contents of the file as a string.
- Parameters:
- path: str
URL of file on this filesystems
- encoding, errors, newline: same as `open`.
- rename(path1,path2,**kwargs)[source]
Alias of
AbstractFileSystem.mv
.
- rm(path,recursive=False,maxdepth=None)[source]
Delete files.
- Parameters:
- path: str or list of str
File(s) to delete.
- recursive: bool
If file(s) are directories, recursively delete contents and thenalso remove the directory
- maxdepth: int or None
Depth to pass to walk for finding files to delete, if recursive.If None, there will be no limit and infinite recursion may bepossible.
- sign(path,expiration=100,**kwargs)[source]
Create a signed URL representing the given path
Some implementations allow temporary URLs to be generated, as away of delegating credentials.
- Parameters:
- pathstr
The path on the filesystem
- expirationint
Number of seconds to enable the URL for (if supported)
- Returns:
- URLstr
The signed URL
- Raises:
- NotImplementedErrorif method is not implemented for a filesystem
- stat(path,**kwargs)[source]
Alias of
AbstractFileSystem.info
.
- to_dict(*,include_password:bool=True)→dict[str,Any][source]
JSON-serializable dictionary representation of this filesystem instance.
- Parameters:
- include_password: bool, default True
Whether to include the password (if any) in the output.
- Returns:
- Dictionary with keys
cls
(the python location of this class), - protocol (text name of this class’s protocol, first one in case of
- multiple),
args
(positional args, usually empty), and all other - keyword arguments as their own keys.
- Dictionary with keys
Warning
Serialized filesystems may contain sensitive information which have beenpassed to the constructor, such as passwords and tokens. Make sure youstore and send them in a secure environment!
- to_json(*,include_password:bool=True)→str[source]
JSON representation of this filesystem instance.
- Parameters:
- include_password: bool, default True
Whether to include the password (if any) in the output.
- Returns:
- JSON string with keys
cls
(the python location of this class), - protocol (text name of this class’s protocol, first one in case of
- multiple),
args
(positional args, usually empty), and all other - keyword arguments as their own keys.
- JSON string with keys
Warning
Serialized filesystems may contain sensitive information which have beenpassed to the constructor, such as passwords and tokens. Make sure youstore and send them in a secure environment!
- touch(path,truncate=True,**kwargs)[source]
Create empty file, or update timestamp
- Parameters:
- path: str
file location
- truncate: bool
If True, always set file size to 0; if False, update timestamp andleave file unchanged, if backend allows this
- propertytransaction
A context within which files are committed together upon exit
Requires the file class to implement
commit()
anddiscard()
for the normal and exception cases.
- transaction_type
alias of
Transaction
- tree(path:str='/',recursion_limit:int=2,max_display:int=25,display_size:bool=False,prefix:str='',is_last:bool=True,first:bool=True,indent_size:int=4)→str[source]
Return a tree-like structure of the filesystem starting from the given path as a string.
- Parameters:
- path: Root path to start traversal from
- recursion_limit: Maximum depth of directory traversal
- max_display: Maximum number of items to display per directory
- display_size: Whether to display file sizes
- prefix: Current line prefix for visual tree structure
- is_last: Whether current item is last in its level
- first: Whether this is the first call (displays root path)
- indent_size: Number of spaces by indent
- Returns:
- str: A string representing the tree structure.
- upload(lpath,rpath,recursive=False,**kwargs)[source]
Alias of
AbstractFileSystem.put
.
- walk(path,maxdepth=None,topdown=True,on_error='omit',**kwargs)[source]
Return all files under the given path.
List all files, recursing into subdirectories; output is iterator-style,like
os.walk()
. For a simple list of files,find()
is available.When topdown is True, the caller can modify the dirnames list in-place (perhapsusing del or slice assignment), and walk() willonly recurse into the subdirectories whose names remain in dirnames;this can be used to prune the search, impose a specific order of visiting,or even to inform walk() about directories the caller creates or renames beforeit resumes walk() again.Modifying dirnames when topdown is False has no effect. (see os.walk)
Note that the “files” outputted will include anything that is nota directory, such as links.
- Parameters:
- path: str
Root to recurse into
- maxdepth: int
Maximum recursion depth. None means limitless, but not recommendedon link-based file-systems.
- topdown: bool (True)
Whether to walk the directory tree from the top downwards or fromthe bottom upwards.
- on_error: “omit”, “raise”, a callable
if omit (default), path with exception will simply be empty;If raise, an underlying exception will be raised;if callable, it will be called with a single OSError instance as argument
- kwargs: passed to ``ls``
- write_bytes(path,value,**kwargs)[source]
Alias of
AbstractFileSystem.pipe_file
.
Built-in Implementations
FSSpec-compatible wrapper of pyarrow.fs.FileSystem. | |
A wrapper on top of the pyarrow.fs.HadoopFileSystem to connect it's interface with fsspec | |
Locally caching filesystem, layer over any other FS | |
Caches whole remote files on first access | |
Caches whole remote files on first access | |
View files accessible to a worker as any other remote file-system | |
A handy decoder for data-URLs | |
Get access to the Databricks filesystem implementation over HTTP. | |
Directory prefix filesystem | |
A filesystem over classic FTP | |
Browse the files of a local git repo at any hash/tag/branch | |
Interface to files in github | |
Simple File-System for fetching data via HTTP(S) | |
View of the files as seen by a Jupyter server (notebook or lab) | |
Compressed archives as a file-system (read-only) | |
Interface to files on local storage | |
A filesystem based on a dict of BytesIO objects | |
View byte ranges of some other file as a file system Initial version: single file system target, which must support async, and must allow start and end args in _cat_file. | |
This interface can be used to read/write references from Parquet stores. | |
Files over SFTP/SSH | |
Allow reading and writing to Windows and Samba network shares. | |
Compressed Tar archives as a file-system (read-only) | |
Interface to HDFS over HTTP using the WebHDFS API. | |
Read/Write contents of ZIP archive as a file-system |
- classfsspec.implementations.arrow.ArrowFSWrapper(*args,**kwargs)[source]
FSSpec-compatible wrapper of pyarrow.fs.FileSystem.
- Parameters:
- fspyarrow.fs.FileSystem
- __init__(fs,**kwargs)[source]
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seena new instance is not required. The token attribute exists to allowimplementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters:
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supportsdirectory listing caching. Pass use_listings_cache=Falseto disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to forcecreating a new instance even if a matching instance exists, and preventstoring this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- classfsspec.implementations.arrow.HadoopFileSystem(*args,**kwargs)[source]
A wrapper on top of the pyarrow.fs.HadoopFileSystemto connect it’s interface with fsspec
- __init__(host='default',port=0,user=None,kerb_ticket=None,replication=3,extra_conf=None,**kwargs)[source]
- Parameters:
- host: str
Hostname, IP or “default” to try to read from Hadoop config
- port: int
Port to connect on, or default from Hadoop config if 0
- user: str or None
If given, connect as this username
- kerb_ticket: str or None
If given, use this ticket for authentication
- replication: int
set replication factor of file for write operations. default value is 3.
- extra_conf: None or dict
Passed on to HadoopFileSystem
- classfsspec.implementations.cached.CachingFileSystem(*args,**kwargs)[source]
Locally caching filesystem, layer over any other FS
This class implements chunk-wise local storage of remote files, for quickaccess after the initial download. The files are stored in a givendirectory with hashes of URLs for the filenames. If no directory is given,a temporary one is used, which should be cleaned up by the OS after theprocess ends. The files themselves are sparse (as implemented in
MMapCache
), so only the data which is accessedtakes up space.Restrictions:
the block-size must be the same for each access of a given file, unlessall blocks of the file have already been read
caching can only be applied to file-systems which produce filesderived from fsspec.spec.AbstractBufferedFile ; LocalFileSystem is alsoallowed, for testing
- __init__(target_protocol=None,cache_storage='TMP',cache_check=10,check_files=False,expiry_time=604800,target_options=None,fs=None,same_names:bool|None=None,compression=None,cache_mapper:AbstractCacheMapper|None=None,**kwargs)[source]
- Parameters:
- target_protocol: str (optional)
Target filesystem protocol. Provide either this or
fs
.- cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory,and will be cleaned up by the OS when this process ends (or later).If a list, each location will be tried in the order given, butonly the last will be considered writable.
- cache_check: int
Number of seconds between reload of cache metadata
- check_files: bool
Whether to explicitly see if the UID of the remote file matchesthe stored one before using. Warning: some file systems such asHTTP cannot reliably give a unique hash of the contents of somepath, so be sure to set this option to False.
- expiry_time: int
The time in seconds after which a local copy is considered useless.Set to falsy to prevent expiry. The default is equivalent to oneweek.
- target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
- fs: filesystem instance
The target filesystem to run against. Provide this or
protocol
.- same_names: bool (optional)
By default, target URLs are hashed using a
HashCacheMapper
sothat files from different backends with the same basename do notconflict. If this argument istrue
, aBasenameCacheMapper
is used instead. Other cache mapper options are available by usingthecache_mapper
keyword argument. Only one of this andcache_mapper
should be specified.- compression: str (optional)
To decompress on download. Can be ‘infer’ (guess from the URL name),one of the entries in
fsspec.compression.compr
, or None for nodecompression.- cache_mapper: AbstractCacheMapper (optional)
The object use to map from original filenames to cached filenames.Only one of this and
same_names
should be specified.
- classfsspec.implementations.cached.SimpleCacheFileSystem(*args,**kwargs)[source]
Caches whole remote files on first access
This class is intended as a layer over any other file system, andwill make a local copy of each file accessed, so that all subsequentreads are local. This implementation only copies whole files, anddoes not keep any metadata about the download time or file details.It is therefore safer to use in multi-threaded/concurrent situations.
This is the only of the caching filesystems that supports write: you willbe given a real local open file, and upon close and commit, it will beuploaded to the target filesystem; the writability or the target URL isnot checked until that time.
- __init__(**kwargs)[source]
- Parameters:
- target_protocol: str (optional)
Target filesystem protocol. Provide either this or
fs
.- cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory,and will be cleaned up by the OS when this process ends (or later).If a list, each location will be tried in the order given, butonly the last will be considered writable.
- cache_check: int
Number of seconds between reload of cache metadata
- check_files: bool
Whether to explicitly see if the UID of the remote file matchesthe stored one before using. Warning: some file systems such asHTTP cannot reliably give a unique hash of the contents of somepath, so be sure to set this option to False.
- expiry_time: int
The time in seconds after which a local copy is considered useless.Set to falsy to prevent expiry. The default is equivalent to oneweek.
- target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
- fs: filesystem instance
The target filesystem to run against. Provide this or
protocol
.- same_names: bool (optional)
By default, target URLs are hashed using a
HashCacheMapper
sothat files from different backends with the same basename do notconflict. If this argument istrue
, aBasenameCacheMapper
is used instead. Other cache mapper options are available by usingthecache_mapper
keyword argument. Only one of this andcache_mapper
should be specified.- compression: str (optional)
To decompress on download. Can be ‘infer’ (guess from the URL name),one of the entries in
fsspec.compression.compr
, or None for nodecompression.- cache_mapper: AbstractCacheMapper (optional)
The object use to map from original filenames to cached filenames.Only one of this and
same_names
should be specified.
- classfsspec.implementations.cached.WholeFileCacheFileSystem(*args,**kwargs)[source]
Caches whole remote files on first access
This class is intended as a layer over any other file system, andwill make a local copy of each file accessed, so that all subsequentreads are local. This is similar to
CachingFileSystem
, but withoutthe block-wise functionality and so can work even when sparse filesare not allowed. See its docstring for definition of the initarguments.The class still needs access to the remote store for listing files,and may refresh cached files.
- __init__(target_protocol=None,cache_storage='TMP',cache_check=10,check_files=False,expiry_time=604800,target_options=None,fs=None,same_names:bool|None=None,compression=None,cache_mapper:AbstractCacheMapper|None=None,**kwargs)
- Parameters:
- target_protocol: str (optional)
Target filesystem protocol. Provide either this or
fs
.- cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory,and will be cleaned up by the OS when this process ends (or later).If a list, each location will be tried in the order given, butonly the last will be considered writable.
- cache_check: int
Number of seconds between reload of cache metadata
- check_files: bool
Whether to explicitly see if the UID of the remote file matchesthe stored one before using. Warning: some file systems such asHTTP cannot reliably give a unique hash of the contents of somepath, so be sure to set this option to False.
- expiry_time: int
The time in seconds after which a local copy is considered useless.Set to falsy to prevent expiry. The default is equivalent to oneweek.
- target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
- fs: filesystem instance
The target filesystem to run against. Provide this or
protocol
.- same_names: bool (optional)
By default, target URLs are hashed using a
HashCacheMapper
sothat files from different backends with the same basename do notconflict. If this argument istrue
, aBasenameCacheMapper
is used instead. Other cache mapper options are available by usingthecache_mapper
keyword argument. Only one of this andcache_mapper
should be specified.- compression: str (optional)
To decompress on download. Can be ‘infer’ (guess from the URL name),one of the entries in
fsspec.compression.compr
, or None for nodecompression.- cache_mapper: AbstractCacheMapper (optional)
The object use to map from original filenames to cached filenames.Only one of this and
same_names
should be specified.
- classfsspec.implementations.dask.DaskWorkerFileSystem(*args,**kwargs)[source]
View files accessible to a worker as any other remote file-system
When instances are run on the worker, uses the real filesystem. Whenrun on the client, they call the worker to provide information or data.
Warning this implementation is experimental, and read-only for now.
- __init__(target_protocol=None,target_options=None,fs=None,client=None,**kwargs)[source]
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seena new instance is not required. The token attribute exists to allowimplementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters:
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supportsdirectory listing caching. Pass use_listings_cache=Falseto disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to forcecreating a new instance even if a matching instance exists, and preventstoring this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- classfsspec.implementations.data.DataFileSystem(*args,**kwargs)[source]
A handy decoder for data-URLs
- classfsspec.implementations.dbfs.DatabricksFileSystem(*args,**kwargs)[source]
Get access to the Databricks filesystem implementation over HTTP.Can be used inside and outside of a databricks cluster.
- __init__(instance,token,**kwargs)[source]
Create a new DatabricksFileSystem.
- Parameters:
- instance: str
The instance URL of the databricks cluster.For example for an Azure databricks cluster, thishas the form adb-<some-number>.<two digits>.azuredatabricks.net.
- token: str
Your personal token. Find out morehere:https://docs.databricks.com/dev-tools/api/latest/authentication.html
- classfsspec.implementations.dirfs.DirFileSystem(*args,**kwargs)[source]
Directory prefix filesystem
The DirFileSystem is a filesystem-wrapper. It assumes every path it is dealing withis relative to the
path
. After performing the necessary paths operation itdelegates everything to the wrapped filesystem.- __init__(path=None,fs=None,fo=None,target_protocol=None,target_options=None,**storage_options)[source]
- Parameters:
- path: str
Path to the directory.
- fs: AbstractFileSystem
An instantiated filesystem to wrap.
- target_protocol, target_options:
if fs is none, construct it from these
- fo: str
Alternate for path; do not provide both
- classfsspec.implementations.ftp.FTPFileSystem(*args,**kwargs)[source]
A filesystem over classic FTP
- __init__(host,port=21,username=None,password=None,acct=None,block_size=None,tempdir=None,timeout=30,encoding='utf-8',tls=False,**kwargs)[source]
You can use _get_kwargs_from_urls to get some kwargs froma reasonable FTP url.
Authentication will be anonymous if username/password are notgiven.
- Parameters:
- host: str
The remote server name/ip to connect to
- port: int
Port to connect with
- username: str or None
If authenticating, the user’s identifier
- password: str of None
User’s password on the server, if using
- acct: str or None
Some servers also need an “account” string for auth
- block_size: int or None
If given, the read-ahead or write buffer size.
- tempdir: str
Directory on remote to put temporary files when in a transaction
- timeout: int
Timeout of the ftp connection in seconds
- encoding: str
Encoding to use for directories and filenames in FTP connection
- tls: bool
Use FTP-TLS, by default False
- classfsspec.implementations.git.GitFileSystem(*args,**kwargs)[source]
Browse the files of a local git repo at any hash/tag/branch
(experimental backend)
- __init__(path=None,fo=None,ref=None,**kwargs)[source]
- Parameters:
- path: str (optional)
Local location of the repo (uses current directory if not given).May be deprecated in favour of
fo
. When used with a higherlevel function such as fsspec.open(), may be of the form“git://[path-to-repo[:]][ref@]path/to/file” (but the actualfile path should not contain “@” or “:”).- fo: str (optional)
Same as
path
, but passed as part of a chained URL. This onetakes precedence if both are given.- ref: str (optional)
Reference to work with, could be a hash, tag or branch name. Defaultsto current working tree. Note that
ls
andopen
also take hash,so this becomes the default for those operations- kwargs
- classfsspec.implementations.github.GithubFileSystem(*args,**kwargs)[source]
Interface to files in github
An instance of this class provides the files residing within a remote githubrepository. You may specify a point in the repos history, by SHA, branchor tag (default is current master).
For files less than 1 MB in size, file content is returned directly in aMemoryFile. For larger files, or for files tracked by git-lfs, file contentis returned as an HTTPFile wrapping the
download_url
provided by theGitHub API.When using fsspec.open, allows URIs of the form:
“github://path/file”, in which case you must specify org, repo andmay specify sha in the extra args
‘github://org:repo@/precip/catalog.yml’, where the org and repo arepart of the URI
‘github://org:repo@sha/precip/catalog.yml’, where the sha is also included
sha
can be the full or abbreviated hex of the commit you want to fetchfrom, or a branch or tag name (so long as it doesn’t contain special characterslike “/”, “?”, which would have to be HTTP-encoded).For authorised access, you must provide username and token, which can be madeathttps://github.com/settings/tokens
- __init__(org,repo,sha=None,username=None,token=None,timeout=None,**kwargs)[source]
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seena new instance is not required. The token attribute exists to allowimplementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters:
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supportsdirectory listing caching. Pass use_listings_cache=Falseto disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to forcecreating a new instance even if a matching instance exists, and preventstoring this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- classfsspec.implementations.http.HTTPFileSystem(*args,**kwargs)[source]
Simple File-System for fetching data via HTTP(S)
ls()
is implemented by loading the parent page and doing a regexmatch on the result. If simple_link=True, anything of the form“http(s)://server.com/stuff?thing=other”; otherwise only links withinHTML href tags will be used.- __init__(simple_links=True,block_size=None,same_scheme=True,size_policy=None,cache_type='bytes',cache_options=None,asynchronous=False,loop=None,client_kwargs=None,get_client=<functionget_client>,encoded=False,**storage_options)[source]
NB: if this is called async, you must await set_client
- Parameters:
- block_size: int
Blocks to read bytes; if 0, will default to raw requests file-likeobjects instead of HTTPFile instances
- simple_links: bool
If True, will consider both HTML <a> tags and anything that lookslike a URL; if False, will consider only the former.
- same_scheme: True
When doing ls/glob, if this is True, only consider paths that havehttp/https matching the input URLs.
- size_policy: this argument is deprecated
- client_kwargs: dict
Passed to aiohttp.ClientSession, seehttps://docs.aiohttp.org/en/stable/client_reference.htmlFor example,
{'auth':aiohttp.BasicAuth('user','pass')}
- get_client: Callable[…, aiohttp.ClientSession]
A callable which takes keyword arguments and constructsan aiohttp.ClientSession. It’s state will be managed bythe HTTPFileSystem class.
- storage_options: key-value
Any other parameters passed on to requests
- cache_type, cache_options: defaults used in open
- classfsspec.implementations.jupyter.JupyterFileSystem(*args,**kwargs)[source]
View of the files as seen by a Jupyter server (notebook or lab)
- __init__(url,tok=None,**kwargs)[source]
- Parameters:
- urlstr
Base URL of the server, like “http://127.0.0.1:8888”. May includetoken in the string, which is given by the process when starting up
- tokstr
If the token is obtained separately, can be given here
- kwargs
- classfsspec.implementations.libarchive.LibArchiveFileSystem(*args,**kwargs)[source]
Compressed archives as a file-system (read-only)
Supports the following formats:tar, pax , cpio, ISO9660, zip, mtree, shar, ar, raw, xar, lha/lzh, rarMicrosoft CAB, 7-Zip, WARC
See the libarchive documentation for further restrictions.https://www.libarchive.org/
Keeps file object open while instance lives. It only works in seekablefile-like objects. In case the filesystem does not support this kind offile object, it is recommended to cache locally.
This class is pickleable, but not necessarily thread-safe (depends on theplatform). See libarchive documentation for details.
- __init__(fo='',mode='r',target_protocol=None,target_options=None,block_size=5242880,**kwargs)[source]
- Parameters:
- fo: str or file-like
Contains ZIP, and must exist. If a str, will fetch file using
open_files()
, which must return one file exactly.- mode: str
Currently, only ‘r’ accepted
- target_protocol: str (optional)
If
fo
is a string, this value can be used to override theFS protocol inferred from a URL- target_options: dict (optional)
Kwargs passed when instantiating the target FS, if
fo
isa string.
- classfsspec.implementations.local.LocalFileSystem(*args,**kwargs)[source]
Interface to files on local storage
- Parameters:
- auto_mkdir: bool
Whether, when opening a file, the directory containing it shouldbe created (if it doesn’t already exist). This is assumed by pyarrowcode.
- __init__(auto_mkdir=False,**kwargs)[source]
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seena new instance is not required. The token attribute exists to allowimplementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters:
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supportsdirectory listing caching. Pass use_listings_cache=Falseto disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to forcecreating a new instance even if a matching instance exists, and preventstoring this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- classfsspec.implementations.memory.MemoryFileSystem(*args,**kwargs)[source]
A filesystem based on a dict of BytesIO objects
This is a global filesystem so instances of this class all point to the samein memory filesystem.
- __init__(*args,**storage_options)
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seena new instance is not required. The token attribute exists to allowimplementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters:
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supportsdirectory listing caching. Pass use_listings_cache=Falseto disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to forcecreating a new instance even if a matching instance exists, and preventstoring this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- classfsspec.implementations.reference.ReferenceFileSystem(*args,**kwargs)[source]
View byte ranges of some other file as a file systemInitial version: single file system target, which must supportasync, and must allow start and end args in _cat_file. Later versionsmay allow multiple arbitrary URLs for the targets.This FileSystem is read-only. It is designed to be used with asynctargets (for now). We do not get original file details from the target FS.Configuration is by passing a dict of references at init, or a URL toa JSON file containing the same; this dictcan also contain concrete data for some set of paths.Reference dict format:{path0: bytes_data, path1: (target_url, offset, size)}https://github.com/fsspec/kerchunk/blob/main/README.md
- __init__(fo,target=None,ref_storage_args=None,target_protocol=None,target_options=None,remote_protocol=None,remote_options=None,fs=None,template_overrides=None,simple_templates=True,max_gap=64000,max_block=256000000,cache_size=128,**kwargs)[source]
- Parameters:
- fodict or str
The set of references to use for this instance, with a structure as above.If str referencing a JSON file, will use fsspec.open, in conjunctionwith target_options and target_protocol to open and parse JSON at thislocation. If a directory, then assume references are a set of parquetfiles to be loaded lazily.
- targetstr
For any references having target_url as None, this is the default filetarget to use
- ref_storage_argsdict
If references is a str, use these kwargs for loading the JSON file.Deprecated: use target_options instead.
- target_protocolstr
Used for loading the reference file, if it is a path. If None, protocolwill be derived from the given path
- target_optionsdict
Extra FS options for loading the reference file
fo
, if given as a path- remote_protocolstr
The protocol of the filesystem on which the references will be evaluated(unless fs is provided). If not given, will be derived from the firstURL that has a protocol in the templates or in the references, in thatorder.
- remote_optionsdict
kwargs to go with remote_protocol
- fsAbstractFileSystem | dict(str, (AbstractFileSystem | dict))
- Directly provide a file system(s):
a single filesystem instance
a dict of protocol:filesystem, where each value is either a filesysteminstance, or a dict of kwargs that can be used to create ininstance for the given protocol
If this is given, remote_options and remote_protocol are ignored.
- template_overridesdict
Swap out any templates in the references file with these - useful fortesting.
- simple_templates: bool
Whether templates can be processed with simple replace (True) or ifjinja is needed (False, much slower). All reference sets produced by
kerchunk
are simple in this sense, but the spec allows for complex.- max_gap, max_block: int
For merging multiple concurrent requests to the same remote file.Neighboring byte ranges will only be merged when theirinter-range gap is <=
max_gap
. Default is 64KB. Set to 0to only merge when it requires no extra bytes. Pass a negativenumber to disable merging, appropriate for local target files.Neighboring byte ranges will only be merged when the size ofthe aggregated range is <=max_block
. Default is 256MB.- cache_sizeint
Maximum size of LRU cache, where cache_size*record_size denotesthe total number of references that can be loaded in memory at once.Only used for lazily loaded references.
- kwargspassed to parent class
- classfsspec.implementations.reference.LazyReferenceMapper(root,fs=None,out_root=None,cache_size=128,categorical_threshold=10,engine:Literal['fastparquet','pyarrow']='fastparquet')[source]
This interface can be used to read/write references from Parquet stores.It is not intended for other types of references.It can be used with Kerchunk’s MultiZarrToZarr method to combinereferences into a parquet store.Examples of this use-case can be found here:https://fsspec.github.io/kerchunk/advanced.html?highlight=parquet#parquet-storage
- __init__(root,fs=None,out_root=None,cache_size=128,categorical_threshold=10,engine:Literal['fastparquet','pyarrow']='fastparquet')[source]
This instance will be writable, storing changes in memory until full partitionsare accumulated or .flush() is called.
To create an empty lazy store, use .create()
- Parameters:
- rootstr
Root of parquet store
- fsfsspec.AbstractFileSystem
fsspec filesystem object, default is local filesystem.
- cache_sizeint, default=128
Maximum size of LRU cache, where cache_size*record_size denotesthe total number of references that can be loaded in memory at once.
- categorical_thresholdint
Encode urls as pandas.Categorical to reduce memory footprint if the ratioof the number of unique urls to total number of refs for each variableis greater than or equal to this number. (default 10)
- engine: Literal[“fastparquet”,”pyarrow”]
Engine choice for reading parquet files. (default is “fastparquet”)
- classfsspec.implementations.sftp.SFTPFileSystem(*args,**kwargs)[source]
Files over SFTP/SSH
Peer-to-peer filesystem over SSH using paramiko.
Note: if using this with the
open
oropen_files
, with full URLs,there is no way to tell if a path is relative, so all paths are assumedto be absolute.- __init__(host,**ssh_kwargs)[source]
- Parameters:
- host: str
Hostname or IP as a string
- temppath: str
Location on the server to put files, when within a transaction
- ssh_kwargs: dict
Parameters passed on to connection. See details inhttps://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connectMay include port, username, password…
- classfsspec.implementations.smb.SMBFileSystem(*args,**kwargs)[source]
Allow reading and writing to Windows and Samba network shares.
When using
fsspec.open()
for getting a file-like object the URIshould be specified as this format:smb://workgroup;user:password@server:port/share/folder/file.csv
.Example:
>>>importfsspec>>>withfsspec.open(...'smb://myuser:mypassword@myserver.com/''share/folder/file.csv'...)assmbfile:...df=pd.read_csv(smbfile,sep='|',header=None)
Note that you need to pass in a valid hostname or IP address for the hostcomponent of the URL. Do not use the Windows/NetBIOS machine name for thehost component.
The first component of the path in the URL points to the name of the sharedfolder. Subsequent path components will point to the directory/folder/file.
The URL components
workgroup
,user
,password
andport
may beoptional.Note
For working this source requiresmbprotocol to be installed, e.g.:
$ pip install smbprotocol# or# pip install smbprotocol[kerberos]
Note: if using this with the
open
oropen_files
, with full URLs,there is no way to tell if a path is relative, so all paths are assumedto be absolute.- __init__(host,port=None,username=None,password=None,timeout=60,encrypt=None,share_access=None,register_session_retries=4,register_session_retry_wait=1,register_session_retry_factor=10,auto_mkdir=False,**kwargs)[source]
You can use _get_kwargs_from_urls to get some kwargs froma reasonable SMB url.
Authentication will be anonymous or integrated if username/password are notgiven.
- Parameters:
- host: str
The remote server name/ip to connect to
- port: int or None
Port to connect with. Usually 445, sometimes 139.
- username: str or None
Username to connect with. Required if Kerberos auth is not being used.
- password: str or None
User’s password on the server, if using username
- timeout: int
Connection timeout in seconds
- encrypt: bool
Whether to force encryption or not, once this has been set to Truethe session cannot be changed back to False.
- share_access: str or None
Specifies the default access applied to file open operationsperformed with this file system object.This affects whether other processes can concurrently open a handleto the same file.
None (the default): exclusively locks the file until closed.
‘r’: Allow other handles to be opened with read access.
‘w’: Allow other handles to be opened with write access.
‘d’: Allow other handles to be opened with delete access.
- register_session_retries: int
Number of retries to register a session with the server. Retries are not performedfor authentication errors, as they are considered as invalid credentials and not networkissues. If set to negative value, no register attempts will be performed.
- register_session_retry_wait: int
Time in seconds to wait between each retry. Number must be non-negative.
- register_session_retry_factor: int
Base factor for the wait time between each retry. The wait timeis calculated using exponential function. For factor=1 all wait timeswill be equal to
register_session_retry_wait
. For any number of retries,the last wait time will be equal toregister_session_retry_wait
and for retries>1the first wait time will be equal toregister_session_retry_wait/factor
.Number must be equal to or greater than 1. Optimal factor is 10.- auto_mkdir: bool
Whether, when opening a file, the directory containing it shouldbe created (if it doesn’t already exist). This is assumed by pyarrowand zarr-python code.
- classfsspec.implementations.tar.TarFileSystem(*args,**kwargs)[source]
Compressed Tar archives as a file-system (read-only)
Supports the following formats:tar.gz, tar.bz2, tar.xz
- __init__(fo='',index_store=None,target_options=None,target_protocol=None,compression=None,**kwargs)[source]
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seena new instance is not required. The token attribute exists to allowimplementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters:
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supportsdirectory listing caching. Pass use_listings_cache=Falseto disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to forcecreating a new instance even if a matching instance exists, and preventstoring this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- classfsspec.implementations.webhdfs.WebHDFS(*args,**kwargs)[source]
Interface to HDFS over HTTP using the WebHDFS API. Supports also HttpFS gateways.
Four auth mechanisms are supported:
- insecure: no auth is done, and the user is assumed to be whoever they
say they are (parameter
user
), or a predefined value such as“dr.who” if not given- spnego: when kerberos authentication is enabled, auth is negotiated by
requests_kerberoshttps://github.com/requests/requests-kerberos .This establishes a session based on existing kinit login and/orspecified principal/password; parameters are passed with
kerb_kwargs
- token: uses an existing Hadoop delegation token from another secured
service. Indeed, this client can also generate such tokens whennot insecure. Note that tokens expire, but can be renewed (by apreviously specified user) and may allow for proxying.
- basic-auth: used when both parameter
user
and parameterpassword
are provided.
- __init__(host,port=50070,kerberos=False,token=None,user=None,password=None,proxy_to=None,kerb_kwargs=None,data_proxy=None,use_https=False,session_cert=None,session_verify=True,**kwargs)[source]
- Parameters:
- host: str
Name-node address
- port: int
Port for webHDFS
- kerberos: bool
Whether to authenticate with kerberos for this connection
- token: str or None
If given, use this token on every call to authenticate. A userand user-proxy may be encoded in the token and should not be alsogiven
- user: str or None
If given, assert the user name to connect with
- password: str or None
If given, assert the password to use for basic auth. If passwordis provided, user must be provided also
- proxy_to: str or None
If given, the user has the authority to proxy, and this value isthe user in who’s name actions are taken
- kerb_kwargs: dict
Any extra arguments for HTTPKerberosAuth, seehttps://github.com/requests/requests-kerberos/blob/master/requests_kerberos/kerberos_.py
- data_proxy: dict, callable or None
If given, map data-node addresses. This can be necessary if theHDFS cluster is behind a proxy, running on Docker or otherwise hasa mismatch between the host-names given by the name-node and theaddress by which to refer to them from the client. If a dict,maps host names
host->data_proxy[host]
; if a callable, fullURLs are passed, and function must conform tourl->data_proxy(url)
.- use_https: bool
Whether to connect to the Name-node using HTTPS instead of HTTP
- session_cert: str or Tuple[str, str] or None
Path to a certificate file, or tuple of (cert, key) files to usefor the requests.Session
- session_verify: str, bool or None
Path to a certificate file to use for verifying the requests.Session.
- kwargs
- classfsspec.implementations.zip.ZipFileSystem(*args,**kwargs)[source]
Read/Write contents of ZIP archive as a file-system
Keeps file object open while instance lives.
This class is pickleable, but not necessarily thread-safe
- __init__(fo='',mode='r',target_protocol=None,target_options=None,compression=0,allowZip64=True,compresslevel=None,**kwargs)[source]
- Parameters:
- fo: str or file-like
Contains ZIP, and must exist. If a str, will fetch file using
open_files()
, which must return one file exactly.- mode: str
Accept: “r”, “w”, “a”
- target_protocol: str (optional)
If
fo
is a string, this value can be used to override theFS protocol inferred from a URL- target_options: dict (optional)
Kwargs passed when instantiating the target FS, if
fo
isa string.- compression, allowZip64, compresslevel: passed to ZipFile
Only relevant when creating a ZIP
Other Known Implementations
Note that most of these projects are hosted outside of thefsspec
organisation. Please read theirdocumentation carefully before using any particular package.
abfs for Azure Blob service, with protocol “abfs://”
adl for Azure DataLake storage, with protocol “adl://”
alluxiofs to access fsspec implemented filesystem with Alluxio distributed cache
boxfs for access to Box file storage, with protocol “box://”
csvbase for access to csvbase.com hosted CSV files, with protocol “csvbase://”
dropbox for access to dropbox shares, with protocol “dropbox://”
dvc to access DVC/Git repository as a filesystem
fsspec-encrypted for transparent encryption on top of other fsspec filesystems.
gcsfs for Google Cloud Storage, with protocol “gs://” or “gcs://”
gdrive to access Google Drive and shares (experimental)
git to access Git repositories
huggingface_hub to access the Hugging Face Hub filesystem, with protocol “hf://”
hdfs-native to access Hadoop filesystem, with protocol “hdfs://”
httpfs-sync to access HTTP(s) files in a synchronous manner to offer an alternative to the aiohttp-based implementation.
ipfsspec for the InterPlanetary File System (IPFS), with protocol “ipfs://”
irods for access to iRODS servers, with protocol “irods://”
lakefs for lakeFS data lakes, with protocol “lakefs://”
morefs for
OverlayFileSystem
,DictFileSystem
, and othersocifs for access to Oracle Cloud Object Storage, with protocol “oci://”
ocilake for OCI Data Lake storage
ossfs for Alibaba Cloud (Aliyun) Object Storage System (OSS)
p9fs for 9P (Plan 9 Filesystem Protocol) servers
PyAthena for S3 access to Amazon Athena, with protocol “s3://” or “s3a://”
PyDrive2 for Google Drive access
fsspec-proxy for “pyscript:” URLs via a proxy server
s3fs for Amazon S3 and other compatible stores, with protocol “s3://”
sshfs for access to SSH servers, with protocol “ssh://” or “sftp://”
swiftspec for OpenStack SWIFT, with protocol “swift://”
tosfs for ByteDance volcano engine Tinder Object Storage (TOS)
wandbfs to access Wandb run data (experimental)
wandbfsspec to access Weights & Biases (experimental)
webdav4 for WebDAV, with protocol “webdav://” or “dav://”
xrootd for xrootd, with protocol “root://”
Read Buffering
| Cache holding memory as a set of blocks. |
| Cache which holds data in a in-memory bytes object |
| memory-mapped sparse file cache |
| Cache which reads only when we get beyond a block of data |
| Caches the first block of a file only |
| Cache holding memory as a set of blocks with pre-loading of the next block in the background. |
- classfsspec.caching.BlockCache(blocksize:int,fetcher:Callable[[int,int],bytes],size:int,maxblocks:int=32)[source]
Cache holding memory as a set of blocks.
Requests are only ever made
blocksize
at a time, and arestored in an LRU cache. The least recently accessed block isdiscarded when more thanmaxblocks
are stored.- Parameters:
- blocksizeint
The number of bytes to store in each block.Requests are only ever made for
blocksize
, so thisshould balance the overhead of making a request againstthe granularity of the blocks.- fetcherCallable
- sizeint
The total size of the file being cached.
- maxblocksint
The maximum number of blocks to cache for. The maximum memoryuse for this cache is then
blocksize*maxblocks
.
- classfsspec.caching.BytesCache(blocksize:int,fetcher:Callable[[int,int],bytes],size:int,trim:bool=True)[source]
Cache which holds data in a in-memory bytes object
Implements read-ahead by the block size, for semi-random reads progressingthrough the file.
- Parameters:
- trim: bool
As we read more data, whether to discard the start of the buffer whenwe are more than a blocksize ahead of it.
- classfsspec.caching.MMapCache(blocksize:int,fetcher:Fetcher,size:int,location:str|None=None,blocks:set[int]|None=None,multi_fetcher:MultiFetcher|None=None)[source]
memory-mapped sparse file cache
Opens temporary file, which is filled blocks-wise when data is requested.Ensure there is enough disc space in the temporary location.
This cache method might only work on posix
- Parameters:
- blocksize: int
How far to read ahead in numbers of bytes
- fetcher: Fetcher
Function of the form f(start, end) which gets bytes from remote asspecified
- size: int
How big this file is
- location: str
Where to create the temporary file. If None, a temporary file iscreated using tempfile.TemporaryFile().
- blocks: set[int]
Set of block numbers that have already been fetched. If None, an emptyset is created.
- multi_fetcher: MultiFetcher
Function of the form f([(start, end)]) which gets bytes from remoteas specified. This function is used to fetch multiple blocks at once.If not specified, the fetcher function is used instead.
- classfsspec.caching.ReadAheadCache(blocksize:int,fetcher:Callable[[int,int],bytes],size:int)[source]
Cache which reads only when we get beyond a block of data
This is a much simpler version of BytesCache, and does not attempt tofill holes in the cache or keep fragments alive. It is best suited tomany small reads in a sequential order (e.g., reading lines from a file).
- classfsspec.caching.FirstChunkCache(blocksize:int,fetcher:Callable[[int,int],bytes],size:int)[source]
Caches the first block of a file only
This may be useful for file types where the metadata is stored in the header,but is randomly accessed.
- classfsspec.caching.BackgroundBlockCache(blocksize:int,fetcher:Callable[[int,int],bytes],size:int,maxblocks:int=32)[source]
Cache holding memory as a set of blocks with pre-loading ofthe next block in the background.
Requests are only ever made
blocksize
at a time, and arestored in an LRU cache. The least recently accessed block isdiscarded when more thanmaxblocks
are stored. If thenext block is not in cache, it is loaded in a separate threadin non-blocking way.- Parameters:
- blocksizeint
The number of bytes to store in each block.Requests are only ever made for
blocksize
, so thisshould balance the overhead of making a request againstthe granularity of the blocks.- fetcherCallable
- sizeint
The total size of the file being cached.
- maxblocksint
The maximum number of blocks to cache for. The maximum memoryuse for this cache is then
blocksize*maxblocks
.
Utilities
| Read a block of bytes from a file |
- fsspec.utils.read_block(f:IO[bytes],offset:int,length:int|None,delimiter:bytes|None=None,split_before:bool=False)→bytes[source]
Read a block of bytes from a file
- Parameters:
- f: File
Open file
- offset: int
Byte offset to start read
- length: int
Number of bytes to read, read through end of file if None
- delimiter: bytes (optional)
Ensure reading starts and stops at delimiter bytestring
- split_before: bool (optional)
Start/stop readbefore delimiter bytestring.
- If using the ``delimiter=`` keyword argument we ensure that the read
- starts and stops at delimiter boundaries that follow the locations
- ``offset`` and ``offset + length``. If ``offset`` is zero then we
- start at zero, regardless of delimiter. The bytestring returned WILL
- include the terminating delimiter string.
Examples
>>>fromioimportBytesIO>>>f=BytesIO(b'Alice, 100\nBob, 200\nCharlie, 300')>>>read_block(f,0,13)b'Alice, 100\nBo'
>>>read_block(f,0,13,delimiter=b'\n')b'Alice, 100\nBob, 200\n'
>>>read_block(f,10,10,delimiter=b'\n')b'Bob, 200\nCharlie, 300'