This PEP proposes the inclusion of a third-party module,pathlib, inthe standard library. The inclusion is proposed under the provisionallabel, as described inPEP 411. Therefore, API changes can be done,either as part of the PEP process, or after acceptance in the standardlibrary (and until the provisional label is removed).
The aim of this library is to provide a simple hierarchy of classes tohandle filesystem paths and the common operations users do over them.
An object-oriented API for filesystem paths has already been proposedand rejected inPEP 355. Several third-party implementations of theidea of object-oriented filesystem paths exist in the wild:
str-subclassingPath class;tuple rather thanstr;AbstractPath class for operations which don’t do I/O and aPath class for all common operations.This proposal attempts to learn from these previous attempts and therejection ofPEP 355.
The implementation of this proposal is tracked in thepep428 branchof pathlib’sMercurial repository.
The rationale to represent filesystem paths using dedicated classes is thesame as for other kinds of stateless objects, such as dates, times or IPaddresses. Python has been slowly moving away from strictly replicatingthe C language’s APIs to providing better, more helpful abstractions aroundall kinds of common functionality. Even if this PEP isn’t accepted, it islikely that another form of filesystem handling abstraction will be adoptedone day into the standard library.
Indeed, many people will prefer handling dates and times using the high-levelobjects provided by thedatetime module, rather than using numerictimestamps and thetime module API. Moreover, using a dedicated classallows to enable desirable behaviours by default, for example the caseinsensitivity of Windows paths.
Thepathlib module implements a simple hierarchy of classes:
+----------+||---------|PurePath|--------|||||+----------+|||||||v|v+---------------+|+-----------------+||||||PurePosixPath|||PureWindowsPath||||||+---------------+|+-----------------+|v||+------+||||||-------|Path|------|||||||||+------+||||||||||vvvv+-----------++-------------+|||||PosixPath||WindowsPath|||||+-----------++-------------+
This hierarchy divides path classes along two dimensions:
Any pure class can be instantiated on any system: for example, you canmanipulatePurePosixPath objects under Windows,PureWindowsPathobjects under Unix, and so on. However, concrete classes can only beinstantiated on a matching system: indeed, it would be error-prone to startdoing I/O withWindowsPath objects under Unix, or vice-versa.
Furthermore, there are two base classes which also act as system-dependentfactories:PurePath will instantiate either aPurePosixPath or aPureWindowsPath depending on the operating system. Similarly,Pathwill instantiate either aPosixPath or aWindowsPath.
It is expected that, in most uses, using thePath class is adequate,which is why it has the shortest name of all.
In this proposal, the path classes do not derive from a builtin type. Thiscontrasts with some other Path class proposals which were derived fromstr. They also do not pretend to implement the sequence protocol:if you want a path to act as a sequence, you have to lookup a dedicatedattribute (theparts attribute).
The key reasoning behind not inheriting fromstr is to prevent accidentallyperforming operations with a string representing a path and a string thatdoesn’t, e.g.path+an_accident. Since operations with a string will notnecessarily lead to a valid or expected file system path, “explicit is betterthan implicit” by avoiding accidental operations with strings by notsubclassing it. Ablog post by a Python core developer goes into more detailon the reasons behind this specific design decision.
Path objects are immutable, which makes them hashable and also prevents aclass of programming errors.
Little of the functionality from os.path is reused. Many os.path functionsare tied by backwards compatibility to confusing or plain wrong behaviour(for example, the fact thatos.path.abspath() simplifies “..” pathcomponents without resolving symlinks first).
Paths of the same flavour are comparable and orderable, whether pure or not:
>>>PurePosixPath('a')==PurePosixPath('b')False>>>PurePosixPath('a')<PurePosixPath('b')True>>>PurePosixPath('a')==PosixPath('a')True
Comparing and ordering Windows path objects is case-insensitive:
>>>PureWindowsPath('a')==PureWindowsPath('A')True
Paths of different flavours always compare unequal, and cannot be ordered:
>>>PurePosixPath('a')==PureWindowsPath('a')False>>>PurePosixPath('a')<PureWindowsPath('a')Traceback (most recent call last): File"<stdin>", line1, in<module>TypeError:unorderable types: PurePosixPath() < PureWindowsPath()
Paths compare unequal to, and are not orderable with instances of builtintypes (such asstr) and any other types.
The API tries to provide useful notations all the while avoiding magic.Some examples:
>>>p=Path('/home/antoine/pathlib/setup.py')>>>p.name'setup.py'>>>p.suffix'.py'>>>p.root'/'>>>p.parts('/', 'home', 'antoine', 'pathlib', 'setup.py')>>>p.relative_to('/home/antoine')PosixPath('pathlib/setup.py')>>>p.exists()True
The philosophy of thePurePath API is to provide a consistent array ofuseful path manipulation operations, without exposing a hodge-podge offunctions likeos.path does.
First a couple of conventions:
\\host\share\myfile.txt) always has a drive and a root(here,\\host\share and\, respectively).We will present construction and joining together since they exposesimilar semantics.
The simplest way to construct a path is to pass it its string representation:
>>>PurePath('setup.py')PurePosixPath('setup.py')
Extraneous path separators and"." components are eliminated:
>>>PurePath('a///b/c/./d/')PurePosixPath('a/b/c/d')
If you pass several arguments, they will be automatically joined:
>>>PurePath('docs','Makefile')PurePosixPath('docs/Makefile')
Joining semantics are similar to os.path.join, in that anchored paths ignorethe information from the previously joined components:
>>>PurePath('/etc','/usr','bin')PurePosixPath('/usr/bin')
However, with Windows paths, the drive is retained as necessary:
>>>PureWindowsPath('c:/foo','/Windows')PureWindowsPath('c:/Windows')>>>PureWindowsPath('c:/foo','d:')PureWindowsPath('d:')
Also, path separators are normalized to the platform default:
>>>PureWindowsPath('a/b')==PureWindowsPath('a\\b')True
Extraneous path separators and"." components are eliminated, but not".." components:
>>>PurePosixPath('a//b/./c/')PurePosixPath('a/b/c')>>>PurePosixPath('a/../b')PurePosixPath('a/../b')
Multiple leading slashes are treated differently depending on the pathflavour. They are always retained on Windows paths (because of the UNCnotation):
>>>PureWindowsPath('//some/path')PureWindowsPath('//some/path/')
On POSIX, they are collapsed except if there are exactly two leading slashes,which is a special case in the POSIX specification onpathname resolution(this is also necessary for Cygwin compatibility):
>>>PurePosixPath('///some/path')PurePosixPath('/some/path')>>>PurePosixPath('//some/path')PurePosixPath('//some/path')
Calling the constructor without any argument creates a path object pointingto the logical “current directory” (without looking up its absolute path,which is the job of thecwd() classmethod on concrete paths):
>>>PurePosixPath()PurePosixPath('.')
To represent a path (e.g. to pass it to third-party libraries), just callstr() on it:
>>>p=PurePath('/home/antoine/pathlib/setup.py')>>>str(p)'/home/antoine/pathlib/setup.py'>>>p=PureWindowsPath('c:/windows')>>>str(p)'c:\\windows'
To force the string representation with forward slashes, use theas_posix()method:
>>>p.as_posix()'c:/windows'
To get the bytes representation (which might be useful under Unix systems),callbytes() on it, which internally usesos.fsencode():
>>>bytes(p)b'/home/antoine/pathlib/setup.py'
To represent the path as afile: URI, call theas_uri() method:
>>>p=PurePosixPath('/etc/passwd')>>>p.as_uri()'file:///etc/passwd'>>>p=PureWindowsPath('c:/Windows')>>>p.as_uri()'file:///c:/Windows'
The repr() of a path always uses forward slashes, even under Windows, forreadability and to remind users that forward slashes are ok:
>>>p=PureWindowsPath('c:/Windows')>>>pPureWindowsPath('c:/Windows')
Several simple properties are provided on every path (each can be empty):
>>>p=PureWindowsPath('c:/Downloads/pathlib.tar.gz')>>>p.drive'c:'>>>p.root'\\'>>>p.anchor'c:\\'>>>p.name'pathlib.tar.gz'>>>p.stem'pathlib.tar'>>>p.suffix'.gz'>>>p.suffixes['.tar', '.gz']
A path can be joined with another using the/ operator:
>>>p=PurePosixPath('foo')>>>p/'bar'PurePosixPath('foo/bar')>>>p/PurePosixPath('bar')PurePosixPath('foo/bar')>>>'bar'/pPurePosixPath('bar/foo')
As with the constructor, multiple path components can be specified, eithercollapsed or separately:
>>>p/'bar/xyzzy'PurePosixPath('foo/bar/xyzzy')>>>p/'bar'/'xyzzy'PurePosixPath('foo/bar/xyzzy')
A joinpath() method is also provided, with the same behaviour:
>>>p.joinpath('Python')PurePosixPath('foo/Python')
Thewith_name() method returns a new path, with the name changed:
>>>p=PureWindowsPath('c:/Downloads/pathlib.tar.gz')>>>p.with_name('setup.py')PureWindowsPath('c:/Downloads/setup.py')
It fails with aValueError if the path doesn’t have an actual name:
>>>p=PureWindowsPath('c:/')>>>p.with_name('setup.py')Traceback (most recent call last): File"<stdin>", line1, in<module> File"pathlib.py", line875, inwith_nameraiseValueError("%r has an empty name"%(self,))ValueError:PureWindowsPath('c:/') has an empty name>>>p.name''
Thewith_suffix() method returns a new path with the suffix changed.However, if the path has no suffix, the new suffix is added:
>>>p=PureWindowsPath('c:/Downloads/pathlib.tar.gz')>>>p.with_suffix('.bz2')PureWindowsPath('c:/Downloads/pathlib.tar.bz2')>>>p=PureWindowsPath('README')>>>p.with_suffix('.bz2')PureWindowsPath('README.bz2')
Therelative_to() method computes the relative difference of a path toanother:
>>>PurePosixPath('/usr/bin/python').relative_to('/usr')PurePosixPath('bin/python')
ValueError is raised if the method cannot return a meaningful value:
>>>PurePosixPath('/usr/bin/python').relative_to('/etc')Traceback (most recent call last): File"<stdin>", line1, in<module> File"pathlib.py", line926, inrelative_to.format(str(self),str(formatted)))ValueError:'/usr/bin/python' does not start with '/etc'
Theparts property returns a tuple providing read-only sequence accessto a path’s components:
>>>p=PurePosixPath('/etc/init.d')>>>p.parts('/', 'etc', 'init.d')
Windows paths handle the drive and the root as a single path component:
>>>p=PureWindowsPath('c:/setup.py')>>>p.parts('c:\\', 'setup.py')
(separating them would be wrong, sinceC: is not the parent ofC:\\).
Theparent property returns the logical parent of the path:
>>>p=PureWindowsPath('c:/python33/bin/python.exe')>>>p.parentPureWindowsPath('c:/python33/bin')
Theparents property returns an immutable sequence of the path’slogical ancestors:
>>>p=PureWindowsPath('c:/python33/bin/python.exe')>>>len(p.parents)3>>>p.parents[0]PureWindowsPath('c:/python33/bin')>>>p.parents[1]PureWindowsPath('c:/python33')>>>p.parents[2]PureWindowsPath('c:/')
is_relative() returns True if the path is relative (see definitionabove), False otherwise.
is_reserved() returns True if a Windows path is a reserved path suchasCON orNUL. It always returns False for POSIX paths.
match() matches the path against a glob pattern. It operates onindividual parts and matches from the right:
>>>p=PurePosixPath('/usr/bin')>>>p.match('/usr/b*')True>>>p.match('usr/b*')True>>>p.match('b*')True>>>p.match('/u*')False
This behaviour respects the following expectations:
In addition to the operations of the pure API, concrete paths provideadditional methods which actually access the filesystem to query or mutateinformation.
The classmethodcwd() creates a path object pointing to the currentworking directory in absolute form:
>>>Path.cwd()PosixPath('/home/antoine/pathlib')
Thestat() returns the file’s stat() result; similarly,lstat()returns the file’s lstat() result (which is different iff the file is asymbolic link):
>>>p.stat()posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
Higher-level methods help examine the kind of the file:
>>>p.exists()True>>>p.is_file()True>>>p.is_dir()False>>>p.is_symlink()False>>>p.is_socket()False>>>p.is_fifo()False>>>p.is_block_device()False>>>p.is_char_device()False
The file owner and group names (rather than numeric ids) are queriedthrough corresponding methods:
>>>p=Path('/etc/shadow')>>>p.owner()'root'>>>p.group()'shadow'
Theresolve() method makes a path absolute, resolving any symlink onthe way (like the POSIX realpath() call). It is the only operation whichwill remove “..” path components. On Windows, this method will alsotake care to return the canonical path (with the right casing).
Simple (non-recursive) directory access is done by calling the iterdir()method, which returns an iterator over the child paths:
>>>p=Path('docs')>>>forchildinp.iterdir():child...PosixPath('docs/conf.py')PosixPath('docs/_templates')PosixPath('docs/make.bat')PosixPath('docs/index.rst')PosixPath('docs/_build')PosixPath('docs/_static')PosixPath('docs/Makefile')
This allows simple filtering through list comprehensions:
>>>p=Path('.')>>>[childforchildinp.iterdir()ifchild.is_dir()][PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
Simple and recursive globbing is also provided:
>>>forchildinp.glob('**/*.py'):child...PosixPath('test_pathlib.py')PosixPath('setup.py')PosixPath('pathlib.py')PosixPath('docs/conf.py')PosixPath('build/lib/pathlib.py')
Theopen() method provides a file opening API similar to the builtinopen() method:
>>>p=Path('setup.py')>>>withp.open()asf:f.readline()...'#!/usr/bin/env python3\n'
Several common filesystem operations are provided as methods:touch(),mkdir(),rename(),replace(),unlink(),rmdir(),chmod(),lchmod(),symlink_to(). More operations could beprovided, for example some of the functionality of the shutil module.
Detailed documentation of the proposed API can be found at thepathlibdocs.
The division operator came out first in apoll about the path joiningoperator. Initial versions ofpathlib used square brackets(i.e.__getitem__) instead.
The joinpath() method was initially called join(), but several peopleobjected that it could be confused with str.join() which has differentsemantics. Therefore, it was renamed to joinpath().
Windows users consider filesystem paths to be case-insensitive and expectpath objects to observe that characteristic, even though in some raresituations some foreign filesystem mounts may be case-sensitive underWindows.
In the words of one commenter,
“If glob(”*.py”) failed to find SETUP.PY on Windows, that would be ausability disaster”.—Paul Moore inhttps://mail.python.org/pipermail/python-dev/2013-April/125254.html
This document has been placed into the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0428.rst
Last modified:2025-02-01 08:59:27 GMT