This PEP proposes a protocol for classes which represent a file systempath to be able to provide astr orbytes representation.Changes to Python’s standard library are also proposed to utilize thisprotocol where appropriate to facilitate the use of path objects wherehistorically onlystr and/orbytes file system paths areaccepted. The goal is to facilitate the migration of users towardsrich path objects while providing an easy way to work with codeexpectingstr orbytes.
Historically in Python, file system paths have been represented asstrings or bytes. This choice of representation has stemmed from C’sown decision to represent file system paths asconstchar*[3]. While that is a totally serviceableformat to use for file system paths, it’s not necessarily optimal. Atissue is the fact that while all file system paths can be representedas strings or bytes, not all strings or bytes represent a file systempath. This can lead to issues where any e.g. string duck-types to afile system path whether it actually represents a path or not.
To help elevate the representation of file system paths from theirrepresentation as strings and bytes to a richer object representation,the pathlib module[4] was provisionally introduced inPython 3.4 throughPEP 428. While considered by some as an improvementover strings and bytes for file system paths, it has suffered from alack of adoption. Typically the key issue listed for the low adoptionrate has been the lack of support in the standard library. This lackof support required users of pathlib to manually convert path objectsto strings by callingstr(path) which many found error-prone.
One issue in converting path objects to strings comes fromthe fact that the only generic way to get a string representation ofthe path was to pass the object tostr(). This can pose aproblem when done blindly as nearly all Python objects have somestring representation whether they are a path or not, e.g.str(None) will give a result thatbuiltins.open()[5] will happily use to create a newfile.
Exacerbating this whole situation is theDirEntry object[8]. While path objects have arepresentation that can be extracted usingstr(),DirEntryobjects expose apath attribute instead. Having no commoninterface between path objects,DirEntry, and any otherthird-party path library has become an issue. A solution that allowsany path-representing object to declare that it is a path and a wayto extract a low-level representation that all path objects couldsupport is desired.
This PEP then proposes to introduce a new protocol to be followed byobjects which represent file system paths. Providing a protocol allowsfor explicit signaling of what objects represent file system paths aswell as a way to extract a lower-level representation that can be usedwith older APIs which only support strings or bytes.
Discussions regarding path objects that led to this PEP can be foundin multiple threads on the python-ideas mailing list archive[1] for the months of March and April 2016 and onthe python-dev mailing list archives[2] duringApril 2016.
This proposal is split into two parts. One part is the proposal of aprotocol for objects to declare and provide support for exposing afile system path representation. The other part deals with changes toPython’s standard library to support the new protocol. These changeswill also lead to the pathlib module dropping its provisional status.
The following abstract base class defines the protocol for an objectto be considered a path object:
importabcimporttypingastclassPathLike(abc.ABC):"""Abstract base class for implementing the file system path protocol."""@abc.abstractmethoddef__fspath__(self)->t.Union[str,bytes]:"""Return the file system path representation of the object."""raiseNotImplementedError
Objects representing file system paths will implement the__fspath__() method which will return thestr orbytesrepresentation of the path. Thestr representation is thepreferred low-level path representation as it is human-readable andwhat people historically represent paths as.
It is expected that most APIs in Python’s standard library thatcurrently accept a file system path will be updated appropriately toaccept path objects (whether that requires code or simply an updateto documentation will vary). The modules mentioned below, though,deserve specific details as they have either fundamental changes thatempower the ability to use path objects, or entail additions/removalof APIs.
open()[5] will be updated to accept path objects aswell as continue to acceptstr andbytes.
Thefspath() function will be added with the following semantics:
importtypingastdeffspath(path:t.Union[PathLike,str,bytes])->t.Union[str,bytes]:"""Return the string representation of the path. If str or bytes is passed in, it is returned unchanged. If __fspath__() returns something other than str or bytes then TypeError is raised. If this function is given something that is not str, bytes, or os.PathLike then TypeError is raised. """ifisinstance(path,(str,bytes)):returnpath# Work from the object's type to match method resolution of other magic# methods.path_type=type(path)try:path=path_type.__fspath__(path)exceptAttributeError:ifhasattr(path_type,'__fspath__'):raiseelse:ifisinstance(path,(str,bytes)):returnpathelse:raiseTypeError("expected __fspath__() to return str or bytes, ""not "+type(path).__name__)raiseTypeError("expected str, bytes or os.PathLike object, not "+path_type.__name__)
Theos.fsencode()[6] andos.fsdecode()[7] functions will be updated to acceptpath objects. As both functions coerce their arguments tobytes andstr, respectively, they will be updated to call__fspath__() if present to convert the path object to astr orbytes representation, and then perform their appropriatecoercion operations as if the return value from__fspath__() hadbeen the original argument to the coercion function in question.
The addition ofos.fspath(), the updates toos.fsencode()/os.fsdecode(), and the current semantics ofpathlib.PurePath provide the semantics necessary toget the path representation one prefers. For a path object,pathlib.PurePath/Path can be used. To obtain thestr orbytes representation without any coercion, thenos.fspath()can be used. If astr is desired and the encoding ofbytesshould be assumed to be the default file system encoding, thenos.fsdecode() should be used. If abytes representation isdesired and any strings should be encoded using the default filesystem encoding, thenos.fsencode() is used. This PEP recommendsusing path objects when possible and falling back to string paths asnecessary and usingbytes as a last resort.
Another way to view this is as a hierarchy of file system pathrepresentations (highest- to lowest-level): path → str → bytes. Thefunctions and classes under discussion can all accept objects on thesame level of the hierarchy, but they vary in whether they promote ordemote objects to another level. Thepathlib.PurePath class canpromote astr to a path object. Theos.fspath() function candemote a path object to astr orbytes instance, dependingon what__fspath__() returns.Theos.fsdecode() function will demote a path object toa string or promote abytes object to astr. Theos.fsencode() function will demote a path or string object tobytes. There is no function that provides a way to demote a pathobject directly tobytes while bypassing string demotion.
TheDirEntry object[8] will gain an__fspath__()method. It will return the same value as currently found on thepath attribute ofDirEntry instances.
TheProtocol ABC will be added to theos module under the nameos.PathLike.
The various path-manipulation functions ofos.path[9]will be updated to accept path objects. For polymorphic functions thataccept both bytes and strings, they will be updated to simply useos.fspath().
During the discussions leading up to this PEP it was suggested thatos.path not be updated using an “explicit is better than implicit”argument. The thinking was that since__fspath__() is polymorphicitself it may be better to have code working withos.path extractthe path representation from path objects explicitly. There is alsothe consideration that adding support this deep into the low-level OSAPIs will lead to code magically supporting path objects withoutrequiring any documentation updated, leading to potential complaintswhen it doesn’t work, unbeknownst to the project author.
But it is the view of this PEP that “practicality beats purity” inthis instance. To help facilitate the transition to supporting pathobjects, it is better to make the transition as easy as possible thanto worry about unexpected/undocumented duck typing support forpath objects by projects.
There has also been the suggestion thatos.path functions could beused in a tight loop and the overhead of checking or calling__fspath__() would be too costly. In this scenario onlypath-consuming APIs would be directly updated and path-manipulatingAPIs like the ones inos.path would go unmodified. This wouldrequire library authors to update their code to support path objectsif they performed any path manipulations, but if the library codepassed the path straight through then the library wouldn’t need to beupdated. It is the view of this PEP and Guido, though, that this is anunnecessary worry and that performance will still be acceptable.
The constructor forpathlib.PurePath andpathlib.Path will beupdated to acceptPathLike objects. BothPurePath andPathwill continue to not acceptbytes path representations, and so if__fspath__() returnsbytes it will raise an exception.
Thepath attribute will be removed as this PEP makes itredundant (it has not been included in any released version of Pythonand so is not a backwards-compatibility concern).
The C API will gain an equivalent function toos.fspath():
/* Return the file system path representation of the object. If the object is str or bytes, then allow it to pass through with an incremented refcount. If the object defines __fspath__(), then return the result of that method. All other types raise a TypeError.*/PyObject *PyOS_FSPath(PyObject *path){ _Py_IDENTIFIER(__fspath__); PyObject *func = NULL; PyObject *path_repr = NULL; if (PyUnicode_Check(path) || PyBytes_Check(path)) { Py_INCREF(path); return path; } func = _PyObject_LookupSpecial(path, &PyId___fspath__); if (NULL == func) { return PyErr_Format(PyExc_TypeError, "expected str, bytes or os.PathLike object, " "not %S", path->ob_type); } path_repr = PyObject_CallFunctionObjArgs(func, NULL); Py_DECREF(func); if (!PyUnicode_Check(path_repr) && !PyBytes_Check(path_repr)) { Py_DECREF(path_repr); return PyErr_Format(PyExc_TypeError, "expected __fspath__() to return str or bytes, " "not %S", path_repr->ob_type); } return path_repr;}There are no explicit backwards-compatibility concerns. Unless anobject incidentally already defines a__fspath__() method there isno reason to expect the pre-existing code to break or expect to haveits semantics implicitly changed.
Libraries wishing to support path objects and a version of Pythonprior to Python 3.6 and the existence ofos.fspath() can use theidiom ofpath.__fspath__()ifhasattr(path,"__fspath__")elsepath.
This is the task list for what this PEP proposes to be changed inPython 3.6:
path attribute from pathlib(done)os.PathLike(code anddocs done)PyOS_FSPath()(code anddocs done)os.fspath()(done <done)os.fsencode()(done)os.fsdecode()(done)pathlib.PurePath andpathlib.Path(done)__fspath__()os.PathLike support to the constructors__fspath__() toDirEntry(done)builtins.open()(done)os.path(done)Various names were proposed during discussions leading to this PEP,including__path__,__pathname__, and__fspathname__. Inthe end people seemed to gravitate towards__fspath__ for beingunambiguous without being unnecessarily long.
At one point it was suggested that__fspath__() only returnstrings and another method named__fspathb__() be introduced toreturn bytes. The thinking is that by making__fspath__() not bepolymorphic it could make dealing with the potential string or bytesrepresentations easier. But the general consensus was that returningbytes will more than likely be rare and that the various functions inthe os module are the better abstraction to promote over directcalls to__fspath__().
path attributeTo help deal with the issue ofpathlib.PurePath not inheritingfromstr, originally it was proposed to introduce apathattribute to mirror whatos.DirEntry provides. In the end,though, it was determined that a protocol would provide the sameresult while not directly exposing an API that most people will neverneed to interact with directly.
__fspath__() only return stringsMuch of the discussion that led to this PEP revolved around whether__fspath__() should be polymorphic and returnbytes as well asstr or only returnstr. The general sentiment for this viewwas thatbytes are difficult to work with due to theirinherent lack of information about their encoding andPEP 383 makesit possible to represent all file system paths usingstr with thesurrogateescape handler. Thus, it would be better to forciblypromote the use ofstr as the low-level path representation forhigh-level path objects.
In the end, it was decided that usingbytes to represent paths issimply not going to go away and thus they should be supported to somedegree. The hope is that people will gravitate towards path objectslike pathlib and that will move people away from operating directlywithbytes.
At one point there was a discussion of developing a generic mechanismto extract a string representation of an object that had semanticmeaning (__str__() does not necessarily return anything ofsemantic significance beyond what may be helpful for debugging). Inthe end, it was deemed to lack a motivating need beyond the one thisPEP is trying to solve in a specific fashion.
It was briefly considered to have__fspath__ be an attributeinstead of a method. This was rejected for two reasons. One,historically protocols have been implemented as “magic methods” andnot “magic methods and attributes”. Two, there is no guarantee thatthe lower-level representation of a path object will be pre-computed,potentially misleading users that there was no expensive computationbehind the scenes in case the attribute was implemented as a property.
This also indirectly ties into the idea of introducing apathattribute to accomplish the same thing. This idea has an added issue,though, of accidentally having any object with apath attributemeet the protocol’s duck typing. Introducing a new magic method forthe protocol helpfully avoids any accidental opting into the protocol.
There was some consideration to providing a generictyping.PathLikeclass which would allow for e.g.typing.PathLike[str] to specifya type hint for a path object which returned a string representation.While potentially beneficial, the usefulness was deemed too small tobother adding the type hint class.
This also removed any desire to have a class in thetyping modulewhich represented the union of all acceptable path-representing typesas that can be represented withtyping.Union[str,bytes,os.PathLike] easily enough and the hopeis users will slowly gravitate to path objects only.
os.fspathb()It was suggested that to mirror the structure of e.g.os.getcwd()/os.getcwdb(), thatos.fspath() only returnstr and that another function namedos.fspathb() beintroduced that only returnedbytes. This was rejected as thepurposes of the*b() functions are tied to querying the filesystem where there is a need to get the raw bytes back. As this PEPdoes not work directly with data on a file system (but whichmaybe), the view was taken this distinction is unnecessary. It’s alsobelieved that the need for only bytes will not be common enough toneed to support in such a specific manner asos.fsencode() willprovide similar functionality.
__fspath__() off of the instanceAn earlier draft of this PEP hados.fspath() callingpath.__fspath__() instead oftype(path).__fspath__(path). Thechanged to be consistent with how other magic methods in Python areresolved.
Thanks to everyone who participated in the various discussions relatedto this PEP that spanned both python-ideas and python-dev. Specialthanks to Stephen Turnbull for direct feedback on early drafts of thisPEP. More special thanks to Koos Zevenhoven and Ethan Furman for notonly feedback on early drafts of this PEP but also helping to drivethe overall discussion on this topic across the two mailing lists.
open() documentation for the C standard library(http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html)pathlib module(https://docs.python.org/3/library/pathlib.html#module-pathlib)builtins.open() function(https://docs.python.org/3/library/functions.html#open)os.fsencode() function(https://docs.python.org/3/library/os.html#os.fsencode)os.fsdecode() function(https://docs.python.org/3/library/os.html#os.fsdecode)os.DirEntry class(https://docs.python.org/3/library/os.html#os.DirEntry)os.path module(https://docs.python.org/3/library/os.path.html#module-os.path)This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0519.rst
Last modified:2025-02-01 08:59:27 GMT