Python allows for a variety of stream-like (a.k.a. file-like) objectsthat can be used viaread() andwrite() calls. Anything thatprovidesread() andwrite() is stream-like. However, moreexotic and extremely useful functions likereadline() orseek() may or may not be available on every stream-like object.Python needs a specification for basic byte-based I/O streams to whichwe can add buffering and text-handling features.
Once we have a defined raw byte-based I/O interface, we can addbuffering and text handling layers on top of any byte-based I/O class.The same buffering and text handling logic can be used for files,sockets, byte arrays, or custom I/O classes developed by Pythonprogrammers. Developing a standard definition of a stream lets usseparate stream-based operations likeread() andwrite() fromimplementation specific operations likefileno() andisatty().It encourages programmers to write code that uses streams as streamsand not require that all streams support file-specific orsocket-specific operations.
The new I/O spec is intended to be similar to the Java I/O libraries,but generally less confusing. Programmers who don’t want to muckabout in the new I/O world can expect that theopen() factorymethod will produce an object backwards-compatible with old-style fileobjects.
The Python I/O Library will consist of three layers: a raw I/O layer,a buffered I/O layer, and a text I/O layer. Each layer is defined byan abstract base class, which may have multiple implementations. Theraw I/O and buffered I/O layers deal with units of bytes, while thetext I/O layer deals with units of characters.
The abstract base class for raw I/O is RawIOBase. It has severalmethods which are wrappers around the appropriate operating systemcalls. If one of these functions would not make sense on the object,the implementation must raise an IOError exception. For example, if afile is opened read-only, the.write() method will raise anIOError. As another example, if the object represents a socket,then.seek(),.tell(), and.truncate() will raise anIOError. Generally, a call to one of these functions maps toexactly one operating system call.
.read(n:int)->bytesRead up tonbytes from the object and return them. Fewerthannbytes may be returned if the operating system callreturns fewer thannbytes. If 0 bytes are returned, thisindicates end of file. If the object is in non-blocking modeand no bytes are available, the call returnsNone.
.readinto(b:bytes)->intRead up tolen(b)bytes from the object and stores them inb, returning the number of bytes read. Like .read, fewerthanlen(b)bytes may be read, and 0 indicates end of file.Noneis returned if a non-blocking object has no bytesavailable. The length ofbis never changed.
.write(b:bytes)->intReturns number of bytes written, which may be<len(b).
.seek(pos:int,whence:int=0)->int
.tell()->int
.truncate(n:int=None)->int
.close()->None
Additionally, it defines a few other methods:
.readable()->boolReturnsTrueif the object was opened for reading,Falseotherwise. IfFalse,.read()will raise anIOErrorif called.
.writable()->boolReturnsTrueif the object was opened for writing,Falseotherwise. IfFalse,.write()and.truncate()will raise anIOErrorif called.
.seekable()->boolReturnsTrueif the object supports random access (such asdisk files), orFalseif the object only supportssequential access (such as sockets, pipes, and ttys). IfFalse,.seek(),.tell(), and.truncate()willraise an IOError if called.
.__enter__()->ContextManagerContext management protocol. Returnsself.
.__exit__(...)->NoneContext management protocol. Same as.close().
If and only if aRawIOBase implementation operates on anunderlying file descriptor, it must additionally provide a.fileno() member function. This could be defined specifically bythe implementation, or a mix-in class could be used (need to decideabout this).
.fileno()->intReturns the underlying file descriptor (an integer)
Initially, three implementations will be provided that implement theRawIOBase interface:FileIO,SocketIO (in the socketmodule), andByteIO. Each implementation must determine whetherthe object supports random access as the information provided by theuser may not be sufficient (consideropen("/dev/tty","rw") oropen("/tmp/named-pipe","rw")). As an example,FileIO candetermine this by calling theseek() system call; if it returns anerror, the object does not support random access. Each implementationmay provided additional methods appropriate to its type. TheByteIO object is analogous to Python 2’scStringIO library,but operating on the new bytes type instead of strings.
The next layer is the Buffered I/O layer which provides more efficientaccess to file-like objects. The abstract base class for all BufferedI/O implementations isBufferedIOBase, which provides similar methodsto RawIOBase:
.read(n:int=-1)->bytesReturns the nextnbytes from the object. It may returnfewer thannbytes if end-of-file is reached or the object isnon-blocking. 0 bytes indicates end-of-file. This method maymake multiple calls toRawIOBase.read()to gather the bytes,or may make no calls toRawIOBase.read()if all of the neededbytes are already buffered.
.readinto(b:bytes)->int
.write(b:bytes)->intWritebbytes to the buffer. The bytes are not guaranteed tobe written to the Raw I/O object immediately; they may bebuffered. Returnslen(b).
.seek(pos:int,whence:int=0)->int
.tell()->int
.truncate(pos:int=None)->int
.flush()->None
.close()->None
.readable()->bool
.writable()->bool
.seekable()->bool
.__enter__()->ContextManager
.__exit__(...)->None
Additionally, the abstract base class provides one member variable:
.rawA reference to the underlyingRawIOBaseobject.
TheBufferedIOBase methods signatures are mostly identical to thatofRawIOBase (exceptions:write() returnsNone,read()’s argument is optional), but may have different semantics.In particular,BufferedIOBase implementations may read more datathan requested or delay writing data using buffers. For the mostpart, this will be transparent to the user (unless, for example, theyopen the same file through a different descriptor). Also, raw readsmay return a short read without any particular reason; buffered readswill only return a short read if EOF is reached; and raw writes mayreturn a short count (even when non-blocking I/O is not enabled!),while buffered writes will raiseIOError when not all bytes couldbe written or buffered.
There are four implementations of theBufferedIOBase abstract baseclass, described below.
BufferedReaderTheBufferedReader implementation is for sequential-accessread-only objects. Its.flush() method is a no-op.
BufferedWriterTheBufferedWriter implementation is for sequential-accesswrite-only objects. Its.flush() method forces all cached data tobe written to the underlying RawIOBase object.
BufferedRWPairTheBufferedRWPair implementation is for sequential-accessread-write objects such as sockets and ttys. As the read and writestreams of these objects are completely independent, it could beimplemented by simply incorporating aBufferedReader andBufferedWriter instance. It provides a.flush() method thathas the same semantics as aBufferedWriter’s.flush() method.
BufferedRandomTheBufferedRandom implementation is for all random-accessobjects, whether they are read-only, write-only, or read-write.Compared to the previous classes that operate on sequential-accessobjects, theBufferedRandom class must contend with the usercalling.seek() to reposition the stream. Therefore, an instanceofBufferedRandom must keep track of both the logical and trueposition within the object. It provides a.flush() method thatforces all cached write data to be written to the underlyingRawIOBase object and all cached read data to be forgotten (so thatfuture reads are forced to go back to the disk).
Q: Do we want to mandate in the specification that switching betweenreading and writing on a read-write object implies a .flush()? Or isthat an implementation convenience that users should not rely on?
For a read-onlyBufferedRandom object,.writable() returnsFalse and the.write() and.truncate() methods throwIOError.
For a write-onlyBufferedRandom object,.readable() returnsFalse and the.read() method throwsIOError.
The text I/O layer provides functions to read and write strings fromstreams. Some new features include universal newlines and characterset encoding and decoding. The Text I/O layer is defined by aTextIOBase abstract base class. It provides several methods thatare similar to theBufferedIOBase methods, but operate on aper-character basis instead of a per-byte basis. These methods are:
.read(n:int=-1)->str
.write(s:str)->int
.tell()->objectReturn a cookie describing the current file position.The only supported use for the cookie is with .seek()with whence set to 0 (i.e. absolute seek).
.seek(pos:object,whence:int=0)->intSeek to positionpos. Ifposis non-zero, it mustbe a cookie returned from.tell()andwhencemust be zero.
.truncate(pos:object=None)->intLikeBufferedIOBase.truncate(), except thatpos(ifnotNone) must be a cookie previously returned by.tell().
Unlike with raw I/O, the units for .seek() are not specified - someimplementations (e.g.StringIO) use characters and others(e.g.TextIOWrapper) use bytes. The special case for zero is toallow going to the start or end of a stream without a prior.tell(). An implementation could include stream encoder state inthe cookie returned from.tell().
TextIOBase implementations also provide several methods that arepass-throughs to the underlyingBufferedIOBase objects:
.flush()->None
.close()->None
.readable()->bool
.writable()->bool
.seekable()->bool
TextIOBase class implementations additionally provide thefollowing methods:
.readline()->strRead until newline or EOF and return the line, or""ifEOF hit immediately.
.__iter__()->IteratorReturns an iterator that returns lines from the file (whichhappens to beself).
.next()->strSame asreadline()except raisesStopIterationif EOFhit immediately.
Two implementations will be provided by the Python library. Theprimary implementation,TextIOWrapper, wraps a Buffered I/Oobject. EachTextIOWrapper object has a property named“.buffer” that provides a reference to the underlyingBufferedIOBase object. Its initializer has the followingsignature:
.__init__(self,buffer,encoding=None,errors=None,newline=None,line_buffering=False)bufferis a reference to theBufferedIOBaseobject tobe wrapped with theTextIOWrapper.
encodingrefers to an encoding to be used for translatingbetween the byte-representation and character-representation.If it isNone, then the system’s locale setting will beused as the default.
errorsis an optional string indicating error handling.It may be set wheneverencodingmay be set. It defaultsto'strict'.
newlinecan beNone,'','\n','\r', or'\r\n'; all other values are illegal. It controls thehandling of line endings. It works as follows:
- On input, if
newlineisNone, universal newlinesmode is enabled. Lines in the input can end in'\n','\r', or'\r\n', and these are translated into'\n'before being returned to the caller. If it is'', universal newline mode is enabled, but line endingsare returned to the caller untranslated. If it has any ofthe other legal values, input lines are only terminated bythe given string, and the line ending is returned to thecaller untranslated. (In other words, translation to'\n'only occurs ifnewlineisNone.)- On output, if
newlineisNone, any'\n'characters written are translated to the system defaultline separator,os.linesep. Ifnewlineis'',no translation takes place. Ifnewlineis any of theother legal values, any'\n'characters written aretranslated to the given string. (Note that the rulesguiding translation are different for output than forinput.)
line_buffering, if True, causeswrite()calls to implyaflush()if the string written contains at least one'\n'or'\r'character. This is set byopen()when it detects that the underlying stream is a TTY device,or when abufferingargument of1is passed.Further notes on the
newlineparameter:
'\r'support is still needed for some OSX applicationsthat produce files using'\r'line endings; Excel (whenexporting to text) and Adobe Illustrator EPS files are themost common examples.- If translation is enabled, it happens regardless of whichmethod is called for reading or writing. For example,
f.read()will always produce the same result as''.join(f.readlines()).- If universal newlines without translation are requested oninput (i.e.
newline=''), if a system read operationreturns a buffer ending in'\r', another system readoperation is done to determine whether it is followed by'\n'or not. In universal newlines mode withtranslation, the second system read operation may bepostponed until the next read request, and if the followingsystem read operation returns a buffer starting with'\n', that character is simply discarded.
Another implementation,StringIO, creates a file-likeTextIOimplementation without an underlying Buffered I/O object. Whilesimilar functionality could be provided by wrapping aBytesIOobject in aTextIOWrapper, theStringIO object allows for muchgreater efficiency as it does not need to actually performing encodingand decoding. A String I/O object can just store the encoded stringas-is. TheStringIO object’s__init__ signature takes anoptional string specifying the initial value; the initial position isalways 0. It does not support encodings or newline translations; youalways read back exactly the characters you wrote.
We should allow changing the encoding and error-handlingsetting later. The behavior of Text I/O operations in the face ofUnicode problems and ambiguities (e.g. diacritics, surrogates, invalidbytes in an encoding) should be the same as that of the unicodeencode()/decode() methods.UnicodeError may be raised.
Implementation note: we should be able to reuse much of theinfrastructure provided by thecodecs module. If it doesn’tprovide the exact APIs we need, we should refactor it to avoidreinventing the wheel.
Non-blocking I/O is fully supported on the Raw I/O level only. If araw object is in non-blocking mode and an operation would block, then.read() and.readinto() returnNone, while.write()returns 0. In order to put an object in non-blocking mode,the user must extract the fileno and do it by hand.
At the Buffered I/O and Text I/O layers, if a read or write fails duea non-blocking condition, they raise anIOError witherrno settoEAGAIN.
Originally, we considered propagating up the Raw I/O behavior, butmany corner cases and problems were raised. To address these issues,significant changes would need to have been made to the Buffered I/Oand Text I/O layers. For example, what should.flush() do on aBuffered non-blocking object? How would the user instruct the objectto “Write as much as you can from your buffer, but don’t block”? Anon-blocking.flush() that doesn’t necessarily flush all availabledata is counter-intuitive. Since non-blocking and blocking objectswould have such different semantics at these layers, it was agreed toabandon efforts to combine them into a single type.
open() Built-in FunctionTheopen() built-in function is specified by the followingpseudo-code:
defopen(filename,mode="r",buffering=None,*,encoding=None,errors=None,newline=None):assertisinstance(filename,(str,int))assertisinstance(mode,str)assertbufferingisNoneorisinstance(buffering,int)assertencodingisNoneorisinstance(encoding,str)assertnewlinein(None,"","\n","\r","\r\n")modes=set(mode)ifmodes-set("arwb+t")orlen(mode)>len(modes):raiseValueError("invalid mode:%r"%mode)reading="r"inmodeswriting="w"inmodesbinary="b"inmodesappending="a"inmodesupdating="+"inmodestext="t"inmodesornotbinaryiftextandbinary:raiseValueError("can't have text and binary mode at once")ifreading+writing+appending>1:raiseValueError("can't have read/write/append mode at once")ifnot(readingorwritingorappending):raiseValueError("must have exactly one of read/write/append mode")ifbinaryandencodingisnotNone:raiseValueError("binary modes doesn't take an encoding arg")ifbinaryanderrorsisnotNone:raiseValueError("binary modes doesn't take an errors arg")ifbinaryandnewlineisnotNone:raiseValueError("binary modes doesn't take a newline arg")# XXX Need to spec the signature for FileIO()raw=FileIO(filename,mode)line_buffering=(buffering==1orbufferingisNoneandraw.isatty())ifline_bufferingorbufferingisNone:buffering=8*1024# International standard buffer size# XXX Try setting it to fstat().st_blksizeifbuffering<0:raiseValueError("invalid buffering size")ifbuffering==0:ifbinary:returnrawraiseValueError("can't have unbuffered text I/O")ifupdating:buffer=BufferedRandom(raw,buffering)elifwritingorappending:buffer=BufferedWriter(raw,buffering)else:assertreadingbuffer=BufferedReader(raw,buffering)ifbinary:returnbufferasserttextreturnTextIOWrapper(buffer,encoding,errors,newline,line_buffering)
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-3116.rst
Last modified:2025-02-01 08:59:27 GMT