Python Enhancement Proposals

Python »
PEP Index »
PEP 3116

PEP 3116 – New I/O

Author:: Daniel Stutzbach <daniel at stutzbachenterprises.com>,Guido van Rossum <guido at python.org>,Mike Verdone <mike.verdone at gmail.com>
Status:

Rationale and Goals

Python allows for a variety of stream-like (a.k.a. file-like) objectsthat can be used viaread() andwrite() calls. Anything thatprovidesread() andwrite() is stream-like. However, moreexotic and extremely useful functions likereadline() orseek() may or may not be available on every stream-like object.Python needs a specification for basic byte-based I/O streams to whichwe can add buffering and text-handling features.

Once we have a defined raw byte-based I/O interface, we can addbuffering and text handling layers on top of any byte-based I/O class.The same buffering and text handling logic can be used for files,sockets, byte arrays, or custom I/O classes developed by Pythonprogrammers. Developing a standard definition of a stream lets usseparate stream-based operations likeread() andwrite() fromimplementation specific operations likefileno() andisatty().It encourages programmers to write code that uses streams as streamsand not require that all streams support file-specific orsocket-specific operations.

The new I/O spec is intended to be similar to the Java I/O libraries,but generally less confusing. Programmers who don’t want to muckabout in the new I/O world can expect that theopen() factorymethod will produce an object backwards-compatible with old-style fileobjects.

Specification

The Python I/O Library will consist of three layers: a raw I/O layer,a buffered I/O layer, and a text I/O layer. Each layer is defined byan abstract base class, which may have multiple implementations. Theraw I/O and buffered I/O layers deal with units of bytes, while thetext I/O layer deals with units of characters.

Raw I/O

The abstract base class for raw I/O is RawIOBase. It has severalmethods which are wrappers around the appropriate operating systemcalls. If one of these functions would not make sense on the object,the implementation must raise an IOError exception. For example, if afile is opened read-only, the.write() method will raise anIOError. As another example, if the object represents a socket,then.seek(),.tell(), and.truncate() will raise anIOError. Generally, a call to one of these functions maps toexactly one operating system call.

.read(n:int)->bytes
Read up ton bytes from the object and return them. Fewerthann bytes may be returned if the operating system callreturns fewer thann bytes. If 0 bytes are returned, thisindicates end of file. If the object is in non-blocking modeand no bytes are available, the call returnsNone.
.readinto(b:bytes)->int
Read up tolen(b) bytes from the object and stores them inb, returning the number of bytes read. Like .read, fewerthanlen(b) bytes may be read, and 0 indicates end of file.None is returned if a non-blocking object has no bytesavailable. The length ofb is never changed.
.write(b:bytes)->int
Returns number of bytes written, which may be<len(b).
.seek(pos:int,whence:int=0)->int
.tell()->int
.truncate(n:int=None)->int
.close()->None

Additionally, it defines a few other methods:

.readable()->bool
ReturnsTrue if the object was opened for reading,False otherwise. IfFalse,.read() will raise anIOError if called.
.writable()->bool
ReturnsTrue if the object was opened for writing,False otherwise. IfFalse,.write() and.truncate() will raise anIOError if called.
.seekable()->bool
ReturnsTrue if the object supports random access (such asdisk files), orFalse if the object only supportssequential access (such as sockets, pipes, and ttys). IfFalse,.seek(),.tell(), and.truncate() willraise an IOError if called.
.__enter__()->ContextManager
Context management protocol. Returnsself.
.__exit__(...)->None
Context management protocol. Same as.close().

If and only if aRawIOBase implementation operates on anunderlying file descriptor, it must additionally provide a.fileno() member function. This could be defined specifically bythe implementation, or a mix-in class could be used (need to decideabout this).

.fileno()->int
Returns the underlying file descriptor (an integer)

Initially, three implementations will be provided that implement theRawIOBase interface:FileIO,SocketIO (in the socketmodule), andByteIO. Each implementation must determine whetherthe object supports random access as the information provided by theuser may not be sufficient (consideropen("/dev/tty","rw") oropen("/tmp/named-pipe","rw")). As an example,FileIO candetermine this by calling theseek() system call; if it returns anerror, the object does not support random access. Each implementationmay provided additional methods appropriate to its type. TheByteIO object is analogous to Python 2’scStringIO library,but operating on the new bytes type instead of strings.

Buffered I/O

The next layer is the Buffered I/O layer which provides more efficientaccess to file-like objects. The abstract base class for all BufferedI/O implementations isBufferedIOBase, which provides similar methodsto RawIOBase:

.read(n:int=-1)->bytes
Returns the nextn bytes from the object. It may returnfewer thann bytes if end-of-file is reached or the object isnon-blocking. 0 bytes indicates end-of-file. This method maymake multiple calls toRawIOBase.read() to gather the bytes,or may make no calls toRawIOBase.read() if all of the neededbytes are already buffered.
.readinto(b:bytes)->int
.write(b:bytes)->int
Writeb bytes to the buffer. The bytes are not guaranteed tobe written to the Raw I/O object immediately; they may bebuffered. Returnslen(b).
.seek(pos:int,whence:int=0)->int
.tell()->int
.truncate(pos:int=None)->int
.flush()->None
.close()->None
.readable()->bool
.writable()->bool
.seekable()->bool
.__enter__()->ContextManager
.__exit__(...)->None

Additionally, the abstract base class provides one member variable:

.raw
A reference to the underlyingRawIOBase object.

TheBufferedIOBase methods signatures are mostly identical to thatofRawIOBase (exceptions:write() returnsNone,read()’s argument is optional), but may have different semantics.In particular,BufferedIOBase implementations may read more datathan requested or delay writing data using buffers. For the mostpart, this will be transparent to the user (unless, for example, theyopen the same file through a different descriptor). Also, raw readsmay return a short read without any particular reason; buffered readswill only return a short read if EOF is reached; and raw writes mayreturn a short count (even when non-blocking I/O is not enabled!),while buffered writes will raiseIOError when not all bytes couldbe written or buffered.

There are four implementations of theBufferedIOBase abstract baseclass, described below.

`BufferedReader`

TheBufferedReader implementation is for sequential-accessread-only objects. Its.flush() method is a no-op.

`BufferedWriter`

TheBufferedWriter implementation is for sequential-accesswrite-only objects. Its.flush() method forces all cached data tobe written to the underlying RawIOBase object.

`BufferedRWPair`

TheBufferedRWPair implementation is for sequential-accessread-write objects such as sockets and ttys. As the read and writestreams of these objects are completely independent, it could beimplemented by simply incorporating aBufferedReader andBufferedWriter instance. It provides a.flush() method thathas the same semantics as aBufferedWriter’s.flush() method.

`BufferedRandom`

TheBufferedRandom implementation is for all random-accessobjects, whether they are read-only, write-only, or read-write.Compared to the previous classes that operate on sequential-accessobjects, theBufferedRandom class must contend with the usercalling.seek() to reposition the stream. Therefore, an instanceofBufferedRandom must keep track of both the logical and trueposition within the object. It provides a.flush() method thatforces all cached write data to be written to the underlyingRawIOBase object and all cached read data to be forgotten (so thatfuture reads are forced to go back to the disk).

Q: Do we want to mandate in the specification that switching betweenreading and writing on a read-write object implies a .flush()? Or isthat an implementation convenience that users should not rely on?

For a read-onlyBufferedRandom object,.writable() returnsFalse and the.write() and.truncate() methods throwIOError.

For a write-onlyBufferedRandom object,.readable() returnsFalse and the.read() method throwsIOError.

Text I/O

The text I/O layer provides functions to read and write strings fromstreams. Some new features include universal newlines and characterset encoding and decoding. The Text I/O layer is defined by aTextIOBase abstract base class. It provides several methods thatare similar to theBufferedIOBase methods, but operate on aper-character basis instead of a per-byte basis. These methods are:

.read(n:int=-1)->str
.write(s:str)->int
.tell()->object
Return a cookie describing the current file position.The only supported use for the cookie is with .seek()with whence set to 0 (i.e. absolute seek).
.seek(pos:object,whence:int=0)->int
Seek to positionpos. Ifpos is non-zero, it mustbe a cookie returned from.tell() andwhence must be zero.
.truncate(pos:object=None)->int
LikeBufferedIOBase.truncate(), except thatpos (ifnotNone) must be a cookie previously returned by.tell().

Unlike with raw I/O, the units for .seek() are not specified - someimplementations (e.g.StringIO) use characters and others(e.g.TextIOWrapper) use bytes. The special case for zero is toallow going to the start or end of a stream without a prior.tell(). An implementation could include stream encoder state inthe cookie returned from.tell().

TextIOBase implementations also provide several methods that arepass-throughs to the underlyingBufferedIOBase objects:

.flush()->None
.close()->None
.readable()->bool
.writable()->bool
.seekable()->bool

TextIOBase class implementations additionally provide thefollowing methods:

.readline()->str
Read until newline or EOF and return the line, or"" ifEOF hit immediately.
.__iter__()->Iterator
Returns an iterator that returns lines from the file (whichhappens to beself).
.next()->str
Same asreadline() except raisesStopIteration if EOFhit immediately.

Two implementations will be provided by the Python library. Theprimary implementation,TextIOWrapper, wraps a Buffered I/Oobject. EachTextIOWrapper object has a property named“.buffer” that provides a reference to the underlyingBufferedIOBase object. Its initializer has the followingsignature:

.__init__(self,buffer,encoding=None,errors=None,newline=None,line_buffering=False)
buffer is a reference to theBufferedIOBase object tobe wrapped with theTextIOWrapper.
encoding refers to an encoding to be used for translatingbetween the byte-representation and character-representation.If it isNone, then the system’s locale setting will beused as the default.
errors is an optional string indicating error handling.It may be set wheneverencoding may be set. It defaultsto'strict'.
newline can beNone,'','\n','\r', or'\r\n'; all other values are illegal. It controls thehandling of line endings. It works as follows:
On input, ifnewline isNone, universal newlinesmode is enabled. Lines in the input can end in'\n','\r', or'\r\n', and these are translated into'\n' before being returned to the caller. If it is'', universal newline mode is enabled, but line endingsare returned to the caller untranslated. If it has any ofthe other legal values, input lines are only terminated bythe given string, and the line ending is returned to thecaller untranslated. (In other words, translation to'\n' only occurs ifnewline isNone.)
On output, ifnewline isNone, any'\n'characters written are translated to the system defaultline separator,os.linesep. Ifnewline is'',no translation takes place. Ifnewline is any of theother legal values, any'\n' characters written aretranslated to the given string. (Note that the rulesguiding translation are different for output than forinput.)
line_buffering, if True, causeswrite() calls to implyaflush() if the string written contains at least one'\n' or'\r' character. This is set byopen()when it detects that the underlying stream is a TTY device,or when abuffering argument of1 is passed.
Further notes on thenewline parameter:
'\r' support is still needed for some OSX applicationsthat produce files using'\r' line endings; Excel (whenexporting to text) and Adobe Illustrator EPS files are themost common examples.
If translation is enabled, it happens regardless of whichmethod is called for reading or writing. For example,f.read() will always produce the same result as''.join(f.readlines()).
If universal newlines without translation are requested oninput (i.e.newline=''), if a system read operationreturns a buffer ending in'\r', another system readoperation is done to determine whether it is followed by'\n' or not. In universal newlines mode withtranslation, the second system read operation may bepostponed until the next read request, and if the followingsystem read operation returns a buffer starting with'\n', that character is simply discarded.

Another implementation,StringIO, creates a file-likeTextIOimplementation without an underlying Buffered I/O object. Whilesimilar functionality could be provided by wrapping aBytesIOobject in aTextIOWrapper, theStringIO object allows for muchgreater efficiency as it does not need to actually performing encodingand decoding. A String I/O object can just store the encoded stringas-is. TheStringIO object’s__init__ signature takes anoptional string specifying the initial value; the initial position isalways 0. It does not support encodings or newline translations; youalways read back exactly the characters you wrote.

Unicode encoding/decoding Issues

We should allow changing the encoding and error-handlingsetting later. The behavior of Text I/O operations in the face ofUnicode problems and ambiguities (e.g. diacritics, surrogates, invalidbytes in an encoding) should be the same as that of the unicodeencode()/decode() methods.UnicodeError may be raised.

Implementation note: we should be able to reuse much of theinfrastructure provided by thecodecs module. If it doesn’tprovide the exact APIs we need, we should refactor it to avoidreinventing the wheel.

Non-blocking I/O

Non-blocking I/O is fully supported on the Raw I/O level only. If araw object is in non-blocking mode and an operation would block, then.read() and.readinto() returnNone, while.write()returns 0. In order to put an object in non-blocking mode,the user must extract the fileno and do it by hand.

At the Buffered I/O and Text I/O layers, if a read or write fails duea non-blocking condition, they raise anIOError witherrno settoEAGAIN.

Originally, we considered propagating up the Raw I/O behavior, butmany corner cases and problems were raised. To address these issues,significant changes would need to have been made to the Buffered I/Oand Text I/O layers. For example, what should.flush() do on aBuffered non-blocking object? How would the user instruct the objectto “Write as much as you can from your buffer, but don’t block”? Anon-blocking.flush() that doesn’t necessarily flush all availabledata is counter-intuitive. Since non-blocking and blocking objectswould have such different semantics at these layers, it was agreed toabandon efforts to combine them into a single type.

The`open()` Built-in Function

Theopen() built-in function is specified by the followingpseudo-code:

defopen(filename,mode="r",buffering=None,*,encoding=None,errors=None,newline=None):assertisinstance(filename,(str,int))assertisinstance(mode,str)assertbufferingisNoneorisinstance(buffering,int)assertencodingisNoneorisinstance(encoding,str)assertnewlinein(None,"","\n","\r","\r\n")modes=set(mode)ifmodes-set("arwb+t")orlen(mode)>len(modes):raiseValueError("invalid mode:%r"%mode)reading="r"inmodeswriting="w"inmodesbinary="b"inmodesappending="a"inmodesupdating="+"inmodestext="t"inmodesornotbinaryiftextandbinary:raiseValueError("can't have text and binary mode at once")ifreading+writing+appending>1:raiseValueError("can't have read/write/append mode at once")ifnot(readingorwritingorappending):raiseValueError("must have exactly one of read/write/append mode")ifbinaryandencodingisnotNone:raiseValueError("binary modes doesn't take an encoding arg")ifbinaryanderrorsisnotNone:raiseValueError("binary modes doesn't take an errors arg")ifbinaryandnewlineisnotNone:raiseValueError("binary modes doesn't take a newline arg")# XXX Need to spec the signature for FileIO()raw=FileIO(filename,mode)line_buffering=(buffering==1orbufferingisNoneandraw.isatty())ifline_bufferingorbufferingisNone:buffering=8*1024# International standard buffer size# XXX Try setting it to fstat().st_blksizeifbuffering<0:raiseValueError("invalid buffering size")ifbuffering==0:ifbinary:returnrawraiseValueError("can't have unbuffered text I/O")ifupdating:buffer=BufferedRandom(raw,buffering)elifwritingorappending:buffer=BufferedWriter(raw,buffering)else:assertreadingbuffer=BufferedReader(raw,buffering)ifbinary:returnbufferasserttextreturnTextIOWrapper(buffer,encoding,errors,newline,line_buffering)

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-3116.rst

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換

PEP 3116 – New I/O