- Notifications
You must be signed in to change notification settings - Fork301
File Descriptors in IronPython
The conceptual picture of file descriptors (FDs) usage on Windows, for the most interesting case ofFileStream
:
graph LR;FileIO --> StreamBox --> FileStream --> Handle(Handle) --> OSFile[OS File];FD(FD) <--> StreamBox;
Conceptually, the relationship betweenFD
(a number) andStreamBox
(a class) is bidirectional becausePythonFileManager
(a global singleton) maintains the association between the two so it is cost-free to obtain the one having the other. FD is not the same as the handle, which is created by the OS. FD is an emulated (fake) file descriptor, assigned by thePythonFileManager
, for the purpose of supporting the Python API. The descriptors are allocated lazily, i.e. only if the user code makes an API call that accesses it. Once assigned, the descriptor does not change. The FD number is released once the FD is closed (or the associatedFileIO
is closed and hadclosefd
set to true.)
It is possible to have the structure above withoutFileIO
; for instance when an OS file is opened with one of the low-level functions inos
, or when an existing FD is duplicated. It is also possible to associate one FD with severalFileIO
. In such cases it is the responsibility of the user code to take care that the FD is closed at the right time.
When an FD is duplicated (usingdup
ordup2
), the associatedStreamBox
is duplicated too (there is always a 1-to-1 relationship between FD andStreamBox
), but the underlyingFileStream
object remains the same, and so is the underlying OS handle. The new FD may be used to create aFileIO
(or several, just as for the original FD). All read/seek/write operations on both descriptors go though the sameFileStream
object and the same OS handle.
graph LR;FD1(FD1) <--> StreamBox --> FileStream --> Handle(Handle) --> OSFile[OS File];FD2(FD2) <--> StreamBox2[StreamBox] --> FileStream;
The descriptors can be closed independently, and the underlyingFileStream
is closed when the lastStreamBox
using it is closed.
On Unix-like systems (Linux, maxOS),FileStream
uses the actual file descriptor as the handle. In the past. IronPython was ignoring this and still issuing its own fake file descriptors as it is in the case of Windows. Now, however, the genuine FD is extracted from the handle and used as FD at thePythonFileManager
level, ensuring that clients of Python API obtain the genuine FD.
graph LR;FileIO --> StreamBox --> FileStream --> FDH(FD) --> OSFile[OS File];FD(FD) <--> StreamBox;
When a file descriptor FD is duplicated, the actual OS call is made to create the duplicate FD2. In order to use FD2 directly, a newStream
object has to be created around it.
The straightforward solution is to create anotherFileStream
using the constructor that accepts an already opened file descriptor.
graph LR;FD1(FD1) <--> StreamBox --> FileStream --> FDH1(FD1) --> OSFile[OS File];FD2(FD2) <--> StreamBox2[StreamBox] --> FileStream2[FileStream] --> FDH2(FD2) --> OSFile;
In this way, the file descriptor on thePythonFileManager
level is the same as the file descriptor used byFileStream
.
Unfortunately, on .NET, somehow, twoFileStream
instances using the same file descriptor will have the two independent read/write positions. This is not how duplicated file descriptors should work: both descriptors should point to the same file description structure and share the read/seek/write position. In practice, on .NET, writing through the second file object will overwrite data already written through the first file object. In regular Unix applications (incl. CPython), the subsequent writes append data, regardless which file object is used. The same principle should apply to reads.
Also unfortunately, on Mono, theFileStream
constructor accepts only descriptors opened by another call to aFileStream
constructor[1]. So descriptors obtained from direct OS calls, likeopen
,creat
,dup
,dup2
are being rejected.
On .NET,FileStream
that was backing an openFileIO
or an open FD from a direct call toos.open
has been replaced byPosixFileStream
. This class operates directly on the given file descriptor providing unbuffered file access, and replicating CPython's behaviour. So, a duplicated file descriptor looks like in the following diagram:
graph LR;FD1(FD1) <--> StreamBox --> PosixFileStream --> FDH1(FD1) --> OSFile[OS File];FD2(FD2) <--> StreamBox2[StreamBox] --> PosixFileStream2[PosixFileStream] --> FDH2(FD2) --> OSFile;
The solution on .NET 6 is the same as on .NET 8:PosixFileStream
is used instead ofFileStream
. However, an issue arises when anmmap
object is requested for a given FD.mmap
implementation is backed byMemoryMappedFile
from the .NET library. On .NET 8, aMemoryMappedFile
instance can be created from a given FD. .NET 6 lacks this constructor and only acceptsFileStream
(for maps that are backed by a regular file). Therefore, for the purpose of supportingMemoryMappedFile
, a deficatedFileStream
is created around the given FD. This instance ofFileStream
is not registered withPythonFileManager
but managed directly byMmapDefault
, which implementsmmap
.
graph LR;FD(FD) <--> StreamBox --> PosixFileStream --> FDH(FD) --> OSFile[OS File];MmapDefault --> FileStream2[FileStream] --> FDH;
To use system-opened file descriptors on Mono,UnixStream
could be used instead ofFileStream
.
graph LR;FD1(FD1) <--> StreamBox --> FileStream --> FDH1(FD1) --> OSFile[OS File];FD2(FD2) <--> StreamBox2[StreamBox] --> UnixStream --> FDH2(FD2) --> OSFile;
SinceFileIO
works with various types of the underlyingStream
, usingUnixStream
should be OK.
AlthoughUnixStream
is available in .NET through packageMono.Posix
, this solution still does not work around desynchronized read/write position, whichFileStream
using the original FD1 must somehow maintain independently.
Another problem with usingUnixStream
is that this class is unsuitable to createMemoryMappedFile
, which on Mono (like on .NET before 8.0) has to be created by being givenFileStream
(for file-backed mmaps). Therefore, on Mono,FileStream
is being used as the backing forFileIO
and a naked FD, just as it is the case on Windows. The difference with Windows is, however, is thatPythonFileManager
uses actual FDs when managing files, not emulated ones. When those actual descriptors are being duplicated, the code tries first to useFileStream
to access the duplicated descriptor. This leads to a situation described in the "Straightforward Mechanism" section, with all caveats listed there. If usingFileStream
fails,UnixStream
is employed, as presented in the diagram above.
As mentioned before, usingUnixStream
may lead to problems when such FD is used to createmmap
, butmmap
created on a file opened regularly (not duplicated) will work.
In Python, a file can be opened with mode "ab+". The file is opened for appending to the end (created if not exists), and the+
means that it is also opened for updating. i.e. reading and writing. The file pointer is initially set at the end of the file (ready to write to append) but can be moved around to read already existing data. However, each write will append data to the end and reset the read/write pointer at the end again.
This opening mode is not supported byFileStream
. On platforms that don't rely onFileStream
(.NET 6.0+/POSIX), this is not an issue asPosixFileStream
handles it the same way as CPython. On other plaforms (Windows — all frameworks, Mono) mode "ab+" is simulated by using two file streams, one for reading and one for writing. Both are maintained in a singleStreamBox
but will have different file handles (Mono: file descriptors).
graph LR;FileIO --> StreamBox --> FileStreamR["FileStream (R)"] --> HandleR("Handle (R)") --> OSFile[OS File];StreamBox --> FileStreamW["FileStream (W)"] --> HandleW("Handle (W)") --> OSFile;FD(FD) <--> StreamBox;
On Windows, since a file descriptor is emulated, this does not create problems. The question might arise whichFileStream
should be used as backing forMemoryMappedFile
but it is not relevant since file opened in mode "a" is not suitable to be used formmap
anyway.
On Mono, the file desriptor reported by such combo is a genuine descriptor of the write-stream. When the descriptor is duplicated, it is the write-stream's descriptor that gets duplicated, with the exception that if the target FD (usingdup2
) is 0 (stdin
), the read-stream's descriptor gets duplicated.
Still looking for more? Browse theDiscussions tab, where you can ask questions to the IronPython community.
🐍IronPython