Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

Ben Hoytbenhoyt at gmail.com
Sat Jun 28 21:48:03 CEST 2014


>> But the underlying system calls -- ``FindFirstFile`` />> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -->> What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir?I guess it'd be better to say "Windows" and "Unix-based OSs"throughout the PEP? Because all of these (including Mac OS X) areUnix-based.> It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we> should mimic stat_result recent addition: the new> stat_result.file_attributes field. Add DirEntry.file_attributes which> would only be available on Windows.>> The Windows structure also contains>>   FILETIME ftCreationTime;>   FILETIME ftLastAccessTime;>   FILETIME ftLastWriteTime;>   DWORD    nFileSizeHigh;>   DWORD    nFileSizeLow;>> It would be nice to expose them as well. I'm  no more surprised that> the exact API is different depending on the OS for functions of the os> module.I think you've misunderstood how DirEntry.lstat() works on Windows --it's basically a no-op, as Windows returns the full stat informationwith the original FindFirst/FindNext OS calls. This is fairly explictin the PEP, but I'm sure I could make it clearer:    DirEntry.lstat(): "like os.lstat(), but requires no system calls on WindowsSo you can already get the dwFileAttributes for free by sayingentry.lstat().st_file_attributes. You can also get all the otherfields you mentioned for free via .lstat() with no additional OS callson Windows, for example: entry.lstat().st_size.Feel free to suggest changes to the PEP or scandir docs if this isn'tclear. Note that is_dir()/is_file()/is_symlink() are free on allsystems, but .lstat() is only free on Windows.> Does your implementation uses a free list to avoid the cost of memory> allocation? A short free list of 10 or maybe just 1 may help. The free> list may be stored directly in the generator object.No, it doesn't. I might add this to the PEP under "possibleimprovements". However, I think the speed increase by removing theextra OS call and/or disk seek is going to be way more than memoryallocation improvements, so I'm not sure this would be worth it.> Does it support also bytes filenames on UNIX?> Python now supports undecodable filenames thanks to the PEP 383> (surrogateescape). I prefer to use the same type for filenames on> Linux and Windows, so Unicode is better. But some users might prefer> bytes for other reasons.I forget exactly now what my scandir module does, but for os.scandir()I think this should behave exactly like os.listdir() does forUnicode/bytes filenames.> Crazy idea: would it be possible to "convert" a DirEntry object to a> pathlib.Path object without losing the cache? I guess that> pathlib.Path expects a full  stat_result object.The main problem is that pathlib.Path objects explicitly don't cachestat info (and Guido doesn't want them to, for good reason I think).There's a thread on python-dev about this earlier. I'll add it to a"Rejected ideas" section.> I don't understand how you can build a full lstat() result without> really calling stat. I see that WIN32_FIND_DATA contains the size, but> here you call lstat().See above.> Do you plan to continue to maintain your module for Python < 3.5, but> upgrade your module for the final PEP?Yes, I intend to maintain the standalone scandir module for 2.6 <=Python < 3.5, at least for a good while. For integration into thePython 3.5 stdlib, the implementation will be integrated intoposixmodule.c, of course.>> Should there be a way to access the full path?>> ---------------------------------------------->>>> Should ``DirEntry``'s have a way to get the full path without using>> ``os.path.join(path, entry.name)``? This is a pretty common pattern,>> and it may be useful to add pathlib-like ``str(entry)`` functionality.>> This functionality has also been requested in `issue 13`_ on GitHub.>>>> .. _`issue 13`:https://github.com/benhoyt/scandir/issues/13>> I think that it would be very convinient to store the directory name> in the DirEntry. It should be light, it's just a reference.>> And provide a fullname() name which would just return> os.path.join(path, entry.name) without trying to resolve path to get> an absolute path.Yeah, fair suggestion. I'm still slightly on the fence about this, butI think an explicit fullname() is a good suggestion. Ideally I thinkit'd be better to mimic pathlib.Path.__str__() which is kind of theequivalent of fullname(). But how does pathlib deal with unicode/bytesissues if it's the str function which has to return a str object? Orat least, it'd be very weird if __str__() returned bytes. But I thinkit'd need to if you passed bytes into scandir(). Do others havethoughts?> Would it be hard to implement the wildcard feature on UNIX to compare> performances of scandir('*.jpg') with and without the wildcard built> in os.scandir?It's a good idea, the problem with this is that the Windows wildcardimplementation has a bunch of crazy edge cases where *.ext will catchmore things than just a simple regex/glob. This was discussed onpython-dev or python-ideas previously, so I'll dig it up and add to aRejected Ideas section. In any case, this could be added later ifthere's a way to iron out the Windows quirks.-Ben


More information about the Python-Devmailing list

[8]ページ先頭

©2009-2025 Movatter.jp