
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2016-01-02 17:57 byserhiy.storchaka, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| walk_consume_fds_opt1.patch | serhiy.storchaka,2016-01-03 08:48 | review | ||
| walk_consume_fds_opt2.patch | serhiy.storchaka,2016-01-03 08:49 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg257352 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-01-02 17:57 | |
Since 3.5 os.walk() consumes a lot of file descriptors. Its number equals to the deep of directories tree. Since the number of file descriptors is limited, this can cause problems.This was the main reason for rejecting fwalk-based implementation of os.walk() (issue15200). | |||
| msg257404 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-01-03 08:48 | |
Here are two alternative solutions for this issue. | |||
| msg258018 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2016-01-11 21:10 | |
I don't like the first solution (it just collects all scandir() results in a list, which defeats one of the purposes of using scandir()), but I don't understand the second solution.Ideally the number of FDs used should be limited by the depth of the tree being walked, right? | |||
| msg258022 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-01-11 21:35 | |
Both patches are basically equivalent. The first one collects all scandir() results in a list, the second one collects only directory names in a list. The purpose of using os.scandir() in os.walk() was a speed up (issue23605), and both patches preserve it.Yes, the number of FDs used is equivalent to the depth of the tree which can be very deep (I just created a tree depth of 1000 levels). And what is worse, all these FDs can be effectively leaked if the walking was not finished. This is unwanted behavior change. | |||
| msg258025 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2016-01-11 21:42 | |
I am all for preventing the leaks. But using FDs proportional to the treedepth seems reasonable to me. (If you are worried about some kind of DoSattack on the algorithm by someone who can build a tree with depth 1000,well, if they can do that they can also create a flat folder with a millionfiles in it.)Is there a potential hybrid strategy, where for small directories we closethe FD but for large directories we keep it open? | |||
| msg258096 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-01-12 10:06 | |
> But using FDs proportional to the tree depth seems reasonable to me.This was the main reason for rejectingissue15200. May be we can allow this for globbing, but only explicitly enabled with special parameter or separate functions. The benefit for globbing is starting to yield results without internal caching. The benefit for os.walk() is smaller, because in any case it yields files and directories not one-by-one, but collected into lists.> Is there a potential hybrid strategy, where for small directories we close> the FD but for large directories we keep it open?May be with more complicated code. I think this needs explicit close() method, so it can go only in 3.6. For now I think we have to commit one of my patches (the difference only stylistic). | |||
| msg258116 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2016-01-12 18:36 | |
I like them both, if I had to choose I'd pick patch 2.But yes, we need to add a close() method to the scandir iterator object.In the meantime, I am still worried about what would happen if somehow the loop got interrupted and the frame got kept alive and the iterator wasn't closed by its dealloc until much later. This kind of thing was common in asyncio and we had to resort to similar tricks to break some cycles. Maybe you can add a try/finally that *deletes* scandir_it to force it to close itself (at least in CPython)? That can go into 3.5. | |||
| msg258238 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-01-14 22:26 | |
Yes, OSError, MemoryError and KeyboardInterrupt can be raised during iterating scandir object or calling is_dir() or is_symlink() methods of the result. They stop iterating and left file descriptor open. If close file descriptor on error during iteration (issue26117), patch 1 becomes more safe, because it minimizes the lifetime of opened descriptor. | |||
| msg260089 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-02-11 11:32 | |
New changesete951d76f1945 by Serhiy Storchaka in branch '3.5':Issue#25995: os.walk() no longer uses FDs proportional to the tree depth.https://hg.python.org/cpython/rev/e951d76f1945New changeset6197a09a56b1 by Serhiy Storchaka in branch 'default':Issue#25995: os.walk() no longer uses FDs proportional to the tree depth.https://hg.python.org/cpython/rev/6197a09a56b1 | |||
| msg260095 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-02-11 11:43 | |
walk_consume_fds_opt1.patch is applied in 3.5 (it is more reliable with the absence of close()) and walk_consume_fds_opt2.patch is applied in 3.6. | |||
| msg260784 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2016-02-24 10:40 | |
The change seems to be causing some Windows buildbot failures <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.5/builds/666/steps/test/logs/stdio>:======================================================================ERROR: test_walk_bad_dir (test.test_os.BytesWalkTests)----------------------------------------------------------------------Traceback (most recent call last): File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\test\test_os.py", line 931, in test_walk_bad_dir root, dirs, files = next(walk_it) File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\test\test_os.py", line 1027, in walk for broot, bdirs, bfiles in os.walk(os.fsencode(top), **kwargs): File "C:\buildbot.python.org\3.5.kloth-win64\build\lib\os.py", line 372, in walk entries = list(scandir(top))TypeError: os.scandir() doesn't support bytes path on Windows, use Unicode instead======================================================================ERROR: test_walk_bottom_up (test.test_os.BytesWalkTests)ERROR: test_walk_prune (test.test_os.BytesWalkTests)Was this lineentries = list(scandir(top))meant to beentries = list(scandir_it) | |||
| msg260788 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-02-24 11:05 | |
New changeset9bffe39e8273 by Serhiy Storchaka in branch '3.5':Fixed a bug in os.walk() with bytes path on Windows caused by merging fixeshttps://hg.python.org/cpython/rev/9bffe39e8273 | |||
| msg260789 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-02-24 11:05 | |
Thanks Martin! | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:25 | admin | set | github: 70183 |
| 2016-02-24 11:05:58 | serhiy.storchaka | set | status: open -> closed messages: +msg260789 |
| 2016-02-24 11:05:05 | python-dev | set | messages: +msg260788 |
| 2016-02-24 10:40:27 | martin.panter | set | status: closed -> open nosy: +martin.panter messages: +msg260784 |
| 2016-02-11 11:43:04 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: +msg260095 stage: patch review -> resolved |
| 2016-02-11 11:32:19 | python-dev | set | nosy: +python-dev messages: +msg260089 |
| 2016-02-08 15:59:59 | serhiy.storchaka | set | assignee:serhiy.storchaka |
| 2016-01-14 22:26:51 | serhiy.storchaka | set | messages: +msg258238 |
| 2016-01-12 18:36:17 | gvanrossum | set | messages: +msg258116 |
| 2016-01-12 10:06:48 | serhiy.storchaka | set | messages: +msg258096 |
| 2016-01-11 21:42:12 | gvanrossum | set | messages: +msg258025 |
| 2016-01-11 21:35:22 | serhiy.storchaka | set | messages: +msg258022 |
| 2016-01-11 21:10:38 | gvanrossum | set | nosy: +gvanrossum messages: +msg258018 |
| 2016-01-03 08:49:03 | serhiy.storchaka | set | files: +walk_consume_fds_opt2.patch |
| 2016-01-03 08:48:40 | serhiy.storchaka | set | files: +walk_consume_fds_opt1.patch keywords: +patch messages: +msg257404 stage: patch review |
| 2016-01-02 17:57:05 | serhiy.storchaka | create | |