Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue33695

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Have shutil.copytree(), copy() and copystat() use cached scandir() stat()s
Type:performanceStage:resolved
Components:Library (Lib)Versions:Python 3.8
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: giampaolo.rodolaNosy List: benhoyt, benjamin.peterson, brett.cannon, giampaolo.rodola, ncoghlan, serhiy.storchaka, stutzbach, tarek, vstinner, yselivanov
Priority:normalKeywords:patch

Created on2018-05-30 12:22 bygiampaolo.rodola, last changed2022-04-11 14:59 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
bench.pygiampaolo.rodola,2018-05-30 12:22
bpo-33695.patchgiampaolo.rodola,2018-05-30 12:23review
Pull Requests
URLStatusLinkedEdit
PR 7874mergedgiampaolo.rodola,2018-06-23 10:34
PR 11425closedxxxxxxx,2019-02-23 15:13
PR 11997mergedgiampaolo.rodola,2019-02-23 17:14
PR 17098mergedkinow,2019-11-09 13:25
Messages (13)
msg318175 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-05-30 12:22
Patch in attachment makes shutil.copytree() use os.scandir() and (differently from#33414) DirEntry instances are passed around so that cached stat()s are used also from within copy2() and copystat() functions. The number of times the filesystem gets accessed via os.stat() is therefore reduced quite consistently. A similar improvement can be done for rmtree() (but that's for another ticket). Patch and benchmark script are in attachment.Linux (+13.5% speedup)======================--- without patch:    ./python  bench.py     Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 0.551s    7956 files and dirs, repeat 2/3... min = 0.548s    7956 files and dirs, repeat 3/3... min = 0.548s    best result = 0.548s--- with patch:    $ ./python  bench.py     Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 0.481s    7956 files and dirs, repeat 2/3... min = 0.479s    7956 files and dirs, repeat 3/3... min = 0.474s    best result = 0.474sWindows (+17% speedup)======================--- without patch:    ./python  bench.py     Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 9.015s    7956 files and dirs, repeat 2/3... min = 8.747s    7956 files and dirs, repeat 3/3... min = 8.614s    best result = 8.614s--- with patch:    $ ./python  bench.py     Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 7.827s    7956 files and dirs, repeat 2/3... min = 7.369s    7956 files and dirs, repeat 3/3... min = 7.153s    best result = 7.153sWindows SMB share (+30%)========================--- without patch:    C:\Users\user\Desktop\cpython>PCbuild\win32\python.exe bench.py    Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 46.853s    7956 files and dirs, repeat 2/3... min = 46.330s    7956 files and dirs, repeat 3/3... min = 44.720s    best result = 44.720s--- with patch:    C:\Users\user\Desktop\cpython>PCbuild\win32\python.exe bench.py    Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 31.729s    7956 files and dirs, repeat 2/3... min = 30.936s    7956 files and dirs, repeat 3/3... min = 30.936s    best result = 30.936sNumber of stat() syscalls (-38%)================================--- without patch:    $ strace ./python bench.py  2>&1 | grep "stat(" | wc -l    324808    --- with patch:    $ strace ./python bench.py  2>&1 | grep "stat(" | wc -l    198768
msg320303 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-06-23 10:53
PR at:https://github.com/python/cpython/pull/7874.I re-ran benchmarks since shutil code changed after#33695. Linux went from +13.5% to 8.8% and Windows went from +17% to 20.7%.In the PR I explicitly avoided using a context manager around os.scandir() for now so that patch it's easier to review (will add it before pushing).Linux (+8.8%)=============without patch:    $ ./python  bench-copytree.py     Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 0.604s    7956 files and dirs, repeat 2/3... min = 0.603s    7956 files and dirs, repeat 3/3... min = 0.601swith patch:    $ ./python  bench-copytree.py     Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 0.557s    7956 files and dirs, repeat 2/3... min = 0.548s    7956 files and dirs, repeat 3/3... min = 0.548s    best result = 0.548sWindows (+20.7%)================without patch:    C:\Users\user\Desktop>cpython\PCbuild\win32\python.exe cpython\bench-copytree.py    Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 8.275s    7956 files and dirs, repeat 2/3... min = 8.018s    7956 files and dirs, repeat 3/3... min = 7.978s    best result = 7.978sWith patch:     C:\Users\user\Desktop>cpython\PCbuild\win32\python.exe cpython\bench-copytree.py    Priming the system's cache...    7956 files and dirs, repeat 1/3... min = 6.609s    7956 files and dirs, repeat 2/3... min = 6.609s    7956 files and dirs, repeat 3/3... min = 6.609s    best result = 6.609s
msg320304 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-06-23 11:08
> I re-ran benchmarks since shutil code changed after#33695.Sorry, I meant#33671.
msg321852 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-07-17 21:45
Unless somebody has complaints I think I'm gonna merge this soon.
msg322872 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2018-08-01 16:20
I'm not convinced that this change should be merged. The benefit is small, and 1) it is only for an artificial set of tiny files, 2) the benchmarking ignores the real IO, it measures the work with a cache. When copy real files (/usr/include or Lib/) with dropped caches the difference is insignificant. On other hand, this optimization makes the code more complex. It can make the case with specifying the ignore argument slower.
msg322873 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2018-08-01 16:24
For dropping disc caches on Linux run    with open('/proc/sys/vm/drop_caches', 'ab') as f: f.write(b'3\n')before every test.
msg322901 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-08-02 00:16
I agree the provided benchmark on Linux should be more refined. And I'm not sure if "echo 3 | sudo tee /proc/sys/vm/drop_caches" before running it is enough honestly.The main point here is the reduction of stat() syscalls (-38%) and that can make a considerable  difference, especially with network filesystems. That's basically the reason why scandir() was introduced in the first place and used in os.walk() glob.glob() and shutil.rmtree(), so I'm not sure why we should use a different rationale for shutil.copytree().
msg322912 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2018-08-02 03:52
os.walk() and glob.glob() used *only* stat(), opendir() and readdir() syscalls (and stat() syscalls dominated). The effect of reducing the number of the stat() syscalls is significant. shutil.rmtree() uses also the unlink() syscall. Since it is usually cheap (but seeissue32453), the benefit still is good, but not such large. Actually I had concerns about using scandir() in shutil.rmtree().shutil.copytree() needs to open, read, and write files. This is not so cheap, and the benefit of reducing the number of the stat() syscalls is hardly noticed in real cases. shutil.copytree() was not converted to using scandir() intentionally.
msg322933 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2018-08-02 09:24
When I worked on the os.scandir() implementation, I recall that an interesting test was NFS. Depending on the configuration, stat() in a network filesystem can be between very slow and slow.
msg322975 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-08-02 16:10
Yes, file copy (open() + read() + write()) is of course more expensive than just "reading" a tree (os.walk(), glob()) or deleting it (rmtree()) and the "pure file copy" time adds up to the benchmark. And indeed it's not an coincidence that#33671 (which replaced read() + write() with sendfile()) shaved off a 5% gain from the benchmark I posted initially for Linux.Still, in a 8k small-files-tree scenario we're seeing ~9% gain on Linux, 20% on Windows and 30% on a SMB share on localhost vs. VirtualBox. I do not consider this a "hardly noticeable gain" as you imply: it is noticeable, exponential and measurable, even with cache being involved (as it is). Note that the number of stat() syscalls per file is being reduced from 6 to 1 (or more if follow_symlinks=False), and that is the real gist here. That *does* make a difference on a regular Windows fs and makes a huge difference with network filesystems in general, as a simple stat() call implies access to the network, not the disk.
msg322984 -(view)Author: Yury Selivanov (yselivanov)*(Python committer)Date: 2018-08-02 18:02
> Depending on the configuration, stat() in a network filesystem can be between very slow and slow.+1.  I also quickly glanced over the patch and I think it looks like a clear win.
msg328267 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-10-22 18:09
@Serhiy: I would like to proceed with this. Do you have further comments? Do you prefer to bring this up on python-dev for further discussion?
msg329732 -(view)Author: Giampaolo Rodola' (giampaolo.rodola)*(Python committer)Date: 2018-11-12 14:18
New changeset19c46a4c96553b2a8390bf8a0e138f2b23e28ed6 by Giampaolo Rodola in branch 'master':bpo-33695 shutil.copytree() + os.scandir() cache (#7874)https://github.com/python/cpython/commit/19c46a4c96553b2a8390bf8a0e138f2b23e28ed6
History
DateUserActionArgs
2022-04-11 14:59:01adminsetgithub: 77876
2019-11-09 13:25:26kinowsetpull_requests: +pull_request16605
2019-02-23 17:14:29giampaolo.rodolasetpull_requests: +pull_request12025
2019-02-23 15:13:39xxxxxxxsetpull_requests: +pull_request12022
2018-11-12 14:19:18giampaolo.rodolasetstatus: open -> closed
assignee:giampaolo.rodola
resolution: fixed
stage: patch review -> resolved
2018-11-12 14:18:24giampaolo.rodolasetmessages: +msg329732
2018-10-22 18:09:26giampaolo.rodolasetmessages: +msg328267
2018-08-02 18:02:38yselivanovsetmessages: +msg322984
2018-08-02 16:10:19giampaolo.rodolasetmessages: +msg322975
2018-08-02 09:24:17vstinnersetmessages: +msg322933
2018-08-02 03:52:46serhiy.storchakasetmessages: +msg322912
2018-08-02 00:16:31giampaolo.rodolasetmessages: +msg322901
2018-08-01 16:24:07serhiy.storchakasetmessages: +msg322873
2018-08-01 16:20:44serhiy.storchakasetmessages: +msg322872
2018-07-17 21:45:24giampaolo.rodolasetmessages: +msg321852
2018-06-23 11:08:10giampaolo.rodolasetmessages: +msg320304
2018-06-23 10:53:20giampaolo.rodolasetmessages: +msg320303
2018-06-23 10:34:07giampaolo.rodolasetpull_requests: +pull_request7481
2018-05-30 12:41:32giampaolo.rodolasetnosy: +brett.cannon,ncoghlan,vstinner,benjamin.peterson,tarek,stutzbach,benhoyt,serhiy.storchaka,yselivanov
2018-05-30 12:23:33giampaolo.rodolasetfiles: +bpo-33695.patch
keywords: +patch
2018-05-30 12:22:35giampaolo.rodolacreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp