Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win)#7160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
giampaolo merged 114 commits intopython:masterfromgiampaolo:shutil-zero-copy
Jun 12, 2018
Merged

bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win)#7160

giampaolo merged 114 commits intopython:masterfromgiampaolo:shutil-zero-copy
Jun 12, 2018

Conversation

@giampaolo
Copy link
Contributor

@giampaologiampaolo commentedMay 28, 2018
edited
Loading

@giampaolo
Copy link
ContributorAuthor

Code updated. Hopefully this is now ready for final review.

...as an extra safety measure: in case src/dst are "exotic" files (nonregular or living on a network fs etc.) we better fail on open() insteadof copyfile(3) as we're not quite sure what's gonna happen in thatcase.
@giampaolo
Copy link
ContributorAuthor

I'm not sure how to interpret these CI failure but they appear unrelated with the changes at hand. If nobody has complaints I'm gonna merge this PR soon.

@ned-deily
Copy link
Member

I noticed a lot of VSTS CI failures over the past 24 hours or so, unrelated to our tests. So I would recommend not giving those failures much weight and, as you can see, those failures do not prevent doing a merge.

@giampaologiampaolo merged commit4a172cc intopython:masterJun 12, 2018
@bedevere-bot
Copy link

@giampaolo: Please replace# withGH- in the commit message next time. Thanks!

@giampaologiampaolo deleted the shutil-zero-copy branchJune 12, 2018 21:04
@ned-deily
Copy link
Member

Hi,@giampaolo. for future reference, please also manually "squash" edit the commit message in the web interface down to its essence. It's a bit jarring to have all of the details of the now squashed intermediate commits in the final cpython commit. Thanks!

serhiy-storchaka reacted with thumbs up emoji

@giampaolo
Copy link
ContributorAuthor

Will do. Sorry about that.

ned-deily reacted with thumbs up emoji

@giampaolo
Copy link
ContributorAuthor

@eryksun thanks for all your support. I really appreciate it.

@ned-deily
Copy link
Member

This looks like a great feature. Thanks for doing it!

if not n:
break
elif n < length:
fdst_write(mv[:n])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In my test case I used anotherwith block to release thismv[:n] view, rather than depend on implicit deallocation to release it. For example:

>>> b = bytearray(100)>>> mv1 = memoryview(b)>>> mv2 = mv1[:10]>>> mv1.release()>>> b.append(0)Traceback (most recent call last):  File "<stdin>", line 1, in <module>BufferError: Existing exports of data: object cannot be re-sized>>> mv2.release()>>> b.append(0)>>> len(b)101

Thisbytearray is internal, but is there any issue with memory usage in garbage-collected versions of Python (e.g. Jython, IronPython) if the views on the buffer (1 MiB in Windows) aren't released explicitly? If not you can remove the firstwith block as well.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Uhm... yes, given the big bufsize I think it makes sense to also immediately release the sliced memoryview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@eryksun If I recall correctly, memory views inadvertently keeping large memory buffers alive on GC based implementations was a key driver in adding context management support tomemoryview in the first place, so that's definitely a concern worth keeping in mind for this kind of code.

DK96-OS reacted with thumbs up emoji

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.


def _is_binary_files_pair(fsrc, fdst):
return hasattr(fsrc, 'readinto') and \
isinstance(fsrc, io.BytesIO) or 'b' in getattr(fsrc, 'mode', '') and \
Copy link
Contributor

@eryksuneryksunJun 12, 2018
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Which objects providereadinto in text mode? Is it worth the function call and extra tests rather than handlingAttributeError (noreadinto) andTypeError (can't write bytes) with an inline try-except incopyfileobj?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think catchingTypeError onfdst.write() is too risky as we can deal with any kind of custom file-like object being passed here. It must be noted that the extra cost of this function is payed by users ofcopyfileobj() only (e.g.tarfile andzipfile modules).copyfile() (and others) will skip this check and call_copybinfileobj() directly.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I benchmarked_copybinfileobj() (I hadn't yet) and it turns out it's only slightly faster for 512MB files but considerably slower for 8MB and 128KB files so I am gonna remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't know how you're testing, but the performance difference withreadinto depends on whether the source file is already in the system cache. Otherwise, of course memory operations will be dwarfed by considerably slower disk I/O.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is how I'm testing it:

$ python -c"import os; f = open('f1', 'wb'); f.write(os.urandom(8 * 1024 * 1024))"$time ./python -m timeit -s'import shutil; p1 = "f1"; p2 = "f2"''shutil.copyfile(p1, p2)'

Copy link
ContributorAuthor

@giampaologiampaoloJun 13, 2018
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wrote a batch script to figure out timings on Windows more easily and this is the result (first value is originalcopyfileobj() implementation, second value is thememoryview() variant):

8MB file1000 loops, best of 5: 343 usec per loop500 loops, best of 5: 478 usec per loop64MB file500 loops, best of 5: 474 usec per loop500 loops, best of 5: 554 usec per loop128MB file200 loops, best of 5: 1.06 msec per loop500 loops, best of 5: 640 usec per loop256MB file1 loop, best of 5: 286 msec per loop5 loops, best of 5: 36.1 msec per loop512MB file1 loop, best of 5: 293 msec per loop5 loops, best of 5: 36.7 msec per loop

I think thememoryview() variant after 128MB is so much faster that it is worth to have the dual implementation and use itfromcopyfile() function only if on Windows and file size > 128MB. I will do it in my other PR/branch.

On the other hand, the same test on Linux shows there is no relevant difference for 512 MB files and a performance degradation for smaller ones.

DK96-OS reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This command is wrong:

./python  -m timeit -s "import shutil; f1 = open('f1', 'rb'); f2 = open('f2', 'wb')" "shutil.copyfileobj(f1, f2)"

The files need to be opened for each pass of the loop, not the setup. That explains the unexpected results. I corrected it to open the files in the loop statement instead of the setup and tested a broad range of file sizes. In the table below all times are best of five for the give number of loops, and normalized overall to make the 64KiB result in the RI_S column equal to 100 time units. I discuss the RI_S case in more detail below.

   SIZE | LOOPS |   R_16K ||    R_1M   %CHNG |  RI_1M   %CHNG |  RI_S   %CHNG--------+-------+---------||-----------------+----------------+--------------  1 GiB |     5 | 1060870 || 1003478    -5.4 | 977391    -7.9 | 963478   -9.2512 MiB |    10 |  323478 ||  283478   -12.4 | 217391   -32.8 | 213913  -33.9256 MiB |    20 |  163304 ||  146435   -10.3 | 112870   -30.9 | 110957  -32.1128 MiB |    40 |   80174 ||   74609    -6.9 |  55478   -30.8 |  55478  -30.8 64 MiB |    80 |   39652 ||   34783   -12.3 |  27304   -31.1 |  27130  -31.6 32 MiB |   160 |   19478 ||   17913    -8.0 |  13809   -29.1 |  13478  -30.8 16 MiB |   320 |    9739 ||    8887    -8.7 |   6887   -29.3 |   6835  -29.8  8 MiB |   640 |    4904 ||    4713    -3.9 |   3652   -25.5 |   3548  -27.7  4 MiB |  1280 |    2504 ||    2348    -6.2 |   1948   -22.2 |   1913  -23.6  2 MiB |  2560 |    1150 ||    1193     3.7 |   1002   -12.9 |    991  -13.8  1 MiB |  5120 |     649 ||     697     7.4 |    659     1.5 |    663    2.2512 KiB | 10240 |     388 ||     553    42.5 |    499    28.6 |    322  -17.0256 KiB | 20480 |     245 ||     459    87.3 |    410    67.3 |    205  -16.3128 KiB | 30720 |     170 ||     388   128.2 |    362   112.9 |    139  -18.2 64 KiB | 40960 |     123 ||     353   187.0 |    343   178.9 |    100  -18.7    R_16K -- read 16 KiB    R_1M  -- read 1 MiB    RI_1M -- readinto 1 MiB    RI_S  -- readinto source size up to 1 MiB

Originally I had tested at 128 MiB with a custom test script to focus on the effects of cached vs non-cached I/O. I assumed the results would be similar for other cases. As shown in the RI_1M column, that's basically true for files larger than 1 megabyte. But there's a significant performance degradation for smaller files. In the RI_S case, I address this by callingos.fstat on the source file to cap thelength of thebytearray at its size. This avoids wastefully over-allocating a zeroedbyterray.

RI_S also experiments with callingSetInformationByHandle :FileEndOfFileInfo (prototyped using ctypes) to avoid having to repeatedly extend the file whenlength is less than the size of the source file. (Note that Python'struncate method is of no use here since it zeros the file.)CopyFileEx does this, so I figured it was worth a try. This does provide a modest performance increase. (Note that I mistakenly included thelength == size boundary, which is apparent in the 1 MiB trial.) I don't know if it's significant enough to justify implementing_winapi.SetFileInformationByHandle.

If I have time, I may run another experiment usingmmap to read into a sliding window of the destination file. It already implements setting the end of the file, albeit with the less efficient combination ofSetFilePointer andSetEndOfFile. (This way requires 4 system calls instead of 1 to set the end of the file.)

Below is the code for RI_S:

 import os_WINDOWS = (os.name == 'nt')if _WINDOWS:    import msvcrt    import ctypes    from ctypes import wintypes        COPY_BUFSIZE = 1024 *1024        kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)    FileEndOfFileInfo = 6    FILE_INFO_BY_HANDLE_CLASS = wintypes.DWORD    kernel32.SetFileInformationByHandle.argtypes = (        wintypes.HANDLE,           # _In_ hFile        FILE_INFO_BY_HANDLE_CLASS, # _In_ FileInformationClass        wintypes.LPVOID,           # _In_ lpFileInformation        wintypes.DWORD)            # _In_ dwBufferSizedef copyfileobj(fsrc, fdst, length=COPY_BUFSIZE):    size = os.fstat(fsrc.fileno()).st_size    if length > size:        length = size    elif _WINDOWS:        info = wintypes.LARGE_INTEGER(size)        if not kernel32.SetFileInformationByHandle(                    msvcrt.get_osfhandle(fdst.fileno()), FileEndOfFileInfo,                     ctypes.byref(info), ctypes.sizeof(info)):            raise ctypes.WinError(ctypes.get_last_error())    fsrc_readinto = fsrc.readinto    fdst_write = fdst.write    with memoryview(bytearray(length)) as mv:        while True:            n = fsrc_readinto(mv)            if not n:                break            elif n < length:                with mv[:n] as smv:                    fdst_write(smv)            else:                fdst_write(mv)

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Interesting. Thanks for the very detailed benchmark. I updated the other branch which now dynamically setsmemoryview() size based on file size (93ebc1f) and I confirm using thereadinto() variant is faster also for smaller files.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

#7681 was merged

@vstinner
Copy link
Member

@ned-deily: " +On OSXfcopyfile_ is used to copy the file content (not metadata)."

Apple changed the OS name from OSX to macOS, no?

@giampaolo
Copy link
ContributorAuthor

True. Will make another PR soon with this and another couple of small changes.

@vstinner
Copy link
Member

True. Will make another PR soon with this and another couple of small changes.

If you change it, you might also rename _fastcopy_osx() to _fastcopy_fcopyfile().

@vstinner
Copy link
Member

"default buffer size on Windows was increased from 16KB to 1MB"

Nitpicking: 16 KiB to 1 MiB. I now prefer "iB" units avoid confusion between base 2 and base 10.

@giampaolo
Copy link
ContributorAuthor

If nobody has complaints I'm gonna merge#7681 soon .

giampaolo added a commit that referenced this pull requestJun 19, 2018
…ndows (#7681)bpo-33671* use memoryview() with size == file size on Windows, see#7160 (comment)* release intermediate (sliced) memoryview immediately* replace "OSX" occurrences with "macOS"* add some unittests for copyfileobj()
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@scoderscoderscoder left review comments

@eryksuneryksuneryksun left review comments

@ncoghlanncoghlanncoghlan approved these changes

@zoobazoobazooba approved these changes

+2 more reviewers

@DK96-OSDK96-OSDK96-OS left review comments

@Ryanb58Ryanb58Ryanb58 approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

11 participants

@giampaolo@ncoghlan@eryksun@ned-deily@bedevere-bot@vstinner@scoder@zooba@Ryanb58@DK96-OS@the-knights-who-say-ni

[8]ページ先頭

©2009-2026 Movatter.jp