Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-81340: Usecopy_file_range inshutil.copyfile copy functions#93152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
zooba merged 9 commits intopython:mainfromillia-v:fix-issue-81340
Feb 3, 2025

Conversation

illia-v
Copy link
Contributor

@illia-villia-v commentedMay 23, 2022
edited by bedevere-appbot
Loading

This makesshutil.copyfile prefer thecopy_file_range system call on Linux.

copy_file_range gives filesystems an opportunity to implement the use of reflinks or server-side copy, but we cannot determine whether any of them are implemented. Therefore, I added anallow_reflink argument in case anyone wants to disable copy-on-write.

GNU Coreutilsenables copy-on-write by default, that is why I setallow_reflink to true by default (unlike@vstinner proposed in#81338).

Note, there isa knowncopy_file_range bug for copying from special filesystems like procfs and sysfs. It seems not to be fixed yet, so we have to check for a silentcopy_file_range fail asCoreutils andGo do.

danielzgtg, thesamesam, and nanonyme reacted with thumbs up emoji
@illia-v
Copy link
ContributorAuthor

@giampaolo I saw that you are marked as a code owner for shutil, but GitHub did not request your review for some reason.

**/*shutil@giampaolo

Would you like to take a look?

@giampaolo
Copy link
Contributor

@illia-v to my understandingcopy_file_range() on Linux enables the so called "server side copy", not "reflink" / "copy on write". I posted a patch some time ago, but I should update it, see:
#81340

As for reflink / CoW copy, I also submitted a patch some time ago, but I wanted to add Windows support before merging it.
#81338

To my understanding reflink / CoW copy on Linux is achieved by using FICLONE, not copy_file_range().

@illia-v
Copy link
ContributorAuthor

@vstinner
Copy link
Member

cc@pablogsal

@vstinner
Copy link
Member

@giampaolo copy_file_range does try cloning. Take a look athttps://stackoverflow.com/a/65518879 andhttps://github.com/torvalds/linux/blob/1e57930e9f4083ad5854ab6eadffe790a8167fb4/fs/read_write.c#L1491-L1507.

Would you mind to create a first PR to mention it in copy_file_range() documentation?https://docs.python.org/dev/library/os.html#os.copy_file_range

@vstinner
Copy link
Member

copyfile(src, dst, *, follow_symlinks=True, allow_reflink=True)

It terms of API, I would prefer to change the parameter name to justreflink=True. IMO it's clear enough thatreflink=True will not use CoW if it's not supported or cannot be used. Python has a long habit of trying to use the most efficient method, but falls back on the most compatible code if it doesn't work.

For example, socket.sendfile() tries to use sendfile() but falls back on send() if sendfile() doesn't work or is not available.

@vstinner
Copy link
Member

The Unix cp command has 3 modes for--reflink=WHEN:

  • always: "perform a lightweight copy (...) if this is not possible the copy fails"
  • never (False in your PR)
  • auto (True in your PR)

Here you only propose 2 modes. Do we need to implement the 3rd "always" mode?

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Your PR is quite big. Would it be possible to extract the tests refactoring and start with a first PR just for that?

@vstinner
Copy link
Member

I see a lot of confusion between which function/syscall is used, server-side copy, and copy of write. There are expectections, hopes, and ... disillusions :-) IMO the doc should be completed to better explain differences between server-side copy and copy-on-write, and attempt to document when and how one method is attempted, when it can fail, etc.

@illia-v
Copy link
ContributorAuthor

illia-v commentedMay 23, 2022
edited
Loading

I am glad the issue has received this attention.

@vstinner thanks for valid points, especially one about the three modes. I think it will be nice to support all of them in Python too.

Butcopy_file_range has no way to specify that reflink is required.

  • cp --reflink=always command seems to useioctl(dest_fd, FICLONE, src_fd) (1 &2) as the only copying method or fail if the call is not possible;
  • cp --reflink=auto triesioctl(dest_fd, FICLONE, src_fd) firstly, thencopy_file_range, then others;
  • cp --reflink=never skips bothioctl(dest_fd, FICLONE, src_fd) andcopy_file_range.

I will create a PR to extendos.copy_file_range documentation. Then I will need to experiment a bit with FICLONE.

Do you think it is possible to launch the three-mode reflink functionality for Linux firstly and add support for other OSes in later pull requests?
I am most interested in supporting this on Linux, but I may consider writing some code for other platforms later.

@vstinner
Copy link
Member

Do you think it is possible to launch the three-mode reflink functionality for Linux firstly and add support for other OSes in later pull requests?

If possible, the API should be the same and should be usable on all platform. For example, "as an user, I would like to call shutil.copyfile() with the same arguments on all platforms and get a similar behavior on all platforms". For example, the API should attempt to use server-side copy and/or use Copy-on-Write, but silently switch to regular read+write copy if no modern copy function is available on the OS or on the source/destination filesystems.

For the os module, it's fine to have different functions depending on the OS. But the shutil module is more a high-level module with a (mostly) portable API.

illia-v reacted with thumbs up emoji

@illia-v
Copy link
ContributorAuthor

@giampaolo copy_file_range does try cloning. Take a look athttps://stackoverflow.com/a/65518879 andhttps://github.com/torvalds/linux/blob/1e57930e9f4083ad5854ab6eadffe790a8167fb4/fs/read_write.c#L1491-L1507.

Would you mind to create a first PR to mention it in copy_file_range() documentation?https://docs.python.org/dev/library/os.html#os.copy_file_range

Done in#93182, please review it.

@illia-villia-v marked this pull request as draftMay 26, 2022 10:21
@illia-v
Copy link
ContributorAuthor

I am swaying away from the idea in favor of a separatereflink function#81338 (comment).

@thesamesam
Copy link
Contributor

thesamesam commentedJul 6, 2023
edited
Loading

I am swaying away from the idea in favor of a separatereflink function#81338 (comment).

The problem with that is you don't get the automatic free win, which is contrary to what others are doing. e.g. GNOME's glib (not glibc), KDE's kio, and even GCC's libstdc++ upgrade automatically. GCC 14 willautomatically usecopy_file_range if available forfilesystem::copy_file.

There doesn't appear to be much precedent for carving this out into its own opt-in functionality?

EDIT: It's been pointed out to me thatEmacs, of all things, unconditionally usescopy_file_range if the system supports it forcopy-file.

Jturnerusa reacted with thumbs up emojiJturnerusa reacted with eyes emoji

@barneygale
Copy link
Contributor

@illia-v are you still interested in working on this? I wonder if we even need theallow_reflinks argument - perhaps we could enable use ofcopy_file_range() (where available) without adding any new arguments?

thesamesam and skshetry reacted with thumbs up emoji

@illia-v
Copy link
ContributorAuthor

@barneygale yes, I am interested and can revise the changes this week

barneygale and thesamesam reacted with heart emoji

@illia-villia-v marked this pull request as ready for reviewJune 5, 2024 16:10
@illia-v
Copy link
ContributorAuthor

@barneygale I droppedallow_reflinks, please take a look

@barneygalebarneygale requested a review fromvstinnerJune 5, 2024 19:16
@illia-villia-v requested a review frombarneygaleJune 5, 2024 19:32
Copy link
Contributor

@barneygalebarneygale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@giampaolo
Copy link
Contributor

giampaolo commentedJun 7, 2024
edited
Loading

As I've previously stated in#81338 (comment), personally I consider copy and CoW / clone 2 distinct operations. E.g. one may want to occupy disk spaceright now instead of later. For this reason I thinkshutil.copyfile should give the possibility to disable this behavior.cp command defaults to--reflink=auto, but it allows to disable CoW with--reflink=never. The problem withcopy_file_range() is that it doesn't allow you to opt-out from CoW, which is why back in 2020 I was more keen on providing a separateshutil.reflink function using FICLONE. But then again,copy_file_range() does provide fast server-side copy in some circumstances, which is something youalways want. What a mess...

It appears to me that the tool which solved all these controversies iscp via--reflink=auto/always/never. I have the feeling thatshutil.copyfile (and possibly also pathlib.Path.copy /#73991) should try to follow its lead and replicate that. E.g. given the signature:

def copyfile(..., reflink=None)

We may do:

  • reflink=None ==cp --reflink=auto == usecopy_file_range() - on failure usesendfile() orread/write
  • reflink=True ==cp --reflink=always == useFICLONE and fail on error
  • reflink=False ==cp --reflink=never == usesendfile() orread/write

But if we do this, I think it would make sense to exposeshutil.reflink() first, which would also support macOS and Windows.

My 2 cents and sorry for not replying earlier on this topic.

barneygale reacted with thumbs up emoji

@barneygale
Copy link
Contributor

There are some compelling arguments and examples from other languages in#81338. Personally I'm persuaded that we should enable it (without an opt-out), rather than second-guessing what the OS provides and what other language runtimes are doing. We have plenty of time til 3.14 ships if we need to back it out. But this is by no means my area of expertise, so I'm happy to be overruled.

thesamesam, zooba, illia-v, and nanonyme reacted with thumbs up emoji

barneygale added a commit that referenced this pull requestJun 14, 2024
Add a `Path.copy()` method that copies the content of one file to another.This method is similar to `shutil.copyfile()` but differs in the following ways:- Uses `fcntl.FICLONE` where available (seeGH-81338)- Uses `os.copy_file_range` where available (seeGH-81340)- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.Incorporates code fromGH-81338 andGH-93152.Co-authored-by: Eryk Sun <eryksun@gmail.com>
@zooba
Copy link
Member

Personally I'm persuaded that we should enable it (without an opt-out), rather than second-guessing what the OS provides and what other language runtimes are doing.

This seems alright to me, too. We're expecting/hoping Windows will start doing transparent CoW when doing file copies, so that ought to give an idea pretty quickly if there's a genuineneed to avoid it, or if we'd be adding the opt-out without justification.

mrahtz pushed a commit to mrahtz/cpython that referenced this pull requestJun 30, 2024
Add a `Path.copy()` method that copies the content of one file to another.This method is similar to `shutil.copyfile()` but differs in the following ways:- Uses `fcntl.FICLONE` where available (seepythonGH-81338)- Uses `os.copy_file_range` where available (seepythonGH-81340)- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.Incorporates code frompythonGH-81338 andpythonGH-93152.Co-authored-by: Eryk Sun <eryksun@gmail.com>
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull requestJul 11, 2024
Add a `Path.copy()` method that copies the content of one file to another.This method is similar to `shutil.copyfile()` but differs in the following ways:- Uses `fcntl.FICLONE` where available (seepythonGH-81338)- Uses `os.copy_file_range` where available (seepythonGH-81340)- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.Incorporates code frompythonGH-81338 andpythonGH-93152.Co-authored-by: Eryk Sun <eryksun@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this pull requestJul 17, 2024
Add a `Path.copy()` method that copies the content of one file to another.This method is similar to `shutil.copyfile()` but differs in the following ways:- Uses `fcntl.FICLONE` where available (seepythonGH-81338)- Uses `os.copy_file_range` where available (seepythonGH-81340)- Uses `_winapi.CopyFile2` where available, even though this copies more metadata than the other implementations. This makes `WindowsPath.copy()` more similar to `shutil.copy2()`.The method is presently _less_ specified than the `shutil` functions to allow OS-specific optimizations that might copy more or less metadata.Incorporates code frompythonGH-81338 andpythonGH-93152.Co-authored-by: Eryk Sun <eryksun@gmail.com>
@illia-v
Copy link
ContributorAuthor

@barneygale based on your experience of having#119058 present in main for more than a month, can we merge this too?

nanonyme reacted with rocket emoji

@barneygale
Copy link
Contributor

There have been no releases frommain in that time, and the first alpha of 3.14 isn't due until October, so I wouldn't read too much into the lack ofpathlib.Path.copy() bug reports.

@illia-v
Copy link
ContributorAuthor

@barneygale thank you for the response. Can we risk merging this PR before the alpha to be able to discover issues (if any) in early stages of 3.14?

@cburroughs
Copy link
Contributor

Sorry I'm not familiar with the details of CPython development. I see this PR has theawaiting merge, is it waiting on anything else before said merge?

@zooba
Copy link
Member

The merge conflicts need addressing, and it sounds like we want someone knowledgeable about the Linux API to chime in on whether this is using it properly.

But I agree, we ought to get it in asap, now that 3.14 is shipping.

@illia-v
Copy link
ContributorAuthor

The merge conflicts need addressing, and it sounds like we want someone knowledgeable about the Linux API to chime in on whether this is using it properly.

I resolved the conflicts, thanks
The PR has an approval from@barneygale who integratedcopy_file_range inpathlib#119058

@zoobazooba merged commita33dcb9 intopython:mainFeb 3, 2025
39 checks passed
@zooba
Copy link
Member

Apologies for being slow. Despite the lack of an explicit endorsement from someone (else) claiming to understand the API well, I think the best option here is to have it in the 3.14 alphas.

I likely won't get tagged on any issues that are reported, but anyone who is should feel free to modify or revert the code if needed to fix users. I'm merging as a point of process, not because this is "my" code - I don't need to be consulted on changes.

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull requestFeb 7, 2025
…pythonGH-93152)This allows the underlying file system an opportunity to optimise or avoid the actual copy.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@barneygalebarneygalebarneygale approved these changes

@vstinnervstinnerAwaiting requested review from vstinner

@zoobazoobaAwaiting requested review from zooba

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@illia-v@giampaolo@vstinner@thesamesam@barneygale@zooba@cburroughs@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp