Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Reduce copies when reading files in pyio, match behavior of _io #129005

Closed
Labels
performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement
@cmaloney

Description

@cmaloney

Feature or enhancement

Proposal:

Currently_pyio uses ~2x as much memory to read all data from a file compared to _io. This is because it makes more than one copy of the data.

Details from test_fileio run

$ ./python -mtest -M8g -uall test_largefile -m test_large_read -vvv== CPython 3.14.0a4+ (heads/main-dirty:3829104ab41, Jan 17 2025, 21:40:47) [Clang 19.1.6 ]== Linux-6.12.9-arch1-1-x86_64-with-glibc2.40 little-endian== Python build: debug== cwd:<$HOME>/python/build/build/test_python_worker_32392æ== CPU count: 32== encodings: locale=UTF-8 FS=utf-8== resources: allUsing random seed: 17400566130:00:00 load avg: 0.53 Run 1test sequentiallyin a single process0:00:00 load avg: 0.53 [1/1] test_largefiletest_large_read (test.test_largefile.CLargeFileTest.test_large_read) ...  ... expected peak memory use: 4.7G ... process data size: 2.3Goktest_large_read (test.test_largefile.PyLargeFileTest.test_large_read) ...  ... expected peak memory use: 4.7G ... process data size: 2.3G ... process data size: 4.3G ... process data size: 4.7Gok----------------------------------------------------------------------Ran 2 testsin 3.711sOK== Tests result: SUCCESS ==1test OK.Total duration: 3.7 secTotal tests: run=2 (filtered)Totaltest files: run=1/1 (filtered)Result: SUCCESS

Plan:

  1. Switch toos.readv()os.readinto() to do readinto like C_Py_read used by_io does.os.read() can't take a buffer to use. This aligns behavior between_io.FileIO.readall and_pyio.FileIO.readall.os.readv works well today and takes a caller allocated buffer rather than needing to add a newos API.readv(2) mirrors the behavior and errors ofread(2), so this should keep the same end behavior.
  2. Update_pyio.BufferedIO to not force a copy of the buffer for readall when its internal buffer is empty. Currently italways slices its internal buffer then adds the result of_pyio.FileIO.readall to it.

For iterating, I'm using a small tracemalloc script to find where copies are:

from_pyioimportopenimporttracemallocwithopen("README.rst",'rb')asfile:tracemalloc.start()data=file.read()snap=tracemalloc.take_snapshot()stats=snap.statistics('lineno')forstatinstats:print(stat)

Loose Ends

  • os.readv seems to be well supported but is currently guarded by a configure check. I'd like to just make pyio requirereadv, but can do conditional code if needed. If makingreadv non-optional generally is feasible, happy to work on that.
    • os.readv is not supported on WASI, so need to add conditional code.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp