Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-132551: make BytesIO object free-thread safe#132616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kumaraditya303 merged 13 commits intopython:mainfromtom-pytel:fix-issue-132551
May 8, 2025

Conversation

@tom-pytel
Copy link
Contributor

@tom-pyteltom-pytel commentedApr 16, 2025
edited by bedevere-appbot
Loading

Fullpyperformance suite times (can anyone can suggest better benchmark forBytesIO?):

             time diff         mean    stdevGIL    -0.01% +- 2.31%noGIL  +0.22% +- 2.93%

Changes,* functions had critical section added,- not. Atomics were added forexports field:

static PyGetSetDef bytesio_getsetlist[] = {*   {"closed",  bytesio_get_closed, NULL,    // @critical_section to avoid make `buf` atomic many placesstatic struct PyMethodDef bytesio_methods[] = {-   _IO_BYTESIO_READABLE_METHODDEF-   _IO_BYTESIO_SEEKABLE_METHODDEF-   _IO_BYTESIO_WRITABLE_METHODDEF*   _IO_BYTESIO_CLOSE_METHODDEF-   _IO_BYTESIO_FLUSH_METHODDEF-   _IO_BYTESIO_ISATTY_METHODDEF*   _IO_BYTESIO_TELL_METHODDEF      // @critical_section to avoid make `pos` atomic many places*   _IO_BYTESIO_WRITE_METHODDEF*   _IO_BYTESIO_WRITELINES_METHODDEF*   _IO_BYTESIO_READ1_METHODDEF*   _IO_BYTESIO_READINTO_METHODDEF*   _IO_BYTESIO_READLINE_METHODDEF*   _IO_BYTESIO_READLINES_METHODDEF*   _IO_BYTESIO_READ_METHODDEF*   _IO_BYTESIO_GETBUFFER_METHODDEF*   _IO_BYTESIO_GETVALUE_METHODDEF*   _IO_BYTESIO_SEEK_METHODDEF      // @critical_section to avoid make `pos` atomic many places*   _IO_BYTESIO_TRUNCATE_METHODDEF*   {"__getstate__", bytesio_getstate,  METH_NOARGS, NULL},*   {"__setstate__", bytesio_setstate,  METH_O, NULL},*   {"__sizeof__",   bytesio_sizeof,     METH_NOARGS, NULL},static PyType_Slot bytesio_slots[] = {-   {Py_tp_dealloc,   bytesio_dealloc},-   {Py_tp_traverse,  bytesio_traverse},-   {Py_tp_clear,     bytesio_clear},-   {Py_tp_iter,      PyObject_SelfIter},*   {Py_tp_iternext,  bytesio_iternext},*   {Py_tp_init,      _io_BytesIO___init__},-   {Py_tp_new,       bytesio_new},static PyType_Slot bytesiobuf_slots[] = {-   {Py_tp_dealloc, bytesiobuf_dealloc},-   {Py_tp_traverse, bytesiobuf_traverse},-   {Py_bf_getbuffer, bytesiobuf_getbuffer},-   {Py_bf_releasebuffer, bytesiobuf_releasebuffer},

Notes:

  • Opted for critical sections in three functions instead of makingbuf and orpos atomic everywhere.
  • New testtest_free_threading.test_io.TestBytesIO clean under--parallel-threads TSAN.
  • bytesiobuf_getbuffer doesn't need critical sections because as far as I see its only used forBytesIO inbytesio.c where its use is protected by a critical section in_io_BytesIO_getbuffer_impl (its also used for a random non-BytesIO-related test in_testcapimodule.c).

@tom-pytel
Copy link
ContributorAuthor

Copy link
Member

@ZeroIntensityZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The implementation looks fine, but I'm worried that concurrent use ofBytesIO is an actual use-case that we need to consider.

For example, I could see someone using a logger that writes to an IO-like object from multiple threads. If they wanted to log to aBytesIO buffer (e.g., for temporarily suppressing the output), then that would scale pretty terribly with critical sections. I'm speculating, though.

@tom-pytel
Copy link
ContributorAuthor

For example, I could see someone using a logger that writes to an IO-like object from multiple threads. If they wanted to log to aBytesIO buffer (e.g., for temporarily suppressing the output), then that would scale pretty terribly with critical sections. I'm speculating, though.

Well, that's why I asked about a specific better benchmark to test with becausepyperformance is not specific enough to this to show a perf hit. But as for the case you mention, if someone is writing theBytesIO object from multiple threads then performance is not really an issue as they will eventually segfault.

$ ./python ../t.py 12Segmentation fault (core dumped)

t.py:

fromioimportBytesIOfromrandomimportrandintimportthreadingdefwrite(barrier,b):barrier.wait()b.write(b'0'*randint(100,1000))defcheck(funcs,*args):barrier=threading.Barrier(len(funcs))thrds= []forfuncinfuncs:thrd=threading.Thread(target=func,args=(barrier,*args))thrds.append(thrd)thrd.start()forthrdinthrds:thrd.join()if__name__=="__main__":count=0whileTrue:print(count:=count+1)check([write]*10,BytesIO())

@ZeroIntensity
Copy link
Member

ZeroIntensity commentedApr 16, 2025
edited
Loading

Well, yeah, we need to fix the races. I'm just wonderinghow we should fix it. Would you mind benchmarking (with this PR) the case I brought up against different numbers of threads? I want to see how bad the contention gets (and compare it to a GILful build as well).

@tom-pytel
Copy link
ContributorAuthor

Well, yeah, we need to fix the races. I'm just wonderinghow we should fix it. Would you mind benchmarking (with this PR) the case I brought up against different numbers of threads? I want to see how bad the contention gets (and compare it to a GILful build as well).

Ok, will do a bunch of tests tomorrow.

@tom-pytel
Copy link
ContributorAuthor

tom-pytel commentedApr 17, 2025
edited
Loading

EDIT: I just realized with a good bit of rewriting lock-free reads might be possible. EDIT2: Nope, see below.

TL;DR: GIL build performance is same as expected while noGIL makes multithread writes possible and otherwise seems ok.

Detail: noGIL PR has a minor hit to single-threaded read and a smaller hit to single-threaded write (cmdline timeit). Multithreaded timings are unstable (especially write, run the script and u will get 2x variation for that), but read looks slower than main while write is much faster than main with explicit locking (which is the only thing to compare it to as otherwise writes crash).

./python -m timeit -s "from io import BytesIO; b = BytesIO((b'.'*99+b'\n')*50_000)" "while b.read(100): pass"./python -m timeit -s "from io import BytesIO; b = BytesIO((b'.'*99+b'\n')*50_000)" "while b.readline(): pass"./python -m timeit -s "from io import BytesIO; b = BytesIO(); l = b'.'*99+b'\n'" "for _ in range(50_000): b.write(l)"

Times in ns, lower is better.

SINGLE MAIN THREAD '-m timeit' cmdline (above):  (per loop, read/readline = 20000000 loops, write = 100 loops)=====================================             read  readline     writeGIL main     18.3      14.4      2580GIL PR       18.2      14.2      2590noGIL main   18.8      14.7      3710noGIL PR     20.3      16.1      3860MULTITHREADING benchmark script (below):========================================read:-----cfg / #thrds     1      2      3      4      5GIL main      24.4   23.6   23.8   23.7   23.7GIL PR        23.4   23.7   24.1   23.5   24.8noGIL main    27.4   44.0   64.8    124    116noGIL PR      28.6   57.6   90.5    129    163readline:---------cfg / #thrds     1      2      3      4      5GIL main      22.4   22.3   22.8   22.2   23.7GIL PR        22.0   22.2   22.7   21.9   25.0noGIL main    25.4   46.2   85.8    198    261noGIL PR      26.1   54.7   82.8    114    152write:  (* = with threading.Lock for 'noGIL main', otherwise crash)------cfg / #thrds     1      2      3      4      5GIL main      22.6   22.2   24.6   22.4   23.0GIL PR        22.1   23.2   22.4   23.0   23.4noGIL main    42.2    286*   571*   805*  1077*noGIL PR      39.7    123    287    535    677write:  (different script (not below) to double check unstable timings, total time)------cfg / #thrds         1          2          3          4          5GIL main       1171556    2474471    3664172    4817181    6343492GIL PR         1195404    2442720    3698038    4949697    6276943noGIL main     5997412   22573890*  27660777*  43626582*  61317669*noGIL PR       2332928    6786042   19316932   38951468   31712621

Ping@ZeroIntensity, timings as you requested.

importsysfromioimportBytesIOfromthreadingimportBarrier,Lock,Threadfromtimeimporttime_nsBESTOF=5NLINES=50_000LINE=b'.'*99+b'\n'LLEN=len(LINE)LOCK=Lock()defread(barrier,out,b):barrier.wait()t0=time_ns()whileb.read(LLEN):passout[0]=time_ns()-t0defreadline(barrier,out,b):barrier.wait()t0=time_ns()whileb.readline():passout[0]=time_ns()-t0defwrite_locked(barrier,out,b):barrier.wait()t0=time_ns()for_inrange(NLINES):withLOCK:b.write(LINE)out[0]=time_ns()-t0defwrite(barrier,out,b):barrier.wait()t0=time_ns()for_inrange(NLINES):b.write(LINE)out[0]=time_ns()-t0defcheck(func,nthrds,b,is_wr):# is_wr compensates for write total size *= nthrds while read total size is fixedbarrier=Barrier(nthrds)outs= []thrds= []for_inrange(nthrds):out= [0]thrd=Thread(target=func,args=(barrier,out,b))outs.append(out)thrds.append(thrd)thrd.start()forthrdinthrds:thrd.join()returnsum(o[0]foroinouts)/ ((nthrdsifis_wrelse1)*NLINES)defcheckbest(func,nthrds,setup,is_wr):best=float('inf')for_inrange(BESTOF):best=min(best,check(func,nthrds,setup(),is_wr))returnbestif__name__=="__main__":nthrds=int((sys.argv+ ['1'])[1])print(f'read:         ',checkbest(read,nthrds,lambda:BytesIO(LINE*NLINES),False),'ns')print(f'readline:     ',checkbest(readline,nthrds,lambda:BytesIO(LINE*NLINES),False),'ns')print(f'write_locked: ',checkbest(write_locked,nthrds,lambda:BytesIO(),True),'ns')print(f'write:        ',checkbest(write,nthrds,lambda:BytesIO(),True),'ns')

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Many functions are just wrapped in critical section. I wonder, what if add special flag, METH_FT_SYNC, and when it set, wrap the method calling code in critical section?

@tom-pytel
Copy link
ContributorAuthor

Many functions are just wrapped in critical section. I wonder, what if add special flag, METH_FT_SYNC, and when it set, wrap the method calling code in critical section?

If you're talking about stuff likebytesio_sizeof, I could have just handed that over to argument clinic to wrap but I guess I was being lazy. Argument clinic works well enough for the critical section wrapping I think.

@serhiy-storchaka
Copy link
Member

I mean that this would work also for methods that do not use Argument Clinic. Was this idea discussed before?

@tom-pytel
Copy link
ContributorAuthor

I mean that this would work also for methods that do not use Argument Clinic. Was this idea discussed before?

Couldn't tell you that, I'm relatively new here (: But one drawback I can see would be that it would only work when calling via py mechanism, if u called directly from C you would still have to do the critical section on your own as opposed to a clinic-generated function which would have it, no?

@ZeroIntensity
Copy link
Member

I mean that this would work also for methods that do not use Argument Clinic. Was this idea discussed before?

Yeah, I've suggested it in the past. Apparently, it's been tried and failed before. I don't know much of the details.

@tom-pytel
Copy link
ContributorAuthor

Ok, had a look and lock-free reads can almost certainly be done but they would not have the data integrity guarantees you would want from a file-like object without sync. They would work in the same way lock-free multithreaded iteration works, could get dup data and might not sync position correctly wrt writes, which is not what you expect from a "file". So scratch that idea.

@kumaraditya303kumaraditya303 merged commit5dd3a3a intopython:mainMay 8, 2025
42 checks passed
Pranjal095 pushed a commit to Pranjal095/cpython that referenced this pull requestJul 12, 2025
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@ngoldbaum
Copy link
Contributor

Just to clarify - this didn't make it into 3.14, right?

@tom-pytel
Copy link
ContributorAuthor

Just to clarify - this didn't make it into 3.14, right?

Looking at the code I don't see it, so no.

@ngoldbaum
Copy link
Contributor

ngoldbaum commentedJul 24, 2025
edited
Loading

OK - I think we'll make it so pytest-run-parallel marks tests that useBytesIO on 3.13 and 3.14 as thread-unsafe:Quansight-Labs/pytest-run-parallel#100

I happened to hit the issues fixed by this PR running the tests forpython-isal underpytest-run-parallel.

taegyunkim pushed a commit to taegyunkim/cpython that referenced this pull requestAug 4, 2025
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@picnixz
Copy link
Member

@kumaraditya303 should this be backported?

@kumaraditya303
Copy link
Contributor

should this be backported?

I didn't backport this as the change is large and it only crashes when multiple threads modify it concurrently which shouldn't be done.

@tom-pytel
Copy link
ContributorAuthor

should this be backported?

I didn't backport this as the change is large and it only crashes when multiple threads modify it concurrently which shouldn't be done.

It was at the end of the 3.14 cycle so a backport shouldn't be very painful, if you want I could have a look.

@ngoldbaum
Copy link
Contributor

It was at the end of the 3.14 cycle so a backport shouldn't be very painful, if you want I could have a look.

A backport would be appreciated, I think it would help resolve this discussion and get a downstream release supporting 3.14 out:pycompression/python-isal#233 (comment)

@kumaraditya303kumaraditya303 added the needs backport to 3.14bugs and security fixes labelSep 5, 2025
@miss-islington-app
Copy link

Thanks@tom-pytel for the PR, and@kumaraditya303 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry,@tom-pytel and@kumaraditya303, I could not cleanly backport this to3.14 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 5dd3a3a58cca4798ebfc2edd673211067453e81e 3.14

@bedevere-app
Copy link

GH-138551 is a backport of this pull request to the3.14 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.14bugs and security fixes labelSep 5, 2025
kumaraditya303 added a commit to kumaraditya303/cpython that referenced this pull requestSep 5, 2025
(cherry picked from commit5dd3a3a)Co-authored-by: Tomasz Pytel <tompytel@gmail.com>Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@kumaraditya303
Copy link
Contributor

I created#138551 to backport it if it helps, although I think that the isal issue is about sharingio.BytesIO in a fixture across threads and that is still incorrect.

kumaraditya303 added a commit that referenced this pull requestOct 7, 2025
* [3.14]gh-132551: make `io.BytesIO` thread safe (GH-132616)(cherry picked from commit5dd3a3a)Co-authored-by: Tomasz Pytel <tompytel@gmail.com>Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

@ZeroIntensityZeroIntensityZeroIntensity left review comments

@kumaraditya303kumaraditya303kumaraditya303 approved these changes

Assignees

@kumaraditya303kumaraditya303

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@tom-pytel@ZeroIntensity@serhiy-storchaka@ngoldbaum@picnixz@kumaraditya303

[8]ページ先頭

©2009-2025 Movatter.jp