Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-121954: Don't mutate tuples in _PyCode_New#122180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
encukou wants to merge20 commits intopython:main
base:main
Choose a base branch
Loading
fromencukou:gh-121863

Conversation

encukou
Copy link
Member

@encukouencukou commentedJul 23, 2024
edited by bedevere-appbot
Loading

Change_PyCode_New (intern_strings &intern_constants) to return new tuples
rather than mutating their arguments.
If possible, return the argument rather than create a copy.

In order to make copying less common, intern strings destined for
_PyCode_New earlier -- in:

  • _PyCode_ConstantKey, used in the compiler to de-duplicate constants;
  • symtable_implicit_arg, which generates the names.0,.1 etc.
    • The most common implicit arg,.0, is baked in as a_Py_STR
    • others are immortalized

This means all string constants produced by the compiler are interned;
previously it was only names and identifier-like strings.
(We alreadyhash all code constants in order to de-duplicate them,
and we treat deleting code objects as rare event. Also, constants were
interned in the free-threaded build.)
Remove a test that verified a string constant isnot interned.

Mergevalidate_and_copy_tuple, the routine for cleaning up user input
forcode.__new__, into the internalintern_names.
Note that this changes the exception type for non-string names
inPyCode_NewWithPosOnlyArgs and such fromSystemError toTypeError;
IMO that doesn't need documenting.

Renameintern_strings tointern_names to further distinguish it from
intern_constants. Unlike constants, names are immortalized.

Comment on lines +170 to +180
Py_ssize_t size = PyTuple_GET_SIZE(orig_tuple);
new_tuple = PyTuple_New(size);
if (!new_tuple) {
Py_DECREF(new_item);
return -1;
}
for (Py_ssize_t i = size; --i >= 0; ) {
_PyTuple_ITEMS(new_tuple)[i] = Py_NewRef(
_PyTuple_ITEMS(orig_tuple)[i]);
}
*new_tuple_p = new_tuple;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copying the tuple should be a rare event, mostly for creating code from user data.
Should it use some kind of faster-cpython stat counter?

@encukouencukou marked this pull request as draftJuly 23, 2024 14:52
@carljmcarljm removed their request for reviewJuly 26, 2024 17:29
@carljm
Copy link
Member

(Symtable change looks fine to me, would rather have someone from faster-cpython review the rest, once the PR is ready.)

@encukou
Copy link
MemberAuthor

encukou commentedJul 27, 2024
edited
Loading

Thanks!

(I'm aware of some of the problems in the PR; since it's not a 3.13 regression I'll get back to this after 3.13rc1 is out.)

@encukou
Copy link
MemberAuthor

I'm blocked on not understanding some of the internals for free-threaded builds.

The free-threaded build tries to “intern” all code constants, but it doesn't -- see for example this, adapted fromtest.test_code.CodeConstsTest but with a set added:

import textwrapglobals = {}exec(textwrap.dedent("""    def func1():        return (0.0, (1, 2, "hello"), {1, 2})"""), globals)exec(textwrap.dedent("""    def func2():        return (0.0, (1, 2, "hello"), {1, 2})"""), globals)print(globals["func1"]() is globals["func2"]())

Which constants aremeant to be interned here?

Worse, when a container (i.e. tuple) is interned, it's latercleared in theclear_containers function, replacing all its contents with NULL. This sounds dangerous for objects that might be shared with other code. What makesclear_containers safe?

@colesbury, do you know?

@colesbury
Copy link
Contributor

colesbury commentedAug 5, 2024
edited
Loading

The free-threaded build tries to “intern” all code constants, but it doesn't...

All types of code constants (frozenset, slices, strings, numbers) are interned, but there are some code paths that skip interning and immortalization:

I thinkexec calls compile internally.

Worse, when a container (i.e. tuple) is interned, it's later cleared in the clear_containers function... This sounds dangerous for objects that might be shared with other code

We are deallocating the immortalized constants near the end of the interpreter shutdown process. Your comment is aboutclear_containers, but the hazards are withdeallocating immortalized objects, and they're the same as with_PyUnicode_ClearInterned. If some (broken) extension holds onto aPyObject* across interpreter lifetimes, it will have an invalid pointer. But that's true in general because some mimalloc state is per-interpreter, and that state gets destroyed infinalize_interp_clear shortly after we free the constants.

The constants form an acyclic graph so we have to be careful about the order in which we free them. The emptying of containers breaks up this graph, which allows us to call_Py_ClearImmortal() in no particular order. That's not the only option; we could have implemented this differently:

  • We could first topologically sort the constants and then call_Py_ClearImmortal() in topological order. That would ensure that we don't free an object while another constant still points to it. This seems more complicated to implement.
  • We could do it in three passes over the constants. First, call_Py_SetMortal(op, 1) to set each object's refcount to 1. Second, traverse containers callingPy_INCREF() on pointed-to constants to fixup the reference counts. Finally, loop over constants and callPy_DECREF() once on each object.

@colesbury
Copy link
Contributor

I'll take a closer look at the free-threading test failures

@colesbury
Copy link
Contributor

@encukou, sorry I think the reported leaks were all pre-existing. They seem to be fixed by:

When I merge that PR into yours, all the tests pass. Would you please review that PR when you get a chance?

@encukou
Copy link
MemberAuthor

Thank you! That makes things much clearer.
Let me poke you with some questions to make sure we're on the same page:

All types of code constants (frozenset, slices, strings, numbers) are interned

Those are constants used by the compiler and marshal. But, users can put anything inco_consts, including mutable values or subtypes of core types.
When they do that, things will break, but I think it should not affect other code objects, or cause crashes or memory corruption.

For example:

deff():print(())code=f.__code__code=code.replace(co_consts=(None, []))exec(code)

Would it make sense to limit this interning to the marshallable immutable types only?

frozenset

The frozenset code seems to rely on the order of elements, so it might intern two distinct but equal sets. Would that be a problem? (If so I'll try to find an example.)

We are deallocating the immortalized constants near the end of the interpreter shutdown process.

The danger I see is that the values interned here are assumed to be “owned” by this machinery, but it seems they could be shared with other code, and perhaps needed in some later stage of interpreter shutdown.
Or is there a mechanism that prevents, for example, the MRO tuple of a built-in class to find its way here?

@colesbury
Copy link
Contributor

But, users can put anything in co_consts...

I'm concerned that allowing non-immortal or mutable types in constants will cause problems downstream in the interpreter. My preference would be to more thoroughly validate the replacementco_consts to ensure that they only contain the smae types the compiler produces.

it should not ... cause crashes or memory corruption...

This is generally our requirement for APIs, butcode.replace is unsafe -- if theco_code is wrong the interpreter will crash horribly.

I still thinks it's a good goal -- the more footguns we remove, the better -- but I think we'd be better off if we get there by more thorough validation, rather than expanding scope/support.

The frozenset code seems to rely on the order of elements, so it might intern two distinct but equal sets.

I think the behavior matches logic in the bytecode compiler, which also de-dupes constants. We only de-dupe two sets if they contain the same elements and their iteration orders match. The iteration order requirement is not strictly necessary, but it simplifies the comparison logic.

...and perhaps needed in some later stage of interpreter shutdown

This happens very late in interpreter shutdown for that reason. The thread state and interpreter state are already cleared and static types have been finalized.

@encukou
Copy link
MemberAuthor

My preference would be to more thoroughly validate the replacement co_consts to ensure that they only contain the same types the compiler produces.

Sounds reasonable. I filed#122850, to not block this PR.

code.replace is unsafe -- if theco_code is wrong the interpreter will crash horribly.

Sure, but people could reasonably expect a “copy-and-patch” operation onco_consts to be safe. Let's fail cleanly rather than break the more obscure invariants.

This happens very late in interpreter shutdown for that reason. The thread state and interpreter state are already cleared and static types have been finalized.

I see. I'll add a comment to make it less likely for someone to add/rearrange the finalization phases.

(Leaving now; I'll update the PR next week.)

@encukouencukou added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelAug 13, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@encukou for commited6115f 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelAug 13, 2024
@encukou
Copy link
MemberAuthor

Today, I wasn't able to figure out the refleak free-threaded buildbots found intest__interpchannels.
It goes away when I revert per-thread heap type refcounts (gh-122418). It also goes away when I revert this change (i.e. modify the parent tuple) for the one-element tuple('encoding',).
I'll continue debugging later.

@colesbury
Copy link
Contributor

@encukou let me know if you'd like me to take a look at the remaining refleak

@encukou
Copy link
MemberAuthor

Don't spend hours on it, but if there's a quick trick or tool you know of, or if can point to where you'd look next, I'd appreciate it.

@encukou
Copy link
MemberAuthor

This hides the refleak:

diff --git a/Lib/test/test__interpreters.py b/Lib/test/test__interpreters.pyindex f493a92e0dd..9ffec59c8bf 100644--- a/Lib/test/test__interpreters.py+++ b/Lib/test/test__interpreters.py@@ -26,7 +26,7 @@ def _captured_script(script):     indented = script.replace('\n', '\n                ')     wrapped = dedent(f"""         import contextlib-        with open({w}, 'w', encoding="utf-8") as spipe:+        with open({w}, 'w', -1, "utf-8") as spipe:             with contextlib.redirect_stdout(spipe):                 {indented}         """)

@JelleZijlstraJelleZijlstra removed their request for reviewMay 4, 2025 16:15
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently will be requested when the pull request is marked ready for reviewericsnowcurrently is a code owner

@JelleZijlstraJelleZijlstraAwaiting requested review from JelleZijlstraJelleZijlstra will be requested when the pull request is marked ready for reviewJelleZijlstra is a code owner

@carljmcarljmAwaiting requested review from carljmcarljm will be requested when the pull request is marked ready for reviewcarljm is a code owner

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@encukou@carljm@colesbury@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp