NotificationsYou must be signed in to change notification settings
Fork32.2k
Star67.7k

gh-131798: JIT: Split`CALL_ISINSTANCE` into several uops#133339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

brandtbucher merged 9 commits intopython:mainfromtomasr8:jit-split-isinstance

May 8, 2025

Merged

gh-131798: JIT: Split`CALL_ISINSTANCE` into several uops#133339

brandtbucher merged 9 commits intopython:mainfromtomasr8:jit-split-isinstance

May 8, 2025

Conversation

Copy link

Member

tomasr8 commentedMay 3, 2025•
edited by bedevere-appbot
Loading

SplitCALL_ISINSTANCE into two guards and the uop itself.

It will be easier to implement#133172 with this in place.

Issue:Better uop coverage in the JIT optimizer #131798

tomasr8 added3 commits

May 3, 2025 13:34

Split CALL_ISIINSTANCE into several uops

43ec167

Add news entry

0ab70a9

Close all stackrefs

900472a

tomasr8 requested review fromFidget-Spinner andmarkshannon ascode owners

May 3, 2025 11:54

bedevere-appbot mentioned this pull request

May 3, 2025

Better uop coverage in the JIT optimizer#131798

Open

bedevere-appbot added the awaiting review label

May 3, 2025

tomasr8 mentioned this pull request

May 3, 2025

gh-131798: JIT: Narrow the return type ofisinstance for some known arguments#133172

Merged

tomasr8 requested a review frombrandtbucher

May 3, 2025 17:18

brandtbucher requested changes

May 3, 2025

View reviewed changes

Copy link

Member

brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks, I just see an opportunity for further cleanup!

Python/bytecodes.c Outdated

Comment on lines 4352 to 4354

		op(_GUARD_CALLABLE_ISINSTANCE_NULL, (callable,null, unused[oparg] --callable, null, unused[oparg])) {
		DEOPT_IF(!PyStackRef_IsNull(null));
		}

Copy link

Member

brandtbucherMay 3, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since now we know that the oparg is two, we can split out the two args and get rid of the array. Also, let's give this a better name (since it's part of the same logical family as_GUARD_TOS_NULL and_GUARD_NOS_NULL).

Suggested change

	op(_GUARD_CALLABLE_ISINSTANCE_NULL, (callable,null,unused[oparg]--callable,null,unused[oparg])) {
	DEOPT_IF(!PyStackRef_IsNull(null));
	}
	op(_GUARD_THIRD_NULL, (null,unused,unused--null,unused,unused)) {
	DEOPT_IF(!PyStackRef_IsNull(null));
	}

Copy link

MemberAuthor

tomasr8May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice! I agree it makes sense to make it reusable. I also moved it just under_GUARD_NOS_NULL to help with discoverability

Python/bytecodes.c OutdatedShow resolvedHide resolved

Python/bytecodes.c Outdated

		PyInterpreterState *interp = tstate->interp;
		DEOPT_IF(callable_o != interp->callable_cache.isinstance);
		}

		op(_CALL_ISINSTANCE, (callable, null, args[oparg] -- res)) {

Copy link

Member

brandtbucherMay 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

And here:

Suggested change

	op(_CALL_ISINSTANCE, (callable,null,args[oparg]--res)) {
	op(_CALL_ISINSTANCE, (callable,null,inst,cls--res)) {

Copy link

MemberAuthor

tomasr8May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

inst is a reserved keyword so I went withinst_ :)

Python/optimizer_bytecodes.c OutdatedShow resolvedHide resolved

bedevere-appbot added awaiting changes and removed awaiting review labels

May 3, 2025

Copy link

bedevere-appbot commentedMay 3, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

tomasr8 added2 commits

May 3, 2025 22:22

Rename to _GUARD_THIRD_NULL

6f49dca

Unpack args array into separate stack variables

8ba53c3

tomasr8 commented

May 3, 2025

View reviewed changes

Python/bytecodes.c Outdated

		if (retval < 0) {
		ERROR_NO_POP();
		}
		(void)null; // Silence compiler warnings about unused variables
		PyStackRef_CLOSE(cls);
		PyStackRef_CLOSE(inst_);

Copy link

MemberAuthor

tomasr8May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Interesting that with named stackrefs it won't let me do this:

DEAD(null);PyStackRef_CLOSE(cls);PyStackRef_CLOSE(inst_);

(the error isSyntaxError: Input 'null' is not live, but 'inst_' is)
While with the previousargs version this was fine:

DEAD(null);PyStackRef_CLOSE(args[0]);PyStackRef_CLOSE(args[1]);

I guess the cases generator can't reason about arrays? Another reason to use named stackrefs instead :)

Copy link

MemberAuthor

tomasr8 commentedMay 3, 2025

CI looks good so: I have made the requested changes; please review again :)

bedevere-appbot added awaiting change review and removed awaiting changes labels

May 3, 2025

Copy link

bedevere-appbot commentedMay 3, 2025

Thanks for making the requested changes!

@brandtbucher: please review the changes made to this pull request.

bedevere-appbot requested a review frombrandtbucher

May 3, 2025 21:10

brandtbucher reviewed

May 6, 2025

View reviewed changes

Copy link

Member

brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One fix to the test:

Lib/test/test_capi/test_opt.py OutdatedShow resolvedHide resolved

Python/bytecodes.c Outdated

		PyInterpreterState *interp = tstate->interp;
		DEOPT_IF(callable_o != interp->callable_cache.isinstance);
		}

		op(_CALL_ISINSTANCE, (callable, null, inst_, cls -- res)) {

Copy link

Member

brandtbucherMay 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't love the trailing underscore. Not a huge deal, but maybe just rename toinstance orobj or something ?

Copy link

MemberAuthor

tomasr8May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I renamed it toinstance inbe50e24

brandtbucher changed the title~~gh-131798: JIT: SplitCALL_ISINSTANCE into severeal uops~~gh-131798: JIT: SplitCALL_ISINSTANCE into several uops

May 6, 2025

tomasr8and others added4 commits

May 8, 2025 18:54

Fix tests

6e11442

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>

Rename parameter

be50e24

Merge remote-tracking branch 'upstream/main' into jit-split-isinstance

ec61bc5

Regen cases

b0b31dd

Copy link

Member

Fidget-Spinner commentedMay 8, 2025

I think the tests could be more robust if there was a way to run the executors without optimizing the uops. Each test could then first use assertIn to ensure the relevant uops are in fact present and only then run with optimizations enabled. Though dynamically turning off optimizations would require some changes in the interpreter so I'm not sure if it's worth it.

You could do this if you want. Mark wanted to do this, but no one has time to implement it proper.

The steps to do this would be the following:

Export some C API foroptimize_uops in_textinternalcapi.c or something.
The API should take a bytecode offset/slice to start trace projection from.
Write our optimizer tests to just use that.

Copy link

MemberAuthor

tomasr8 commentedMay 8, 2025

Ok I'll give it a try! Seems like a nice project for the weekend (or for PyCon if I can't get it working by then 😄)

Thanks for writing down the individual steps for me, it's super helpful :)

tomasr8 requested a review frombrandtbucher

May 8, 2025 20:18

brandtbucher approved these changes

May 8, 2025

View reviewed changes

bedevere-appbot added awaiting merge and removed awaiting change review labels

May 8, 2025

brandtbucher merged commitc492ac7 intopython:main

May 8, 2025

63 of 64 checks passed

bedevere-appbot removed the awaiting merge label

May 8, 2025

tomasr8 deleted the jit-split-isinstance branch

May 8, 2025 21:30

Copy link

MemberAuthor

tomasr8 commentedMay 9, 2025

I think the tests could be more robust if there was a way to run the executors without optimizing the uops. Each test could then first use assertIn to ensure the relevant uops are in fact present and only then run with optimizations enabled. Though dynamically turning off optimizations would require some changes in the interpreter so I'm not sure if it's worth it.
You could do this if you want. Mark wanted to do this, but no one has time to implement it proper.
The steps to do this would be the following:
1. Export some C API for `optimize_uops` in `_textinternalcapi.c` or something.2. The API should take a bytecode offset/slice to start trace projection from.3. Write our optimizer tests to just use that.

@Fidget-Spinner, I noticed that the optimizer can already be turned off by settingPYTHON_UOPS_OPTIMIZE to'0':

cpython/Python/optimizer.c

Lines 1280 to 1288 in98e2c3a

	char*env_var=Py_GETENV("PYTHON_UOPS_OPTIMIZE");
	if (env_var==NULL\|\|env_var=='\0'\|\|env_var>'0') {
	length=_Py_uop_analyze_and_optimize(frame,buffer,
	length,
	curr_stackentries,&dependencies);
	if (length <=0) {
	returnlength;
	}
	}

Rather than exposing a new internal api, I think we could simply toggle this variable in the tests. I'm not sure if this is what Mark intended as this would still test more than just the optimizer proper. Though I like that these tests are more end-to-end.

Anyway, here's what it'd look like. Let me know if this is something we want to pursue, otherwise I'll go back to thinking how to expose just the optimizer :)

diff --git a/Lib/test/test_capi/test_opt.py b/Lib/test/test_capi/test_opt.pyindex 651148336f7..b82b36fa2f5 100644--- a/Lib/test/test_capi/test_opt.py+++ b/Lib/test/test_capi/test_opt.py@@ -11,6 +11,7 @@ from test.support import (script_helper, requires_specialization,                           import_helper, Py_GIL_DISABLED, requires_jit_enabled,                           reset_code)+from test.support.os_helper import EnvironmentVarGuard  _testinternalcapi = import_helper.import_module("_testinternalcapi")@@ -458,6 +459,12 @@ def _run_with_optimizer(self, testfunc, arg):         ex = get_first_executor(testfunc)         return res, ex+    def _run_without_optimizer(self, testfunc, arg):+        with EnvironmentVarGuard() as env:+            env["PYTHON_UOPS_OPTIMIZE"] = "0"+            res = testfunc(arg)+        ex = get_first_executor(testfunc)+        return res, ex      def test_int_type_propagation(self):         def testfunc(loops):@@ -1951,6 +1958,17 @@ def testfunc(n):                     x += 1             return x+        res, ex = self._run_without_optimizer(testfunc, TIER2_THRESHOLD)+        self.assertEqual(res, TIER2_THRESHOLD)+        self.assertIsNotNone(ex)+        uops = get_opnames(ex)+        self.assertIn("_CALL_ISINSTANCE", uops)+        self.assertIn("_GUARD_THIRD_NULL", uops)+        self.assertIn("_GUARD_CALLABLE_ISINSTANCE", uops)++        # Invalidate the executor to force a reoptimization+        _testinternalcapi.invalidate_executors(testfunc.__code__)+         res, ex = self._run_with_optimizer(testfunc, TIER2_THRESHOLD)         self.assertEqual(res, TIER2_THRESHOLD)         self.assertIsNotNone(ex)

Copy link

Member

Fidget-Spinner commentedMay 9, 2025

@tomasr8 yeah I'm not advocating for removing the end-to-end tests altogether. Rather, the optimizer tests should have a mix of both end-to-end and unit.

For example, it would be nice if we could generate an optimized trace without having to even wrap it in a for loop and run till JIT threshold. This actually makes the tests really slow because JIT threshold is quite high right now.

Copy link

MemberAuthor

tomasr8 commentedMay 10, 2025

Got it! Yeah, being able to simplify the tests and make them run faster is definitely worth it :)

Labels

None yet

3 participants

Movatterモバイル変換

Uh oh!

gh-131798: JIT: SplitCALL_ISINSTANCE into several uops#133339

gh-131798: JIT: SplitCALL_ISINSTANCE into several uops#133339

Uh oh!

Conversation

tomasr8 commentedMay 3, 2025• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

brandtbucher left a comment

Choose a reason for hiding this comment

Uh oh!

brandtbucherMay 3, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomasr8May 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brandtbucherMay 3, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8May 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bedevere-appbot commentedMay 3, 2025

Uh oh!

tomasr8May 3, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8 commentedMay 3, 2025

Uh oh!

bedevere-appbot commentedMay 3, 2025

Uh oh!

brandtbucher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brandtbucherMay 6, 2025

Choose a reason for hiding this comment

Uh oh!

tomasr8May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner commentedMay 8, 2025

Uh oh!

tomasr8 commentedMay 8, 2025

Uh oh!

Uh oh!

tomasr8 commentedMay 9, 2025

Uh oh!

Fidget-Spinner commentedMay 9, 2025

Uh oh!

tomasr8 commentedMay 10, 2025

Uh oh!

Uh oh!

gh-131798: JIT: Split`CALL_ISINSTANCE` into several uops#133339

gh-131798: JIT: Split`CALL_ISINSTANCE` into several uops#133339

tomasr8 commentedMay 3, 2025•
edited by bedevere-appbot
Loading

brandtbucherMay 3, 2025•
edited
Loading