NotificationsYou must be signed in to change notification settings
Fork32.4k
Star67.9k

gh-106727: Make`inspect.getsource` smarter for class for same name definitions#106815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

carljm merged 4 commits intopython:mainfromgaogaotiantian:inspect-class

Jul 18, 2023

Merged

gh-106727: Make`inspect.getsource` smarter for class for same name definitions#106815

carljm merged 4 commits intopython:mainfromgaogaotiantian:inspect-class

Jul 18, 2023

Conversation

Copy link

Member

gaogaotiantian commentedJul 16, 2023•
edited by bedevere-bot
Loading

When there are multiple class definitions with the same class name found in a file, we currently just return the first one found. This is not the best solution.

It would be pretty difficult(maybe not impossible) to do this perfectly. A couple of heuristics are introduced:

If a method with afirst_lineno is found in the class definition, we think that's our target.
If we find the docstring in the definition and only one of the candidates has the match, that's our target.
If we can't figure it out, pick the last one. (This is a slightly better guess than the first one).

One test case was removed - we happened to be correct on that case, just lucky. We can't solve it right now (flipping the if condition won't make a difference).

Issue:inspect.getsource (findsource) not working as expected when duplicate class is used #106727

Improve inspect.getsource for class

9e9d7af

bedevere-bot added the awaiting review label

Jul 16, 2023

bedevere-bot mentioned this pull request

Jul 16, 2023

inspect.getsource (findsource) not working as expected when duplicate class is used#106727

Closed

📜🤖 Added by blurb_it.

867730b

gaogaotiantian requested review fromcarljm andpablogsal

July 17, 2023 00:02

Add newline at the end of the file

fc0a563

Copy link

Member

carljm commentedJul 18, 2023

The rationale for switching to last-wins instead of first-wins is reasonable: barring conditionals, last-wins at runtime.

I can also see some rationale to prefer leftmost (i.e. prefer top-level definition over something nested in e.g. a conditional.)

I'm not sure the rationale for the docstring or method based heuristics is strong enough. Simplicity of behavior has value. More heuristics make the behavior less predictable and harder to explain; we should be quite sure that they significantly increase our probability of giving the right answer.

Also, it seems like any changes to this heuristic could be backwards-incompatible and change behavior of existing code. Are we sure the benefit of changes here is sufficient to justify this potential breakage?

Copy link

MemberAuthor

gaogaotiantian commentedJul 18, 2023

Using methods is almost definitive when there's a positive match - that's the only thing we have for classes that contain some compile time info. As long as the class has a method, we can find it with a super high certainty.

docstring would work as long as the class to be found has a unique docstring - which may not be super common, but not super rare either.

The bottom line is - I don't think either of the heuristics would generate a false positive in any case (unless the user manually assign a method of one class to another).

Yes this is not completely backward-compatible, but the code we potentially could break, is the code that we want to improve. (I know the sentence is a bit twisted)

I think the real way to solve this is to add a compile time info on the class, but that's a long shot.

Copy link

MemberAuthor

gaogaotiantian commentedJul 18, 2023•
edited
Loading

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

We have the information all along when we create the class, we just do not record it. This would solve this whole thing and may be useful in the future. An extra field won't break the existing code either. Just don't know how difficult it is to propose a new field on all classes.

carljm reviewed

Jul 18, 2023

View reviewed changes

Copy link

Member

carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, agreed, the new heuristics do look like they should pretty much always be correct if they match.

The docstring heuristic looks like it could be expensive for a file with multiple large class definitions of the same name, but this should be quite unusual.

Can we add a test specifically showing the docstring heuristic in action?

Lib/inspect.pyShow resolvedHide resolved

Copy link

Member

carljm commentedJul 18, 2023

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

A new entire Python object attached to every class is a significant memory cost. It's not out of the question, but the value provided should be quite strong; I don't think makinginspect.getsource work better in a few marginal edge cases meets the bar.

Merge branch 'python:main' into inspect-class

3c880fe

Copy link

MemberAuthor

gaogaotiantian commentedJul 18, 2023

Okay I agree adding a class field only forinspect.getsource probably is not the best idea. As for the test - it's not obvious on the diff, but the existing test forcls238.cls239 is exactly testing this:

ifTrue:classcls238:classcls239:'''if clause cls239'''else:classcls238:classcls239:'''else clause 239'''pass

This is the perfect case where the old way just get it correct by luck (it would've got it wrong if it'sif False), but the new way can always get it correctly.

carljm approved these changes

Jul 18, 2023

View reviewed changes

Copy link

Member

carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, looks good to me! I guess there's no documentation update needed here, since this is a bugfix:inspect.getsource doesn't document its heuristics, it claims to just get the right source for the given class, and this will make it do that more often.

bedevere-bot added awaiting merge and removed awaiting review labels

Jul 18, 2023

carljmenabled auto-merge (squash)

July 18, 2023 23:17

carljm merged commit663854d intopython:main

Jul 18, 2023

bedevere-bot removed the awaiting merge label

Jul 18, 2023

Copy link

MemberAuthor

gaogaotiantian commentedJul 18, 2023

Thank you for the review!

Viicos reviewed

Jul 19, 2023

View reviewed changes

Lib/inspect.py

Comment on lines +1081 to +1084

		if isinstance(member, types.FunctionType):
		for lineno, end_lineno in self.lineno_found:
		if lineno <= member.__code__.co_firstlineno <= end_lineno:
		return lineno

Copy link

Contributor

ViicosJul 19, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Using the line numbers of the__code__ attribute of the methods is really smart, I haven't thought about that! We can be pretty sure the right class is being detected this way.

As I said in the issue:#106727 (comment), I'm afraid of how it will behave with metaclasses:

classMeta(type):def__new__(mcs,name,bases,namespace):cls=super().__new__(mcs,name,bases,namespace)cls.my_method=lambdaa:areturnclsclassA(metaclass=Meta):passA.__dict__["my_method"].__code__.co_firstlineno#> 4

Will it raise false positives if the__code__.co_firstlineno attribute of the method defined in the metaclass happen to falls within thelineno <= ... <= end_lineno range?

Copy link

Member

carljmJul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, if the metaclass and the class using it both have the same name, your example will confuse this heuristic. Nice observation!

But since the metaclass must come before the class using it, this just means we effectively maintain the previous first-one-wins behavior in this unusual scenario.

So while there may be scope for further improvement in edge cases, I'm still satisfied this PR was strictly an improvement over the previous behavior.

Copy link

Contributor

ViicosJul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What I meant is if the metaclass is defined in another file, the__code__.co_firstlineno integer of any method of that metaclass could clash with a method defined on the inspected class. I'm greatly satisfied by the improvements made in this PR as well; I'll see if there's a way to handle this metaclass edge case in some way!

Copy link

MemberAuthor

gaogaotiantianJul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually, we can deal with this. For all the methods defined in the class, they should have__module__ attribute to indicate in which module it was defined. We can simply compare this to the class passed in to rule out all the cases where the metaclass is not defined within the same file. This will also rule out the dynamically added methods.

Of course, there could be other evil cases like assigning the method of one class to another, that's a different story.

But we should get a pretty decent improvement by checking the modules, I'll work on the PR.