Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-106727: Makeinspect.getsource smarter for class for same name definitions#106815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
carljm merged 4 commits intopython:mainfromgaogaotiantian:inspect-class
Jul 18, 2023

Conversation

gaogaotiantian
Copy link
Member

@gaogaotiantiangaogaotiantian commentedJul 16, 2023
edited by bedevere-bot
Loading

When there are multiple class definitions with the same class name found in a file, we currently just return the first one found. This is not the best solution.

It would be pretty difficult(maybe not impossible) to do this perfectly. A couple of heuristics are introduced:

  • If a method with afirst_lineno is found in the class definition, we think that's our target.
  • If we find the docstring in the definition and only one of the candidates has the match, that's our target.
  • If we can't figure it out, pick the last one. (This is a slightly better guess than the first one).

One test case was removed - we happened to be correct on that case, just lucky. We can't solve it right now (flipping the if condition won't make a difference).

@carljm
Copy link
Member

The rationale for switching to last-wins instead of first-wins is reasonable: barring conditionals, last-wins at runtime.

I can also see some rationale to prefer leftmost (i.e. prefer top-level definition over something nested in e.g. a conditional.)

I'm not sure the rationale for the docstring or method based heuristics is strong enough. Simplicity of behavior has value. More heuristics make the behavior less predictable and harder to explain; we should be quite sure that they significantly increase our probability of giving the right answer.

Also, it seems like any changes to this heuristic could be backwards-incompatible and change behavior of existing code. Are we sure the benefit of changes here is sufficient to justify this potential breakage?

@gaogaotiantian
Copy link
MemberAuthor

Using methods is almost definitive when there's a positive match - that's the only thing we have for classes that contain some compile time info. As long as the class has a method, we can find it with a super high certainty.

docstring would work as long as the class to be found has a unique docstring - which may not be super common, but not super rare either.

The bottom line is - I don't think either of the heuristics would generate a false positive in any case (unless the user manually assign a method of one class to another).

Yes this is not completely backward-compatible, but the code we potentially could break, is the code that we want to improve. (I know the sentence is a bit twisted)

I think the real way to solve this is to add a compile time info on the class, but that's a long shot.

carljm reacted with thumbs up emoji

@gaogaotiantian
Copy link
MemberAuthor

gaogaotiantian commentedJul 18, 2023
edited
Loading

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

We have the information all along when we create the class, we just do not record it. This would solve this whole thing and may be useful in the future. An extra field won't break the existing code either. Just don't know how difficult it is to propose a new field on all classes.

Copy link
Member

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, agreed, the new heuristics do look like they should pretty much always be correct if they match.

The docstring heuristic looks like it could be expensive for a file with multiple large class definitions of the same name, but this should be quite unusual.

Can we add a test specifically showing the docstring heuristic in action?

@carljm
Copy link
Member

How bad it is to add an extra dunderscore field that is not listed in dir for classes?

A new entire Python object attached to every class is a significant memory cost. It's not out of the question, but the value provided should be quite strong; I don't think makinginspect.getsource work better in a few marginal edge cases meets the bar.

@gaogaotiantian
Copy link
MemberAuthor

Okay I agree adding a class field only forinspect.getsource probably is not the best idea. As for the test - it's not obvious on the diff, but the existing test forcls238.cls239 is exactly testing this:

ifTrue:classcls238:classcls239:'''if clause cls239'''else:classcls238:classcls239:'''else clause 239'''pass

This is the perfect case where the old way just get it correct by luck (it would've got it wrong if it'sif False), but the new way can always get it correctly.

carljm reacted with thumbs up emoji

Copy link
Member

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, looks good to me! I guess there's no documentation update needed here, since this is a bugfix:inspect.getsource doesn't document its heuristics, it claims to just get the right source for the given class, and this will make it do that more often.

@carljmcarljmenabled auto-merge (squash)July 18, 2023 23:17
@carljmcarljm merged commit663854d intopython:mainJul 18, 2023
@gaogaotiantian
Copy link
MemberAuthor

Thank you for the review!

carljm reacted with thumbs up emoji

Comment on lines +1081 to +1084
if isinstance(member, types.FunctionType):
for lineno, end_lineno in self.lineno_found:
if lineno <= member.__code__.co_firstlineno <= end_lineno:
return lineno
Copy link
Contributor

@ViicosViicosJul 19, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Using the line numbers of the__code__ attribute of the methods is really smart, I haven't thought about that! We can be pretty sure the right class is being detected this way.

As I said in the issue:#106727 (comment), I'm afraid of how it will behave with metaclasses:

classMeta(type):def__new__(mcs,name,bases,namespace):cls=super().__new__(mcs,name,bases,namespace)cls.my_method=lambdaa:areturnclsclassA(metaclass=Meta):passA.__dict__["my_method"].__code__.co_firstlineno#> 4

Will it raise false positives if the__code__.co_firstlineno attribute of the method defined in the metaclass happen to falls within thelineno <= ... <= end_lineno range?

carljm reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, if the metaclass and the class using it both have the same name, your example will confuse this heuristic. Nice observation!

But since the metaclass must come before the class using it, this just means we effectively maintain the previous first-one-wins behavior in this unusual scenario.

So while there may be scope for further improvement in edge cases, I'm still satisfied this PR was strictly an improvement over the previous behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What I meant is if the metaclass is defined in another file, the__code__.co_firstlineno integer of any method of that metaclass could clash with a method defined on the inspected class. I'm greatly satisfied by the improvements made in this PR as well; I'll see if there's a way to handle this metaclass edge case in some way!

carljm reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually, we can deal with this. For all the methods defined in the class, they should have__module__ attribute to indicate in which module it was defined. We can simply compare this to the class passed in to rule out all the cases where the metaclass is not defined within the same file. This will also rule out the dynamically added methods.

Of course, there could be other evil cases like assigning the method of one class to another, that's a different story.

But we should get a pretty decent improvement by checking the modules, I'll work on the PR.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@carljmcarljmcarljm approved these changes

@ViicosViicosViicos left review comments

@pablogsalpablogsalAwaiting requested review from pablogsal

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@gaogaotiantian@carljm@Viicos@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp