Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Refactor namespace logic for annotations evaluation#10530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Viicos merged 27 commits intomainfromns-refactor
Oct 10, 2024
Merged

Conversation

Viicos
Copy link
Member

Change Summary

This can be reviewed commit per commit, with details added in each commit message.

There are some similarities with#10425, but here are the main things that differ:

  • As said inImprove namespace handling performance in core schema building #10425 (comment), we are no longer only using the local argument. Instead, theNamespacesTuple (name TBD) is used to tightly couple globals and locals together.
  • TheNsResolver class is still present, but is closer to theTypesNamespaceStack we had previously and simplified. It is not a mapping. The lazy evaluation capability is still available for the locals, thanks to theLazyLocalNamespace mapping. It also supports the concept of a fallback and override namespace, as wespecified.

TheNsResolver can be used in two ways:

  • By explicitly specifying globals and locals. This is used for example when generating a schema fromvalidate_call or fromTypeAdapter. Forvalidate_call, the globals and locals are fetched usingns_from and when we enterGenerateSchema._callable_schema, the explicitly passed globals and locals will be used.
  • By not specifying any globals or locals. This is used for example when generating a schema when a Pydantic model is created. In this case,NsResolver.push will be called when we enterGenerateSchema._model_schema, and the initial globals and locals are then irrelevant.

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review,please add a comment including the phrase "please review" to assign reviewers

sydney-runkle reacted with heart emoji
@github-actionsgithub-actionsbot added the relnotes-fixUsed for bugfixes. labelOct 1, 2024
@codspeed-hqCodSpeed HQ
Copy link

codspeed-hqbot commentedOct 1, 2024
edited
Loading

CodSpeed Performance Report

Merging#10530 willnot alter performance

Comparingns-refactor (86fe4cc) withmain (07e2b72)

Summary

✅ 38 untouched benchmarks

@sydney-runklesydney-runkle added relnotes-performanceUsed for performance improvements. and removed relnotes-fixUsed for bugfixes. labelsOct 3, 2024
@Viicos
Copy link
MemberAuthor

Viicos commentedOct 3, 2024
edited
Loading

One issue that I have and I'm unsure how to solve it yet:

The following currently unexpectedly works on main/Markus' PR:

# module1.pyfromtypingimportOptionalfrompydanticimportBaseModelclassFoo(BaseModel):# note that this is a type checking error,# Bar is not defined in this module:a:Optional['Bar']=None# main.pyfrompydanticimportBaseModelfromreproimportFooclassBar(BaseModel):b:FooBar.__pydantic_complete__#> True# When building `Bar`, we will re-enter `GenerateSchema_model_schema` for `Foo`,# but because we merge every ns together, `'Bar'` successfully resolves.

It doesn't (and shouldn't) with this PR. However, it also doesn't (but in theory should) work when in the same module:

fromtypingimportOptionalfrompydanticimportBaseModelclassFoo(BaseModel):a:Optional['Bar']=NoneclassBar(BaseModel):b:FooBar.__pydantic_complete__#> False (on this PR)

Similarly, when buildingBar, we will re-enterGenerateSchema_model_schema forFoo. On this PR, only the module namespace ofFoo (which doesn't containBar because we are currently defining it!) and locals ofFoo (vars(Foo) +{Foo.__name__: Foo}) is used to resolve annotations (see theNsResolver.types_namespace implementation).


So there's a tradeoff here, either we keep using the ns of all the types in the stack (but then we allow invalid examples such as the first one), or we don't but we need to introduce extramodel_rebuild calls for users.

Or we could implement something even smarter: when rebuildingFoo, becauseBar was defined in the same scope (i.e. same locals), we can "share" locals fromBar (which includes{Bar.__name__: Bar}) to the schema rebuilding ofFoo. Not sure how easy it is to do so, however...

Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@Viicos,

This is amazing work. I'm particularly excited about:

  1. The abundance of new documentation. This is really important, as we manage namespaces slightly differently for each ofTypeAdapters,BaseModels,dataclasses, etc, but the differences are slight and confusing.
  2. The ease of reviewing this PR - thanks for breaking things down into small commits with in depth explanations
  3. The conceptual changes here - inspired by@MarkusSintonen, the abstraction of some of this namespace management logic is really useful.
  4. As mentioned above with the different namespace management paths, there's a lot to address here, and you've done a wonderful job wrapping your head around these issues.

I still need to review47fdd78, but I felt there was enough in this review already to go ahead and submit.

globals: GlobalsNamespace
"""The namespace to be used as the `globals` argument during annotations evaluation."""

locals: MappingNamespace
Copy link
Contributor

@MarkusSintonenMarkusSintonenOct 3, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why this isnt alwaysLazyLocalNamespace? Is something actually allowed to initialize this with any kind of Mapping? There is some empties somewhere in the PR but it could be also then just empty LazyLocalNamespace

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wanted to haveLazyLocalNamespace as an implementation detail, as this gives us more flexibility, especially if in the future we expose theGenerateSchema class to Pydantic users (thetypes_namespace property exposesNamespaceTuple).

sydney-runkle reacted with thumbs up emoji
def __init__(self, *namespaces: MappingNamespace) -> None:
self._namespaces = namespaces

@cached_property
Copy link
Contributor

@MarkusSintonenMarkusSintonenOct 3, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hows the perf looking with this in ForwardRefs cases like in the mega kube file thing when using future annotations? So how this works by flatting the dicts vs just iterating them, perf wise? I guess thedef data is really never active when ForwardRefs are not being used by user, right?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess thedef data is really never active when ForwardRefs are not being used by user, right?

Correct.

Running memray on the k8s file, I get:

  • ~3 GiB total allocations on main
  • ~1.5GiB total allocations here
  • ~1.9GiB total allocations here, with the future import.

So how this works by flatting the dicts vs just iterating them, perf wise?

Considering most of the local namespaces are really small (compared to globals), If I remember correctly the difference is marginal, at least by experimenting quickly withtimeit and some dummy examples.

sydney-runkle and MarkusSintonen reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.

@cloudflare-workers-and-pagesCloudflare Workers and Pages
Copy link

cloudflare-workers-and-pagesbot commentedOct 4, 2024
edited
Loading

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit:86fe4cc
Status: ✅  Deploy successful!
Preview URL:https://50667832.pydantic-docs.pages.dev
Branch Preview URL:https://ns-refactor.pydantic-docs.pages.dev

View logs

This defines:- two type aliases, representing the global and local namespaces- a `NamespacesTuple` datastructure, to make sure globals and locals  are coupled together- a `LazyLocalNamespace` mapping implementation. This is considered  an implementation detail and is used to lazily merge locals if  required when calling `eval`- a `ns_from` function returning the global and local ns to be used  for annotations evalutation. It handles some edge cases not covered  by the Python stdlib- A `NsResolver` class to be used *only* with the `GenerateSchema`  class
Using the newly defined `ns_from` function, we improve correctnessby fetching the correct globals and locals for each base of the class.Previously, the same global namespace was used, meaning base classesdefined in different modules could have forward annotations resolvedto the wrong type.We also fix type annotations of the eval like functions, using ourtwo type aliases.
Only set type params if no local namespace was provided. Otherwise,it is assumed that `get_function_type_hints` is called from`GenerateSchema._callable_schema` and `ns_from` was used (whichis smarter when it comes to handle type params).
Make it compatible with the new arguments that `GenerateSchema`and `collect_model_fields` expects.
This is where we use the concept of a "fallback" and "override"namespace.
Use new type aliases, correctly instantiate the `GenerateSchema`class
This is a pretty big changes, but basically this removes the hackmerging all the global namespaces for each dataclass base, andinstead improve the `collect_dataclass_fields` to mimic whatwe do for Pydantic models. The core config handling seemsunnecessary as well, so it was simplified.
Similar to the previous commit for Pydantic models.
Use the `ns_from` function that will take care of type params.
This is still WIP, I'm unsure about the fallback ns
Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking better and better! We're definitely close here, great work.

def __init__(self, *namespaces: MappingNamespace) -> None:
self._namespaces = namespaces

@cached_property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.

@sydney-runkle
Copy link
Contributor

Changes look good thus far 👍

FastAPI uses it (but shouldn't!). No runtime warning is emitted,to avoid end users having to deal with this.
The optimization avoids computing `types_namespace` if it isn't necessary.The `GetSetDescriptorType` check was also fixed, `isinstance` is the way to go(see `typing.get_type_hints`).
@github-actionsGitHub Actions
Copy link
Contributor

github-actionsbot commentedOct 9, 2024
edited
Loading

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  annotated_handlers.py
  dataclasses.py
  fields.py
  functional_serializers.py
  main.py
  type_adapter.py
  validate_call_decorator.py
  pydantic/_internal
  _dataclasses.py
  _decorators.py
  _fields.py
  _generate_schema.py1465-1466,1532-1533
  _generics.py
  _model_construction.py
  _namespace_utils.py
  _schema_generation_shared.py
  _typing_extra.py271,285
  _validate_call.py
Project Total 

This report was generated bypython-coverage-comment-action

Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Amazing work,@Viicos!

Comment on lines -74 to -77
family: Annotated[ # (7)!
list[User],
validate_as_deferred(lambda: list[User]).transform(lambda x: x[1:]),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

CC@adriangb I think this is broken by this PR but maybe worth breaking. I think it's possible to fix by fiddling with the__get_pydantic_core_schema__ on the Pipeline and making sure we plumb through the namespace stuff, but maybe harder than it's worth doing this second

sydney-runkle reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think@Viicos decided this was worth breaking. As long as there's clear documentation showing the alternative path forward I'm okay with that.

sydney-runkle reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copy pasting what I added on Slack so that it's not lost:

the TL;DR is using lambdas with references to other symbols in annotations opens the door to a lot of weird behaviors.


This is the (simplified) test failing on the PR:

fromtyping_extensionsimportAnnotatedfrompydanticimportBaseModelfrompydantic.experimental.pipelineimportvalidate_as_deferredclassUser(BaseModel):family:'Annotated[list[User], validate_as_deferred(lambda: list[User])]'# The `family` annotation is successfully evaluated, but at some point during# core schema generation, the pipeline API logic is triggered and when the lambda# gets called, we end up with:# NameError: name 'User' is not defined

Onmain, when we evaluate (read: Python will call eval) the annotation for family, we use the following as globals:
{'User': __main__.User, 'Annotated': ..., 'BaseModel': ..., ...}
and locals are empty.

On this PR, we cleaned up how globals and locals were mixed up before. This means that we now use the following as globals:
{'Annotated': ..., 'BaseModel': ..., ...}
and locals:
{'User': __main__.User, ...}

And the issue comes from what could be considered as a limitation ofeval. Consider this example:

deffunc():A=intworks=lambda:list[A]fails=eval('lambda: list[A]',globals(),locals())works()# list[int]fails()# NameError: A is not defined.

Theeval limitation is that it does not have access to the non-locals of thelambda environment (which is a new scope, like with adef statement). Even thoughA is present in locals, it won't be used to resolveA and soeval will look up in theglobals instead (that's why it works onmain becauseUser was added inglobals for theeval call).

This limitation is documented in this open CPythonPR.


@cached_property
def data(self) -> Mapping[str, Any]:
return {k: v for ns in self._namespaces for k, v in ns.items()}
Copy link
Contributor

@dmontagudmontaguOct 10, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any reason not to usecollections.ChainMap here instead of looping over all the items in all the dicts?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thought about using chain maps but couldn't remember all the reasons I did not. Looking into it, we could also
ditchLazyLocalNamespace and use a chain map instead, butChainMap is a mutable mapping, and ideally we'd like to enforce immutability (at least from at the type checker level). I added a comment about this.

Regarding whetherdata should return a chain map instead, we discussed about it inthis comment.

sydney-runkle reacted with thumbs up emoji
parent_ns = _model_construction.unpack_lenient_weakvaluedict(cls.__pydantic_parent_namespace__) or {}

ns_resolver = _namespace_utils.NsResolver(
parent_namespace={**rebuild_ns, **parent_ns},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Another place collections.ChainMap could maybe be used, maybe not necessary

@ViicosViicos merged commitc772b43 intomainOct 10, 2024
63 checks passed
@ViicosViicos deleted the ns-refactor branchOctober 10, 2024 19:55
RF-Tar-Railt added a commit to RF-Tar-Railt/nonebot-plugin-uninfo that referenced this pull requestFeb 17, 2025
RF-Tar-Railt added a commit to RF-Tar-Railt/nonebot-plugin-uninfo that referenced this pull requestFeb 17, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@adriangbadriangbadriangb left review comments

@MarkusSintonenMarkusSintonenMarkusSintonen left review comments

@dmontagudmontagudmontagu left review comments

@sydney-runklesydney-runklesydney-runkle approved these changes

Assignees
No one assigned
Labels
relnotes-performanceUsed for performance improvements.
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@Viicos@sydney-runkle@adriangb@MarkusSintonen@dmontagu

[8]ページ先頭

©2009-2025 Movatter.jp