Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork2.2k
Refactor namespace logic for annotations evaluation#10530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
codspeed-hqbot commentedOct 1, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
CodSpeed Performance ReportMerging#10530 willnot alter performanceComparing Summary
|
Viicos commentedOct 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
One issue that I have and I'm unsure how to solve it yet: The following currently unexpectedly works on main/Markus' PR: # module1.pyfromtypingimportOptionalfrompydanticimportBaseModelclassFoo(BaseModel):# note that this is a type checking error,# Bar is not defined in this module:a:Optional['Bar']=None# main.pyfrompydanticimportBaseModelfromreproimportFooclassBar(BaseModel):b:FooBar.__pydantic_complete__#> True# When building `Bar`, we will re-enter `GenerateSchema_model_schema` for `Foo`,# but because we merge every ns together, `'Bar'` successfully resolves. It doesn't (and shouldn't) with this PR. However, it also doesn't (but in theory should) work when in the same module: fromtypingimportOptionalfrompydanticimportBaseModelclassFoo(BaseModel):a:Optional['Bar']=NoneclassBar(BaseModel):b:FooBar.__pydantic_complete__#> False (on this PR) Similarly, when building So there's a tradeoff here, either we keep using the ns of all the types in the stack (but then we allow invalid examples such as the first one), or we don't but we need to introduce extra Or we could implement something even smarter: when rebuilding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is amazing work. I'm particularly excited about:
- The abundance of new documentation. This is really important, as we manage namespaces slightly differently for each of
TypeAdapter
s,BaseModel
s,dataclasses
, etc, but the differences are slight and confusing. - The ease of reviewing this PR - thanks for breaking things down into small commits with in depth explanations
- The conceptual changes here - inspired by@MarkusSintonen, the abstraction of some of this namespace management logic is really useful.
- As mentioned above with the different namespace management paths, there's a lot to address here, and you've done a wonderful job wrapping your head around these issues.
I still need to review47fdd78, but I felt there was enough in this review already to go ahead and submit.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
globals: GlobalsNamespace | ||
"""The namespace to be used as the `globals` argument during annotations evaluation.""" | ||
locals: MappingNamespace |
MarkusSintonenOct 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why this isnt alwaysLazyLocalNamespace
? Is something actually allowed to initialize this with any kind of Mapping? There is some empties somewhere in the PR but it could be also then just empty LazyLocalNamespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wanted to haveLazyLocalNamespace
as an implementation detail, as this gives us more flexibility, especially if in the future we expose theGenerateSchema
class to Pydantic users (thetypes_namespace
property exposesNamespaceTuple
).
def __init__(self, *namespaces: MappingNamespace) -> None: | ||
self._namespaces = namespaces | ||
@cached_property |
MarkusSintonenOct 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hows the perf looking with this in ForwardRefs cases like in the mega kube file thing when using future annotations? So how this works by flatting the dicts vs just iterating them, perf wise? I guess thedef data
is really never active when ForwardRefs are not being used by user, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I guess the
def data
is really never active when ForwardRefs are not being used by user, right?
Correct.
Running memray on the k8s file, I get:
- ~3 GiB total allocations on main
- ~1.5GiB total allocations here
- ~1.9GiB total allocations here, with the future import.
So how this works by flatting the dicts vs just iterating them, perf wise?
Considering most of the local namespaces are really small (compared to globals), If I remember correctly the difference is marginal, at least by experimenting quickly withtimeit
and some dummy examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.
cloudflare-workers-and-pagesbot commentedOct 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Deploying pydantic-docs with |
Latest commit: | 86fe4cc |
Status: | ✅ Deploy successful! |
Preview URL: | https://50667832.pydantic-docs.pages.dev |
Branch Preview URL: | https://ns-refactor.pydantic-docs.pages.dev |
Uh oh!
There was an error while loading.Please reload this page.
This defines:- two type aliases, representing the global and local namespaces- a `NamespacesTuple` datastructure, to make sure globals and locals are coupled together- a `LazyLocalNamespace` mapping implementation. This is considered an implementation detail and is used to lazily merge locals if required when calling `eval`- a `ns_from` function returning the global and local ns to be used for annotations evalutation. It handles some edge cases not covered by the Python stdlib- A `NsResolver` class to be used *only* with the `GenerateSchema` class
Using the newly defined `ns_from` function, we improve correctnessby fetching the correct globals and locals for each base of the class.Previously, the same global namespace was used, meaning base classesdefined in different modules could have forward annotations resolvedto the wrong type.We also fix type annotations of the eval like functions, using ourtwo type aliases.
Only set type params if no local namespace was provided. Otherwise,it is assumed that `get_function_type_hints` is called from`GenerateSchema._callable_schema` and `ns_from` was used (whichis smarter when it comes to handle type params).
and use new type aliases
Make it compatible with the new arguments that `GenerateSchema`and `collect_model_fields` expects.
This is where we use the concept of a "fallback" and "override"namespace.
Use new type aliases, correctly instantiate the `GenerateSchema`class
This is a pretty big changes, but basically this removes the hackmerging all the global namespaces for each dataclass base, andinstead improve the `collect_dataclass_fields` to mimic whatwe do for Pydantic models. The core config handling seemsunnecessary as well, so it was simplified.
Similar to the previous commit for Pydantic models.
Use the `ns_from` function that will take care of type params.
This is still WIP, I'm unsure about the fallback ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looking better and better! We're definitely close here, great work.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
def __init__(self, *namespaces: MappingNamespace) -> None: | ||
self._namespaces = namespaces | ||
@cached_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Changes look good thus far 👍 |
FastAPI uses it (but shouldn't!). No runtime warning is emitted,to avoid end users having to deal with this.
The optimization avoids computing `types_namespace` if it isn't necessary.The `GetSetDescriptorType` check was also fixed, `isinstance` is the way to go(see `typing.get_type_hints`).
github-actionsbot commentedOct 9, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Coverage reportClick to see where and how coverage changed
This report was generated bypython-coverage-comment-action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Amazing work,@Viicos!
family: Annotated[ # (7)! | ||
list[User], | ||
validate_as_deferred(lambda: list[User]).transform(lambda x: x[1:]), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
CC@adriangb I think this is broken by this PR but maybe worth breaking. I think it's possible to fix by fiddling with the__get_pydantic_core_schema__
on the Pipeline and making sure we plumb through the namespace stuff, but maybe harder than it's worth doing this second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think@Viicos decided this was worth breaking. As long as there's clear documentation showing the alternative path forward I'm okay with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Copy pasting what I added on Slack so that it's not lost:
the TL;DR is using lambdas with references to other symbols in annotations opens the door to a lot of weird behaviors.
This is the (simplified) test failing on the PR:
fromtyping_extensionsimportAnnotatedfrompydanticimportBaseModelfrompydantic.experimental.pipelineimportvalidate_as_deferredclassUser(BaseModel):family:'Annotated[list[User], validate_as_deferred(lambda: list[User])]'# The `family` annotation is successfully evaluated, but at some point during# core schema generation, the pipeline API logic is triggered and when the lambda# gets called, we end up with:# NameError: name 'User' is not defined
Onmain
, when we evaluate (read: Python will call eval) the annotation for family, we use the following as globals:{'User': __main__.User, 'Annotated': ..., 'BaseModel': ..., ...}
and locals are empty.
On this PR, we cleaned up how globals and locals were mixed up before. This means that we now use the following as globals:{'Annotated': ..., 'BaseModel': ..., ...}
and locals:{'User': __main__.User, ...}
And the issue comes from what could be considered as a limitation ofeval
. Consider this example:
deffunc():A=intworks=lambda:list[A]fails=eval('lambda: list[A]',globals(),locals())works()# list[int]fails()# NameError: A is not defined.
Theeval
limitation is that it does not have access to the non-locals of thelambda
environment (which is a new scope, like with adef
statement). Even thoughA
is present in locals, it won't be used to resolveA
and soeval
will look up in theglobals
instead (that's why it works onmain
becauseUser
was added inglobals
for theeval
call).
This limitation is documented in this open CPythonPR.
@cached_property | ||
def data(self) -> Mapping[str, Any]: | ||
return {k: v for ns in self._namespaces for k, v in ns.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Any reason not to usecollections.ChainMap
here instead of looping over all the items in all the dicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I thought about using chain maps but couldn't remember all the reasons I did not. Looking into it, we could also
ditchLazyLocalNamespace
and use a chain map instead, butChainMap
is a mutable mapping, and ideally we'd like to enforce immutability (at least from at the type checker level). I added a comment about this.
Regarding whetherdata
should return a chain map instead, we discussed about it inthis comment.
Uh oh!
There was an error while loading.Please reload this page.
parent_ns = _model_construction.unpack_lenient_weakvaluedict(cls.__pydantic_parent_namespace__) or {} | ||
ns_resolver = _namespace_utils.NsResolver( | ||
parent_namespace={**rebuild_ns, **parent_ns}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Another place collections.ChainMap could maybe be used, maybe not necessary
c772b43
intomainUh oh!
There was an error while loading.Please reload this page.
adapt pydantic BC in 2.10 (pydantic/pydantic#10530)
adapt pydantic BC in 2.10 (pydantic/pydantic#10530)
Change Summary
This can be reviewed commit per commit, with details added in each commit message.
There are some similarities with#10425, but here are the main things that differ:
NamespacesTuple
(name TBD) is used to tightly couple globals and locals together.NsResolver
class is still present, but is closer to theTypesNamespaceStack
we had previously and simplified. It is not a mapping. The lazy evaluation capability is still available for the locals, thanks to theLazyLocalNamespace
mapping. It also supports the concept of a fallback and override namespace, as wespecified.The
NsResolver
can be used in two ways:validate_call
or fromTypeAdapter
. Forvalidate_call
, the globals and locals are fetched usingns_from
and when we enterGenerateSchema._callable_schema
, the explicitly passed globals and locals will be used.NsResolver.push
will be called when we enterGenerateSchema._model_schema
, and the initial globals and locals are then irrelevant.Related issue number
Checklist