Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork2.2k
Collect model fields when rebuilding a model#11388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This aligns with dataclasses.
This is the biggest change in this PR, and the hardest toreason about and the trickiest to implement.Instead of re-evaluating annotations on a field-by-field basis,we assume all fields have their annotation evaluated (andconsequently the `FieldInfo` attributes -- some of themset depending on the annotation -- are correct). To do so,we collect model fields (if necessary, thanks to the new`__pydantic_fields_complete__` attribute) in `_model_schema()`,and then have `_common_field_schema()` called with a fully built`FieldInfo` instance (this mimics what is done for dataclasses).When `model_rebuild()` is called, we also want to recollect fieldsif it wasn't successful the first time. This introduces challenges,such as keeping the model assignments to be able to recollect fieldsand call `FieldInfo.from_annotated_attribute()`.
codspeed-hqbot commentedFeb 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
CodSpeed Performance ReportMerging#11388 willdegrade performances by 15.94%Comparing Summary
Benchmarks breakdown
|
9f735a6
to131fa9f
Comparecloudflare-workers-and-pagesbot commentedFeb 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Deploying pydantic-docs with |
Latest commit: | de1adec |
Status: | ✅ Deploy successful! |
Preview URL: | https://203f8a46.pydantic-docs.pages.dev |
Branch Preview URL: | https://model-fields-cleanup.pydantic-docs.pages.dev |
The `has_instance_in_type` function is no longer used.We don't need to track attributes explicitly set on `FieldInfo`(this was a code smell and I'm glad it's gone now).
bba626b
toa463e08
Compare…s()`They need to be performed once we know fields are complete.
0e3cd84
to10a643a
Compare74e241e
to783c3be
CompareUh oh!
There was an error while loading.Please reload this page.
783c3be
to1743828
CompareUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Great, this seems like a good step.
Seems good to be consistent here. We should comb through
I really like this pattern as well. Not only more performant, but also makes more sense logically.
Sure, willing to move forward here, these regressions aren't super concerning. As I review, I'll see if I can suggest any changes that might lead to slight reductions in the regressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the in depth PR description.
Code changes generally look good. I'd like to learn more about:
a) memory usage consequences
b) where more time is now being spent - the codspeed benchmarks aren't super clear on what's slowing things down
I understand that the slower benchmarks are for pretty specific use cases, but want to make sure we understand perf consequences before merging.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
# The `from_annotated_attribute()` call below mutates the assigned `Field()`, so make a copy: | ||
original_assignment = ( | ||
copy(assigned_value) if not evaluated and isinstance(assigned_value, FieldInfo_) else assigned_value | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Any metrics on perf impact of this copy? Seems like a pretty special case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It's only relevant if an annotation fails to evaluate and has aField()
assigned:
a:Undefined=Field()
So not that common, + this copy could be removed if we manage to refactor how we merge field infos together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think it's definitely worth investigating refactoring how we merge field infos :).
""" | ||
FieldInfo_ = import_cached_field_info() | ||
rebuilt_fields: dict[str, FieldInfo] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Understandable, but downside - this is going to increate memory usage. Could you benchmark this with memray and see what the difference is like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It's only going to make temporary memory allocations, as:
- if this is called from
model_rebuild()
, it will override the existing__pydantic_fields__
, so the old ones will be gc'ed. - if this is called in
GenerateSchema._model_schema()
, as part of another model build, they are going to be gc'ed at the end of the method.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
5fb8716
to29f7702
Compare83bdb4c
to76d535d
Comparegithub-actionsbot commentedFeb 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Coverage reportClick to see where and how coverage changed
This report was generated bypython-coverage-comment-action |
Uh oh!
There was an error while loading.Please reload this page.
if config_wrapper.defer_build: | ||
# TODO we can also stop there if `__pydantic_fields_complete__` is False. | ||
# However, `set_model_fields()` is currently lenient and we don't have access to the `NameError`. | ||
# (which is useful as we can provide the name in the error message: `set_model_mock(cls, e.name)`) | ||
set_model_mocks(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we link an issue to this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Created#11453.
# Private attributes, used to rebuild FieldInfo instances: | ||
self._complete = True | ||
self._original_annotation: Any = PydanticUndefined | ||
self._original_assignment: Any = PydanticUndefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should this be_original_default
? Or is that confusing for theField()
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think it's confusing forField()
assignments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Great work here - this takes us in a good direction in terms of refactoring field collection and merging, which have the potential for some modest perf benefits and minimal breaking changes.
Note - failing 3rd party tests are unrelated and should be fixed by#11429 |
76d535d
tode1adec
Compare72c77bf
intomainUh oh!
There was an error while loading.Please reload this page.
This results in parameterized classes being different type objects,although they represent the same type.A new refactor (#11388)was introduced since then and fixes the root issue.
This applying roughly the same logic from#11388for dataclasses.
This applying roughly the same logic from#11388for dataclasses.
This applying roughly the same logic from#11388for dataclasses.
This applying roughly the same logic from#11388 for dataclasses.Also fix third-party tests using uv.
Uh oh!
There was an error while loading.Please reload this page.
Fixes#11390,fixes#10695, works towards#9969 (we don't get the
KeyError
anymore, but we still get a clean schema error because of a missing definition).Change Summary
#5305 made it so that unknown forward annotations would not stop field collection. The PR is huge and doesn't explain why this was done, but this introduced subtle bugs and code smells that this PR tries removing.
The first issue arising from this is that the model fields can be inaccurate when a model is not fully built yet:
and I won't be surprised if some users currently make use of
model_fields
on an incomplete model (especially as we implicitly rebuild on instance creation — meaning most users don't explicitly callmodel_rebuild()
). A relatively big concern I also have is FastAPI relying onFieldInfo
instances to validate data. They do manual stuff before calling the Pydantic validate methods, such as checking forFieldInfo.is_required()
,alias
.When a model is being rebuilt, we reevaluate the annotation and recreate a
FieldInfo
instance during the core schema gen process (and we used to even to this in a less performant way; see#10769):pydantic/pydantic/_internal/_generate_schema.py
Lines 1301 to 1320 in929e8f4
Doing so introduced a code smell where we track explicitly set attributes using
_attributes_set
. This is known to be a source of bugs as we need to keep track of such explicitly set attrs when you alter theFieldInfo
instance:...
as the default value in some cases. See this added test case:pydantic/tests/test_edge_cases.py
Lines 1919 to 1927 in929e8f4
deprecated
(first bug ofDeprecated fields bugs with forward annotations #11390).And more generally:
_generate_schema.py#L1311
, wedirectly mutate the original field info instance to take the newly evaluated annotation. This is the reason we get theKeyError
on generics (KeyError when inhering from a self-referencing generic model #9969).ModelMetaclass.__new__
, at model creation (i.e. stuff thatisn't reprocessed onmodel_rebuild()
), such as setting deprecated descriptors (second bug of#11390: we set these descriptors based onFieldInfo.deprecated
).Instead, this PR does three things:
__pydantic_fields_complete__
class property on models (for private use). It isTrue
if at least one field annotation failed to evaluate duringcollect_model_fields()
. On a related note: we still set__pydantic_fields__
for backwards compatibility, but a follow up PR can now easily add a user/deprecation warning when accessingmodel_fields
if__pydantic_fields_complete__ == False
. In V3, we could remove this backward compatibility logic (TBD).Also note that now wedon't enter
GenerateSchema
if the fields collection failed, as it is guaranteed to fail anyway (if an annotation failed to evaluate during fields collection, it will also immediately fail to be evaluated inGenerateSchema
).model_rebuild()
will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.GenerateSchema
), we first callrebuild_model_fields()
; a new function that iterates over the existing (but potentially not complete) field infos, and recreate instances if the annotation needs to be evaluated again. This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:sub: Sub
, we don't callSub.model_rebuild()
. Doing so might be the right thing to do, but is unfortunately not backwards compatible. The reason is that the NS Resolver used forModel
might contain annotations (e.g. provided through theparent_namespace
) that should be available forSub
. This means that even ifsub: Sub
successfully builds,Sub
isn't complete yet — each model should be rebuilt independently, throughmodel_rebuild()
).TODO:
Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the newrebuild_model_fields()
might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example withf1..100
above and thetest_failed_rebuild
benchmark).Related issue number
Checklist