NotificationsYou must be signed in to change notification settings
Fork2.2k
Star24.6k

Collect model fields when rebuilding a model#11388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

Viicos merged 10 commits intomainfrommodel-fields-cleanup

Feb 12, 2025

Merged

Collect model fields when rebuilding a model#11388

Viicos merged 10 commits intomainfrommodel-fields-cleanup

Feb 12, 2025

Conversation

Copy link

Member

Viicos commentedFeb 4, 2025•
edited
Loading

Fixes#11390,fixes#10695, works towards#9969 (we don't get theKeyError anymore, but we still get a clean schema error because of a missing definition).

Change Summary

#5305 made it so that unknown forward annotations would not stop field collection. The PR is huge and doesn't explain why this was done, but this introduced subtle bugs and code smells that this PR tries removing.

The first issue arising from this is that the model fields can be inaccurate when a model is not fully built yet:

fromtypingimportAnnotatedfrompydanticimportBaseModelclassModel(BaseModel):a:'Annotated[Forward, Field(alias="b")]'Model.model_fields['a'].alias#> None, should be 'b'

and I won't be surprised if some users currently make use ofmodel_fields on an incomplete model (especially as we implicitly rebuild on instance creation — meaning most users don't explicitly callmodel_rebuild()). A relatively big concern I also have is FastAPI relying onFieldInfo instances to validate data. They do manual stuff before calling the Pydantic validate methods, such as checking forFieldInfo.is_required(),alias.

When a model is being rebuilt, we reevaluate the annotation and recreate aFieldInfo instance during the core schema gen process (and we used to even to this in a less performant way; see#10769):

pydantic/pydantic/_internal/_generate_schema.py

Lines 1301 to 1320 in929e8f4

	ifnotfield_info.evaluated:
	# TODO Can we use field_info.apply_typevars_map here?
	try:
	evaluated_type=_typing_extra.eval_type(field_info.annotation,*self._types_namespace)
	exceptNameErrorase:
	raisePydanticUndefinedAnnotation.from_name_error(e)frome
	evaluated_type=replace_types(evaluated_type,self._typevars_map)
	field_info.evaluated=True
	ifnothas_instance_in_type(evaluated_type,PydanticRecursiveRef):
	new_field_info=FieldInfo.from_annotation(evaluated_type)
	field_info.annotation=new_field_info.annotation

	# Handle any field info attributes that may have been obtained from now-resolved annotations
	fork,vinnew_field_info._attributes_set.items():
	# If an attribute is already set, it means it was set by assigning to a call to Field (or just a
	# default value), and that should take the highest priority. So don't overwrite existing attributes.
	# We skip over "attributes" that are present in the metadata_lookup dict because these won't
	# actually end up as attributes of the `FieldInfo` instance.
	ifknotinfield_info._attributes_setandknotinfield_info.metadata_lookup:
	setattr(field_info,k,v)

Doing so introduced a code smell where we track explicitly set attributes using_attributes_set. This is known to be a source of bugs as we need to keep track of such explicitly set attrs when you alter theFieldInfo instance:

We forgot to update it when using... as the default value in some cases. See this added test case:

pydantic/tests/test_edge_cases.py

Lines 1919 to 1927 in929e8f4

	deftest_ellipsis_forward_ref_annotated()->None:
	"""This previously resulted in the ellipsis being used as a default value."""

	classModel(BaseModel):
	f:'Forward'

	Forward=Annotated[int,Field(...)]

	assertModel.model_fields['f'].defaultisPydanticUndefined

We currently forget to updatedeprecated (first bug ofDeprecated fields bugs with forward annotations #11390).

And more generally:

On_generate_schema.py#L1311, wedirectly mutate the original field info instance to take the newly evaluated annotation. This is the reason we get theKeyError on generics (KeyError when inhering from a self-referencing generic model #9969).
We make use of the model fields (even though they might be incorrect) insideModelMetaclass.__new__, at model creation (i.e. stuff thatisn't reprocessed onmodel_rebuild()), such as setting deprecated descriptors (second bug of#11390: we set these descriptors based onFieldInfo.deprecated).

Instead, this PR does three things:

Add a new__pydantic_fields_complete__ class property on models (for private use). It isTrue if at least one field annotation failed to evaluate duringcollect_model_fields(). On a related note: we still set__pydantic_fields__ for backwards compatibility, but a follow up PR can now easily add a user/deprecation warning when accessingmodel_fieldsif__pydantic_fields_complete__ == False. In V3, we could remove this backward compatibility logic (TBD).
Also note that now wedon't enterGenerateSchema if the fields collection failed, as it is guaranteed to fail anyway (if an annotation failed to evaluate during fields collection, it will also immediately fail to be evaluated inGenerateSchema).
Callingmodel_rebuild()will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.
When an incomplete Pydantic model is encountered as an annotation (insideGenerateSchema), we first callrebuild_model_fields(); a new function that iterates over the existing (but potentially not complete) field infos, and recreate instances if the annotation needs to be evaluated again. This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:
```
classSub(BaseModel):f1:int    ...# Many fields...f100:'Forward'classModel(BaseModel):sub:Sub# Note that Sub is incomplete, so in the process of generating the schema for# `Model`, we don't use the cached schema attribute# Thus previously, we would generate a schema for f1, f2, ..., f99. f100 would fail,# meaning we generated 99 fields core schemas for nothing.# In this PR, we first (re)collect fields for `Sub`, and immediately raise if fields collection did not suceed.
```
On important thing to acknowledge: when encounteringsub: Sub, we don't callSub.model_rebuild(). Doing so might be the right thing to do, but is unfortunately not backwards compatible. The reason is that the NS Resolver used forModel might contain annotations (e.g. provided through theparent_namespace) that should be available forSub. This means that even ifsub: Sub successfully builds,Sub isn't complete yet — each model should be rebuilt independently, throughmodel_rebuild()).

TODO:

Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the newrebuild_model_fields() might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example withf1..100 above and thetest_failed_rebuild benchmark).

Related issue number

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review,please add a comment including the phrase "please review" to assign reviewers

Viicos added2 commits

February 3, 2025 21:29

Collect and set model fields when rebuilding model

7175058

This aligns with dataclasses.

Recollect fields during rebuild if collection failed

2b2d8d3

This is the biggest change in this PR, and the hardest toreason about and the trickiest to implement.Instead of re-evaluating annotations on a field-by-field basis,we assume all fields have their annotation evaluated (andconsequently the `FieldInfo` attributes -- some of themset depending on the annotation -- are correct). To do so,we collect model fields (if necessary, thanks to the new`__pydantic_fields_complete__` attribute) in `_model_schema()`,and then have `_common_field_schema()` called with a fully built`FieldInfo` instance (this mimics what is done for dataclasses).When `model_rebuild()` is called, we also want to recollect fieldsif it wasn't successful the first time. This introduces challenges,such as keeping the model assignments to be able to recollect fieldsand call `FieldInfo.from_annotated_attribute()`.

github-actionsbot added the relnotes-fixUsed for bugfixes. label

Feb 4, 2025

Copy link

codspeed-hqbot commentedFeb 4, 2025•
edited
Loading

CodSpeed Performance Report

Merging#11388 willdegrade performances by 15.94%

_{Comparingmodel-fields-cleanup (de1adec) withmain (53f8ece)}

Summary

⚡ 2 improvements
❌ 3 regressions
✅ 41 untouched benchmarks

⚠️Please fix the performance issues oracknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_schema_build`	2.5 ms	2.8 ms	-11.53%
⚡	`test_construct_dataclass_schema`	1.2 ms	1.1 ms	+8.6%
⚡	`test_failed_rebuild`	3,851.1 µs	211.9 µs	×18
❌	`test_deeply_nested_recursive_model_schema_generation`	1.1 ms	1.3 ms	-15.94%
❌	`test_recursive_discriminated_union_with_base_model`	1.4 ms	1.5 ms	-8.22%

Viicos force-pushed themodel-fields-cleanup branch from9f735a6 to131fa9fCompare

February 4, 2025 17:04

Copy link

cloudflare-workers-and-pagesbot commentedFeb 4, 2025•
edited
Loading

Deploying pydantic-docs with Cloudflare Pages

Latest commit:	`de1adec`
Status:	✅ Deploy successful!
Preview URL:	https://203f8a46.pydantic-docs.pages.dev
Branch Preview URL:	https://model-fields-cleanup.pydantic-docs.pages.dev

View logs

Clean up unused code

4e1f4c5

The `has_instance_in_type` function is no longer used.We don't need to track attributes explicitly set on `FieldInfo`(this was a code smell and I'm glad it's gone now).

Viicos added the third-party-testsAdd this label on a PR to trigger 3rd party tests label

Feb 4, 2025

Viicos closed this

Feb 4, 2025

Viicos reopened this

Feb 4, 2025

Viicos force-pushed themodel-fields-cleanup branch frombba626b toa463e08Compare

February 4, 2025 19:58

Move operations depending on model fields inside `complete_model_clas…

361847c

…s()`They need to be performed once we know fields are complete.

Viicos force-pushed themodel-fields-cleanup branch from0e3cd84 to10a643aCompare

February 6, 2025 13:54

Viicos added2 commits

February 6, 2025 16:47

Simplify condition for field info creation

07f2c36

Alternative approach

5a73371

Viicos force-pushed themodel-fields-cleanup branch 4 times, most recently from74e241e to783c3beCompare

February 7, 2025 10:53

Viicos commented

Feb 7, 2025

View reviewed changes

pydantic/fields.pyShow resolvedHide resolved

Viicos added2 commits

February 7, 2025 17:01

Add deprecated issue test

1896c54

Add generics test

125c697

Viicos force-pushed themodel-fields-cleanup branch from783c3be to1743828Compare

February 7, 2025 16:32

Viicos commented

Feb 7, 2025

View reviewed changes

pydantic/_internal/_generics.py OutdatedShow resolvedHide resolved

Viicos commented

Feb 7, 2025

View reviewed changes

pydantic/_internal/_model_construction.pyShow resolvedHide resolved

Copy link

Contributor

sydney-runkle commentedFeb 10, 2025

Add a new__pydantic_fields_complete__ class property on models

Great, this seems like a good step.

Calling model_rebuild() will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.

Seems good to be consistent here. We should comb throughmodel_rebuild issues, this might close some of those.

This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:

I really like this pattern as well. Not only more performant, but also makes more sense logically.

Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the new rebuild_model_fields() might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example with f1..100 above and the test_failed_rebuild benchmark).

Sure, willing to move forward here, these regressions aren't super concerning. As I review, I'll see if I can suggest any changes that might lead to slight reductions in the regressions.

sydney-runkle reviewed

Feb 10, 2025

View reviewed changes

Copy link

Contributor

sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the in depth PR description.

Code changes generally look good. I'd like to learn more about:
a) memory usage consequences
b) where more time is now being spent - the codspeed benchmarks aren't super clear on what's slowing things down

I understand that the slower benchmarks are for pretty specific use cases, but want to make sure we understand perf consequences before merging.

pydantic/fields.pyShow resolvedHide resolved

pydantic/_internal/_model_construction.py OutdatedShow resolvedHide resolved

pydantic/main.py OutdatedShow resolvedHide resolved

pydantic/_internal/_fields.pyShow resolvedHide resolved

pydantic/_internal/_fields.py OutdatedShow resolvedHide resolved

pydantic/_internal/_fields.py

Comment on lines +241 to +245

		# The `from_annotated_attribute()` call below mutates the assigned `Field()`, so make a copy:
		original_assignment = (
		copy(assigned_value) if not evaluated and isinstance(assigned_value, FieldInfo_) else assigned_value
		)

Copy link

Contributor

sydney-runkleFeb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any metrics on perf impact of this copy? Seems like a pretty special case...

Copy link

MemberAuthor

ViicosFeb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's only relevant if an annotation fails to evaluate and has aField() assigned:

a:Undefined=Field()

So not that common, + this copy could be removed if we manage to refactor how we merge field infos together.

Copy link

Contributor

sydney-runkleFeb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's definitely worth investigating refactoring how we merge field infos :).

pydantic/_internal/_fields.py

		"""
		FieldInfo_ = import_cached_field_info()

		rebuilt_fields: dict[str, FieldInfo] = {}

Copy link

Contributor

sydney-runkleFeb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Understandable, but downside - this is going to increate memory usage. Could you benchmark this with memray and see what the difference is like?

Copy link

MemberAuthor

ViicosFeb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's only going to make temporary memory allocations, as:

if this is called frommodel_rebuild(), it will override the existing__pydantic_fields__, so the old ones will be gc'ed.
if this is called inGenerateSchema._model_schema(), as part of another model build, they are going to be gc'ed at the end of the method.

pydantic/_internal/_generics.py OutdatedShow resolvedHide resolved

pydantic/_internal/_fields.pyShow resolvedHide resolved

Feedback

da90550

Viicos force-pushed themodel-fields-cleanup branch from5fb8716 to29f7702Compare

February 10, 2025 21:21

Last fixes

de1adec

Viicos marked this pull request as ready for review

February 12, 2025 14:05

Viicos force-pushed themodel-fields-cleanup branch from83bdb4c to76d535dCompare

February 12, 2025 14:06

Copy link

Contributor

github-actionsbot commentedFeb 12, 2025•
edited
Loading

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
pydantic
fields.py
main.py
pydantic/_internal
_fields.py
_generate_schema.py
_model_construction.py
Project Total

_{This report was generated bypython-coverage-comment-action}

sydney-runkle reviewed

Feb 12, 2025

View reviewed changes

pydantic/_internal/_fields.pyShow resolvedHide resolved

sydney-runkle reviewed

Feb 12, 2025

View reviewed changes

pydantic/_internal/_model_construction.py

Comment on lines +229 to +233

		if config_wrapper.defer_build:
		# TODO we can also stop there if `__pydantic_fields_complete__` is False.
		# However, `set_model_fields()` is currently lenient and we don't have access to the `NameError`.
		# (which is useful as we can provide the name in the error message: `set_model_mock(cls, e.name)`)
		set_model_mocks(cls)

Copy link

Contributor

sydney-runkleFeb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we link an issue to this comment?

Copy link

MemberAuthor

ViicosFeb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Created#11453.

sydney-runkle reviewed

Feb 12, 2025

View reviewed changes

pydantic/fields.py

		# Private attributes, used to rebuild FieldInfo instances:
		self._complete = True
		self._original_annotation: Any = PydanticUndefined
		self._original_assignment: Any = PydanticUndefined

Copy link

Contributor

sydney-runkleFeb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should this be_original_default? Or is that confusing for theField() case?

Copy link

MemberAuthor

ViicosFeb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's confusing forField() assignments

sydney-runkle approved these changes

Feb 12, 2025

View reviewed changes

Copy link

Contributor

sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great work here - this takes us in a good direction in terms of refactoring field collection and merging, which have the potential for some modest perf benefits and minimal breaking changes.

Copy link

Contributor

sydney-runkle commentedFeb 12, 2025

Note - failing 3rd party tests are unrelated and should be fixed by#11429

Viicos force-pushed themodel-fields-cleanup branch from76d535d tode1adecCompare

February 12, 2025 16:28

Viicos merged commit72c77bf intomain

Feb 12, 2025

76 of 77 checks passed

Viicos deleted the model-fields-cleanup branch

February 12, 2025 16:41

This was referencedFeb 17, 2025

Raise a warning when accessing incomplete model fields#11453

Open

FormPage.__pydantic_init_subclass__ doesn't behave as expectedworkfloworchestrator/pydantic-forms#39

Closed

This was referencedApr 1, 2025

Do not overrideFieldInfo._complete when using field from parent class#11668

Merged

Recursion ForwardRef issue starting from pydantic 2.11.0#11689

Closed

Preserve field description when rebuilding model fields#11698

Merged

Viicos added a commit that referenced this pull request

Apr 15, 2025

Remove generics cache workaround

e7d3bbf

This results in parameterized classes being different type objects,although they represent the same type.A new refactor (#11388)was introduced since then and fixes the root issue.

Viicos mentioned this pull request

Apr 15, 2025

Remove generics cache workaround#11755

Merged

5 tasks

Viicos added a commit that referenced this pull request

Jun 4, 2025

Rebuild dataclass fields before schema generation

158a17d

This applying roughly the same logic from#11388for dataclasses.

Viicos mentioned this pull request

Jun 4, 2025

Rebuild dataclass fields before schema generation#11949

Merged

5 tasks

Viicos added a commit that referenced this pull request

Jun 4, 2025

Rebuild dataclass fields before schema generation

fc1858d

This applying roughly the same logic from#11388for dataclasses.

Viicos added a commit that referenced this pull request

Jun 12, 2025

Rebuild dataclass fields before schema generation

e8a3c45

This applying roughly the same logic from#11388for dataclasses.

Viicos added a commit that referenced this pull request

Jun 12, 2025

Rebuild dataclass fields before schema generation (#11949)

43bbdc0

This applying roughly the same logic from#11388 for dataclasses.Also fix third-party tests using uv.

Labels

relnotes-fix

Used for bugfixes.

third-party-tests

Add this label on a PR to trigger 3rd party tests

2 participants

Movatterモバイル変換

Uh oh!

Collect model fields when rebuilding a model#11388

Collect model fields when rebuilding a model#11388

Uh oh!

Conversation

Viicos commentedFeb 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Change Summary

Related issue number

Checklist

Uh oh!

codspeed-hqbot commentedFeb 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Merging#11388 willdegrade performances by 15.94%

Summary

Benchmarks breakdown

Uh oh!

cloudflare-workers-and-pagesbot commentedFeb 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Deploying pydantic-docs with Cloudflare Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sydney-runkle commentedFeb 10, 2025

Uh oh!

sydney-runkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actionsbot commentedFeb 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Coverage report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sydney-runkle left a comment

Choose a reason for hiding this comment

Uh oh!

sydney-runkle commentedFeb 12, 2025

Uh oh!

Uh oh!

Uh oh!

Viicos commentedFeb 4, 2025•
edited
Loading

codspeed-hqbot commentedFeb 4, 2025•
edited
Loading

cloudflare-workers-and-pagesbot commentedFeb 4, 2025•
edited
Loading

github-actionsbot commentedFeb 12, 2025•
edited
Loading