Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Collect model fields when rebuilding a model#11388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Viicos merged 10 commits intomainfrommodel-fields-cleanup
Feb 12, 2025
Merged

Conversation

Viicos
Copy link
Member

@ViicosViicos commentedFeb 4, 2025
edited
Loading

Fixes#11390,fixes#10695, works towards#9969 (we don't get theKeyError anymore, but we still get a clean schema error because of a missing definition).

Change Summary

#5305 made it so that unknown forward annotations would not stop field collection. The PR is huge and doesn't explain why this was done, but this introduced subtle bugs and code smells that this PR tries removing.

The first issue arising from this is that the model fields can be inaccurate when a model is not fully built yet:

fromtypingimportAnnotatedfrompydanticimportBaseModelclassModel(BaseModel):a:'Annotated[Forward, Field(alias="b")]'Model.model_fields['a'].alias#> None, should be 'b'

and I won't be surprised if some users currently make use ofmodel_fields on an incomplete model (especially as we implicitly rebuild on instance creation — meaning most users don't explicitly callmodel_rebuild()). A relatively big concern I also have is FastAPI relying onFieldInfo instances to validate data. They do manual stuff before calling the Pydantic validate methods, such as checking forFieldInfo.is_required(),alias.

When a model is being rebuilt, we reevaluate the annotation and recreate aFieldInfo instance during the core schema gen process (and we used to even to this in a less performant way; see#10769):

ifnotfield_info.evaluated:
# TODO Can we use field_info.apply_typevars_map here?
try:
evaluated_type=_typing_extra.eval_type(field_info.annotation,*self._types_namespace)
exceptNameErrorase:
raisePydanticUndefinedAnnotation.from_name_error(e)frome
evaluated_type=replace_types(evaluated_type,self._typevars_map)
field_info.evaluated=True
ifnothas_instance_in_type(evaluated_type,PydanticRecursiveRef):
new_field_info=FieldInfo.from_annotation(evaluated_type)
field_info.annotation=new_field_info.annotation
# Handle any field info attributes that may have been obtained from now-resolved annotations
fork,vinnew_field_info._attributes_set.items():
# If an attribute is already set, it means it was set by assigning to a call to Field (or just a
# default value), and that should take the highest priority. So don't overwrite existing attributes.
# We skip over "attributes" that are present in the metadata_lookup dict because these won't
# actually end up as attributes of the `FieldInfo` instance.
ifknotinfield_info._attributes_setandknotinfield_info.metadata_lookup:
setattr(field_info,k,v)

Doing so introduced a code smell where we track explicitly set attributes using_attributes_set. This is known to be a source of bugs as we need to keep track of such explicitly set attrs when you alter theFieldInfo instance:

  • We forgot to update it when using... as the default value in some cases. See this added test case:
    deftest_ellipsis_forward_ref_annotated()->None:
    """This previously resulted in the ellipsis being used as a default value."""
    classModel(BaseModel):
    f:'Forward'
    Forward=Annotated[int,Field(...)]
    assertModel.model_fields['f'].defaultisPydanticUndefined
  • We currently forget to updatedeprecated (first bug ofDeprecated fields bugs with forward annotations #11390).

And more generally:

  • On_generate_schema.py#L1311, wedirectly mutate the original field info instance to take the newly evaluated annotation. This is the reason we get theKeyError on generics (KeyError when inhering from a self-referencing generic model #9969).
  • We make use of the model fields (even though they might be incorrect) insideModelMetaclass.__new__, at model creation (i.e. stuff thatisn't reprocessed onmodel_rebuild()), such as setting deprecated descriptors (second bug of#11390: we set these descriptors based onFieldInfo.deprecated).

Instead, this PR does three things:

  • Add a new__pydantic_fields_complete__ class property on models (for private use). It isTrue if at least one field annotation failed to evaluate duringcollect_model_fields(). On a related note: we still set__pydantic_fields__ for backwards compatibility, but a follow up PR can now easily add a user/deprecation warning when accessingmodel_fieldsif__pydantic_fields_complete__ == False. In V3, we could remove this backward compatibility logic (TBD).
    Also note that now wedon't enterGenerateSchema if the fields collection failed, as it is guaranteed to fail anyway (if an annotation failed to evaluate during fields collection, it will also immediately fail to be evaluated inGenerateSchema).
  • Callingmodel_rebuild()will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.
  • When an incomplete Pydantic model is encountered as an annotation (insideGenerateSchema), we first callrebuild_model_fields(); a new function that iterates over the existing (but potentially not complete) field infos, and recreate instances if the annotation needs to be evaluated again. This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:
    classSub(BaseModel):f1:int    ...# Many fields...f100:'Forward'classModel(BaseModel):sub:Sub# Note that Sub is incomplete, so in the process of generating the schema for# `Model`, we don't use the cached schema attribute# Thus previously, we would generate a schema for f1, f2, ..., f99. f100 would fail,# meaning we generated 99 fields core schemas for nothing.# In this PR, we first (re)collect fields for `Sub`, and immediately raise if fields collection did not suceed.
    On important thing to acknowledge: when encounteringsub: Sub, we don't callSub.model_rebuild(). Doing so might be the right thing to do, but is unfortunately not backwards compatible. The reason is that the NS Resolver used forModel might contain annotations (e.g. provided through theparent_namespace) that should be available forSub. This means that even ifsub: Sub successfully builds,Sub isn't complete yet — each model should be rebuilt independently, throughmodel_rebuild()).

TODO:

  • Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the newrebuild_model_fields() might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example withf1..100 above and thetest_failed_rebuild benchmark).

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review,please add a comment including the phrase "please review" to assign reviewers

This is the biggest change in this PR, and the hardest toreason about and the trickiest to implement.Instead of re-evaluating annotations on a field-by-field basis,we assume all fields have their annotation evaluated (andconsequently the `FieldInfo` attributes -- some of themset depending on the annotation -- are correct). To do so,we collect model fields (if necessary, thanks to the new`__pydantic_fields_complete__` attribute) in `_model_schema()`,and then have `_common_field_schema()` called with a fully built`FieldInfo` instance (this mimics what is done for dataclasses).When `model_rebuild()` is called, we also want to recollect fieldsif it wasn't successful the first time. This introduces challenges,such as keeping the model assignments to be able to recollect fieldsand call `FieldInfo.from_annotated_attribute()`.
@github-actionsgithub-actionsbot added the relnotes-fixUsed for bugfixes. labelFeb 4, 2025
@codspeed-hqCodSpeed HQ
Copy link

codspeed-hqbot commentedFeb 4, 2025
edited
Loading

CodSpeed Performance Report

Merging#11388 willdegrade performances by 15.94%

Comparingmodel-fields-cleanup (de1adec) withmain (53f8ece)

Summary

⚡ 2 improvements
❌ 3 regressions
✅ 41 untouched benchmarks

⚠️Please fix the performance issues oracknowledge them on CodSpeed.

Benchmarks breakdown

BenchmarkBASEHEADChange
test_schema_build2.5 ms2.8 ms-11.53%
test_construct_dataclass_schema1.2 ms1.1 ms+8.6%
test_failed_rebuild3,851.1 µs211.9 µs×18
test_deeply_nested_recursive_model_schema_generation1.1 ms1.3 ms-15.94%
test_recursive_discriminated_union_with_base_model1.4 ms1.5 ms-8.22%

@cloudflare-workers-and-pagesCloudflare Workers and Pages
Copy link

cloudflare-workers-and-pagesbot commentedFeb 4, 2025
edited
Loading

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit:de1adec
Status: ✅  Deploy successful!
Preview URL:https://203f8a46.pydantic-docs.pages.dev
Branch Preview URL:https://model-fields-cleanup.pydantic-docs.pages.dev

View logs

The `has_instance_in_type` function is no longer used.We don't need to track attributes explicitly set on `FieldInfo`(this was a code smell and I'm glad it's gone now).
@ViicosViicos added the third-party-testsAdd this label on a PR to trigger 3rd party tests labelFeb 4, 2025
@ViicosViicos closed thisFeb 4, 2025
@ViicosViicos reopened thisFeb 4, 2025
…s()`They need to be performed once we know fields are complete.
@ViicosViicosforce-pushed themodel-fields-cleanup branch 4 times, most recently from74e241e to783c3beCompareFebruary 7, 2025 10:53
@sydney-runkle
Copy link
Contributor

Add a new__pydantic_fields_complete__ class property on models

Great, this seems like a good step.

Calling model_rebuild() will redo the fields collection if it wasn't successful the first time. Note that this aligns with Pydantic dataclasses.

Seems good to be consistent here. We should comb throughmodel_rebuild issues, this might close some of those.

This way, we avoid generating a schema for each field just to realize that one field still has a forward annotation:

I really like this pattern as well. Not only more performant, but also makes more sense logically.

Performance issues: I believe this is a tradeoff to have, as the couple affected benchmarks are models with one field defined, and the new rebuild_model_fields() might add a bit of overhead. However, it will be beneficial if many fields are defined (see the example with f1..100 above and the test_failed_rebuild benchmark).

Sure, willing to move forward here, these regressions aren't super concerning. As I review, I'll see if I can suggest any changes that might lead to slight reductions in the regressions.

Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the in depth PR description.

Code changes generally look good. I'd like to learn more about:
a) memory usage consequences
b) where more time is now being spent - the codspeed benchmarks aren't super clear on what's slowing things down

I understand that the slower benchmarks are for pretty specific use cases, but want to make sure we understand perf consequences before merging.

Comment on lines +241 to +245
# The `from_annotated_attribute()` call below mutates the assigned `Field()`, so make a copy:
original_assignment = (
copy(assigned_value) if not evaluated and isinstance(assigned_value, FieldInfo_) else assigned_value
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Any metrics on perf impact of this copy? Seems like a pretty special case...

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's only relevant if an annotation fails to evaluate and has aField() assigned:

a:Undefined=Field()

So not that common, + this copy could be removed if we manage to refactor how we merge field infos together.

sydney-runkle reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's definitely worth investigating refactoring how we merge field infos :).

"""
FieldInfo_ = import_cached_field_info()

rebuilt_fields: dict[str, FieldInfo] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Understandable, but downside - this is going to increate memory usage. Could you benchmark this with memray and see what the difference is like?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's only going to make temporary memory allocations, as:

  • if this is called frommodel_rebuild(), it will override the existing__pydantic_fields__, so the old ones will be gc'ed.
  • if this is called inGenerateSchema._model_schema(), as part of another model build, they are going to be gc'ed at the end of the method.

sydney-runkle reacted with thumbs up emoji
@ViicosViicos marked this pull request as ready for reviewFebruary 12, 2025 14:05
@github-actionsGitHub Actions
Copy link
Contributor

github-actionsbot commentedFeb 12, 2025
edited
Loading

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  fields.py
  main.py
  pydantic/_internal
  _fields.py
  _generate_schema.py
  _model_construction.py
Project Total 

This report was generated bypython-coverage-comment-action

Comment on lines +229 to +233
if config_wrapper.defer_build:
# TODO we can also stop there if `__pydantic_fields_complete__` is False.
# However, `set_model_fields()` is currently lenient and we don't have access to the `NameError`.
# (which is useful as we can provide the name in the error message: `set_model_mock(cls, e.name)`)
set_model_mocks(cls)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we link an issue to this comment?

Viicos reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Created#11453.

# Private attributes, used to rebuild FieldInfo instances:
self._complete = True
self._original_annotation: Any = PydanticUndefined
self._original_assignment: Any = PydanticUndefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should this be_original_default? Or is that confusing for theField() case?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's confusing forField() assignments

Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great work here - this takes us in a good direction in terms of refactoring field collection and merging, which have the potential for some modest perf benefits and minimal breaking changes.

@sydney-runkle
Copy link
Contributor

Note - failing 3rd party tests are unrelated and should be fixed by#11429

@ViicosViicos merged commit72c77bf intomainFeb 12, 2025
76 of 77 checks passed
@ViicosViicos deleted the model-fields-cleanup branchFebruary 12, 2025 16:41
Viicos added a commit that referenced this pull requestApr 15, 2025
This results in parameterized classes being different type objects,although they represent the same type.A new refactor (#11388)was introduced since then and fixes the root issue.
@ViicosViicos mentioned this pull requestApr 15, 2025
5 tasks
Viicos added a commit that referenced this pull requestJun 4, 2025
This applying roughly the same logic from#11388for dataclasses.
Viicos added a commit that referenced this pull requestJun 4, 2025
This applying roughly the same logic from#11388for dataclasses.
Viicos added a commit that referenced this pull requestJun 12, 2025
This applying roughly the same logic from#11388for dataclasses.
Viicos added a commit that referenced this pull requestJun 12, 2025
This applying roughly the same logic from#11388 for dataclasses.Also fix third-party tests using uv.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sydney-runklesydney-runklesydney-runkle approved these changes

Assignees
No one assigned
Labels
relnotes-fixUsed for bugfixes.third-party-testsAdd this label on a PR to trigger 3rd party tests
Projects
None yet
Milestone
No milestone
2 participants
@Viicos@sydney-runkle

[8]ページ先頭

©2009-2025 Movatter.jp