Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimize calls toget_type_ref#10863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Viicos merged 12 commits intomainfromtype-ref-perf
Jan 15, 2025
Merged

Optimize calls toget_type_ref#10863

Viicos merged 12 commits intomainfromtype-ref-perf
Jan 15, 2025

Conversation

Viicos
Copy link
Member

@ViicosViicos commentedNov 17, 2024
edited
Loading

Indirectlycloses#9443.

What we're trying to achieve and how

get_type_ref is pretty expensive, and we'll try to reduce the amount of calls to this function.

Currently, for every call toGenerateSchema.generate_schema, we unconditionally end up with the following call chain:_generate_schema_from_property ->get_schema_or_ref ->get_type_ref.

The idea is thus to remove this part (L750-752):

def_generate_schema_from_property(self,obj:Any,source:Any)->core_schema.CoreSchema|None:
"""Try to generate schema from either the `__get_pydantic_core_schema__` function or
`__pydantic_core_schema__` property.
Note: `__get_pydantic_core_schema__` takes priority so it can
decide whether to use a `__pydantic_core_schema__` attribute, or generate a fresh schema.
"""
# avoid calling `__get_pydantic_core_schema__` if we've already visited this object
if_typing_extra.is_self(obj):
obj=self._resolve_self_type(obj)
withself.defs.get_schema_or_ref(obj)as (_,maybe_schema):
ifmaybe_schemaisnotNone:
returnmaybe_schema

So that anyGenerateSchema method responsible for generating a core schema for a referenceable type handle it. For example,_model_schema, which already has this check:

def_model_schema(self,cls:type[BaseModel])->core_schema.CoreSchema:
"""Generate schema for a Pydantic model."""
withself.defs.get_schema_or_ref(cls)as (model_ref,maybe_schema):
ifmaybe_schemaisnotNone:
returnmaybe_schema

Note that theif maybe_schema is not None branch is never reached (same forall the other referenceable types such as dataclasses, typeddict, etc) because the branch we're trying to remove handles it already.

So surely removing the branch from_generate_schema_from_property should work.

Challenges

However, this breaks for Pydantic models:

Let's say you have the following:

classSub(BaseModel):passclassModel(BaseModel):sub1:Subsub2:Sub

When we generate a schema forsub2, we follow the call chaingenerate_schema ->_generate_schema_from_property. If wedon't check forget_schema_or_ref, we would want it to be checked in_model_schema as shown above. However, the rest of the_generate_schema_from_property implementation will check for__get_pydantic_core_schema__(), and the__pydantic_core_schema__ property. In our example,Sub is already in the definitions thanks tosub1, so it would break by returning the complete core schema ofSub1 instead of adefinition reference schema (produced byget_schema_or_ref).

So let's try to push things a bit further. Why not move the__get_pydantic_core_schema__()/__pydantic_core_schema__ logic to the_model_schema method? (same for dataclasses).

The issue is that while it would be fine for the__pydantic_core_schema__ property (only relevant for Pydantic models and dataclasses), the__get_pydantic_core_schema__() method can be set on pretty much anything.

Solution

That's why I had to remove the__get_pydantic_core_schema__ method fromBaseModel. Thecurrent implementation is really only returning the cached__pydantic_core_schema__ property or delegating the generation to theGenerateSchema class.

During model construction, we were doing something even more confusing:

handler=CallbackGetCoreSchemaHandler(
partial(gen_schema.generate_schema,from_dunder_get_core_schema=False),
gen_schema,
ref_mode='unpack',
)
try:
schema=cls.__get_pydantic_core_schema__(cls,handler)

The following call chain happens:cls.__get_pydantic_core_schema__(cls, handler) ->handler(cls) ~GenerateSchema().generate_schema(cls, from_dunder_get_core_schema=False).

I think havingfrom_dunder_get_core_schema to avoid an infinite loop is a sign that the current implementation is quite convoluted.

Instead, by:

  • removing theBaseModel.__get_pydantic_core_schema__ method, and moving the check for the__pydantic_core_schema__ attribute toGenerateSchema._model_schema
  • Only checking for a__get_pydantic_core_schema__ method (and not the property) when callingGenerateSchema.generate_schema (which isn't an issue for Pydantic models as we removed it).

We:

  • allow for a more consistent approach when it comes to schema generation of models. We can now callGenerateSchema().generate_schema(some_model_cls), and the call chain is way simpler:generate_schema -> ... ->_model_schema.
  • move the logic to potentially callget_schema_or_ref (and check for the__pydantic_core_schema__ property) to the responsible methods:_model_schema,_dataclass_schema, etc.

Breaking changes concerns

We had a couple reports from people overridingBaseModel.__get_pydantic_core_schema__:

Feels like doing so always led to unexpected behavior, and a hacky fix. That's why I'm in support of having this pattern forBaseModel removed. It might break existing code, but we are probably helping the users in avoiding unexpected issues in the future.


We could try to still support the existingBaseModel.__get_pydantic_core_schema__ method, but honestly it's been quite hard to implement this already, and it might be that it's impossible to have both this perf improvement and support for the existing behavior at the same time

Footnotes

  1. By callingSub.__get_pydantic_core_schema__(), which would return it from the cached attribute. Confusing! Adds an extra level of indirection.

sydney-runkle and zzstoatzz reacted with thumbs up emoji
@github-actionsgithub-actionsbot added the relnotes-fixUsed for bugfixes. labelNov 17, 2024
@cloudflare-workers-and-pagesCloudflare Workers and Pages
Copy link

cloudflare-workers-and-pagesbot commentedNov 17, 2024
edited
Loading

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit:000e60e
Status: ✅  Deploy successful!
Preview URL:https://286f9f34.pydantic-docs.pages.dev
Branch Preview URL:https://type-ref-perf.pydantic-docs.pages.dev

View logs

@codspeed-hqCodSpeed HQ
Copy link

codspeed-hqbot commentedNov 17, 2024
edited
Loading

CodSpeed Performance Report

Merging#10863 willimprove performances by 30.11%

Comparingtype-ref-perf (000e60e) withmain (c371bd1)

Summary

⚡ 14 improvements
✅ 31 untouched benchmarks

Benchmarks breakdown

Benchmarkmaintype-ref-perfChange
test_schema_build3.3 ms3 ms+9.69%
test_fastapi_startup_perf243 ms214.7 ms+13.21%
test_fastapi_startup_perf29 ms27.5 ms+5.29%
test_complex_model_schema_generation2.5 ms1.9 ms+30.11%
test_lots_of_models_with_lots_of_fields3.1 s2.8 s+10.95%
test_recursive_model_schema_generation1.2 ms1.1 ms+11.62%
test_simple_model_schema_generation854.8 µs804.5 µs+6.25%
test_simple_model_schema_lots_of_fields_generation38.1 ms29.7 ms+28.36%
test_tagged_union_with_callable_discriminator_schema_generation1.8 ms1.6 ms+9.27%
test_deeply_nested_recursive_model_schema_generation1.5 ms1.4 ms+6.48%
test_generic_recursive_model_schema_generation1,049.3 µs953.6 µs+10.04%
test_nested_recursive_generic_model_schema_generation2.1 ms1.8 ms+15.54%
test_nested_recursive_model_schema_generation2.1 ms1.9 ms+8.78%
test_recursive_discriminated_union_with_base_model1.9 ms1.8 ms+8.11%

@sydney-runkle
Copy link
Contributor

I don't think it is a viable option to remove support for__get_pydantic_core_schema__. That's pretty breaking. Happy to see what other folks think as well.

@@ -671,30 +671,6 @@ def model_validate_strings(
__tracebackhide__ = True
return cls.__pydantic_validator__.validate_strings(obj, strict=strict, context=context)

@classmethod
def __get_pydantic_core_schema__(cls, source: type[BaseModel], handler: GetCoreSchemaHandler, /) -> CoreSchema:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I dont think this is a good idea. This is definitely being relied on. Its not internal and also documented for customizing the schema.

sydney-runkle reacted with thumbs up emoji
Copy link
Contributor

@MarkusSintonenMarkusSintonenNov 19, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But the perf improvement could be maybe done so that the__get_pydantic_core_schema__ can not be overriden but instead coming from model_config (with similar args). Users then have option to migrate into that. Then if its defined in model config the perf improvements are not applied. (Didnt fully follow why its causing issues here but maybe possible)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To be clear, this doesn't remove support for__get_pydantic_core_schema__. This is still handled by the added_generate_schema_from_get_schema_method method.

Users are still free to implement__get_pydantic_core_schema__ on Pydantic models, however we provide no guarantees (and never did, see thebreaking changes concerns section in my original post) it will work flawlessly.

sydney-runkle reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ah interesting. Will give this another review then. Didn't realize users could still implement support. Nice!

@sydney-runkle
Copy link
Contributor

Ah, I jumped the gun here on my review. Reread your new description, excited to look again.

Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice work, I'm in favor of this change, after better understanding the scope.

Comment on lines -793 to -873
if is_function_with_inner_schema(schema):
ref = schema['schema'].pop('ref', None) # pyright: ignore[reportCallIssue, reportArgumentType]
if ref:
schema['ref'] = ref
else:
ref = get_ref(schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Seems like we lost this logic - is this needed anywhere?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is related to what I mentioned:

the first one I could find:#7102. The#7102 (comment) isn't really compelling, the user was doing things not the intended way. It also led toa scary and hacky fix that I would really like not having.

These removed lines had the effect of moving the ref from the inner schema of afunction-* schema to the thefunction-* schema itself. With the following simplified test code added in the mentioned issue above:

classNumeric(BaseModel):value:float@classmethoddef__get_pydantic_core_schema__(cls,source_type,handler):returncore_schema.no_info_before_validator_function(cls.validate,handler(source_type))@classmethoddefvalidate(cls,v):        ...classOuterModel(BaseModel):x:Numericy:Numeric

Onmain, the schema ofOuterModel would look like:

{│'type':'definitions',│'schema': {│   │'type':'model',│   │'cls':<class'__main__.OuterModel'>,│   │'schema': {│   │   │'type':'model-fields',│   │   │'fields': {│   │   │   │'x': {'type':'model-field','schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:109238506193520'},'metadata': {}},│   │   │   │'y': {'type':'model-field','schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:109238506193520'},'metadata': {}}│   │   │   },│   │   │'model_name':'OuterModel',│   │   │'computed_fields': []│   │   },│   │'config': {'title':'OuterModel'},│   │'ref':'__main__.OuterModel:109238503123056',│   │'metadata': {'<stripped>'}│   },│'definitions': [│   │   {│   │   │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│   │   │'schema': {│   │   │   │'type':'model',│   │   │   │'cls':<class'__main__.Numeric'>,│   │   │   │'schema': {│   │   │   │   │'type':'model-fields',│   │   │   │   │'fields': {'value': {'type':'model-field','schema': {'type':'float'},'metadata': {}}},│   │   │   │   │'model_name':'Numeric',│   │   │   │   │'computed_fields': []│   │   │   │   },│   │   │   │'config': {'title':'Numeric'}│   │   │   },│   │   │'ref':'__main__.Numeric:109238506193520',│   │   │'metadata': {'<stripped>'},│   │   │'type':'function-before'│   │   }│   ]}

On this PR, it looks like:

{│'type':'definitions',│'schema': {│   │'type':'model',│   │'cls':<class'__main__.OuterModel'>,│   │'schema': {│   │   │'type':'model-fields',│   │   │'fields': {│   │   │   │'x': {│   │   │   │   │'type':'model-field',│   │   │   │   │'schema': {│   │   │   │   │   │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│   │   │   │   │   │'schema': {│   │   │   │   │   │   │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│   │   │   │   │   │   │'schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:105945921898336'},│   │   │   │   │   │   │'metadata': {'<stripped>'},│   │   │   │   │   │   │'type':'function-before'│   │   │   │   │   │   },│   │   │   │   │   │'metadata': {'<stripped>'},│   │   │   │   │   │'type':'function-before'│   │   │   │   │   },│   │   │   │   │'metadata': {}│   │   │   │   },│   │   │   │'y': {│   │   │   │   │'type':'model-field',│   │   │   │   │'schema': {│   │   │   │   │   │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│   │   │   │   │   │'schema': {│   │   │   │   │   │   │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│   │   │   │   │   │   │'schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:105945921898336'},│   │   │   │   │   │   │'metadata': {'<stripped>'},│   │   │   │   │   │   │'type':'function-before'│   │   │   │   │   │   },│   │   │   │   │   │'metadata': {'<stripped>'},│   │   │   │   │   │'type':'function-before'│   │   │   │   │   },│   │   │   │   │'metadata': {}│   │   │   │   }│   │   │   },│   │   │'model_name':'OuterModel',│   │   │'computed_fields': []│   │   },│   │'config': {'title':'OuterModel'},│   │'ref':'__main__.OuterModel:105945922827312',│   │'metadata': {'<stripped>'}│   },│'definitions': [│   │   {│   │   │'type':'model',│   │   │'cls':<class'__main__.Numeric'>,│   │   │'schema': {│   │   │   │'type':'model-fields',│   │   │   │'fields': {'value': {'type':'model-field','schema': {'type':'float'},'metadata': {}}},│   │   │   │'model_name':'Numeric',│   │   │   │'computed_fields': []│   │   │   },│   │   │'config': {'title':'Numeric'},│   │   │'ref':'__main__.Numeric:105945921898336'│   │   }│   ]}

Essentially, the difference in these two schemas is that we don't "move" the ref from the inner schema to thefunction-before schemas.

The changes in this PR + removing this reference moving coincidentally make it work still.

However, doing so was a dangerous game: onL793,schema directly comes from another model. Thepop calls removes the reference to the schema, and mutating schemas from other models has been a known issue.

You may be wondering: why this doesn't break things in the example I gave? Surely thepop call should have mutated the core schema ofNumeric. Turns out it doesn't, becauseNumeric.__get_pydantic_core_schema__ does not cache the schema, so calling it will generate a new one every time (and this is what happens during the schema gen ofOuterModel). But on a similar issue,I mentioned that explicitly caching the core schema in the__get_pydantic_core_schema__ method would resolve the user issue (as the use case was slightly different)!

So to conclude, overridingBaseModel.__get_pydantic_core_schema__ is full of unexpected behaviors, but that's fine as officially supporting them would be a huge pain.

sydney-runkle reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it make sense to break this change out into a different PR?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

iirc (but I'm not sure), I was able to remove it only thanks to the other changes. This won't clutter the git diff though, because it's just a removal. Probably by having a proper commit description when merging, I can add a note about this?

@sydney-runkle
Copy link
Contributor

I've removed the example I addedhere to address#7833. It's no longer relevant, as there's no__get_pydantic_core_schema__ onsuper, so things go as planned without this workaround.

cc@chbndrhnns, you might find this interesting. This simplification now works:

fromtypingimportTypefrompydantic_coreimportCoreSchemafromtyping_extensionsimportAnnotatedfrompydanticimportBaseModel,GetCoreSchemaHandlerclassMetadata(BaseModel):foo:str='metadata!'bar:int=100classModel(BaseModel):state:Annotated[int,Metadata()]m=Model.model_validate({'state':2})print(repr(m))#> Model(state=2)print(m.model_fields)"""{    'state': FieldInfo(        annotation=int,        required=True,        metadata=[Metadata(foo='metadata!', bar=100)],    )}"""
Viicos reacted with thumbs up emojichbndrhnns reacted with heart emoji

@github-actionsGitHub Actions
Copy link
Contributor

github-actionsbot commentedNov 20, 2024
edited
Loading

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  main.py692
  pydantic/_internal
  _core_utils.py
  _dataclasses.py
  _generate_schema.py856
  _model_construction.py
Project Total 

This report was generated bypython-coverage-comment-action

@sydney-runklesydney-runkle marked this pull request as ready for reviewNovember 20, 2024 21:13
Comment on lines -989 to -973
As seen above, annotating a field with a `BaseModel` type can be used to modify or override the generated json schema.
However, if you want to take advantage of storing metadata via `Annotated`, but you don't want to override the generated JSON
schema, you can use the following approach with a no-op version of `__get_pydantic_core_schema__` implemented on the
metadata class:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This does seem like a terrible example, but is ther ean alternative way to do what it is trying to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yep, see#10863 (comment)

You just don't have to use the nasty no-op, so it's even better!

@sydney-runkle
Copy link
Contributor

@Viicos, I'd love to chat about a few more nitty gritty details here tomorrow, but other than that, I think we can move forward with this. Great find!

@chbndrhnns
Copy link
Contributor

cc@chbndrhnns, you might find this interesting. This simplification now works:

Thanks for the ping,@sydney-runkle. This sounds exciting! I am waiting for the release now ;)

sydney-runkle reacted with heart emoji

@sydney-runkle
Copy link
Contributor

Sure thing. We just released v2.10, and this won't fit in until v2.11, which will be a bit 😅

Feel free to install frommain once we've merged this, if you want the change soon!

@Viicos
Copy link
MemberAuthor

I went and tested this branch on a couple libraries (using the followingGH search), and couldn't find any issues. This should imo be good to merge once approved.

@ViicosViicos added the third-party-testsAdd this label on a PR to trigger 3rd party tests labelDec 20, 2024
@sydney-runkle
Copy link
Contributor

Nice, tests passing! Let's add a few other integrations before we move forward with a merge here?

@sydney-runkle
Copy link
Contributor

@Viicos, I can review this again early next week once we have some more third party tests added?

@Viicos
Copy link
MemberAuthor

@Viicos, I can review this again early next week once we have some more third party tests added?

Two integration tests were added since#10863 (comment), but overall it is pretty hard to see how this affects existing code, as most 3rd-party libraries are not defining a lot of custom types/__get_pydantic_core_schema__ implementations. Only code possibly affected seems ok (#10863 (comment))

sydney-runkle reacted with thumbs up emoji

@sydney-runkle
Copy link
Contributor

Fair enough, I'll review today then 👍

@sydney-runklesydney-runkle added relnotes-changeUsed for changes to existing functionality which don't have a better categorization. relnotes-performanceUsed for performance improvements. deferredDeferred until future release or until something else gets done and removed relnotes-fixUsed for bugfixes. labelsJan 8, 2025
@sydney-runklesydney-runkle removed the deferredDeferred until future release or until something else gets done labelJan 14, 2025
Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking great, a few follow up questions 👍

Comment on lines -793 to -873
if is_function_with_inner_schema(schema):
ref = schema['schema'].pop('ref', None) # pyright: ignore[reportCallIssue, reportArgumentType]
if ref:
schema['ref'] = ref
else:
ref = get_ref(schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it make sense to break this change out into a different PR?

Copy link
Contributor

@sydney-runklesydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's wait on merging 3.8, but otherwise LGTM :)

@ViicosViicos merged commitdac74dc intomainJan 15, 2025
79 checks passed
@ViicosViicos deleted the type-ref-perf branchJanuary 15, 2025 16:20
@chbndrhnns
Copy link
Contributor

This simplification now works:

@sydney-runkle
The no-op override seems still required with 2.11.5.
What were you referring to when saying it's no longer required?

FAILED                                             [100%]tests/test_.py:16 (test_)def test_():>       m = Model.model_validate({'state': 2})E       pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelE       stateE         Input should be a valid dictionary or instance of Metadata [type=model_type, input_value=2, input_type=int]E           For further information visit https://errors.pydantic.dev/2.11/v/model_typetests/test_.py:18: ValidationError

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@adriangbadriangbadriangb left review comments

@MarkusSintonenMarkusSintonenMarkusSintonen left review comments

@sydney-runklesydney-runklesydney-runkle approved these changes

Assignees
No one assigned
Labels
relnotes-changeUsed for changes to existing functionality which don't have a better categorization.relnotes-performanceUsed for performance improvements.third-party-testsAdd this label on a PR to trigger 3rd party tests
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Pydantic dataclasses cannot be used as Annotated Validators
5 participants
@Viicos@sydney-runkle@chbndrhnns@adriangb@MarkusSintonen

[8]ページ先頭

©2009-2025 Movatter.jp