Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork2.2k
Optimize calls toget_type_ref
#10863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
cloudflare-workers-and-pagesbot commentedNov 17, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Deploying pydantic-docs with |
Latest commit: | 000e60e |
Status: | ✅ Deploy successful! |
Preview URL: | https://286f9f34.pydantic-docs.pages.dev |
Branch Preview URL: | https://type-ref-perf.pydantic-docs.pages.dev |
codspeed-hqbot commentedNov 17, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
CodSpeed Performance ReportMerging#10863 willimprove performances by 30.11%Comparing Summary
Benchmarks breakdown
|
I don't think it is a viable option to remove support for |
@@ -671,30 +671,6 @@ def model_validate_strings( | |||
__tracebackhide__ = True | |||
return cls.__pydantic_validator__.validate_strings(obj, strict=strict, context=context) | |||
@classmethod | |||
def __get_pydantic_core_schema__(cls, source: type[BaseModel], handler: GetCoreSchemaHandler, /) -> CoreSchema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I dont think this is a good idea. This is definitely being relied on. Its not internal and also documented for customizing the schema.
MarkusSintonenNov 19, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
But the perf improvement could be maybe done so that the__get_pydantic_core_schema__
can not be overriden but instead coming from model_config (with similar args). Users then have option to migrate into that. Then if its defined in model config the perf improvements are not applied. (Didnt fully follow why its causing issues here but maybe possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
To be clear, this doesn't remove support for__get_pydantic_core_schema__
. This is still handled by the added_generate_schema_from_get_schema_method
method.
Users are still free to implement__get_pydantic_core_schema__
on Pydantic models, however we provide no guarantees (and never did, see thebreaking changes concerns section in my original post) it will work flawlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ah interesting. Will give this another review then. Didn't realize users could still implement support. Nice!
Uh oh!
There was an error while loading.Please reload this page.
Ah, I jumped the gun here on my review. Reread your new description, excited to look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nice work, I'm in favor of this change, after better understanding the scope.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
if is_function_with_inner_schema(schema): | ||
ref = schema['schema'].pop('ref', None) # pyright: ignore[reportCallIssue, reportArgumentType] | ||
if ref: | ||
schema['ref'] = ref | ||
else: | ||
ref = get_ref(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seems like we lost this logic - is this needed anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is related to what I mentioned:
the first one I could find:#7102. The#7102 (comment) isn't really compelling, the user was doing things not the intended way. It also led toa scary and hacky fix that I would really like not having.
These removed lines had the effect of moving the ref from the inner schema of afunction-*
schema to the thefunction-*
schema itself. With the following simplified test code added in the mentioned issue above:
classNumeric(BaseModel):value:float@classmethoddef__get_pydantic_core_schema__(cls,source_type,handler):returncore_schema.no_info_before_validator_function(cls.validate,handler(source_type))@classmethoddefvalidate(cls,v): ...classOuterModel(BaseModel):x:Numericy:Numeric
Onmain
, the schema ofOuterModel
would look like:
{│'type':'definitions',│'schema': {│ │'type':'model',│ │'cls':<class'__main__.OuterModel'>,│ │'schema': {│ │ │'type':'model-fields',│ │ │'fields': {│ │ │ │'x': {'type':'model-field','schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:109238506193520'},'metadata': {}},│ │ │ │'y': {'type':'model-field','schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:109238506193520'},'metadata': {}}│ │ │ },│ │ │'model_name':'OuterModel',│ │ │'computed_fields': []│ │ },│ │'config': {'title':'OuterModel'},│ │'ref':'__main__.OuterModel:109238503123056',│ │'metadata': {'<stripped>'}│ },│'definitions': [│ │ {│ │ │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│ │ │'schema': {│ │ │ │'type':'model',│ │ │ │'cls':<class'__main__.Numeric'>,│ │ │ │'schema': {│ │ │ │ │'type':'model-fields',│ │ │ │ │'fields': {'value': {'type':'model-field','schema': {'type':'float'},'metadata': {}}},│ │ │ │ │'model_name':'Numeric',│ │ │ │ │'computed_fields': []│ │ │ │ },│ │ │ │'config': {'title':'Numeric'}│ │ │ },│ │ │'ref':'__main__.Numeric:109238506193520',│ │ │'metadata': {'<stripped>'},│ │ │'type':'function-before'│ │ }│ ]}
On this PR, it looks like:
{│'type':'definitions',│'schema': {│ │'type':'model',│ │'cls':<class'__main__.OuterModel'>,│ │'schema': {│ │ │'type':'model-fields',│ │ │'fields': {│ │ │ │'x': {│ │ │ │ │'type':'model-field',│ │ │ │ │'schema': {│ │ │ │ │ │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│ │ │ │ │ │'schema': {│ │ │ │ │ │ │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│ │ │ │ │ │ │'schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:105945921898336'},│ │ │ │ │ │ │'metadata': {'<stripped>'},│ │ │ │ │ │ │'type':'function-before'│ │ │ │ │ │ },│ │ │ │ │ │'metadata': {'<stripped>'},│ │ │ │ │ │'type':'function-before'│ │ │ │ │ },│ │ │ │ │'metadata': {}│ │ │ │ },│ │ │ │'y': {│ │ │ │ │'type':'model-field',│ │ │ │ │'schema': {│ │ │ │ │ │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│ │ │ │ │ │'schema': {│ │ │ │ │ │ │'function': {'type':'no-info','function':<boundmethodNumeric.validateof<class'__main__.Numeric'>>},│ │ │ │ │ │ │'schema': {'type':'definition-ref','schema_ref':'__main__.Numeric:105945921898336'},│ │ │ │ │ │ │'metadata': {'<stripped>'},│ │ │ │ │ │ │'type':'function-before'│ │ │ │ │ │ },│ │ │ │ │ │'metadata': {'<stripped>'},│ │ │ │ │ │'type':'function-before'│ │ │ │ │ },│ │ │ │ │'metadata': {}│ │ │ │ }│ │ │ },│ │ │'model_name':'OuterModel',│ │ │'computed_fields': []│ │ },│ │'config': {'title':'OuterModel'},│ │'ref':'__main__.OuterModel:105945922827312',│ │'metadata': {'<stripped>'}│ },│'definitions': [│ │ {│ │ │'type':'model',│ │ │'cls':<class'__main__.Numeric'>,│ │ │'schema': {│ │ │ │'type':'model-fields',│ │ │ │'fields': {'value': {'type':'model-field','schema': {'type':'float'},'metadata': {}}},│ │ │ │'model_name':'Numeric',│ │ │ │'computed_fields': []│ │ │ },│ │ │'config': {'title':'Numeric'},│ │ │'ref':'__main__.Numeric:105945921898336'│ │ }│ ]}
Essentially, the difference in these two schemas is that we don't "move" the ref from the inner schema to thefunction-before
schemas.
The changes in this PR + removing this reference moving coincidentally make it work still.
However, doing so was a dangerous game: onL793,schema
directly comes from another model. Thepop
calls removes the reference to the schema, and mutating schemas from other models has been a known issue.
You may be wondering: why this doesn't break things in the example I gave? Surely thepop
call should have mutated the core schema ofNumeric
. Turns out it doesn't, becauseNumeric.__get_pydantic_core_schema__
does not cache the schema, so calling it will generate a new one every time (and this is what happens during the schema gen ofOuterModel
). But on a similar issue,I mentioned that explicitly caching the core schema in the__get_pydantic_core_schema__
method would resolve the user issue (as the use case was slightly different)!
So to conclude, overridingBaseModel.__get_pydantic_core_schema__
is full of unexpected behaviors, but that's fine as officially supporting them would be a huge pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it make sense to break this change out into a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
iirc (but I'm not sure), I was able to remove it only thanks to the other changes. This won't clutter the git diff though, because it's just a removal. Probably by having a proper commit description when merging, I can add a note about this?
I've removed the example I addedhere to address#7833. It's no longer relevant, as there's no cc@chbndrhnns, you might find this interesting. This simplification now works: fromtypingimportTypefrompydantic_coreimportCoreSchemafromtyping_extensionsimportAnnotatedfrompydanticimportBaseModel,GetCoreSchemaHandlerclassMetadata(BaseModel):foo:str='metadata!'bar:int=100classModel(BaseModel):state:Annotated[int,Metadata()]m=Model.model_validate({'state':2})print(repr(m))#> Model(state=2)print(m.model_fields)"""{ 'state': FieldInfo( annotation=int, required=True, metadata=[Metadata(foo='metadata!', bar=100)], )}""" |
github-actionsbot commentedNov 20, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Coverage reportClick to see where and how coverage changed
This report was generated bypython-coverage-comment-action |
As seen above, annotating a field with a `BaseModel` type can be used to modify or override the generated json schema. | ||
However, if you want to take advantage of storing metadata via `Annotated`, but you don't want to override the generated JSON | ||
schema, you can use the following approach with a no-op version of `__get_pydantic_core_schema__` implemented on the | ||
metadata class: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This does seem like a terrible example, but is ther ean alternative way to do what it is trying to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yep, see#10863 (comment)
You just don't have to use the nasty no-op, so it's even better!
@Viicos, I'd love to chat about a few more nitty gritty details here tomorrow, but other than that, I think we can move forward with this. Great find! |
Thanks for the ping,@sydney-runkle. This sounds exciting! I am waiting for the release now ;) |
Sure thing. We just released v2.10, and this won't fit in until v2.11, which will be a bit 😅 Feel free to install from |
I went and tested this branch on a couple libraries (using the followingGH search), and couldn't find any issues. This should imo be good to merge once approved. |
Nice, tests passing! Let's add a few other integrations before we move forward with a merge here? |
@Viicos, I can review this again early next week once we have some more third party tests added? |
Two integration tests were added since#10863 (comment), but overall it is pretty hard to see how this affects existing code, as most 3rd-party libraries are not defining a lot of custom types/ |
Fair enough, I'll review today then 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looking great, a few follow up questions 👍
Uh oh!
There was an error while loading.Please reload this page.
if is_function_with_inner_schema(schema): | ||
ref = schema['schema'].pop('ref', None) # pyright: ignore[reportCallIssue, reportArgumentType] | ||
if ref: | ||
schema['ref'] = ref | ||
else: | ||
ref = get_ref(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it make sense to break this change out into a different PR?
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Let's wait on merging 3.8, but otherwise LGTM :)
dac74dc
intomainUh oh!
There was an error while loading.Please reload this page.
@sydney-runkle
|
Uh oh!
There was an error while loading.Please reload this page.
Indirectlycloses#9443.
What we're trying to achieve and how
get_type_ref
is pretty expensive, and we'll try to reduce the amount of calls to this function.Currently, for every call to
GenerateSchema.generate_schema
, we unconditionally end up with the following call chain:_generate_schema_from_property
->get_schema_or_ref
->get_type_ref
.The idea is thus to remove this part (L750-752):
pydantic/pydantic/_internal/_generate_schema.py
Lines 740 to 752 in30ee4f4
So that any
GenerateSchema
method responsible for generating a core schema for a referenceable type handle it. For example,_model_schema
, which already has this check:pydantic/pydantic/_internal/_generate_schema.py
Lines 624 to 628 in30ee4f4
Note that the
if maybe_schema is not None
branch is never reached (same forall the other referenceable types such as dataclasses, typeddict, etc) because the branch we're trying to remove handles it already.So surely removing the branch from
_generate_schema_from_property
should work.Challenges
However, this breaks for Pydantic models:
Let's say you have the following:
When we generate a schema for
sub2
, we follow the call chaingenerate_schema
->_generate_schema_from_property
. If wedon't check forget_schema_or_ref
, we would want it to be checked in_model_schema
as shown above. However, the rest of the_generate_schema_from_property
implementation will check for__get_pydantic_core_schema__()
, and the__pydantic_core_schema__
property. In our example,Sub
is already in the definitions thanks tosub1
, so it would break by returning the complete core schema ofSub
1 instead of adefinition reference schema (produced byget_schema_or_ref
).So let's try to push things a bit further. Why not move the
__get_pydantic_core_schema__()
/__pydantic_core_schema__
logic to the_model_schema
method? (same for dataclasses).The issue is that while it would be fine for the
__pydantic_core_schema__
property (only relevant for Pydantic models and dataclasses), the__get_pydantic_core_schema__()
method can be set on pretty much anything.Solution
That's why I had to remove the
__get_pydantic_core_schema__
method fromBaseModel
. Thecurrent implementation is really only returning the cached__pydantic_core_schema__
property or delegating the generation to theGenerateSchema
class.During model construction, we were doing something even more confusing:
pydantic/pydantic/_internal/_model_construction.py
Lines 648 to 655 in30ee4f4
The following call chain happens:
cls.__get_pydantic_core_schema__(cls, handler)
->handler(cls)
~GenerateSchema().generate_schema(cls, from_dunder_get_core_schema=False)
.I think having
from_dunder_get_core_schema
to avoid an infinite loop is a sign that the current implementation is quite convoluted.Instead, by:
BaseModel.__get_pydantic_core_schema__
method, and moving the check for the__pydantic_core_schema__
attribute toGenerateSchema._model_schema
__get_pydantic_core_schema__
method (and not the property) when callingGenerateSchema.generate_schema
(which isn't an issue for Pydantic models as we removed it).We:
GenerateSchema().generate_schema(some_model_cls)
, and the call chain is way simpler:generate_schema
-> ... ->_model_schema
.get_schema_or_ref
(and check for the__pydantic_core_schema__
property) to the responsible methods:_model_schema
,_dataclass_schema
, etc.Breaking changes concerns
We had a couple reports from people overriding
BaseModel.__get_pydantic_core_schema__
:__get_pydantic_core_schema__
ignored in self-referencing models #10582, a user also facing unexpected issues (maybe the same ones).Feels like doing so always led to unexpected behavior, and a hacky fix. That's why I'm in support of having this pattern for
BaseModel
removed. It might break existing code, but we are probably helping the users in avoiding unexpected issues in the future.We could try to still support the existing
BaseModel.__get_pydantic_core_schema__
method, but honestly it's been quite hard to implement this already, and it might be that it's impossible to have both this perf improvement and support for the existing behavior at the same timeFootnotes
By calling
Sub.__get_pydantic_core_schema__()
, which would return it from the cached attribute. Confusing! Adds an extra level of indirection.↩