Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

allow sorting keys on to_json and to_python by passing in sort_keys#1637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
aezomz wants to merge2 commits intopydantic:main
base:main
Choose a base branch
Loading
fromaezomz:allow_model_dump_sort_keys

Conversation

aezomz
Copy link

@aezomzaezomz commentedFeb 15, 2025
edited
Loading

Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense?
Since I have been trying to do model_dump_json with sort keys too.

This feature should simulate the same as how we usejson.dumps(data, sort_keys=True)

Will sort from:        {            'field_123': b'test_123',            'field_b': 12,            'field_a': b'test',            'field_c': {'mango': 2, 'banana': 3, 'apple': 1},            'field_n': [                {'mango': 3, 'banana': 2, 'apple': 1},                [{'mango': 3, 'banana': 2, 'apple': 1}, {'d': 3, 'b': 2, 'a': 1}],                3,            ],            'field_d': [                {'d': 3, 'b': 2, 'a': {'nested3': 3, 'nested1': 1, 'nested2': 2}},                [[{'mango': 3, 'banana': 2, 'apple': 1}], {'d': 3, 'b': 2, 'a': 1}],                3,            ],            'field_none': None,        }To:    assert s.to_python(m, exclude_none=True, sort_keys=True) == snapshot(        {            'field_123': b'test_123',            'field_a': b'test',            'field_b': 12,            'field_c': {'apple': 1, 'banana': 3, 'mango': 2},            'field_n': [                {'apple': 1, 'banana': 2, 'mango': 3},                [{'apple': 1, 'banana': 2, 'mango': 3}, {'a': 1, 'b': 2, 'd': 3}],                3,            ],            'field_d': [                {'a': {'nested1': 1, 'nested2': 2, 'nested3': 3}, 'b': 2, 'd': 3},                [[{'apple': 1, 'banana': 2, 'mango': 3}], {'a': 1, 'b': 2, 'd': 3}],                3,            ],        }    )

Take note that

  1. field_d is extra and still manage to sort
  2. sorting recursively for both defined schema and extras
  3. sorting including dictionary in array or nested array array

Please let me know if I miss out any other features thatsort_keys=True is suppose to do!

Thanks!

Change Summary

allow sorting keys on to_json and to_python by passing in sort_keys

Related issue number

shouldfixpydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with thispydantic-core (except for expected changes)
  • My PR is ready to review,please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer:@davidhewitt

balderdash, choucavalier, edgarrmondragon, and darabos reacted with heart emojiedgarrmondragon and choucavalier reacted with rocket emoji
@codecovCodecov
Copy link

codecovbot commentedFeb 15, 2025
edited
Loading

@codspeed-hqCodSpeed HQ
Copy link

codspeed-hqbot commentedFeb 15, 2025
edited
Loading

CodSpeed Performance Report

Merging#1637 willnot alter performance

Comparingaezomz:allow_model_dump_sort_keys (ad37c91) withmain (d9dacb0)

Summary

✅ 157 untouched benchmarks

@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch from07a31f5 to7222c8dCompareMarch 4, 2025 10:22
@aezomz
Copy link
Author

please review

pydantic-hooky[bot] reacted with thumbs up emoji

@aezomzaezomz marked this pull request as ready for reviewMarch 4, 2025 13:19
Copy link
Member

@adriangbadriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Main issue is perf regression

@aezomz
Copy link
Author

I separated the test out, I also refactor the functions so we can reuse whensort_keys=true.
To keep the original perf benchmark, I have done a simple bool check on sort_keys before using expensive function like sorting.

Let me know what else I need to improve. Thanks

@zzstoatzzzzstoatzz mentioned this pull requestMar 10, 2025
@aezomz
Copy link
Author

please review, not sure how I can take it from here

pydantic-hooky[bot] reacted with thumbs up emoji

Copy link
Member

@adriangbadriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it aLiteral['recursive', 'top-level', 'unsorted'] or something like that.

aezomz reacted with eyes emoji
@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch from0cf4b6d to95f9329CompareMarch 23, 2025 14:44
@aezomz
Copy link
Author

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it aLiteral['recursive', 'top-level', 'unsorted'] or something like that.

Added different sort mode as above, updated the PR description.

@aezomz
Copy link
Author

please review 👍

pydantic-hooky[bot] reacted with thumbs up emoji

@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch from95f9329 toc83a212CompareMarch 28, 2025 17:28
@aezomz
Copy link
Author

please review 👍

pydantic-hooky[bot] reacted with thumbs up emoji

@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch fromc83a212 to0075a4bCompareMarch 31, 2025 17:30
@aezomz
Copy link
Author

please review 👍

There is this test that is failing, unclear if its related to my change.
build-wasm-emscripten

Let me know any other places i can optimize. Thanks!

pydantic-hooky[bot] reacted with thumbs up emoji

@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch fromdce2689 to5ce696cCompareApril 4, 2025 15:47
@aezomz
Copy link
Author

please review 👍

pydantic-hooky[bot] reacted with thumbs up emoji

@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch fromdfe7380 to1b8d824CompareApril 14, 2025 17:48
@aezomz
Copy link
Author

please review 👍

pydantic-hooky[bot] reacted with thumbs up emoji

1 similar comment
@aezomz
Copy link
Author

please review 👍

pydantic-hooky[bot] and choucavalier reacted with thumbs up emoji

@aezomzaezomz requested a review fromadriangbApril 23, 2025 15:15
@DouweMDouweM self-assigned thisApr 23, 2025
@DouweMDouweM self-requested a reviewApril 23, 2025 19:24
@aezomz
Copy link
Author

@DouweM please review :)

pydantic-hooky[bot] reacted with thumbs up emoji

@DouweMDouweMforce-pushed theallow_model_dump_sort_keys branch fromdc0ec59 to11501f6CompareApril 25, 2025 21:58
Copy link
Contributor

@DouweMDouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for your work on this@aezomz! I'm a Rust newbie as well, but I have a feeling we can significantly simplify this code by dropping the recursive dict sorting and moving this to the serializer for the dict type specifically. Can you give that a try please?

I also think it's worth seeing if we can reduce the duplication between theif sort_keys/else branches, but I know Rust typing may make that tricky...

Let me know if you get stuck, we can bring in some of our Rust experts!

Copy link
Contributor

@DouweMDouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.

I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.

@davidhewitt Would you mind having a Rusty eyed look for us? :)

if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? {
// Get potentially sorted value
if extra.sort_keys {
let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why do we still need tosort_dict_recursive here if we're already doing that in the dict serialization itself? (Same question for the 2 calls below)

Copy link
Author

@aezomzaezomzMay 16, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I tried ur recommendation and it doesn't work.

It will only work when converting to Python type (to_python). Since we wrote the serializing logic.
As forto_json we use the serde package if am not wrong.
They are separate into defined field processing and non defined fields processing. Cause we need to sort before using the serde package.

This is based on my high level understanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am unconvinced we need the recursive sort here either.serde will serialize the content in the order we call.serialize_entry, I think we just need to make sure that everywhere we do that we're sorting first.

let value =
value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?;
let value = if extra.sort_keys {
let sorted_value = sort_dict_recursive(py, &value)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could we not do a recursive sort here, but sortnew_dict specifically after we've built it here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it might be less efficient for sorting new_dict at the end? It would require an extra iteration over all items. The current approach sorts values recursively as they're being processed, which should be more optimal since we are already iterating and setting key values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

From a quick scan, I think the implementation here would have exponential blowup, because for a dict-of-dicts, when serializing the top-level dict we'll callsort_dict_recursive, and then again we'll call it for each dict when we start serializing them below.

);
map.serialize_entry(&key, &value_serialize)?;
if extra.sort_keys {
let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same as above -- I'd like to get rid of recursive sorting and do this after we build themap here. But I'm not 100% sure the data types here allow that...

davidhewitt reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We haveserialize_pairs_python andserialize_pairs_json functions ininfer.rs which are probably good starting points for code re-use which might lead to a solution.

Fully agree the recursive sort is questionable.

aezomz reacted with thumbs up emoji
Copy link
Contributor

@DouweMDouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@aezomz Thanks, this looks a lot better already, but I have a feeling we can get rid of the recursive sorting entirely, if we do it at each level (model + dict) explicitly.

I'm running into the limits of my Rust skills though, and I'm not sure the data types involved actually allow us to do this.

@davidhewitt Would you mind having a Rusty eyed look for us? :)

..*extra
};

let filter = self.filter.key_filter(key, include, exclude).map_err(py_err_se_err)?;
Copy link
Author

@aezomzaezomzMay 16, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This check for include and exclude defined in the parameters. Those fields are definied in pydantic schema.

So this function handles the serialization or extra handling of those fields.

@@ -444,8 +546,19 @@ impl TypeSerializer for GeneralFieldsSerializer {
let filter = self.filter.key_filter(&key, include, exclude).map_err(py_err_se_err)?;
if let Some((next_include, next_exclude)) = filter {
let output_key = infer_json_key(&key, extra).map_err(py_err_se_err)?;
let s = SerializeInfer::new(&value, next_include.as_ref(), next_exclude.as_ref(), extra);
map.serialize_entry(&output_key, &s)?;
if extra.sort_keys {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While this block is cater for extra fields (not defined in pydantic schema)
So we need to sort recursively for that as well.
Both of them doesn't seem to use dict.rs serializer . As the objective isto_json and using serde package directly...

https://github.com/pydantic/pydantic-core/pull/1637/files#diff-d32436e9ac9b3e5dfaf920749269f6cff3dae3b3e030561f9b9bf50447067450R540

@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch from66c7f53 to91e98f9CompareMay 16, 2025 13:44
@aezomz
Copy link
Author

@DouweM can you help to review again?
I didn't make much changes except for rebase and commit.
I have shared my comments above on the concerns.
Thanks!

Copy link
Contributor

@davidhewittdavidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry for the very slow review by me. Thank you for working on this.

I think this is a good starting point however there's a fair bit of work still needed to make this implementation both efficient and correct.

sorted_dict.set_item(k, sorted_v)?;
}
Ok(sorted_dict.into_any())
} else if let Ok(list) = value.downcast::<PyList>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why a branch for lists here?

let value =
value_serializer.to_python(&value, next_include.as_ref(), next_exclude.as_ref(), extra)?;
let value = if extra.sort_keys {
let sorted_value = sort_dict_recursive(py, &value)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

From a quick scan, I think the implementation here would have exponential blowup, because for a dict-of-dicts, when serializing the top-level dict we'll callsort_dict_recursive, and then again we'll call it for each dict when we start serializing them below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There's now a lot of code duplication in this file. It feels like what we really need is afor_all_fields function which takes a closure and calls it for each field to serialize in turn.

That function can take responsibility for, if sorting requested, collecting all fields into a vec and sorting them first. Otherwise can just do the serializing in a streaming fashion.

aezomz reacted with thumbs up emoji
if !exclude_default(value, &field_extra, serializer).map_err(py_err_se_err)? {
// Get potentially sorted value
if extra.sort_keys {
let sorted_dict = sort_dict_recursive(value.py(), value).map_err(py_err_se_err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am unconvinced we need the recursive sort here either.serde will serialize the content in the order we call.serialize_entry, I think we just need to make sure that everywhere we do that we're sorting first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are places in this file where we serialize dicts, those presumably need sorting added (and tests). Best way to test those code paths is by using theserialize_as_any=True runtime flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oic, i just realized that the fields.rs json serialization uses infer.rs ... thats a great hint. Managed to implement it. Will look at adding the tests... Thanks David.

);
map.serialize_entry(&key, &value_serialize)?;
if extra.sort_keys {
let sorted_dict = sort_dict_recursive(value.py(), &value).map_err(py_err_se_err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We haveserialize_pairs_python andserialize_pairs_json functions ininfer.rs which are probably good starting points for code re-use which might lead to a solution.

Fully agree the recursive sort is questionable.

aezomz reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We need to add tests for computed fields; at the moment the implementation neither supports them nor tests them.

aezomz reacted with thumbs up emoji
@aezomzaezomzforce-pushed theallow_model_dump_sort_keys branch from2efb8ae toad37c91CompareJune 25, 2025 18:07
@aezomzaezomz requested a review fromdavidhewittJune 25, 2025 18:19
@aezomz
Copy link
Author

@davidhewitt please give it another review when u have the chance. thanks~

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@adriangbadriangbAwaiting requested review from adriangb

@DouweMDouweMAwaiting requested review from DouweM

@samuelcolvinsamuelcolvinAwaiting requested review from samuelcolvin

@davidhewittdavidhewittAwaiting requested review from davidhewitt

Requested changes must be addressed to merge this pull request.

Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Add sorting keys to model_dump_json
5 participants
@aezomz@DouweM@adriangb@davidhewitt@samuelcolvin

[8]ページ先頭

©2009-2025 Movatter.jp