NaN and Infinity values, which must be serialized as strings ("NaN", "Infinity", "-Infinity").
Complex numbers (np.complex64, np.complex128), which must be stored as two-element arrays [real, imag] according to the above Nan/Inf rules.

Changes

Updated _sanitize_fill_value() to enforce correct JSON serialization.
Fixed test_v2meta_fill_value_serialization() to compare expected and actual JSON using a normalized representation.
Introduced property-based testing with Hypothesis to generate valid input cases and verify compliance.
Enforce compliance (to some degree) by settingallow_nan toFalse onjson.dumps

Resolves:#2741

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented indocs/user-guide/*.rst
Changes documented as a new file inchanges/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

github-actionsbot added the needs release notesAutomatically applied to PRs which haven't added release notes label

Feb 5, 2025

Copy link

ContributorAuthor

moradology commentedFeb 5, 2025

The bad news is, this issue is going to be slightly more than I've said here. The good news is that the property-based tests caught some edge cases.

Copy link

Contributor

dcherian commentedFeb 5, 2025

Nice, I've been meaning to add this to Zarr:

@st.compositedefv3_array_metadata(draw:st.DrawFn)->bytes:fromzarr.codecs.bytesimportBytesCodecfromzarr.core.chunk_gridsimportRegularChunkGridfromzarr.core.chunk_key_encodingsimportDefaultChunkKeyEncodingfromzarr.core.metadata.v3importArrayV3Metadata# separator = draw(st.sampled_from(['/', '\\']))shape=draw(array_shapes)ndim=len(shape)chunk_shape=draw(npst.array_shapes(min_dims=ndim,max_dims=ndim))dtype=draw(zrst.v3_dtypes())fill_value=draw(npst.from_dtype(dtype))dimension_names=draw(st.none()|st.lists(st.none()|simple_text,min_size=ndim,max_size=ndim)    )metadata=ArrayV3Metadata(shape=shape,data_type=dtype,chunk_grid=RegularChunkGrid(chunk_shape=chunk_shape),fill_value=fill_value,attributes=draw(simple_attrs),dimension_names=dimension_names,chunk_key_encoding=DefaultChunkKeyEncoding(separator="/"),# FIXMEcodecs=[BytesCodec()],storage_transformers=(),    )returnmetadata.to_buffer_dict(prototype=default_buffer_prototype())["zarr.json"]

What do you think of aarray_metadata_json(zarr_formats...) strategy that just returns the JSON and we can test whether that satisfies the spec for V2 and V3?

Copy link

ContributorAuthor

moradology commentedFeb 5, 2025•
edited
Loading

I love the idea. I was thinking the other day that the obvious path out of the bugs that are currently popping up would be property-based testing, so I was pretty pleased to see that there's already some work in that direction

Out of curiosity, do we have something like json schema that we could apply against the outputs to at least verify structure? We'd still need to define all the rules that exist in terms of value/type dependencies etc. but that's an easy win if it exists somewhere

Copy link

Contributor

dcherian commentedFeb 5, 2025

Out of curiosity, do we have something like json schema that we could apply against the outputs to at least verify structure?

Don't know. ping@jhamman @d-v-b

Copy link

Contributor

d-v-b commentedFeb 6, 2025

I'm not aware of a JSON schema definition for the array metadata. If one existed, it would necessarily only support partial validation, because JSON schema can't express certain invariants in the metadata document, like the requirement that dimensional attributes (shape, chunk_shape, etc) be consistent.

Copy link

Contributor

dcherian commentedFeb 6, 2025

A fairly easy alternative way to handle this would be to simply write a test that takes thearrays strategy, extracts the metadata, converts to JSON, and then asserts that the JSON meets spec (as best as we can).

I still think a generic metadata strategy is probably useful.

Copy link

ContributorAuthor

moradology commentedFeb 6, 2025

Yeah, you'd definitely need json-schemaand custom validation rules that encode relationships among different fields

Copy link

ContributorAuthor

moradology commentedFeb 17, 2025

Did some refactoring and organization of property-based testing code and added round trip testing forArrayV2Metadata (which hopefully captures some important expectations about these things. Still need to fully satisfy the linter and make sure similar guarantees are tested forArrayV3Metadata

Copy link

Contributor

dcherian commentedFeb 17, 2025

Hye@moradology I added anarray_metadata strategy in

zarr-python/src/zarr/testing/strategies.py

Line 110 ine8bfb64

defarray_metadata(

would you mind merging your changes in with that strategy please?

Copy link

ContributorAuthor

moradology commentedFeb 18, 2025•
edited
Loading

Definitely - do you mind my pulling the generators up into their own submodule as done here, given their similar and unique functionality?

Copy link

Contributor

dcherian commentedFeb 18, 2025

I think testing/strategies.py is right place. Everything in there already handles the v2 vs v3 complexity

Copy link

ContributorAuthor

moradology commentedFeb 18, 2025•
edited
Loading

@dcherian Maybe I got out ahead of myself. Do you think the small amount of extra organization I added is undesirable? Basically, I brokestrategies.py out intostrategies/array_metadata_generators.py,strategies/array_generators.py,strategies/dtype_generators.py while usingstrategies/__init__.py to keep imports the same.

My thought was just that continuing to add property-based tests is probably a good idea and the singlestrategies.py file will likely get a bit long

Copy link

Contributor

dcherian commentedFeb 18, 2025•
edited
Loading

It feels a bit premature to me. In any case, I find it nice to keep such refactoring PRs separate so that we can review (much) smaller diffs. In my experience, keeping the diffs small is the best way to keep open-source contributions easy to merge.

How about just keeping them in strategies for now and we can refactor later as needed.

Copy link

ContributorAuthor

moradology commentedFeb 18, 2025

Sounds like a plan. As a relatively new contributor to this library, I appreciate the firm opinions!

Copy link

Contributor

dcherian commentedFeb 18, 2025

Thankyou for considering the opinions in a constructive manner!

moradology force-pushed thefix/v2meta_fill_serialization branch 2 times, most recently from9cd0ccd to251253eCompare

February 18, 2025 19:37

moradology marked this pull request as ready for review

February 18, 2025 19:37

github-actionsbot removed the needs release notesAutomatically applied to PRs which haven't added release notes label

Feb 18, 2025

Copy link

ContributorAuthor

moradology commentedFeb 18, 2025

OK, so to get the property-based tests working I had to make some decisions about what serialization strategies looked like. I tried to follow the instructions availablehere, but other issues that have come to the surface in the history ofzarr-python suggest there are likely blindspots in this document.This part, in particular, deserves special attention

Copy link

ContributorAuthor

moradology commentedFeb 18, 2025

A bit confused about the code coverage going down. The tested constraints are definitely a bit tighter than they were, if anything

dcherian reviewed

Feb 18, 2025

View reviewed changes

tests/test_properties.py OutdatedShow resolvedHide resolved

dcherian reviewed

Feb 18, 2025

View reviewed changes

tests/test_properties.py Outdated



		@given(npst.from_dtype(dtype=np.dtype("float64"),allow_nan=True,allow_infinity=True))
		deftest_v2meta_nan_and_infinity(fill_value):

Copy link

Contributor

dcherianFeb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could look more likehttps://github.com/zarr-developers/zarr-python/pull/2847/files#diff-d318cba7c9e4a6983338cf21df1db66aab796137a2fb4a76ce48c0afa17de2f9

We could abstract out aassert_valid_v2_json_dict and similarly for v3.

Copy link

ContributorAuthor

moradologyFeb 19, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Makes sense to me - would you like me to wait for that branch to go in and then add these validations to the broader test you have in mind?

Copy link

Contributor

dcherianFeb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's just copy the test over and you can expand them here. I can handle the merge conflicts. It may very well be that this one goes in first :)

Copy link

ContributorAuthor

moradologyFeb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

🫡

Copy link

ContributorAuthor

moradologyFeb 19, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, so I'm curious about something. See this line:https://github.com/zarr-developers/zarr-python/blob/main/src/zarr/core/metadata/v3.py#L413C1-L414C1
Is that type correct? If so, it seems that numpy types should be serialized afterto_dict. As of now, they're not (this PR changes that forv2 but not yet forv3). So what's the desired behavior?

Copy link

Contributor

dcherianFeb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

don't know. ping@d-v-b

Copy link

Contributor

d-v-bFeb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

no, that type is wrong and it bugs me! for v3 metadatato_dict returns an instance ofDataType for thedata_type key, and I think we registered that type with a custom JSON encoder.

Copy link

Contributor

d-v-bFeb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't remember why this decision was made but we should definitely fix it. fwiw,dict[str, JSON] is also sub-optimal, given that the keys of the metadata document are (almost) entirely static, and so we could do typeddicts here

Copy link

ContributorAuthor

moradologyFeb 20, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, so the immediately obvious thing is that the serialization logic I've added toto_dict needs to be pushed down intoto_buffer_dict andto_dict should retain its (potentially) not-directly-serializable python types. I assumed serialization should happen into_dict because ofJSON in the type sig but also because there's some serialization happening in that function already:https://github.com/zarr-developers/zarr-python/blob/main/src/zarr/core/metadata/v2.py#L199-L204. I will address that, too

From there, a typed-dict can be implemented forto_dict output.

Copy link

ContributorAuthor

moradologyFeb 24, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Alright, after a fair bit of back and forth I've gotten to a better (but imperfect) place in terms of serialization/deserialization. A couple of high points:

fill value serialization is happening in exactly one function and inside of theto_buffer_dict method rather thanto_dict.https://github.com/zarr-developers/zarr-python/pull/2802/files#diff-535b8bc19e7bf0cd5b07eb0d481ba8676f21b41cd352ddcbfc92c4667f124804R112-R133
deserialization/parsing happens entirely inside of one function rather than in a few different places. Hopefully this makes things a bit easier to follow:https://github.com/zarr-developers/zarr-python/pull/2802/files#diff-535b8bc19e7bf0cd5b07eb0d481ba8676f21b41cd352ddcbfc92c4667f124804R318-R392

I also played around with removing thedict[str, JSON type signature from the to_dict/from_dict methods but upon further inspection, those seem to be expected/baked in for otherMetadata subclasses. In particular,ChunkKeyEncoding,GroupMetadata, andChunkGrid. It seems to me that really disentangling this type signature might be out of scope here and probably deserves a separate PR.

moradology force-pushed thefix/v2meta_fill_serialization branch from404213a to935ac71Compare

February 19, 2025 15:26

moradology added2 commits

February 19, 2025 11:41

Fix fill_value serialization of NaN

fd43cbf

Round trip serialization for array metadata v2/v3

6301b15

moradology force-pushed thefix/v2meta_fill_serialization branch from807470f to6301b15Compare

February 19, 2025 17:41

moradology changed the title~~Fix fill_value serialization of NaN; add property-based tests~~Fix fill_value serialization of NaN

Feb 19, 2025

moradology changed the title~~Fix fill_value serialization of NaN~~Fix fill_value serialization issues

Feb 19, 2025

moradology added3 commits

February 26, 2025 13:14

Merge branch 'main' into fix/v2meta_fill_serialization

54920df

Merge branch 'main' into fix/v2meta_fill_serialization

921d2fa

Merge branch 'main' into fix/v2meta_fill_serialization

a59f9ac

Copy link

ContributorAuthor

moradology commentedMar 4, 2025

@d-v-b or@dcherian - thoughts on this set of changes? No pressure, I just don't want this to get too stale if possible. The PR is now non-trivial, but I think it does a decent job of organizing and defining serialization/deserialization behaviors that were previously hard to see at a glance or else not well defined. It also verifies round tripping with property-based tests, which is how certain undefined behaviors were uncovered (e.g. 'NaT' fordatetime instances)

d-v-b reviewed

Mar 4, 2025

View reviewed changes

src/zarr/core/metadata/v2.py

		f"fill_value{fill_value!r} is not valid for dtype{dtype}; must be a unicode string"
		)
		elifdtype.kindin"SV"andisinstance(fill_value,str):
		fill_value=base64.standard_b64decode(fill_value)

Copy link

Contributor

d-v-bMar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

out of scope but does anyone know why we base64 encode scalars with dtypeS ?S is ascii-encoded strings, which shouldn't need any encoding / decoding.

d-v-b approved these changes

Mar 4, 2025

View reviewed changes

Copy link

Contributor

d-v-b commentedMar 4, 2025

this looks good to me, I'm going to merge soon unless there are objections. good to keep things moving.

dstansby suggested changes

Mar 4, 2025

View reviewed changes

Copy link

Contributor

dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks great overall - I think there's some improvements that could be made in terms of code structuring/readability, I left some comments inline.

There's also lots of lines that codecov is claiming aren't covered here - are these real, or is codecov not working?

src/zarr/core/metadata/v2.py OutdatedShow resolvedHide resolved

src/zarr/core/metadata/v2.pyShow resolvedHide resolved

tests/test_properties.py OutdatedShow resolvedHide resolved

tests/test_properties.pyShow resolvedHide resolved

Remove redundancies, fix integral handling

042d815

Copy link

ContributorAuthor

moradology commentedMar 5, 2025•
edited
Loading

Glad I got the pushback to go through things again because the property-based tests came across an error related to floating point precision at really big numbers that (in extremely rare cases of integers nearsys.maxsize) was causing problems. Also trimmed some redundancy

moradology requested a review fromdstansby

March 5, 2025 19:57

moradology force-pushed thefix/v2meta_fill_serialization branch from428ddb3 to58b4070Compare

March 5, 2025 20:22

Reorganize structured fill parsing

1388a3b

moradology force-pushed thefix/v2meta_fill_serialization branch from58b4070 to1388a3bCompare

March 5, 2025 20:26

moradology added2 commits

March 5, 2025 14:32

Merge branch 'main' into fix/v2meta_fill_serialization

66aa4d5

Bump up hypothesis deadline

c46b27c

Copy link

ContributorAuthor

moradology commentedMar 5, 2025•
edited
Loading

Well, this timeout error is pretty annoying! Can't reliably reproduce it locally but don't really like fiddling with the timeout settings either. Oh well. I think it should be fine to ignore timeout settings fortest_basic_indexing

moradology added3 commits

March 5, 2025 15:00

Remove hypothesis deadline

11e2520

Merge branch 'main' into fix/v2meta_fill_serialization

45606ec

Merge branch 'main' into fix/v2meta_fill_serialization

097faa4

Copy link

ContributorAuthor

moradology commentedApr 2, 2025

@dstansby Bumping this again just to make sure I don't lose track of it. If you all need some help doing PR reviews etc (obviously there's a lot going on) I'd be happy to lend a hand on smaller/less controversial PRs!

dstansby approved these changes

Apr 2, 2025

View reviewed changes

Copy link

Contributor

dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I spotted a wildprint() statement that should probably be removed 😄 - otherwise looks good to me now, thanks for making the changes!

tests/test_v2.py OutdatedShow resolvedHide resolved

moradologyand others added2 commits

April 2, 2025 13:53

Update tests/test_v2.py

5c6166f

Co-authored-by: David Stansby <dstansby@gmail.com>

Merge branch 'main' into fix/v2meta_fill_serialization

3651f4b

dcherianenabled auto-merge (squash)

April 4, 2025 14:51

dcherian merged commit3b6565b intozarr-developers:main

Apr 4, 2025

29 of 30 checks passed

BrianMichell mentioned this pull request

Apr 17, 2025

Structured dtype serialization with consolidated metadata fails#2998

Closed

d-v-b pushed a commit to d-v-b/zarr-python that referenced this pull request

Apr 20, 2025

Fix fill_value serialization issues (zarr-developers#2802)

11f6680

* Fix fill_value serialization of NaN* Round trip serialization for array metadata v2/v3* Unify metadata v2 fill value parsing* Test structured fill_value parsing* Remove redundancies, fix integral handling* Reorganize structured fill parsing* Bump up hypothesis deadline* Remove hypothesis deadline* Update tests/test_v2.pyCo-authored-by: David Stansby <dstansby@gmail.com>---------Co-authored-by: Deepak Cherian <deepak@cherian.net>Co-authored-by: David Stansby <dstansby@gmail.com>

tasansal mentioned this pull request

Apr 22, 2025

[v2] Removeprint statement from structured dtype metadata parsing#3007

Merged

6 tasks

d-v-b mentioned this pull request

May 1, 2025

serializing not a time (NaT) to metadata#3028

Closed

Labels

None yet

4 participants

Movatterモバイル変換

Uh oh!

Fix fill_value serialization issues#2802

Fix fill_value serialization issues#2802

Uh oh!

Conversation

moradology commentedFeb 5, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Changes

Uh oh!

moradology commentedFeb 5, 2025

Uh oh!

dcherian commentedFeb 5, 2025

Uh oh!

moradology commentedFeb 5, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dcherian commentedFeb 5, 2025

Uh oh!

d-v-b commentedFeb 6, 2025

Uh oh!

dcherian commentedFeb 6, 2025

Uh oh!

moradology commentedFeb 6, 2025

Uh oh!

moradology commentedFeb 17, 2025

Uh oh!

dcherian commentedFeb 17, 2025

Uh oh!

moradology commentedFeb 18, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dcherian commentedFeb 18, 2025

Uh oh!

moradology commentedFeb 18, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dcherian commentedFeb 18, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

moradology commentedFeb 18, 2025

Uh oh!

dcherian commentedFeb 18, 2025

Uh oh!

moradology commentedFeb 18, 2025

Uh oh!

moradology commentedFeb 18, 2025

Uh oh!

Uh oh!

dcherianFeb 18, 2025

Choose a reason for hiding this comment

Uh oh!

moradologyFeb 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcherianFeb 19, 2025

Choose a reason for hiding this comment

Uh oh!

moradologyFeb 19, 2025

Choose a reason for hiding this comment

Uh oh!

moradologyFeb 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcherianFeb 19, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-bFeb 20, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-bFeb 20, 2025

Choose a reason for hiding this comment

Uh oh!

moradologyFeb 20, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moradologyFeb 24, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

moradology commentedFeb 5, 2025•
edited
Loading

moradology commentedFeb 5, 2025•
edited
Loading

moradology commentedFeb 18, 2025•
edited
Loading

moradology commentedFeb 18, 2025•
edited
Loading

dcherian commentedFeb 18, 2025•
edited
Loading

moradologyFeb 19, 2025•
edited
Loading

moradologyFeb 19, 2025•
edited
Loading

moradologyFeb 20, 2025•
edited
Loading

moradologyFeb 24, 2025•
edited
Loading

moradology commentedMar 5, 2025•
edited
Loading

moradology commentedMar 5, 2025•
edited
Loading