Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

STAC metadata as multidimensional coordinates#230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
gjoseph92 wants to merge31 commits intomain
base:main
Choose a base branch
Loading
frommultidimensional-coords

Conversation

gjoseph92
Copy link
Owner

@gjoseph92gjoseph92 commentedNov 30, 2023
edited
Loading

This refactors the STAC metadata -> xarray coordinates logic to support multi-dimensional coordinates. For example, imagine a field likerescaling (orhref, for that matter), that's different for every asset of every item. Previously, we'd drop this field. Now, we can store it as a 2D array indexing the dimensionstime, band.

This builds off of@Berhinj's work in#222, which provided a lot of inspiration for this design.

This is a significant rewrite of the metadata logic. The basic idea now is, for each field in STAC metadata:

  1. Make a 2D NumPy array (for thetime, band dimensions)
  2. Iterate through all the STAC metadata, and write values into the array
  3. De-duplicate the array: if all values along a dimension are identical, drop that dimension. For example, thetype field will be an MxN array ofimage/tiff; application=geotiff over and over; this is collapsed down into a 0D scalar.

Unfortunately, it does come at a slight performance cost:stackstac.stack on 10,000Landast-8 items is 13.6% slower, going from 13.2s to 15s. (Of course, there's no performance difference with actually computing the results, which is where the bulk of time is spent anyway.)

There are a number of benefits, however:

  • Robust handling of JSON-y subfields. For example, theroles field, which contains variable-length lists of tags per band like["data"],["cloud", "cloud-shadow"], is now stored. Same with complex things likeclassification:bitfields. More interestingly, we could now easily retaingeometry,bbox, etc. xrefStore per-item bounding boxes as a coordinate #6
  • Generally, should be more robust to new fields/extensions adopted in STAC, or even inserted by users (they may not be formatted ideally, but almost certainly won't be dropped)
  • raster:bands subfields are now in coordinates, and supporting extensions likeeo:bands,raster:bands, etc. is now generalized and easy to extend in the future
  • Finally have tests for coordinates logic!
  • rescale logic could be simplified into a one-liner after the stack is built (stack * stack.coords['raster:bands_scale'] + stack.coords['raster:bands_offset']), rather than plumbing scale/offset factors through many layers. (TBD if this makes a nasty Dask graph though.)

There's maybe even a future where the logic to generate the Dask array takes these coordinates as input, rather than raw STAC dicts or pystac items? Perhaps if we can someday store the geometries as a GeoSeries, or something like that.

Closes#216

types are so messed up, idk what to do with this, or if it matters
- unnesting is slow. can optimize.    - could probably move into inner loop, so we don't have to iterate over values twice- deobjectifying is slowest.
makes surprisingly little difference. even with micro-optimization, this is just slow python.could maybe replace `isinstance` with `type is`; beyond that, idk.
this gets us to just 3x slower (was 4-5x before, with deobjectifying)string fields are now left as objects, mostly because we'd have to scan them to see if any values were missing before `astype('U')`, because otherwise missing values would just be the string `'None'`
@Berhinj
Copy link

@gjoseph92 what is preventing you from merging? Would a deep code review help? I tried going through it quickly, but i's too far in my head and there has been a looot of changes from my initial code but in a good way though.

@gjoseph92
Copy link
OwnerAuthor

@Berhinj simply haven't gotten around to taking another look at it. I wanted to give it a few days before taking another pass at self-review. If you could try out this branch on your use case and confirm it helps, that would be very helpful to get a real-world test!

@Berhinj
Copy link

Berhinj commentedDec 11, 2023
edited
Loading

@gjoseph92 just tried it and compared to the results I was getting from my PR, the multicoordinnates aspect is working good!

Issue though, I notice that the "earthsearch:boa_offset_applied" boolean array from sentinel 2 became float...
Same for the "s2:processing_baseline" coords which went from characters to float

@gjoseph92
Copy link
OwnerAuthor

I notice that the "earthsearch:boa_offset_applied" boolean array from sentinel 2 became float

Good point. I should special-case booleans.

@gjoseph92 just tried it and compared to the results I was getting from my PR, the multicoordinnates aspect is working good!

Same for the "s2:processing_baseline" coords which went from characters to float

I can't reproduce this. I'm gettingarray('05.09', dtype=object), which is still a string. I'm just running through thebasic example from the docs for this.

@gjoseph92
Copy link
OwnerAuthor

@Berhinj booleans should be handled correctly now; I'm gettingearthsearch:boa_offset_applied as a bool now. Do you have any other feedback here?

@Berhinj
Copy link

@gjoseph92 no, that's perfect, thanks

@Berhinj
Copy link

any chance this will be merged soon? :)

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Multidimensional coordinnates is not supported
2 participants
@gjoseph92@Berhinj

[8]ページ先頭

©2009-2025 Movatter.jp