This is some work-in-progress for adding data structures for creating a memory efficient summary of a sequence of meta data dictionaries (assuming a large number of keys/values repeat) and then using this to determine how to sort the associated images into an nD array.

This approach was inspired bythis dcmstack issue.

moloney added2 commits

August 1, 2020 15:54

WIP: Initial meta summary work

d5b588d

WIP: Basics mostly working, needs more testing and finish ndSort

cb3222b

Copy link

pep8speaks commentedJul 9, 2021•
edited
Loading

Hello@moloney, Thank you for updating!

In the filenibabel/metasum.py:

Line 568:101:E501 line too long (102 > 100 characters)

To test for issues locally,pip install flake8 and then runflake8 nibabel.

Comment last updated at 2021-07-13 03:30:41 UTC

Copy link

codecovbot commentedJul 9, 2021•
edited
Loading

Codecov Report

Merging#1030 (bf8ecfc) intomaster (ea68c4e) willdecrease coverage by1.21%.
The diff coverage is58.96%.

@@            Coverage Diff             @@##           master    #1030      +/-   ##==========================================- Coverage   92.26%   91.04%   -1.22%==========================================  Files         100      101       +1       Lines       12205    12668     +463       Branches     2136     2267     +131     ==========================================+ Hits        11261    11534     +273- Misses        616      781     +165- Partials      328      353      +25

Impacted Files	Coverage Δ
nibabel/metasum.py	`58.96% <58.96%> (ø)`

Continue to review full report at Codecov.

Legend -Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing data
Powered byCodecov. Last updateea68c4e...bf8ecfc. Read thecomment docs.

effigies reviewed

Jul 9, 2021

View reviewed changes

nibabel/metasum.pyShow resolvedHide resolved

moloney added3 commits

July 9, 2021 16:32

TST+BF: Expand tests and fix bugs

c21a8fd

BF: Add bitarray dependency

4ba6a73

ENH: Make ValueIndices.to_list much more efficient

583e0aa

effigies reviewed

Jul 12, 2021

View reviewed changes

setup.cfgShow resolvedHide resolved

moloneyand others added2 commits

July 12, 2021 12:24

BF: Add dataclasses backport for 3.6

8946f16

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

Merge branch 'master' into enh-metasum

a2c94da

Copy link

Member

effigies commentedJul 12, 2021

Merged master in to resolve conflicts and get the tests going. Let me know if you'd prefer I didn't do that.

moloney added2 commits

July 12, 2021 20:29

ENH: Get the nd_sort method mostly working w/ basic tests

59fab27

Merge branch 'enh-metasum' of github.com:moloney/nibabel into enh-met…

bf8ecfc

…asum

Copy link

Contributor

ZviBaratz commentedAug 8, 2021

@matthew-brett /@moloney /@effigies
If DICOM related functionality is moved to dicom_parser, do you still think theMetaSummary implementation will be required?
I feel like we could simply cache a dictionary of lazily evaluated header values within eachSeries instance. The higher levelDataset class (to be implemented) can simply query those.

Copy link

ContributorAuthor

moloney commentedAug 9, 2021

@ZviBaratz Can you explain in more detail what you have in mind? I don't see how a cache helps to solve the problem of determining what meta data is varying when someone hands us a list of Dicom files we have never seen before (that could come from multiple Dicom series).

Copy link

Contributor

ZviBaratz commentedAug 9, 2021

The idea is that there will be aDataset class which will receive a root directory and iterate its files to create the representations for the contained series. When a user tries to query based on any particular header field, the dataset queries all the createdSeries instances headers to retrieve the value (at which point it could be saved to a cache dictionary in order to avoid repeating computations). Of course some evaluation time is to be expected, but I don't think it should be anything too bad up to a few dozen series. If you're working with more than that, it might be best to export the metadata to some external table anyway.

Copy link

ContributorAuthor

moloney commentedAug 9, 2021•
edited
Loading

We really don't want to require all the files live in a single directory. The assumption is you are passed a list of files that could be massive even for a single series (e.g. 36K) that you have never seen before and you want to efficiently convert them into an xarray on the fly. My original implementation in dcmstack wasn't totally naive, meta data values that were constant were only stored once, and yet it required orders of magnitude more memory (18GB vs ~800MB with 36K files) compared to this approach.

Copy link

Contributor

ZviBaratz commentedAug 9, 2021

I see.
I'll be working on the issues that are already piling up in dicom_parser for the next couple of weeks, after that I'll start thinking on how this would best be integrated into dicom_parser. We could discuss it in more detail in our next meeting.

Copy link

ContributorAuthor

moloney commentedAug 9, 2021

If we want to support using multiprocessing to speed up the parsing of very large series, this would also provide a nice compact representation to pass around.

Copy link

Member

effigies commentedMar 3, 2022

Sorry, I lost track of this one. What's the status? Are we still trying to get this into nibabel?

Labels

None yet

Movatterモバイル変換

WIP: Add memory efficient meta data summary#1030

Are you sure you want to change the base?

WIP: Add memory efficient meta data summary#1030

Uh oh!

Conversation

moloney commentedJul 9, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

pep8speaks commentedJul 9, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Comment last updated at 2021-07-13 03:30:41 UTC

Uh oh!

codecovbot commentedJul 9, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

effigies commentedJul 12, 2021

Uh oh!

ZviBaratz commentedAug 8, 2021

Uh oh!

moloney commentedAug 9, 2021

Uh oh!

ZviBaratz commentedAug 9, 2021

Uh oh!

moloney commentedAug 9, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ZviBaratz commentedAug 9, 2021

Uh oh!

moloney commentedAug 9, 2021

Uh oh!

effigies commentedMar 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moloney commentedJul 9, 2021•
edited
Loading

pep8speaks commentedJul 9, 2021•
edited
Loading

codecovbot commentedJul 9, 2021•
edited
Loading

moloney commentedAug 9, 2021•
edited
Loading