zarr-developers/zarr-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork366
Star1.8k

Slow performance on Zarr DirectoryStore#2484

Unanswered

huberb asked this question inQ&A

huberb

Nov 12, 2024

· 5 comments· 10 replies

Return to top

Discussion options

huberb
Nov 12, 2024

I'm looking for ways to improve the performance of my dataloading pipeline and I found Zarr. To get an idea about throughput, I started a small benchmark script in python. To get a baseline I also run tests using numpy memory mapped arrays.

I'm working with 4D arrays which are quite large. One of my criterias is that I need to access them as a key-value store. From each value, I access randomly on the first axis.

I created some dummy arrays to test throughput.

Here is my complete benchmarking code that compares Zarr to accessing raw Numpy arrays on disk:

importnumpyasnpfrompathlibimportPathimporttimefromtqdmimporttqdmimportosimportrandomimportzarrdefwrite_numpy_dataset(data_path,raw_data):Path(data_path).mkdir(parents=True,exist_ok=True)forsample_name,sample_dataintqdm(raw_data.items()):mmap=np.memmap(f"{data_path}/{sample_name}.npy",dtype=sample_data.dtype,mode='w+',shape=sample_data.shape        )mmap[:]=sample_data[:]mmap.flush()defwrite_zarr_dataset(data_path,raw_data):zarr_store=zarr.DirectoryStore(f'{data_path}/data.zarr')zarr_group=zarr.group(zarr_store,overwrite=True)fordata_key,sampleintqdm(raw_data.items()):zarr_group.create_dataset(data_key,data=sample,chunks=[28,28,28],# I also tried sample.shape[1:] and sample.shape hereoverwrite=True,compressor=None# remove compression because I want as much throughput as possible        )defload_samples_numpy(data_path):numpy_files= [f"{data_path}/{f}"forfinos.listdir(data_path)]whileTrue:fornumpy_fileinnumpy_files:index=random.randint(0,499)# open a new memory map for every access. This way I will never cache samples in RAMvolume=np.memmap(numpy_file,mode='r',dtype=np.float32            ).reshape([500,-1,28,28,28])yieldvolume[index]defload_samples_zarr(data_path):zarr_store=zarr.DirectoryStore(f"{data_path}/data.zarr")zarr_group=zarr.open_group(zarr_store,mode="r")data_keys= [sforsinzarr_group]whileTrue:fordata_keyindata_keys:index=random.randint(0,499)yieldzarr_group[data_key][index]defbenchmark(dataset_path,load_fn):total_size=0start_time=time.time()print("\nStarting benchmark")print(f"reading from:{dataset_path}")sample_sums= []fori,sampleintqdm(enumerate(load_fn(dataset_path))):ifi==2000:breaktotal_size+=sample.nbytessample_sums.append(sample.sum())# do some dummy calculation to force the data into RAMend_time=time.time()throughput= (total_size/ (end_time-start_time))/ (1024*1024)print(f"Throughput:{throughput:.2f} MB/s")if__name__=='__main__':data_path_numpy="/mnt/cache/dataset_numpy"data_path_zarr="/mnt/cache/dataset_zarr"# the second axis of my data samples is not fixedraw_data= {'sample0':np.random.normal(size=(500,14,28,28,28)).astype(np.float32),'sample1':np.random.normal(size=(500,16,28,28,28)).astype(np.float32),'sample2':np.random.normal(size=(500,12,28,28,28)).astype(np.float32),'sample3':np.random.normal(size=(500,11,28,28,28)).astype(np.float32),'sample4':np.random.normal(size=(500,13,28,28,28)).astype(np.float32),'sample5':np.random.normal(size=(500,19,28,28,28)).astype(np.float32),'sample6':np.random.normal(size=(500,20,28,28,28)).astype(np.float32),'sample7':np.random.normal(size=(500,15,28,28,28)).astype(np.float32),'sample8':np.random.normal(size=(500,18,28,28,28)).astype(np.float32),'sample9':np.random.normal(size=(500,17,28,28,28)).astype(np.float32),    }write_numpy_dataset(data_path_numpy,raw_data)benchmark(data_path_numpy,load_samples_numpy)write_zarr_dataset(data_path_zarr,raw_data)benchmark(data_path_zarr,load_samples_zarr)

It turns out that accessing Numpy arrays outperforms Zarr by a factor of ~6-7
My maximum disk speed is 500MB/s and I reach roughly 400MB/s using numpy. With Zarr I see a throughput of ~50-60MB/s

This difference is so big that I feel like I must be missing something. I tried different chunk sizes and disabled compression completely. Still, Zarr never reaches a throughput that comes even close to Numpy's memory mapped arrays.

Does anyone have a hint on what I'm missing? Is Zarr generally slow for my usecase of accessing large 4D arrays?
Appreciate any help

You must be logged in to vote

Replies: 5 comments 10 replies

Comment options

d-v-b
Nov 12, 2024
Maintainer

It's important to distinguish between zarr the format andzarr-python, the python library. On a local file system, zarr-the-format can be read as quickly as you can read + decompress files stored on disk. Since you are not using any compression, the chunks of your arrays are literally the serialized bytes of the in-memory numpy arrays you are trying to create, so you can skipzarr-python entirely for reading and just memory-map each chunk. Happy to elaborate on how this would work if you are interested.

As forzarr-python, you should know that array indexing inzarr-python versions 2.x.x is serial, which is slow. When performing array indexing likearray[:], the chunks for that array are fetched one after another. So if you have any array indexing patterns that read multiple chunks, you will waitO(n_chunks * read_time * decompress_time) time for your entire array indexing operation to complete.

We are trying to make indexing faster inzarr-python 3.x, and we have made good progress in that direction. But I think there a limits to how far we can go w.r.t performance inzarr-python, or any python library. If performance is your main goal, you might consider usingtensorstore for reading your data. Or just memory-map your chunks as numpy arrays, as I suggested earlier. The python librarydask is also a popular solution for concurrent access to zarr arrays, which has the advantage of generalizing to compute clusters.

You must be logged in to vote

1 reply

Comment options

huberb Nov 12, 2024
Author

Thank you for this quick response.

Indeed, performance is my main criteria here. Previously I've used the TFRecords format which worked great for me so far but only allows sequential read access to my data instead of (random) index based access, which would be really beneficial for my use case.

My datasets exceed 50TB nowadays and since I got some new hardware to play with, data loading became my main bottleneck.

I'm currently benchmarking on a Linux system with a SSD that gives me 500MB/s read access, but later I want to read data from a NFS that has much higher throughput but also adds latency when opening filehandles. This is why I don't think I can afford storing each sample as a single file on disk, so I group them into chunks of 500 samples that have common shapes, as you can see in the code example that I gave above. Still, I think I'm loading far too many filehandles when storing my data like this.

I thought about just grouping even more samples into single files and access those with memory mapping. Can you think of any better approach than this?

My main criterias are:

index based access
some form of key-value store (so far this is my filesystem, using filenames as keys)
try to avoid opening too many filehandles because latency will kill my performance
compression would be nice to save disk space, but throughput has a higher priority for me

Other stuff like prefetching and batching is an afterthought after I figured out how to max out my throughput on a single process.

I'm willing to redo my whole data pipeline if necessary and I'm happy for any hints. So far I've tried TFRecords, HDF5, DALI, memory mapped Numpy arrays and Zarr.tensorstore seems like a good candidate to try. Can you point me to any examples that might fit my usecase? Would you recommend any specific backend?

Thanks again for the response

Comment options

d-v-b
Nov 12, 2024
Maintainer

I think thetensorstore tutorial is a good place to start. You would probably not want to use the format in that example (n5); instead, you can use tensorstore to read and write zarr v2 and v3 arrays (to write zarr groups you can just usezarr-python). Tensorstore also supports other formats but I have much less experience with them. Because tensorstore has a performance focus, you might get good pointers by opening a discussion / issue on their github page.

You must be logged in to vote

4 replies

Comment options

huberb Nov 13, 2024
Author

Do you also have by chance any guidance on how to set up a Linux system to improve reading speed through caching? I'm asking because my current benchmarks seems to improve if they run multiple times and it seems like I'm running into some filesystem caching mechanism on Linux even if my files exceed my available RAM. Running some new benchmarks on my system multiple times shows this:

$ python benchmark.pyBytes read: 20580.00 MBThroughput: 530.77 MB/s$ python benchmark.pyBytes read: 20580.00 MBThroughput: 775.01 MB/s$ python benchmark.pyBytes read: 20580.00 MBThroughput: 996.57 MB/s

I've been reading this StackExchange post:
https://unix.stackexchange.com/questions/30286/can-i-configure-my-linux-system-for-more-aggressive-file-system-caching

but I'm not sure if this applies to my NFS with filesystem type nfs4. Filehandle caching could also be helpful to me

Anyway, thanks a lot for the help! We can also close this discussion since it's moving away from zarr related questions.

Comment options

d-v-b Nov 13, 2024
Maintainer

unfortunately I know basically nothing about tuning local file systems for throughput -- most of the data I work with is on cloud storage, where we have different challenges :) That being said, if you come up with a good story for how to get the most out of zarr on local disk(s), I'd be interested in reading about it (and it could potentially end up in our docs)

Comment options

huberb Nov 14, 2024
Author

So what turned out to work really well for me is saving my data as .HDF5 with LZF compression and an appropriate chunk size. I group a ton of samples into single files to avoid file handle overhead. This way I can read 13.4Gbit/s of data on my test setup, even though my bandwidth would only allow for 10Gbit/s.

With Zarr I can't get to similar performance and it seems to me this is only because of the format that Zarr stores on disk. HDF5 puts everything into single files. Zarr stores chunks as separate files instead.
Withstrace -T I can see that a lot of time is spend on theopenat syscall when reading those .zarr files. This only happens when I read from my NFS but I can't find a way to speed this up yet.

Maybe this is too specific to the setup that I'm using but I think there is a usecase for a storage option that minimizes the amount of files on disk.

Comment options

d-v-b Nov 14, 2024
Maintainer

zarr v3 introduces a compression strategy (sharding) that can pack multiple "read chunks" inside a larger file, called a "shard", to address the problem of too many files. The upcomingzarr-python 3 release will support this, and tensorstore supports it today. Using sharding should give you the same IO benefits of hdf5.

Comment options

huberb
Mar 10, 2025
Author

Sooo 5 months later and I'm still here 😓 Since you mentioned you are using cloud storages for your zarr datasets I wondered which kind of cloud technology you use? Is it as simple as some S3 buckets or are you using anything that is tuned for latency and throughput? Is there anything you can recommend for the purpose of reading data from a network as fast as possible?

You must be logged in to vote

4 replies

Comment options

d-v-b Mar 10, 2025
Maintainer

i tend to use s3, and reading from s3 or any other network-bound storage efficiently typically requires making a lot of concurrent requests

Comment options

huberb Mar 13, 2025
Author

I see. Do you host your own S3 or do you use some publically available cloud? What kind of throughput can you expect if you store huge datasets and access small chunks like 0.5MB per sample? I have a storage system attached that could theoretically run S3.

To feed my GPUs I'd need to reach something like 4GB/s while accessing only small chunks (like 500KB per sample, but a batch size of 1024). This means I would've to reach a throughput of 8.000 samples per second. The complete dataset would be above 20TB if uncompressed. I guess I would have to run thousands of requests in parallel?
I'm connected to my storage with a 100Gbit/s connection so theoretically this would work. I'm not sure if S3 would handle this properly.

Just trying to avoid another dead-end here because I've wasted to much time on other approaches.

Comment options

rabernat Mar 13, 2025
Maintainer

If you already have a fast hard drive on your system, it's very unlikely that you would ever get better performance by putting an additional S3 layer on top of this. That would only introduce overhead.

We should return to your original problem and understand why your throughput is not as expected.

Comment options

huberb Mar 13, 2025
Author

I have many of those datasets and I already mount my storage as NFS because I don't have local hard drives on my system that have the required size.

Basically I'm killed by random access. I can read my dataset sequentially and I'll reach my desired throughput, but it starts to break down once I do random access into those datasets.

HDF5 starts to become really slow with random indexing. I also tried saving those datasets as raw numpy arrays on disk and memory map them, but even if I access them with many processes / threads at once I can't reach more than 1GB/s with random access.

I can give more details on my previous approaches if you think there's still something to be gained. I'm totally open to the idea that I'm overlooking something obvious

Comment options

rabernat
Mar 13, 2025
Maintainer

I just tried running the original example on my macbook using Zarr 3.0.5. I made a few changes

I made it it bit smaller so it was cheaper to run (changed the first dimension to 100 rather than 500)
Updated it to use Zarr V3
I picked a chunk shape that was aligned with the random access pattern.

In this example, Zarr was about 4x slower than memmapped numpy. Memmapping is very efficient, so I'm not super surprised by this.

full code

import numpy as npfrom pathlib import Pathimport timefrom tqdm import tqdmimport osimport randomimport zarrimport tempfileimport zarr.codecsdef write_numpy_dataset(data_path, raw_data):    Path(data_path).mkdir(parents=True, exist_ok=True)    for sample_name, sample_data in tqdm(raw_data.items()):        mmap = np.memmap(            f"{data_path}/{sample_name}.npy",            dtype=sample_data.dtype,            mode='w+',            shape=sample_data.shape        )        mmap[:] = sample_data[:]        mmap.flush()def write_zarr_dataset(data_path, raw_data):    zarr_store = f'{data_path}/data.zarr'    zarr_group = zarr.group(zarr_store, zarr_format=3, overwrite=True)    for data_key, sample in tqdm(raw_data.items()):        a = zarr_group.create_array(            data_key,            dtype=sample.dtype,            shape=sample.shape,            chunks=(1,) + tuple(sample.shape[1:]),              compressors=None,        )        a[:] = sampledef load_samples_numpy(data_path):    numpy_files = [f"{data_path}/{f}" for f in os.listdir(data_path)]    while True:        for numpy_file in numpy_files:            index = random.randint(0, 99)            # open a new memory map for every access. This way I will never cache samples in RAM            volume = np.memmap(                numpy_file,                mode='r',                dtype=np.float32            ).reshape([100, -1, 28, 28, 28])            yield volume[index]def load_samples_zarr(data_path):    zarr_store = f"{data_path}/data.zarr"    zarr_group = zarr.open_group(zarr_store, mode="r")    data_keys = [s for s in zarr_group]    while True:        for data_key in data_keys:            index = random.randint(0, 99)            yield zarr_group[data_key][index]def benchmark(dataset_path, load_fn):    total_size = 0    start_time = time.time()    print("\nStarting benchmark")    print(f"reading from: {dataset_path}")    sample_sums = []    for i, sample in tqdm(enumerate(load_fn(dataset_path))):        if i == 2000:            break        total_size += sample.nbytes        sample_sums.append(sample.sum())  # do some dummy calculation to force the data into RAM    end_time = time.time()    throughput = (total_size / (end_time - start_time)) / (1024 * 1024)    print(f"Throughput: {throughput:.2f} MB/s")def main():    data_path_numpy = tempfile.TemporaryDirectory()    data_path_zarr = tempfile.TemporaryDirectory()    raw_data = {        'sample0': np.random.normal(size=(100, 14, 28, 28, 28)).astype(np.float32),        'sample1': np.random.normal(size=(100, 16, 28, 28, 28)).astype(np.float32),        'sample2': np.random.normal(size=(100, 12, 28, 28, 28)).astype(np.float32),        'sample3': np.random.normal(size=(100, 11, 28, 28, 28)).astype(np.float32),        'sample4': np.random.normal(size=(100, 13, 28, 28, 28)).astype(np.float32),        'sample5': np.random.normal(size=(100, 19, 28, 28, 28)).astype(np.float32),        'sample6': np.random.normal(size=(100, 20, 28, 28, 28)).astype(np.float32),        'sample7': np.random.normal(size=(100, 15, 28, 28, 28)).astype(np.float32),        'sample8': np.random.normal(size=(100, 18, 28, 28, 28)).astype(np.float32),        'sample9': np.random.normal(size=(100, 17, 28, 28, 28)).astype(np.float32),    }    write_numpy_dataset(data_path_numpy.name, raw_data)    write_zarr_dataset(data_path_zarr.name, raw_data)    benchmark(data_path_numpy.name, load_samples_numpy)    benchmark(data_path_zarr.name, load_samples_zarr)if __name__ == "__main__":    main()

Starting benchmarkreading from: /var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmpgyk19na32000it [00:00, 2264.43it/s]Throughput: 2938.54 MB/sStarting benchmarkreading from: /var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmpqe6_bafq2000it [00:03, 586.10it/s]Throughput: 760.69 MB/s

Interestingly, when I ran the exact same code from a jupyter notebook in VScode, the timing was almost identical!

Starting benchmarkreading from: [/var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmprv8b_hip](https://file+.vscode-resource.vscode-cdn.net/var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmprv8b_hip)2000it [00:03, 576.82it/s]Throughput: 748.22 MB/sStarting benchmarkreading from: [/var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmpkh65n3qh](https://file+.vscode-resource.vscode-cdn.net/var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmpkh65n3qh)2000it [00:03, 574.40it/s]Throughput: 745.03 MB/s

I wonder if there is some clue here.

You must be logged in to vote

1 reply

Comment options

huberb Mar 14, 2025
Author

Alright, digging deeper again. The folder I'm using is located on a NFS that is mounted to my Linux system and is connected with 100Gbit/s Ethernet

defload_samples_numpy(data_path,dataset_len):whileTrue:index=random.randint(0,dataset_len-1)volume=np.memmap(data_path,mode='r',dtype=np.float32,shape=(dataset_len,10,28,28,28)        )yieldvolume[index]defbenchmark(dataset_path,dataset_len):total_size=0start_time=time.time()print("\nStarting benchmark")print(f"reading from:{dataset_path}")sample_sums= []loader=load_samples_numpy(dataset_path,dataset_len)fori,sampleintqdm(enumerate(loader)):ifi==5000:breaktotal_size+=sample.nbytessample_sums.append(sample.sum())end_time=time.time()throughput= (total_size/ (end_time-start_time))/ (1024*1024)print(f"Throughput:{throughput:.2f} MB/s")

dataset_path_numpy='...'dataset_len=100benchmark(data_path_numpy,dataset_len)

5000it [00:02, 2106.75it/s]Throughput: 1758.60 MB/s

looks good so far, however, this dataset is much bigger than 100 samples

dataset_path_numpy='...'dataset_len=1800000benchmark(data_path_numpy,dataset_len)

5000it [00:31, 157.78it/s]Throughput: 132.09 MB/s

seems like using the actual filesize is killing my performance.
Going to try something lighter:

dataset_path_numpy='...'dataset_len=10000benchmark(data_path_numpy,dataset_len)

5000it [00:10, 470.81it/s]Throughput: 393.96 MB/s

Still quite slow. From playing around, it seems like I should split my files into something like ~500 samples per file. Otherwise I lose too much throughput.
This means I'll write 3600 files to my dataset folder and determine the file with modulo on the index that I'm searching for

However, I'm actually using float16 instead of float32, which is killing my performance even more (less bytes per read).
Also, with that many files I won't benefit from filesystem caching anymore. The opening of filehandles will take more time

Also, this is only a single dataset. I will need to use at least 15 of those and I need to sample from them with a predefined probability. I'll reach 20000 files easily if I only save 500 samples per file

Comment options

rabernat
Mar 15, 2025
Maintainer

I dug into this example a bit more with the goal of understanding the impact of memmap. I wrote a very naive flat binary storage layer which uses regular filesystem calls

code

importnumpyasnpfrompathlibimportPathimporttimefromtqdmimporttqdmimportosimportrandomimportzarrimporttempfileimportzarr.codecsdefwrite_numpy_dataset(data_path,raw_data):Path(data_path).mkdir(parents=True,exist_ok=True)forsample_name,sample_dataintqdm(raw_data.items()):mmap=np.memmap(f"{data_path}/{sample_name}.npy",dtype=sample_data.dtype,mode='w+',shape=sample_data.shape        )mmap[:]=sample_data[:]mmap.flush()defwrite_raw_dataset(data_path,raw_data):forsample_name,sample_dataintqdm(raw_data.items()):Path(f"{data_path}/{sample_name}").mkdir(parents=True,exist_ok=True)forninrange(sample_data.shape[0]):raw=sample_data[n].tobytes()withopen(f"{data_path}/{sample_name}/{n:02d}.raw",'wb')asf:f.write(raw)defwrite_zarr_dataset(data_path,raw_data):zarr_store=f'{data_path}/data.zarr'zarr_group=zarr.group(zarr_store,zarr_format=3,overwrite=True)fordata_key,sampleintqdm(raw_data.items()):a=zarr_group.create_array(data_key,dtype=sample.dtype,shape=sample.shape,chunks=(1,)+tuple(sample.shape[1:]),compressors=None,        )a[:]=sampledefload_samples_numpy(data_path):numpy_files= [f"{data_path}/{f}"forfinos.listdir(data_path)]whileTrue:fornumpy_fileinnumpy_files:index=random.randint(0,99)# open a new memory map for every access. This way I will never cache samples in RAMvolume=np.memmap(numpy_file,mode='r',dtype=np.float32            ).reshape([100,-1,28,28,28])yieldvolume[index]defload_samples_raw(data_path):sample_names=os.listdir(data_path)whileTrue:forsample_nameinsample_names:index=random.randint(0,99)fname=f"{data_path}/{sample_name}/{index:02d}.raw"withopen(fname,'rb')asf:data=np.frombuffer(f.read(),dtype=np.float32)data=data.reshape([-1,28,28,28])yielddatadefload_samples_zarr(data_path):zarr_store=f"{data_path}/data.zarr"zarr_group=zarr.open_group(zarr_store,mode="r")data_keys= [sforsinzarr_group]whileTrue:fordata_keyindata_keys:index=random.randint(0,99)yieldzarr_group[data_key][index]defbenchmark(dataset_path,load_fn):total_size=0start_time=time.time()print("\nStarting benchmark")print(f"reading from:{dataset_path}")sample_sums= []fori,sampleintqdm(enumerate(load_fn(dataset_path))):ifi==2000:breaktotal_size+=sample.nbytes#assert sample.shape == (1, 1, 28, 28, 28), f"Unexpected shape: {sample.shape}"sample_sums.append(sample.sum())# do some dummy calculation to force the data into RAMend_time=time.time()throughput= (total_size/ (end_time-start_time))/ (1024*1024)print(f"Throughput:{throughput:.2f} MB/s")defmain():data_path_numpy=tempfile.TemporaryDirectory()data_path_raw=tempfile.TemporaryDirectory()data_path_zarr=tempfile.TemporaryDirectory()raw_data= {'sample0':np.random.normal(size=(100,14,28,28,28)).astype(np.float32),'sample1':np.random.normal(size=(100,16,28,28,28)).astype(np.float32),'sample2':np.random.normal(size=(100,12,28,28,28)).astype(np.float32),'sample3':np.random.normal(size=(100,11,28,28,28)).astype(np.float32),'sample4':np.random.normal(size=(100,13,28,28,28)).astype(np.float32),'sample5':np.random.normal(size=(100,19,28,28,28)).astype(np.float32),'sample6':np.random.normal(size=(100,20,28,28,28)).astype(np.float32),'sample7':np.random.normal(size=(100,15,28,28,28)).astype(np.float32),'sample8':np.random.normal(size=(100,18,28,28,28)).astype(np.float32),'sample9':np.random.normal(size=(100,17,28,28,28)).astype(np.float32),    }write_numpy_dataset(data_path_numpy.name,raw_data)write_raw_dataset(data_path_raw.name,raw_data)write_zarr_dataset(data_path_zarr.name,raw_data)benchmark(data_path_numpy.name,load_samples_numpy)benchmark(data_path_raw.name,load_samples_raw)benchmark(data_path_zarr.name,load_samples_zarr)if__name__=="__main__":main()

# Numpy MemmapStarting benchmarkreading from: /var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmpnh004qcj2000it [00:00, 2582.81it/s]Throughput: 3063.67 MB/s# Numpy RawStarting benchmarkreading from: /var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmpl2jjhfzu2000it [00:00, 2023.97it/s]Throughput: 2400.85 MB/s# ZarrStarting benchmarkreading from: /var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/tmps_ggbroe2000it [00:02, 676.33it/s]Throughput: 802.37 MB/s

I feel that Zarr should be able to at least match the "Numpy raw" performance. I wonder if this is related to the unnecessary memory copies issue described in#2904.

You must be logged in to vote

0 replies

Movatterモバイル変換

Uh oh!

Slow performance on Zarr DirectoryStore#2484

Uh oh!

Uh oh!

huberbNov 12, 2024

Replies: 5 comments· 10 replies

Uh oh!

Uh oh!

d-v-bNov 12, 2024 Maintainer

Uh oh!

Uh oh!

huberbNov 12, 2024 Author

Uh oh!

d-v-bNov 12, 2024 Maintainer

Uh oh!

huberbNov 13, 2024 Author

Uh oh!

d-v-bNov 13, 2024 Maintainer

Uh oh!

huberbNov 14, 2024 Author

Uh oh!

d-v-bNov 14, 2024 Maintainer

Uh oh!

huberbMar 10, 2025 Author

Uh oh!

d-v-bMar 10, 2025 Maintainer

Uh oh!

huberbMar 13, 2025 Author

Uh oh!

rabernatMar 13, 2025 Maintainer

Uh oh!

huberbMar 13, 2025 Author

Uh oh!

rabernatMar 13, 2025 Maintainer

Uh oh!

Uh oh!

huberbMar 14, 2025 Author

Uh oh!

rabernatMar 15, 2025 Maintainer

Uh oh!

huberb
Nov 12, 2024

Replies: 5 comments 10 replies

d-v-b
Nov 12, 2024
Maintainer

huberb Nov 12, 2024
Author

d-v-b
Nov 12, 2024
Maintainer

huberb Nov 13, 2024
Author

d-v-b Nov 13, 2024
Maintainer

huberb Nov 14, 2024
Author

d-v-b Nov 14, 2024
Maintainer

huberb
Mar 10, 2025
Author

d-v-b Mar 10, 2025
Maintainer

huberb Mar 13, 2025
Author

rabernat Mar 13, 2025
Maintainer

huberb Mar 13, 2025
Author

rabernat
Mar 13, 2025
Maintainer

huberb Mar 14, 2025
Author

rabernat
Mar 15, 2025
Maintainer