zarr-developers/zarr-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork366
Star1.8k

Inconsistent reading performance with multiple cpu threads#2184

Unanswered

FelipeMoser asked this question inQ&A

FelipeMoser

Aug 13, 2024

· 1 comments· 1 reply

Return to top

Discussion options

FelipeMoser
Aug 13, 2024

Zarr version

2.18.2

Numcodecs version

0.13.0

Python Version

3.12.4

Operating System

Linux

Installation

pip install zarr

Description

I've converted some ome.tiff files to .zarr and have had issues with the reading time of the zarr files.
For this example I'm using an image of shape [ 4, 16484, 11620 ], and I have stored it with chunk size (1,1024, 1024) as well as unchunked.
All files are stored in a RAID0 nvme ssd and have the same compression.

I've compared the reading times using 1, 10, and 50 logical threads (with taskset) and noticed the performance can vary greatly depending on the settings. If unchunked, additional threads significantly improves reading time, just like when reading ome.tiffs. In fact, reading unchunked files is significantly faster than ome.tiffs. However, chunked files do not seem to benefit from additional threads, even resulting in slower times. Reading with the dask library also seems to have inconsistent performance, although in a different way.

Additionally, considering the hardware (RAID0 nvme ssd, dual 56 core CPU Intel Xeon Platinum 8280) , I'd assume that reading chunked files with multiple workers would be much faster, as the processing is done in parallel. But here we see that not only does it not seem to benefit from more workers, but it's an order of magnitude slower than reading an unchunked file.

Is there something I could be missing here?

Steps to reproduce

This is the code I'm using:

start = time.time()z = zarr.open(path_zarr)[:]print(f"Zarr read time (chunked): {time.time() - start}")start = time.time()z_nochunk = zarr.open(path_zarr_nochunk)[:]print(f"Zarr read time (no chunks): {time.time() - start}")start = time.time()d = dask.array.from_zarr(path_zarr).compute()print(f"Dask read time (chunked): {time.time() - start}")start = time.time()d_nochunk = dask.array.from_zarr(path_zarr_nochunk).compute()print(f"Dask read time (no chunks): {time.time() - start}")start = time.time()t = tifffile.imread(path_tiff)print(f"Tiff read time: {time.time() - start}")

Results:

# 1 thread:Zarr read time (chunked): 2.8317720890045166Zarr read time (no chunks): 0.7048866748809814Dask read time (chunked): 3.4919939041137695Dask read time (no chunks): 0.7005000114440918Tiff read time: 1.094351053237915

# 10 threads:Zarr read time (chunked): 2.8606531620025635Zarr read time (no chunks): 0.32688140869140625Dask read time (chunked): 2.7447142601013184Dask read time (no chunks): 0.712876558303833Tiff read time: 0.4734377861022949

# 50 threads:Zarr read time (chunked): 2.8490779399871826Zarr read time (no chunks): 0.2691495418548584Dask read time (chunked): 2.9153594970703125Dask read time (no chunks): 0.716036319732666Tiff read time: 0.4784407615661621

Additional output

No response

You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

ziyuanzhao2000
Apr 9, 2025

Is there any update to this issue from the developers?

I'm encountering the same issue in my project.

You must be logged in to vote

1 reply

Comment options

jhamman Apr 10, 2025
Maintainer

As it, this post doesn't have enough information in it to be actionable.

Can we get a reproducible example added to this post?
I'd also be curious to hear if this is reproducible using zarr 3.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistent reading performance with multiple cpu threads#2184

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

FelipeMoser
Aug 13, 2024

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Steps to reproduce

Additional output

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

ziyuanzhao2000
Apr 9, 2025

Uh oh!

{{title}}

Uh oh!

jhamman Apr 10, 2025
Maintainer

Select a reply

Uh oh!

Movatterモバイル変換

Uh oh!

Inconsistent reading performance with multiple cpu threads#2184

Uh oh!

Uh oh!

FelipeMoserAug 13, 2024

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Steps to reproduce

Additional output

Replies: 1 comment· 1 reply

Uh oh!

ziyuanzhao2000Apr 9, 2025

Uh oh!

jhammanApr 10, 2025 Maintainer

Uh oh!

FelipeMoser
Aug 13, 2024

Replies: 1 comment 1 reply

ziyuanzhao2000
Apr 9, 2025

jhamman Apr 10, 2025
Maintainer