Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit991ee91

Browse files
authored
Merge branch 'main' into numpy==2.1
2 parents30990a3 +e3ee591 commit991ee91

File tree

14 files changed

+160
-13
lines changed

14 files changed

+160
-13
lines changed

‎.github/workflows/check_changelogs.yml‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
-uses:actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8# v5.0.0
1313

1414
-name:Install uv
15-
uses:astral-sh/setup-uv@b75a909f75acd358c2196fb9a5f1299a9a8868a4#v6.7.0
15+
uses:astral-sh/setup-uv@2ddd2b9cb38ad8efd50337e8ab201519a34c9f24#v7.1.1
1616

1717
-name:Check changelog entries
1818
run:uv run --no-sync python ci/check_changelog_entries.py

‎.github/workflows/issue-metrics.yml‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
SEARCH_QUERY:'repo:zarr-developers/zarr-python is:issue created:${{ env.last_month }} -reason:"not planned"'
3636

3737
-name:Create issue
38-
uses:peter-evans/create-issue-from-file@v5
38+
uses:peter-evans/create-issue-from-file@v6
3939
with:
4040
title:Monthly issue metrics report
4141
token:${{ secrets.GITHUB_TOKEN }}

‎.pre-commit-config.yaml‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ci:
66
default_stages:[pre-commit, pre-push]
77
repos:
88
-repo:https://github.com/astral-sh/ruff-pre-commit
9-
rev:v0.13.1
9+
rev:v0.13.3
1010
hooks:
1111
-id:ruff-check
1212
args:["--fix", "--show-fixes"]
@@ -42,7 +42,7 @@ repos:
4242
-hypothesis
4343
-s3fs
4444
-repo:https://github.com/scientific-python/cookie
45-
rev:2025.05.02
45+
rev:2025.10.01
4646
hooks:
4747
-id:sp-repo-review
4848
-repo:https://github.com/pre-commit/pygrep-hooks

‎changes/3526.feature.md‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Increased the default value of`async.concurrency` from 10 to 64 to improve parallelism and throughput for concurrent I/O operations. This change enables better performance out-of-the-box for most workloads. Users with specific resource constraints or when using many Dask threads may want to lower this value via the`ZARR_ASYNC_CONCURRENCY` environment variable or by setting`zarr.config.set({'async.concurrency': N})`.

‎changes/3532.misc.md‎

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Accept`"bytes"` as an alias for`"variable_length_bytes"` when parsing`JSON`-encoded Zarr V3
2+
data types.

‎changes/3535.bugfix.md‎

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Fixed a bug where the`"consolidated_metadata"` key was written to metadata documents even when
2+
consolidated metadata was not used, resulting in invalid metadata documents.

‎docs/release-notes.md‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
- Add a command-line interface to migrate v2 Zarr metadata to v3. Corresponding functions are also provided under zarr.metadata. ([#1798](https://github.com/zarr-developers/zarr-python/issues/1798))
1010
- Add obstore implementation of delete_dir. ([#3310](https://github.com/zarr-developers/zarr-python/issues/3310))
1111
- Adds a registry for chunk key encodings for extensibility. This allows users to implement a custom`ChunkKeyEncoding`, which can be registered via`register_chunk_key_encoding` or as an entry point under`zarr.chunk_key_encoding`. ([#3436](https://github.com/zarr-developers/zarr-python/issues/3436))
12-
- Trying to open a group at a pathwere a array already exists now raises a helpful error. ([#3444](https://github.com/zarr-developers/zarr-python/issues/3444))
12+
- Trying to open a group at a pathwhere an array already exists now raises a helpful error. ([#3444](https://github.com/zarr-developers/zarr-python/issues/3444))
1313

1414
##Bugfixes
1515

‎docs/user-guide/performance.md‎

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,84 @@ Coming soon.
175175

176176
##Parallel computing and synchronization
177177

178-
Coming soon.
178+
Zarr is designed to support parallel computing and enables concurrent reads and writes to arrays. This section covers how to optimize Zarr's concurrency settings for different parallel computing scenarios.
179+
180+
###Concurrent I/O operations
181+
182+
Zarr uses asynchronous I/O internally to enable concurrent reads and writes across multiple chunks. The level of concurrency is controlled by the`async.concurrency` configuration setting, which determines the maximum number of concurrent I/O operations.
183+
184+
The default value is 64, which provides good performance for most workloads. You can adjust this value based on your specific needs:
185+
186+
```python
187+
import zarr
188+
189+
# Set concurrency for the current session
190+
zarr.config.set({'async.concurrency':128})
191+
192+
# Or use environment variable
193+
# export ZARR_ASYNC_CONCURRENCY=128
194+
```
195+
196+
Higher concurrency values can improve throughput when:
197+
- Working with remote storage (e.g., S3, GCS) where network latency is high
198+
- Reading/writing many small chunks in parallel
199+
- The storage backend can handle many concurrent requests
200+
201+
Lower concurrency values may be beneficial when:
202+
- Working with local storage with limited I/O bandwidth
203+
- Memory is constrained (each concurrent operation requires buffer space)
204+
- Using Zarr within a parallel computing framework (see below)
205+
206+
###Using Zarr with Dask
207+
208+
[Dask](https://www.dask.org/) is a popular parallel computing library that works well with Zarr for processing large arrays. When using Zarr with Dask, it's important to consider the interaction between Dask's thread pool and Zarr's concurrency settings.
209+
210+
**Important**: When using many Dask threads, you may need to reduce both Zarr's`async.concurrency` and`threading.max_workers` settings to avoid creating too many concurrent operations. The total number of concurrent I/O operations can be roughly estimated as:
211+
212+
```
213+
total_concurrency ≈ dask_threads × zarr_async_concurrency
214+
```
215+
216+
For example, if you're running Dask with 10 threads and Zarr's default concurrency of 64, you could potentially have up to 640 concurrent operations, which may overwhelm your storage system or cause memory issues.
217+
218+
**Recommendation**: When using Dask with many threads, configure Zarr's concurrency settings:
219+
220+
```python
221+
import zarr
222+
import dask.arrayas da
223+
224+
# If using Dask with many threads (e.g., 8-16), reduce Zarr's concurrency settings
225+
zarr.config.set({
226+
'async.concurrency':4,# Limit concurrent async operations
227+
'threading.max_workers':4,# Limit Zarr's internal thread pool
228+
})
229+
230+
# Open Zarr array
231+
z= zarr.open_array('data/large_array.zarr',mode='r')
232+
233+
# Create Dask array from Zarr array
234+
arr= da.from_array(z,chunks=z.chunks)
235+
236+
# Process with Dask
237+
result= arr.mean(axis=0).compute()
238+
```
239+
240+
**Configuration guidelines for Dask workloads**:
241+
242+
-`async.concurrency`: Controls the maximum number of concurrent async I/O operations. Start with a lower value (e.g., 4-8) when using many Dask threads.
243+
-`threading.max_workers`: Controls Zarr's internal thread pool size for blocking operations (defaults to CPU count). Reduce this to avoid thread contention with Dask's scheduler.
244+
245+
You may need to experiment with different values to find the optimal balance for your workload. Monitor your system's resource usage and adjust these settings based on whether your storage system or CPU is the bottleneck.
246+
247+
###Thread safety and process safety
248+
249+
Zarr arrays are designed to be thread-safe for concurrent reads and writes from multiple threads within the same process. However, proper synchronization is required when writing to overlapping regions from multiple threads.
250+
251+
For multi-process parallelism, Zarr provides safe concurrent writes as long as:
252+
- Different processes write to different chunks
253+
- The storage backend supports atomic writes (most do)
254+
255+
When writing to the same chunks from multiple processes, you should use external synchronization mechanisms or ensure that writes are coordinated to avoid race conditions.
179256

180257
##Pickle support
181258

‎src/zarr/core/config.py‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ def enable_gpu(self) -> ConfigSet:
107107
"order":"C",
108108
"write_empty_chunks":False,
109109
},
110-
"async": {"concurrency":10,"timeout":None},
110+
"async": {"concurrency":64,"timeout":None},
111111
"threading": {"max_workers":None},
112112
"json_indent":2,
113113
"codec_pipeline": {

‎src/zarr/core/dtype/npy/bytes.py‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1046,7 +1046,7 @@ def _check_json_v3(cls, data: DTypeJSON) -> TypeGuard[Literal["variable_length_b
10461046
True if the input is a valid representation of this class in Zarr V3, False otherwise.
10471047
"""
10481048

1049-
returndata==cls._zarr_v3_name
1049+
returndatain (cls._zarr_v3_name,"bytes")
10501050

10511051
@classmethod
10521052
def_from_json_v2(cls,data:DTypeJSON)->Self:

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp