Quickstart#
Welcome to the Zarr-Python Quickstart guide! This page will help you get up and running withthe Zarr library in Python to efficiently manage and analyze multi-dimensional arrays.
Zarr is a powerful library for storage of n-dimensional arrays, supporting chunking,compression, and various backends, making it a versatile choice for scientific andlarge-scale data.
Installation#
Zarr requires Python 3.11 or higher. You can install it viapip:
pipinstallzarr
orconda:
condainstall--channelconda-forgezarr
Creating an Array#
To get started, you can create a simple Zarr array:
>>>importzarr>>>importnumpyasnp>>>>>># Create a 2D Zarr array>>>z=zarr.create_array(...store="data/example-1.zarr",...shape=(100,100),...chunks=(10,10),...dtype="f4"...)>>>>>># Assign data to the array>>>z[:,:]=np.random.random((100,100))>>>z.infoType : ArrayZarr format : 3Data type : DataType.float32Shape : (100, 100)Chunk shape : (10, 10)Order : CRead-only : FalseStore type : LocalStoreCodecs : [{'endian': <Endian.little: 'little'>}, {'level': 0, 'checksum': False}]No. bytes : 40000 (39.1K)
Here, we created a 2D array of shape(100,100), chunked into blocks of(10,10), and filled it with random floating-point data. This array waswritten to aLocalStore in thedata/example-1.zarr directory.
Compression and Filters#
Zarr supports data compression and filters. For example, to use Blosc compression:
>>>z=zarr.create_array(..."data/example-3.zarr",...mode="w",shape=(100,100),...chunks=(10,10),dtype="f4",...compressors=zarr.codecs.BloscCodec(cname="zstd",clevel=3,shuffle=zarr.codecs.BloscShuffle.shuffle)...)>>>z[:,:]=np.random.random((100,100))>>>>>>z.infoType : ArrayZarr format : 3Data type : DataType.float32Shape : (100, 100)Chunk shape : (10, 10)Order : CRead-only : FalseStore type : LocalStoreCodecs : [{'endian': <Endian.little: 'little'>}, {'level': 0, 'checksum': False}]No. bytes : 40000 (39.1K)
This compresses the data using the Zstandard codec with shuffle enabled for better compression.
Hierarchical Groups#
Zarr allows you to create hierarchical groups, similar to directories:
>>># Create nested groups and add arrays>>>root=zarr.group("data/example-2.zarr")>>>foo=root.create_group(name="foo")>>>bar=root.create_array(...name="bar",shape=(100,10),chunks=(10,10),dtype="f4"...)>>>spam=foo.create_array(name="spam",shape=(10,),dtype="i4")>>>>>># Assign values>>>bar[:,:]=np.random.random((100,10))>>>spam[:]=np.arange(10)>>>>>># print the hierarchy>>>root.tree()/├── bar (100, 10) float32└── foo └── spam (10,) int32
This creates a group with two datasets:foo andbar.
Batch Hierarchy Creation#
Zarr provides tools for creating a collection of arrays and groups with a single function call.Suppose we want to copy existing groups and arrays into a new storage backend:
>>># Create nested groups and add arrays>>>root=zarr.group("data/example-3.zarr",attributes={'name':'root'})>>>foo=root.create_group(name="foo")>>>bar=root.create_array(...name="bar",shape=(100,10),chunks=(10,10),dtype="f4"...)>>>nodes={'':root.metadata}|{k:v.metadatafork,vinroot.members()}>>>print(nodes)>>>fromzarr.storageimportMemoryStore>>>new_nodes=dict(zarr.create_hierarchy(store=MemoryStore(),nodes=nodes))>>>new_root=new_nodes['']>>>assertnew_root.attrs==root.attrs
Note thatzarr.create_hierarchy() will only initialize arrays and groups – copying array data mustbe done in a separate step.
Persistent Storage#
Zarr supports persistent storage to disk or cloud-compatible backends. While examples aboveutilized azarr.storage.LocalStore, a number of other storage options are available.
Zarr integrates seamlessly with cloud object storage such as Amazon S3 and Google Cloud Storageusing external libraries likes3fs orgcsfs:
>>>imports3fs>>>>>>z=zarr.create_array("s3://example-bucket/foo",mode="w",shape=(100,100),chunks=(10,10),dtype="f4")>>>z[:,:]=np.random.random((100,100))
A single-file store can also be created using the thezarr.storage.ZipStore:
>>># Store the array in a ZIP file>>>store=zarr.storage.ZipStore("data/example-3.zip",mode='w')>>>>>>z=zarr.create_array(...store=store,...mode="w",...shape=(100,100),...chunks=(10,10),...dtype="f4"...)>>>>>># write to the array>>>z[:,:]=np.random.random((100,100))>>>>>># the ZipStore must be explicitly closed>>>store.close()
To open an existing array from a ZIP file:
>>># Open the ZipStore in read-only mode>>>store=zarr.storage.ZipStore("data/example-3.zip",read_only=True)>>>>>>z=zarr.open_array(store,mode='r')>>>>>># read the data as a NumPy Array>>>z[:]array([[0.66734236, 0.15667458, 0.98720884, ..., 0.36229587, 0.67443246, 0.34315267], [0.65787303, 0.9544212 , 0.4830079 , ..., 0.33097172, 0.60423803, 0.45621237], [0.27632037, 0.9947008 , 0.42434934, ..., 0.94860053, 0.6226942 , 0.6386924 ], ..., [0.12854576, 0.934397 , 0.19524333, ..., 0.11838563, 0.4967675 , 0.43074256], [0.82029045, 0.4671437 , 0.8090906 , ..., 0.7814118 , 0.42650765, 0.95929915], [0.4335856 , 0.7565437 , 0.7828931 , ..., 0.48119593, 0.66220033, 0.6652362 ]], shape=(100, 100), dtype=float32)
Read more about Zarr’s storage options in theUser Guide.
Next Steps#
Now that you’re familiar with the basics, explore the following resources:
