CINPLA/exdirPublic

NotificationsYou must be signed in to change notification settings
Fork14
Star74

Directory structure standard for experimental pipelines.

License

MIT license

74 stars 14 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 607 Commits
.conda-recipe		.conda-recipe
.github		.github
3rdparty		3rdparty
docs		docs
examples		examples
exdir		exdir
jupyter-config		jupyter-config
libs/travis-conda-scripts		libs/travis-conda-scripts
tests		tests
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
environment.yml		environment.yml
postBuild		postBuild
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Repository files navigation

Important: The reference implementation contained in this repository is intended forfeedback and as a basis for future library implementations.It is not ready for production use.

Experimental Directory Structure

Experimental Directory Structure (exdir) is a proposed, open specification forexperimental pipelines.Exdir is currently a prototype published to invite researchers to give feedback onthe standard.

Exdir is an hierarchical format based on open standards.It is inspired by already existing formats, such as HDF5 and NumPy,and attempts to solve some of the problems assosciated with these whileretaining their benefits.The development of exdir owes a great deal to the efforts of others to standardizedata formats in science in general and neuroscience in particular, among themthe Klusta Kwik Team and Neurodata Without Borders.

Installation

Exdir can be installed with Anaconda:

conda install exdir -c cinpla -c conda-forge

Usage

The following code creates an Exdir directory with a group and a dataset:

import numpy as npimport exdirexperiment = exdir.File("experiment.exdir")group = experiment.create_group("group")data = np.arange(10)dataset = group.create_dataset("dataset", data=data)

The data can be retrieved using the above used keys:

group = experiment["group"]dataset = group["dataset"]print(dataset)

Attributes can be added to all objects, including files, groups and datasets:

group.attrs["room_number"] = 1234dataset.attrs["recoring_date"] = "2018-02-04"

See thedocumentation for more information.

Benchmarks

Seebenchmarks.ipynb.

Alive versioncan be explored using Binder.

Quick introduction

Exdir is not a file format in itself, but rather a standardized folder structure.The abstract data model is almost equivalent to that of HDF5,with groups, datasets, and attributes.This was done to simplify the transition from either format.However, data in Exdir is not stored in a single file,but rather multiple files within the hierarchy.The metadata is stored in a restricted verison of the YAML 1.2 formatand the binary data in the NumPy 2.0 format.

Here is an example structure:

example.exdir (File, folder)│   attributes.yaml (-, file)│   exdir.yaml (-, file)│├── dataset1 (Dataset, folder)│   ├── data.npy (-, file)│   ├── attributes.yaml (-, file)│   └── exdir.yaml (-, file)│└── group1 (Group, folder)│   ├── attributes.yaml (-, file)    └── exdir.yaml (-, file)    │    ├── dataset3 (Dataset, folder)    │   ├── data.npy (-, file)    │   ├── attributes.yaml (-, file)    │   └── exdir.yaml (-, file)    │    ├── link1 (Link, folder)    │   └── exdir.yaml (-, file)    │    └── dataset4 (Dataset, folder)        ├── data.npy (-, file)        ├── attributes.yaml (-, file)        ├── exdir.yaml (-, file)        │        └── raw (Raw, folder)            ├── image0001.tif (-, file)            ├── image0002.tif (-, file)            └── ...

The above structure shows the name of the object, the type of the object in exdir andthe type of the object on the file system as follows:

[name] ([EXP type], [file system type])

A dash (-) indicates that the object doesn't have a separate internalrepresentation in the format, but is used indirectly.It is however explicitly stored in the file system.

The above structure shows that theexample.exdir file is simply a folder inthe file system, but when read by an exdir parser, it appears as aFile.TheFile is the root object of any structure.The metadata of theFile is stored in a file named meta.yaml.This is internal to exdir.Attributes of theFile is stored in a file named attributes.yaml.This is optional.

Below the file, multiple objects may appear, among themDatasets andGroups.BothDatasets andGroups are stored as folders in the file system.Both have their metadata stored in files named meta.yaml.These are not visible as files within the exdir format, but appear simply asthe metadata for theDatasets andGroups.

If there is any additional data assosciated with the dataset,it may (optionally) be stored in a folder namedraw.This differs from HDF5, but allows storing raw data from experiments (such asTIFF images from an external microscopy system) locally with the dataconverted to the NumPy format.

Goals and benefits

By reusing the structure of HDF5, exdir should be familiar to researchers thathave experience with this format.However, by not storing the data in a single file,the data is much less prone to corruption.Further, HDF5 is not optimal for modifications, parallelization or dataexploration.

By storing the data in separate files, we get the many benefits of modern filesystems in protection against data corruption.The data is more easily accessible in parallell computing and is stored ina well known and tested format.It is easier to explore the data by use of standard command line tools or simplythe file explorer.

However, we intend to develop a graphical user interface along the lines ofHDF5view that allows simple data exploration similar to this.

Principles

Exdir should be based on existing open standards
Exdir should not solve problems that have already been solved, such as storing binary data
Exdir should be lightweight

Background

Exdir was designed due to a need at the Centre for IntegrativeNeuroplasticity (CINPLA) at the University of Oslo for a format that wouldfit the experimental pipeline.While researching the different options, we found that the neurosciencecommunity had several formats for storing experimental data.A large effort at standardizing the format in the community was spawned byNeurodata Without Borders (NWB).An initial version of the NWB format was published, based on the HDF5 format.However, shortly after the first publication of NWB, concerns were voicedabout HDF5 format from the developers of the klusta project[1].They had been using HDF5 as the underlying file format for their software suiteand started seeing problems with the file format among their users.They saw multiple problems with HDF5 in the form of data corrpution, performanceissues, bugs and poor support for parallelization.

HDF5 is not optimal for modifications.This is not a problem if you only store data from acquisition,as this shouldn't be changed.However, for analysis it is often necessary to modify the data multiple times asdifferent methods and parameters are tested.At the same time, it is beneficial to keep the analysed data stored togetherwith the acquisition data.

[1]http://cyrille.rossant.net/moving-away-hdf5/

About

Directory structure standard for experimental pipelines.

exdir.rtfd.io

Releases6

v0.5.0 Latest

Sep 30, 2023

+ 5 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Experimental Directory Structure

Installation

Usage

Benchmarks

Quick introduction

Goals and benefits

Principles

Background

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases6

Packages

Uh oh!

Contributors10

Uh oh!

Languages

Movatterモバイル変換

License

CINPLA/exdir

Folders and files

Latest commit

History

Repository files navigation

Experimental Directory Structure

Installation

Usage

Benchmarks

Quick introduction

Goals and benefits

Principles

Background

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases6

Packages0

Uh oh!

Contributors10

Uh oh!

Languages

Packages