Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Python library for numpy arrays that persist on disk in a format that is simple, self-documented and tool-independent, and maximizes universal readability.

License

NotificationsYou must be signed in to change notification settings

gbeckers/Darr

Repository files navigation

Github CI StatusAppveyor StatusPyPi versionConda ForgeCodecov BadgeDocs StatusZenodo Badge

Darr is a Python science library that allows you to work efficiently withpotentially very large, disk-based Numpy arrays that are widely readable andself-documented. Documentation includes copy-paste ready code to read arraysin many popular data science languages such as R, Julia, Scilab, IDL,Matlab, Maple, and Mathematica, or in Python/Numpy without Darr. Withoutexporting them and with minimal effort.

Universal readability of data is a pillar of good scientific practice. It isalso generally a good idea for anyone who wants to flexibly move betweenanalysis environments, who wants to save data for the longer term, or whowants to share data with others without spending much time on figuring outand/or explaining how the receiver can read it. No idea how to read your7-dimensional uint32 numpy array in Matlab to quickly try out an algorithmyour colleague wrote? No worries, a quick copy-paste of code from the arraydocumentation is all that is needed to read your data in, e.g. R or Matlab(seeexample).As you work with your array, its documentation is automatically kept up todate. No need to export anything, make notes, or to provide elaborateexplanation. No looking up things. No dependence on complicated formats orspecialized libraries for reading you data elsewhere later.

In essence, Darr makes it trivially easy to share your numerical arrays withothers or with yourself when working in different computing environments,and stores them in a future-proof way.

More rationale for a tool-independent approach to numeric array storage isprovidedhere.

Under the hood, Darr uses NumPy memory-mapped arrays, which is a widelyestablished and trusted way of working with disk-based numerical data, andwhich makes Darr fully NumPy compatible. This enables efficient out-of-coreread/write access to potentially very large arrays. In addition to automaticdocumentation, Darr adds other functionality to NumPy's memmap, such as easythe appending and truncating of data, support for ragged arrays, the abilityto create arrays from iterators, and easy use of metadata. Flat binary filesand (JSON) text files are accompanied by a README text file that explains howthe array and metadata are stored (see example arrays).

See thistutorialfor a brief introduction, or thedocumentation for more info.

Darr is currently pre-1.0, and still undergoing development. It is open sourceand freely available under theNew BSD License terms.

Features

  • Data is stored purely based on flat binary and text files, maximizinguniversal readability.
  • Automatic self-documention, including copy-paste ready code snippets forreading the array in a number of popular data analysis environments, such asPython (without Darr), R, Julia, Scilab, Octave/Matlab, GDL/IDL, andMathematica(seeexample array).
  • Disk-persistent array data is directly accessible throughNumPyindexingand may be larger than RAM
  • Easy and efficient appending of data (see example).
  • Supports ragged arrays.
  • Easy use of metadata, stored in a widely readable separateJSON text file (see example).
  • Many numeric types are supported: (u)int8-(u)int64, float16-float64,complex64, complex128.
  • Integrates easily with theDasklibrary for out-of-core computation on very large arrays.
  • Minimal dependencies, onlyNumPy.

Limitations

  • Nostructured (record) arrayssupported yet, justndarrays
  • No string data, just numeric.
  • No compression, although compression for archiving purposes is supported.
  • Uses multiple files per array, as binary data is separated from textdocumentation and metadata. This can be a disadvantage in terms of storagespace if you have very many very small arrays.

Installation

Darr officially depends on Python 3.9 or higher. Older versions may work(probably >= 3.6) but are not tested.

Install Darr from PyPI:

$ pip install darr

Or, install Darr via conda:

$ conda install -c conda-forge darr

To install the latest development version, use pip with the latest GitHubmaster:

$ pip install git+https://github.com/gbeckers/darr@master

Documentation

See thedocumentation for more information.

Contributing

Any help / suggestions / ideas / contributions are welcome and very muchappreciated. For any comment, question, or error, please open an issue orpropose a pull request.

Other interesting projects

If Darr is not exactly what you are looking for, have a look at these projects:

Darr is BSD licensed (BSD 3-Clause License). (c) 2017-2024, GabriëlBeckers

About

A Python library for numpy arrays that persist on disk in a format that is simple, self-documented and tool-independent, and maximizes universal readability.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp