numpy.lib.format#
Binary serialization
NPY format#
A simple format for saving numpy arrays to disk with the fullinformation about them.
The.npy format is the standard binary file format in NumPy forpersisting asingle arbitrary NumPy array on disk. The format stores allof the shape and dtype information necessary to reconstruct the arraycorrectly even on another machine with a different architecture.The format is designed to be as simple as possible while achievingits limited goals.
The.npz format is the standard format for persistingmultiple NumPyarrays on disk. A.npz file is a zip file containing multiple.npyfiles, one for each array.
Capabilities#
Can represent all NumPy arrays including nested record arrays andobject arrays.
Represents the data in its native binary form.
Supports Fortran-contiguous arrays directly.
Stores all of the necessary information to reconstruct the arrayincluding shape and dtype on a machine of a differentarchitecture. Both little-endian and big-endian arrays aresupported, and a file with little-endian numbers will yielda little-endian array on any machine reading the file. Thetypes are described in terms of their actual sizes. For example,if a machine with a 64-bit C “long int” writes out an array with“long ints”, a reading machine with 32-bit C “long ints” will yieldan array with 64-bit integers.
Is straightforward to reverse engineer. Datasets often live longer thanthe programs that created them. A competent developer should beable to create a solution in their preferred programming language toread most
.npyfiles that they have been given without muchdocumentation.Allows memory-mapping of the data. See
open_memmap.Can be read from a filelike stream object instead of an actual file.
Stores object arrays, i.e. arrays containing elements that are arbitraryPython objects. Files with object arrays are not to be mmapable, butcan be read and written to disk.
Limitations#
Arbitrary subclasses of numpy.ndarray are not completely preserved.Subclasses will be accepted for writing, but only the array data willbe written out. A regular numpy.ndarray object will be createdupon reading the file.
Warning
Due to limitations in the interpretation of structured dtypes, dtypeswith fields with empty names will have the names replaced by ‘f0’, ‘f1’,etc. Such arrays will not round-trip through the format entirelyaccurately. The data is intact; only the field names will differ. We areworking on a fix for this. This fix will not require a change in thefile format. The arrays with such structures can still be saved andrestored, and the correct dtype may be restored by using theloadedarray.view(correct_dtype) method.
File extensions#
We recommend using the.npy and.npz extensions for files savedin this format. This is by no means a requirement; applications may wishto use these file formats but use an extension specific to theapplication. In the absence of an obvious alternative, however,we suggest using.npy and.npz.
Version numbering#
The version numbering of these formats is independent of NumPy versionnumbering. If the format is upgraded, the code innumpy.io will stillbe able to read and write Version 1.0 files.
Format Version 1.0#
The first 6 bytes are a magic string: exactly\x93NUMPY.
The next 1 byte is an unsigned byte: the major version number of the fileformat, e.g.\x01.
The next 1 byte is an unsigned byte: the minor version number of the fileformat, e.g.\x00. Note: the version of the file format is not tiedto the version of the numpy package.
The next 2 bytes form a little-endian unsigned short int: the length ofthe header data HEADER_LEN.
The next HEADER_LEN bytes form the header data describing the array’sformat. It is an ASCII string which contains a Python literal expressionof a dictionary. It is terminated by a newline (\n) and padded withspaces (\x20) to make the total oflen(magicstring)+2+len(length)+HEADER_LEN be evenly divisibleby 64 for alignment purposes.
The dictionary contains three keys:
- “descr”dtype.descr
An object that can be passed as an argument to the
numpy.dtypeconstructor to create the array’s dtype.- “fortran_order”bool
Whether the array data is Fortran-contiguous or not. SinceFortran-contiguous arrays are a common form of non-C-contiguity,we allow them to be written directly to disk for efficiency.
- “shape”tuple of int
The shape of the array.
For repeatability and readability, the dictionary keys are sorted inalphabetic order. This is for convenience only. A writer SHOULD implementthis if possible. A reader MUST NOT depend on this.
Following the header comes the array data. If the dtype contains Pythonobjects (i.e.dtype.hasobjectisTrue), then the data is a Pythonpickle of the array. Otherwise the data is the contiguous (either C-or Fortran-, depending onfortran_order) bytes of the array.Consumers can figure out the number of bytes by multiplying the numberof elements given by the shape (noting thatshape=() means there is1 element) bydtype.itemsize.
Format Version 2.0#
The version 1.0 format only allowed the array header to have a total size of65535 bytes. This can be exceeded by structured arrays with a large number ofcolumns. The version 2.0 format extends the header size to 4 GiB.numpy.save will automatically save in 2.0 format if the data requires it,else it will always use the more compatible 1.0 format.
The description of the fourth element of the header therefore has become:“The next 4 bytes form a little-endian unsigned int: the length of the headerdata HEADER_LEN.”
Format Version 3.0#
This version replaces the ASCII string (which in practice was latin1) witha utf8-encoded string, so supports structured types with any unicode fieldnames.
Notes#
The.npy format, including motivation for creating it and a comparison ofalternatives, is described in the“npy-format” NEP, however details haveevolved with time and this document is more current.
Functions
| Returns a dtype based off the given description. |
| Returns the dtype unchanged if it contained no metadata or a copy of the dtype if it (or any of its structure dtypes) contained metadata. |
| Get a serializable descriptor from the dtype. |
| Get the dictionary of header metadata from a numpy.ndarray. |
| |
| Return the magic string for the given file format version. |
| Open a .npy file as a memory-mapped array. |
| Read an array from an NPY file. |
| Read an array header from a filelike object using the 1.0 file format version. |
| Read an array header from a filelike object using the 2.0 file format version. |
| Read the magic string to get the version of the file format. |
| Write an array to an NPY file, including a header. |
| Write the header for an array using the 1.0 format. |
| Write the header for an array using the 2.0 format. |