Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

HDFStore appending for mixed datatypes, including NumPy arrays #3032

Closed
Labels
IO DataIO issues that don't fit into a more specific label
@alexbw

Description

@alexbw

A pandas array I have contains some image data, recorded from a camera during a behavioral experiment. A simplified version looks like this:

num_frames = 100mouse = [{"velocity":np.random.random((1,))[0], \        "image":np.random.random((80,80)).astype('float32'), \        "spine":np.r_[0:80].astype('float32'),        #"time":millisec(i*33),        "mouse_id":"mouse1",        "special":i} for i in range(num_frames)]df = DataFrame(mouse)

I understand I can't query over theimage orspine entries. Of course, I can easily query for low velocity frames, like this:

low_velocity = df[df['velocity'] < 0.5]

However, there is a lot of this data (several hundred gigabytes), so I'd like to keep it in an HDF5 file, and pull up frames only as needed from disk.

In v0.10, I understand that "mixed-type" frames now can be appended into the HDFStore. However, I get an error when trying to append this dataframe into the HDFStore.

store = HDFStore("mouse.h5", "w")store.append("mouse", df)---------------------------------------------------------------------------Exception                                 Traceback (most recent call last)<ipython-input-30-8f0da271e75f> in <module>()      1 store = HDFStore("mouse.h5", "w")----> 2 store.append("mouse", df)/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in append(self, key, value, columns, **kwargs)    543             raise Exception("columns is not a supported keyword in append, try data_columns")    544 --> 545         self._write_to_group(key, value, table=True, append=True, **kwargs)    546     547     def append_to_multiple(self, d, value, selector, data_columns=None, axes=None, **kwargs):/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in _write_to_group(self, key, value, index, table, append, complib, **kwargs)    799             raise ValueError('Compression not supported on non-table')    800 --> 801         s.write(obj = value, append=append, complib=complib, **kwargs)    802         if s.is_table and index:    803             s.create_index(columns = index)/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, **kwargs)   2537         # create the axes   2538         self.create_axes(axes=axes, obj=obj, validate=append,-> 2539                          min_itemsize=min_itemsize, **kwargs)   2540    2541         if not self.is_exists:/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)   2279                 raise   2280             except (Exception), detail:-> 2281                 raise Exception("cannot find the correct atom type -> [dtype->%s,items->%s] %s" % (b.dtype.name, b.items, str(detail)))   2282             j += 1   2283 Exception: cannot find the correct atom type -> [dtype->object,items->Index([image, mouse_id, spine], dtype=object)] cannot set an array element with a sequence

I'm working with a relatively new release of pandas:

pandas.__version__'0.11.0.dev-95a5326'import tablestables.__version__'2.4.0+1.dev'

It would be immensely convenient to have a single repository for all of this data, instead of fragmenting just the queryable parts off to separate nodes.
Is this possible currently with some work-around (maybe with record arrays), and will this be supported officially in the future?

As a side-note, this kind of heterogeneous data ("ragged" arrays) is incredibly wide-spread in neurobiology and the biological sciences in general. Any extra support along these lines would be incredibly well-received.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp