pandas-dev/pandasPublic

NotificationsYou must be signed in to change notification settings
Fork19.4k
Star47.3k

HDFStore appending for mixed datatypes, including NumPy arrays #3032

New issue

Closed

HDFStore appending for mixed datatypes, including NumPy arrays#3032

Labels

IO DataIO issues that don't fit into a more specific label

Description

alexbw

opened

on Mar 12, 2013

A pandas array I have contains some image data, recorded from a camera during a behavioral experiment. A simplified version looks like this:

num_frames = 100mouse = [{"velocity":np.random.random((1,))[0], \        "image":np.random.random((80,80)).astype('float32'), \        "spine":np.r_[0:80].astype('float32'),        #"time":millisec(i*33),        "mouse_id":"mouse1",        "special":i} for i in range(num_frames)]df = DataFrame(mouse)

I understand I can't query over theimage orspine entries. Of course, I can easily query for low velocity frames, like this:

low_velocity = df[df['velocity'] < 0.5]

However, there is a lot of this data (several hundred gigabytes), so I'd like to keep it in an HDF5 file, and pull up frames only as needed from disk.

In v0.10, I understand that "mixed-type" frames now can be appended into the HDFStore. However, I get an error when trying to append this dataframe into the HDFStore.

store = HDFStore("mouse.h5", "w")store.append("mouse", df)---------------------------------------------------------------------------Exception                                 Traceback (most recent call last)<ipython-input-30-8f0da271e75f> in <module>()      1 store = HDFStore("mouse.h5", "w")----> 2 store.append("mouse", df)/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in append(self, key, value, columns, **kwargs)    543             raise Exception("columns is not a supported keyword in append, try data_columns")    544 --> 545         self._write_to_group(key, value, table=True, append=True, **kwargs)    546     547     def append_to_multiple(self, d, value, selector, data_columns=None, axes=None, **kwargs):/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in _write_to_group(self, key, value, index, table, append, complib, **kwargs)    799             raise ValueError('Compression not supported on non-table')    800 --> 801         s.write(obj = value, append=append, complib=complib, **kwargs)    802         if s.is_table and index:    803             s.create_index(columns = index)/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, **kwargs)   2537         # create the axes   2538         self.create_axes(axes=axes, obj=obj, validate=append,-> 2539                          min_itemsize=min_itemsize, **kwargs)   2540    2541         if not self.is_exists:/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)   2279                 raise   2280             except (Exception), detail:-> 2281                 raise Exception("cannot find the correct atom type -> [dtype->%s,items->%s] %s" % (b.dtype.name, b.items, str(detail)))   2282             j += 1   2283 Exception: cannot find the correct atom type -> [dtype->object,items->Index([image, mouse_id, spine], dtype=object)] cannot set an array element with a sequence

I'm working with a relatively new release of pandas:

pandas.__version__'0.11.0.dev-95a5326'import tablestables.__version__'2.4.0+1.dev'

It would be immensely convenient to have a single repository for all of this data, instead of fragmenting just the queryable parts off to separate nodes.
Is this possible currently with some work-around (maybe with record arrays), and will this be supported officially in the future?

As a side-note, this kind of heterogeneous data ("ragged" arrays) is incredibly wide-spread in neurobiology and the biological sciences in general. Any extra support along these lines would be incredibly well-received.

Metadata

Assignees

No one assigned

Labels

IO DataIO issues that don't fit into a more specific label

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HDFStore appending for mixed datatypes, including NumPy arrays #3032

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions