Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork19.4k
Description
A pandas array I have contains some image data, recorded from a camera during a behavioral experiment. A simplified version looks like this:
num_frames = 100mouse = [{"velocity":np.random.random((1,))[0], \ "image":np.random.random((80,80)).astype('float32'), \ "spine":np.r_[0:80].astype('float32'), #"time":millisec(i*33), "mouse_id":"mouse1", "special":i} for i in range(num_frames)]df = DataFrame(mouse)I understand I can't query over theimage orspine entries. Of course, I can easily query for low velocity frames, like this:
low_velocity = df[df['velocity'] < 0.5]However, there is a lot of this data (several hundred gigabytes), so I'd like to keep it in an HDF5 file, and pull up frames only as needed from disk.
In v0.10, I understand that "mixed-type" frames now can be appended into the HDFStore. However, I get an error when trying to append this dataframe into the HDFStore.
store = HDFStore("mouse.h5", "w")store.append("mouse", df)---------------------------------------------------------------------------Exception Traceback (most recent call last)<ipython-input-30-8f0da271e75f> in <module>() 1 store = HDFStore("mouse.h5", "w")----> 2 store.append("mouse", df)/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in append(self, key, value, columns, **kwargs) 543 raise Exception("columns is not a supported keyword in append, try data_columns") 544 --> 545 self._write_to_group(key, value, table=True, append=True, **kwargs) 546 547 def append_to_multiple(self, d, value, selector, data_columns=None, axes=None, **kwargs):/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in _write_to_group(self, key, value, index, table, append, complib, **kwargs) 799 raise ValueError('Compression not supported on non-table') 800 --> 801 s.write(obj = value, append=append, complib=complib, **kwargs) 802 if s.is_table and index: 803 s.create_index(columns = index)/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, **kwargs) 2537 # create the axes 2538 self.create_axes(axes=axes, obj=obj, validate=append,-> 2539 min_itemsize=min_itemsize, **kwargs) 2540 2541 if not self.is_exists:/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.11.0.dev_95a5326-py2.7-macosx-10.5-x86_64.egg/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs) 2279 raise 2280 except (Exception), detail:-> 2281 raise Exception("cannot find the correct atom type -> [dtype->%s,items->%s] %s" % (b.dtype.name, b.items, str(detail))) 2282 j += 1 2283 Exception: cannot find the correct atom type -> [dtype->object,items->Index([image, mouse_id, spine], dtype=object)] cannot set an array element with a sequenceI'm working with a relatively new release of pandas:
pandas.__version__'0.11.0.dev-95a5326'import tablestables.__version__'2.4.0+1.dev'It would be immensely convenient to have a single repository for all of this data, instead of fragmenting just the queryable parts off to separate nodes.
Is this possible currently with some work-around (maybe with record arrays), and will this be supported officially in the future?
As a side-note, this kind of heterogeneous data ("ragged" arrays) is incredibly wide-spread in neurobiology and the biological sciences in general. Any extra support along these lines would be incredibly well-received.