Internals#

This section will provide a look into some of pandas internals. It’s primarilyintended for developers of pandas itself.

Indexing#

In pandas there are a few objects implemented which can serve as validcontainers for the axis labels:

  • Index: the generic “ordered set” object, an ndarray of object dtypeassuming nothing about its contents. The labels must be hashable (andlikely immutable) and unique. Populates a dict of label to location inCython to doO(1) lookups.

  • MultiIndex: the standard hierarchical index object

  • DatetimeIndex: An Index object withTimestamp boxed elements (impl are the int64 values)

  • TimedeltaIndex: An Index object withTimedelta boxed elements (impl are the in64 values)

  • PeriodIndex: An Index object with Period elements

There are functions that make the creation of a regular index easy:

  • date_range(): fixed frequency date range generated from a time rule orDateOffset. An ndarray of Python datetime objects

  • period_range(): fixed frequency date range generated from a time rule orDateOffset. An ndarray ofPeriod objects, representing timespans

Warning

CustomIndex subclasses are not supported, custom behavior should be implemented using theExtensionArray interface instead.

MultiIndex#

Internally, theMultiIndex consists of a few things: thelevels, theintegercodes, and the levelnames:

In [1]:index=pd.MultiIndex.from_product(   ...:[range(3),["one","two"]],names=["first","second"]   ...:)   ...:In [2]:indexOut[2]:MultiIndex([(0, 'one'),            (0, 'two'),            (1, 'one'),            (1, 'two'),            (2, 'one'),            (2, 'two')],           names=['first', 'second'])In [3]:index.levelsOut[3]:FrozenList([[0, 1, 2], ['one', 'two']])In [4]:index.codesOut[4]:FrozenList([[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])In [5]:index.namesOut[5]:FrozenList(['first', 'second'])

You can probably guess that the codes determine which unique element isidentified with that location at each layer of the index. It’s important tonote that sortedness is determinedsolely from the integer codes and doesnot check (or care) whether the levels themselves are sorted. Fortunately, theconstructorsfrom_tuples() andfrom_arrays() ensurethat this is true, but if you compute the levels and codes yourself, please be careful.

Values#

pandas extends NumPy’s type system with custom types, likeCategorical ordatetimes with a timezone, so we have multiple notions of “values”. For 1-Dcontainers (Index classes andSeries) we have the following convention:

  • cls._values refers is the “best possible” array. This could be anndarray orExtensionArray.

So, for example,Series[category]._values is aCategorical.

Subclassing pandas data structures#

This section has been moved toSubclassing pandas data structures.