- API reference
- DataFrame
- pandas.DataF...
pandas.DataFrame.reindex#
- DataFrame.reindex(labels=None,*,index=None,columns=None,axis=None,method=None,copy=None,level=None,fill_value=nan,limit=None,tolerance=None)[source]#
Conform DataFrame to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new objectis produced unless the new index is equivalent to the current one and
copy=False
.- Parameters:
- labelsarray-like, optional
New labels / index to conform the axis specified by ‘axis’ to.
- indexarray-like, optional
New labels for the index. Preferably an Index object to avoidduplicating data.
- columnsarray-like, optional
New labels for the columns. Preferably an Index object to avoidduplicating data.
- axisint or str, optional
Axis to target. Can be either the axis name (‘index’, ‘columns’)or number (0, 1).
- method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
Method to use for filling holes in reindexed DataFrame.Please note: this is only applicable to DataFrames/Series with amonotonically increasing/decreasing index.
None (default): don’t fill gaps
pad / ffill: Propagate last valid observation forward to nextvalid.
backfill / bfill: Use next valid observation to fill gap.
nearest: Use nearest valid observations to fill gap.
- copybool, default True
Return a new object, even if the passed indexes are the same.
Note
Thecopy keyword will change behavior in pandas 3.0.Copy-on-Writewill be enabled by default, which means that all methods with acopy keyword will use a lazy copy mechanism to defer the copy andignore thecopy keyword. Thecopy keyword will be removed in afuture version of pandas.
You can already get the future behavior and improvements throughenabling copy on write
pd.options.mode.copy_on_write=True
- levelint or name
Broadcast across a level, matching Index values on thepassed MultiIndex level.
- fill_valuescalar, default np.nan
Value to use for missing values. Defaults to NaN, but can be any“compatible” value.
- limitint, default None
Maximum number of consecutive elements to forward or backward fill.
- toleranceoptional
Maximum distance between original and new labels for inexactmatches. The values of the index at the matching locations mostsatisfy the equation
abs(index[indexer]-target)<=tolerance
.Tolerance may be a scalar value, which applies the same toleranceto all values, or list-like, which applies variable tolerance perelement. List-like includes list, tuple, array, Series, and must bethe same size as the index and its dtype must exactly match theindex’s type.
- Returns:
- DataFrame with changed index.
See also
DataFrame.set_index
Set row labels.
DataFrame.reset_index
Remove row labels or move them to new columns.
DataFrame.reindex_like
Change to same indices as other DataFrame.
Examples
DataFrame.reindex
supports two calling conventions(index=index_labels,columns=column_labels,...)
(labels,axis={'index','columns'},...)
Wehighly recommend using keyword arguments to clarify yourintent.
Create a dataframe with some fictional data.
>>>index=['Firefox','Chrome','Safari','IE10','Konqueror']>>>df=pd.DataFrame({'http_status':[200,200,404,404,301],...'response_time':[0.04,0.02,0.07,0.08,1.0]},...index=index)>>>df http_status response_timeFirefox 200 0.04Chrome 200 0.02Safari 404 0.07IE10 404 0.08Konqueror 301 1.00
Create a new index and reindex the dataframe. By defaultvalues in the new index that do not have correspondingrecords in the dataframe are assigned
NaN
.>>>new_index=['Safari','Iceweasel','Comodo Dragon','IE10',...'Chrome']>>>df.reindex(new_index) http_status response_timeSafari 404.0 0.07Iceweasel NaN NaNComodo Dragon NaN NaNIE10 404.0 0.08Chrome 200.0 0.02
We can fill in the missing values by passing a value tothe keyword
fill_value
. Because the index is not monotonicallyincreasing or decreasing, we cannot use arguments to the keywordmethod
to fill theNaN
values.>>>df.reindex(new_index,fill_value=0) http_status response_timeSafari 404 0.07Iceweasel 0 0.00Comodo Dragon 0 0.00IE10 404 0.08Chrome 200 0.02
>>>df.reindex(new_index,fill_value='missing') http_status response_timeSafari 404 0.07Iceweasel missing missingComodo Dragon missing missingIE10 404 0.08Chrome 200 0.02
We can also reindex the columns.
>>>df.reindex(columns=['http_status','user_agent']) http_status user_agentFirefox 200 NaNChrome 200 NaNSafari 404 NaNIE10 404 NaNKonqueror 301 NaN
Or we can use “axis-style” keyword arguments
>>>df.reindex(['http_status','user_agent'],axis="columns") http_status user_agentFirefox 200 NaNChrome 200 NaNSafari 404 NaNIE10 404 NaNKonqueror 301 NaN
To further illustrate the filling functionality in
reindex
, we will create a dataframe with amonotonically increasing index (for example, a sequenceof dates).>>>date_index=pd.date_range('1/1/2010',periods=6,freq='D')>>>df2=pd.DataFrame({"prices":[100,101,np.nan,100,89,88]},...index=date_index)>>>df2 prices2010-01-01 100.02010-01-02 101.02010-01-03 NaN2010-01-04 100.02010-01-05 89.02010-01-06 88.0
Suppose we decide to expand the dataframe to cover a widerdate range.
>>>date_index2=pd.date_range('12/29/2009',periods=10,freq='D')>>>df2.reindex(date_index2) prices2009-12-29 NaN2009-12-30 NaN2009-12-31 NaN2010-01-01 100.02010-01-02 101.02010-01-03 NaN2010-01-04 100.02010-01-05 89.02010-01-06 88.02010-01-07 NaN
The index entries that did not have a value in the original data frame(for example, ‘2009-12-29’) are by default filled with
NaN
.If desired, we can fill in the missing values using one of severaloptions.For example, to back-propagate the last valid value to fill the
NaN
values, passbfill
as an argument to themethod
keyword.>>>df2.reindex(date_index2,method='bfill') prices2009-12-29 100.02009-12-30 100.02009-12-31 100.02010-01-01 100.02010-01-02 101.02010-01-03 NaN2010-01-04 100.02010-01-05 89.02010-01-06 88.02010-01-07 NaN
Please note that the
NaN
value present in the original dataframe(at index value 2010-01-03) will not be filled by any of thevalue propagation schemes. This is because filling while reindexingdoes not look at dataframe values, but only compares the original anddesired indexes. If you do want to fill in theNaN
values presentin the original dataframe, use thefillna()
method.See theuser guide for more.