- API reference
- General functions
- pandas.wide_to_long
pandas.wide_to_long#
- pandas.wide_to_long(df,stubnames,i,j,sep='',suffix='\\d+')[source]#
Unpivot a DataFrame from wide to long format.
Less flexible but more user-friendly than melt.
With stubnames [‘A’, ‘B’], this function expects to find one or moregroup of columns with formatA-suffix1, A-suffix2,…, B-suffix1, B-suffix2,…You specify what you want to call this suffix in the resulting long formatwithj (for examplej=’year’)
Each row of these wide variables are assumed to be uniquely identified byi (can be a single column name or a list of column names)
All remaining variables in the data frame are left intact.
- Parameters:
- dfDataFrame
The wide-format DataFrame.
- stubnamesstr or list-like
The stub name(s). The wide format variables are assumed tostart with the stub names.
- istr or list-like
Column(s) to use as id variable(s).
- jstr
The name of the sub-observation variable. What you wish to name yoursuffix in the long format.
- sepstr, default “”
A character indicating the separation of the variable namesin the wide format, to be stripped from the names in the long format.For example, if your column names are A-suffix1, A-suffix2, youcan strip the hyphen by specifyingsep=’-’.
- suffixstr, default ‘\d+’
A regular expression capturing the wanted suffixes. ‘\d+’ capturesnumeric suffixes. Suffixes with no numbers could be specified with thenegated character class ‘\D+’. You can also further disambiguatesuffixes, for example, if your wide variables are of the form A-one,B-two,.., and you have an unrelated column A-rating, you can ignore thelast one by specifyingsuffix=’(!?one|two)’. When all suffixes arenumeric, they are cast to int64/float64.
- Returns:
- DataFrame
A DataFrame that contains each stub name as a variable, with new index(i, j).
See also
meltUnpivot a DataFrame from wide to long format, optionally leaving identifiers set.
pivotCreate a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivotPivot without aggregation that can handle non-numeric data.
DataFrame.pivot_tableGeneralization of pivot that can handle duplicate values for one index/column pair.
DataFrame.unstackPivot based on the index values instead of a column.
Notes
All extra variables are left untouched. This simply usespandas.melt under the hood, but is hard-coded to “do the right thing”in a typical case.
Examples
>>>np.random.seed(123)>>>df=pd.DataFrame({"A1970":{0:"a",1:"b",2:"c"},..."A1980":{0:"d",1:"e",2:"f"},..."B1970":{0:2.5,1:1.2,2:.7},..."B1980":{0:3.2,1:1.3,2:.1},..."X":dict(zip(range(3),np.random.randn(3)))...})>>>df["id"]=df.index>>>df A1970 A1980 B1970 B1980 X id0 a d 2.5 3.2 -1.085631 01 b e 1.2 1.3 0.997345 12 c f 0.7 0.1 0.282978 2>>>pd.wide_to_long(df,["A","B"],i="id",j="year")... X A Bid year0 1970 -1.085631 a 2.51 1970 0.997345 b 1.22 1970 0.282978 c 0.70 1980 -1.085631 d 3.21 1980 0.997345 e 1.32 1980 0.282978 f 0.1
With multiple id columns
>>>df=pd.DataFrame({...'famid':[1,1,1,2,2,2,3,3,3],...'birth':[1,2,3,1,2,3,1,2,3],...'ht1':[2.8,2.9,2.2,2,1.8,1.9,2.2,2.3,2.1],...'ht2':[3.4,3.8,2.9,3.2,2.8,2.4,3.3,3.4,2.9]...})>>>df famid birth ht1 ht20 1 1 2.8 3.41 1 2 2.9 3.82 1 3 2.2 2.93 2 1 2.0 3.24 2 2 1.8 2.85 2 3 1.9 2.46 3 1 2.2 3.37 3 2 2.3 3.48 3 3 2.1 2.9>>>l=pd.wide_to_long(df,stubnames='ht',i=['famid','birth'],j='age')>>>l... htfamid birth age1 1 1 2.8 2 3.4 2 1 2.9 2 3.8 3 1 2.2 2 2.92 1 1 2.0 2 3.2 2 1 1.8 2 2.8 3 1 1.9 2 2.43 1 1 2.2 2 3.3 2 1 2.3 2 3.4 3 1 2.1 2 2.9
Going from long back to wide just takes some creative use ofunstack
>>>w=l.unstack()>>>w.columns=w.columns.map('{0[0]}{0[1]}'.format)>>>w.reset_index() famid birth ht1 ht20 1 1 2.8 3.41 1 2 2.9 3.82 1 3 2.2 2.93 2 1 2.0 3.24 2 2 1.8 2.85 2 3 1.9 2.46 3 1 2.2 3.37 3 2 2.3 3.48 3 3 2.1 2.9
Less wieldy column names are also handled
>>>np.random.seed(0)>>>df=pd.DataFrame({'A(weekly)-2010':np.random.rand(3),...'A(weekly)-2011':np.random.rand(3),...'B(weekly)-2010':np.random.rand(3),...'B(weekly)-2011':np.random.rand(3),...'X':np.random.randint(3,size=3)})>>>df['id']=df.index>>>df A(weekly)-2010 A(weekly)-2011 B(weekly)-2010 B(weekly)-2011 X id0 0.548814 0.544883 0.437587 0.383442 0 01 0.715189 0.423655 0.891773 0.791725 1 12 0.602763 0.645894 0.963663 0.528895 1 2
>>>pd.wide_to_long(df,['A(weekly)','B(weekly)'],i='id',...j='year',sep='-')... X A(weekly) B(weekly)id year0 2010 0 0.548814 0.4375871 2010 1 0.715189 0.8917732 2010 1 0.602763 0.9636630 2011 0 0.544883 0.3834421 2011 1 0.423655 0.7917252 2011 1 0.645894 0.528895
If we have many columns, we could also use a regex to find ourstubnames and pass that list on to wide_to_long
>>>stubnames=sorted(...set([match[0]formatchindf.columns.str.findall(...r'[A-B]\(.*\)').valuesifmatch!=[]])...)>>>list(stubnames)['A(weekly)', 'B(weekly)']
All of the above examples have integers as suffixes. It is possible tohave non-integers as suffixes.
>>>df=pd.DataFrame({...'famid':[1,1,1,2,2,2,3,3,3],...'birth':[1,2,3,1,2,3,1,2,3],...'ht_one':[2.8,2.9,2.2,2,1.8,1.9,2.2,2.3,2.1],...'ht_two':[3.4,3.8,2.9,3.2,2.8,2.4,3.3,3.4,2.9]...})>>>df famid birth ht_one ht_two0 1 1 2.8 3.41 1 2 2.9 3.82 1 3 2.2 2.93 2 1 2.0 3.24 2 2 1.8 2.85 2 3 1.9 2.46 3 1 2.2 3.37 3 2 2.3 3.48 3 3 2.1 2.9
>>>l=pd.wide_to_long(df,stubnames='ht',i=['famid','birth'],j='age',...sep='_',suffix=r'\w+')>>>l... htfamid birth age1 1 one 2.8 two 3.4 2 one 2.9 two 3.8 3 one 2.2 two 2.92 1 one 2.0 two 3.2 2 one 1.8 two 2.8 3 one 1.9 two 2.43 1 one 2.2 two 3.3 2 one 2.3 two 3.4 3 one 2.1 two 2.9