Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas.wide_to_long#

pandas.wide_to_long(df,stubnames,i,j,sep='',suffix='\\d+')[source]#

Unpivot a DataFrame from wide to long format.

Less flexible but more user-friendly than melt.

With stubnames [‘A’, ‘B’], this function expects to find one or moregroup of columns with formatA-suffix1, A-suffix2,…, B-suffix1, B-suffix2,…You specify what you want to call this suffix in the resulting long formatwithj (for examplej=’year’)

Each row of these wide variables are assumed to be uniquely identified byi (can be a single column name or a list of column names)

All remaining variables in the data frame are left intact.

Parameters:
dfDataFrame

The wide-format DataFrame.

stubnamesstr or list-like

The stub name(s). The wide format variables are assumed tostart with the stub names.

istr or list-like

Column(s) to use as id variable(s).

jstr

The name of the sub-observation variable. What you wish to name yoursuffix in the long format.

sepstr, default “”

A character indicating the separation of the variable namesin the wide format, to be stripped from the names in the long format.For example, if your column names are A-suffix1, A-suffix2, youcan strip the hyphen by specifyingsep=’-’.

suffixstr, default ‘\d+’

A regular expression capturing the wanted suffixes. ‘\d+’ capturesnumeric suffixes. Suffixes with no numbers could be specified with thenegated character class ‘\D+’. You can also further disambiguatesuffixes, for example, if your wide variables are of the form A-one,B-two,.., and you have an unrelated column A-rating, you can ignore thelast one by specifyingsuffix=’(!?one|two)’. When all suffixes arenumeric, they are cast to int64/float64.

Returns:
DataFrame

A DataFrame that contains each stub name as a variable, with new index(i, j).

See also

melt

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

pivot

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.pivot

Pivot without aggregation that can handle non-numeric data.

DataFrame.pivot_table

Generalization of pivot that can handle duplicate values for one index/column pair.

DataFrame.unstack

Pivot based on the index values instead of a column.

Notes

All extra variables are left untouched. This simply usespandas.melt under the hood, but is hard-coded to “do the right thing”in a typical case.

Examples

>>>np.random.seed(123)>>>df=pd.DataFrame({"A1970":{0:"a",1:"b",2:"c"},..."A1980":{0:"d",1:"e",2:"f"},..."B1970":{0:2.5,1:1.2,2:.7},..."B1980":{0:3.2,1:1.3,2:.1},..."X":dict(zip(range(3),np.random.randn(3)))...})>>>df["id"]=df.index>>>df  A1970 A1980  B1970  B1980         X  id0     a     d    2.5    3.2 -1.085631   01     b     e    1.2    1.3  0.997345   12     c     f    0.7    0.1  0.282978   2>>>pd.wide_to_long(df,["A","B"],i="id",j="year")...                X  A    Bid year0  1970 -1.085631  a  2.51  1970  0.997345  b  1.22  1970  0.282978  c  0.70  1980 -1.085631  d  3.21  1980  0.997345  e  1.32  1980  0.282978  f  0.1

With multiple id columns

>>>df=pd.DataFrame({...'famid':[1,1,1,2,2,2,3,3,3],...'birth':[1,2,3,1,2,3,1,2,3],...'ht1':[2.8,2.9,2.2,2,1.8,1.9,2.2,2.3,2.1],...'ht2':[3.4,3.8,2.9,3.2,2.8,2.4,3.3,3.4,2.9]...})>>>df   famid  birth  ht1  ht20      1      1  2.8  3.41      1      2  2.9  3.82      1      3  2.2  2.93      2      1  2.0  3.24      2      2  1.8  2.85      2      3  1.9  2.46      3      1  2.2  3.37      3      2  2.3  3.48      3      3  2.1  2.9>>>l=pd.wide_to_long(df,stubnames='ht',i=['famid','birth'],j='age')>>>l...                  htfamid birth age1     1     1    2.8            2    3.4      2     1    2.9            2    3.8      3     1    2.2            2    2.92     1     1    2.0            2    3.2      2     1    1.8            2    2.8      3     1    1.9            2    2.43     1     1    2.2            2    3.3      2     1    2.3            2    3.4      3     1    2.1            2    2.9

Going from long back to wide just takes some creative use ofunstack

>>>w=l.unstack()>>>w.columns=w.columns.map('{0[0]}{0[1]}'.format)>>>w.reset_index()   famid  birth  ht1  ht20      1      1  2.8  3.41      1      2  2.9  3.82      1      3  2.2  2.93      2      1  2.0  3.24      2      2  1.8  2.85      2      3  1.9  2.46      3      1  2.2  3.37      3      2  2.3  3.48      3      3  2.1  2.9

Less wieldy column names are also handled

>>>np.random.seed(0)>>>df=pd.DataFrame({'A(weekly)-2010':np.random.rand(3),...'A(weekly)-2011':np.random.rand(3),...'B(weekly)-2010':np.random.rand(3),...'B(weekly)-2011':np.random.rand(3),...'X':np.random.randint(3,size=3)})>>>df['id']=df.index>>>df   A(weekly)-2010  A(weekly)-2011  B(weekly)-2010  B(weekly)-2011  X  id0        0.548814        0.544883        0.437587        0.383442  0   01        0.715189        0.423655        0.891773        0.791725  1   12        0.602763        0.645894        0.963663        0.528895  1   2
>>>pd.wide_to_long(df,['A(weekly)','B(weekly)'],i='id',...j='year',sep='-')...         X  A(weekly)  B(weekly)id year0  2010  0   0.548814   0.4375871  2010  1   0.715189   0.8917732  2010  1   0.602763   0.9636630  2011  0   0.544883   0.3834421  2011  1   0.423655   0.7917252  2011  1   0.645894   0.528895

If we have many columns, we could also use a regex to find ourstubnames and pass that list on to wide_to_long

>>>stubnames=sorted(...set([match[0]formatchindf.columns.str.findall(...r'[A-B]\(.*\)').valuesifmatch!=[]])...)>>>list(stubnames)['A(weekly)', 'B(weekly)']

All of the above examples have integers as suffixes. It is possible tohave non-integers as suffixes.

>>>df=pd.DataFrame({...'famid':[1,1,1,2,2,2,3,3,3],...'birth':[1,2,3,1,2,3,1,2,3],...'ht_one':[2.8,2.9,2.2,2,1.8,1.9,2.2,2.3,2.1],...'ht_two':[3.4,3.8,2.9,3.2,2.8,2.4,3.3,3.4,2.9]...})>>>df   famid  birth  ht_one  ht_two0      1      1     2.8     3.41      1      2     2.9     3.82      1      3     2.2     2.93      2      1     2.0     3.24      2      2     1.8     2.85      2      3     1.9     2.46      3      1     2.2     3.37      3      2     2.3     3.48      3      3     2.1     2.9
>>>l=pd.wide_to_long(df,stubnames='ht',i=['famid','birth'],j='age',...sep='_',suffix=r'\w+')>>>l...                  htfamid birth age1     1     one  2.8            two  3.4      2     one  2.9            two  3.8      3     one  2.2            two  2.92     1     one  2.0            two  3.2      2     one  1.8            two  2.8      3     one  1.9            two  2.43     1     one  2.2            two  3.3      2     one  2.3            two  3.4      3     one  2.1            two  2.9

[8]ページ先頭

©2009-2025 Movatter.jp