- API reference
- DataFrame
- pandas.DataF...
pandas.DataFrame.join#
- DataFrame.join(other,on=None,how='left',lsuffix='',rsuffix='',sort=False,validate=None)[source]#
Join columns of another DataFrame.
Join columns withother DataFrame either on index or on a keycolumn. Efficiently join multiple DataFrame objects by index at once bypassing a list.
- Parameters:
- otherDataFrame, Series, or a list containing any combination of them
Index should be similar to one of the columns in this one. If aSeries is passed, its name attribute must be set, and that will beused as the column name in the resulting joined DataFrame.
- onstr, list of str, or array-like, optional
Column or index level name(s) in the caller to join on the indexinother, otherwise joins index-on-index. If multiplevalues given, theother DataFrame must have a MultiIndex. Canpass an array as the join key if it is not already contained inthe calling DataFrame. Like an Excel VLOOKUP operation.
- how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘left’
How to handle the operation of the two objects.
left: use calling frame’s index (or column if on is specified)
right: useother’s index.
outer: form union of calling frame’s index (or column if on isspecified) withother’s index, and sort it lexicographically.
inner: form intersection of calling frame’s index (or column ifon is specified) withother’s index, preserving the orderof the calling’s one.
cross: creates the cartesian product from both frames, preserves the orderof the left keys.
- lsuffixstr, default ‘’
Suffix to use from left frame’s overlapping columns.
- rsuffixstr, default ‘’
Suffix to use from right frame’s overlapping columns.
- sortbool, default False
Order result DataFrame lexicographically by the join key. If False,the order of the join key depends on the join type (how keyword).
- validatestr, optional
If specified, checks if join is of specified type.
“one_to_one” or “1:1”: check if join keys are unique in both leftand right datasets.
“one_to_many” or “1:m”: check if join keys are unique in left dataset.
“many_to_one” or “m:1”: check if join keys are unique in right dataset.
“many_to_many” or “m:m”: allowed, but does not result in checks.
Added in version 1.5.0.
- Returns:
- DataFrame
A dataframe containing columns from both the caller andother.
See also
DataFrame.merge
For column(s)-on-column(s) operations.
Notes
Parameterson,lsuffix, andrsuffix are not supported whenpassing a list ofDataFrame objects.
Examples
>>>df=pd.DataFrame({'key':['K0','K1','K2','K3','K4','K5'],...'A':['A0','A1','A2','A3','A4','A5']})
>>>df key A0 K0 A01 K1 A12 K2 A23 K3 A34 K4 A45 K5 A5
>>>other=pd.DataFrame({'key':['K0','K1','K2'],...'B':['B0','B1','B2']})
>>>other key B0 K0 B01 K1 B12 K2 B2
Join DataFrames using their indexes.
>>>df.join(other,lsuffix='_caller',rsuffix='_other') key_caller A key_other B0 K0 A0 K0 B01 K1 A1 K1 B12 K2 A2 K2 B23 K3 A3 NaN NaN4 K4 A4 NaN NaN5 K5 A5 NaN NaN
If we want to join using the key columns, we need to set key to bethe index in bothdf andother. The joined DataFrame will havekey as its index.
>>>df.set_index('key').join(other.set_index('key')) A BkeyK0 A0 B0K1 A1 B1K2 A2 B2K3 A3 NaNK4 A4 NaNK5 A5 NaN
Another option to join using the key columns is to use theonparameter. DataFrame.join always usesother’s index but we can useany column indf. This method preserves the original DataFrame’sindex in the result.
>>>df.join(other.set_index('key'),on='key') key A B0 K0 A0 B01 K1 A1 B12 K2 A2 B23 K3 A3 NaN4 K4 A4 NaN5 K5 A5 NaN
Using non-unique key values shows how they are matched.
>>>df=pd.DataFrame({'key':['K0','K1','K1','K3','K0','K1'],...'A':['A0','A1','A2','A3','A4','A5']})
>>>df key A0 K0 A01 K1 A12 K1 A23 K3 A34 K0 A45 K1 A5
>>>df.join(other.set_index('key'),on='key',validate='m:1') key A B0 K0 A0 B01 K1 A1 B12 K1 A2 B13 K3 A3 NaN4 K0 A4 B05 K1 A5 B1