bigframes.pandas.DataFrame.merge#
- DataFrame.merge(right:DataFrame,how:Literal['inner','left','outer','right','cross']='inner',on:Hashable|Sequence[Hashable]|None=None,*,left_on:Hashable|Sequence[Hashable]|None=None,right_on:Hashable|Sequence[Hashable]|None=None,left_index:bool=False,right_index:bool=False,sort:bool=False,suffixes:tuple[str,str]=('_x','_y'))→DataFrame[source]#
Merge DataFrame objects with a database-style join.
The join is done on columns or indexes. If joining columns oncolumns, the DataFrame indexeswill be ignored. Otherwise if joining indexeson indexes or indexes on a column or columns, the index will be passed on.When performing a cross merge, no column specifications to merge on areallowed.
Warning
If both key columns contain rows where the key is a null value, thoserows will be matched against each other. This is different from usual SQLjoin behaviour and can lead to unexpected results.
Examples:
Merge DataFrames df1 and df2 by specifying type of merge:
>>>df1=bpd.DataFrame({'a':['foo','bar'],'b':[1,2]})>>>df1 a b0 foo 11 bar 2[2 rows x 2 columns]
>>>df2=bpd.DataFrame({'a':['foo','baz'],'c':[3,4]})>>>df2 a c0 foo 31 baz 4[2 rows x 2 columns]
>>>df1.merge(df2,how="inner",on="a") a b c0 foo 1 3[1 rows x 3 columns]
>>>df1.merge(df2,how='left',on='a') a b c0 foo 1 31 bar 2 <NA>[2 rows x 3 columns]
Merge df1 and df2 on the lkey and rkey columns. The value columns havethe default suffixes, _x and _y, appended.
>>>df1=bpd.DataFrame({'lkey':['foo','bar','baz','foo'],...'value':[1,2,3,5]})>>>df1 lkey value0 foo 11 bar 22 baz 33 foo 5[4 rows x 2 columns]
>>>df2=bpd.DataFrame({'rkey':['foo','bar','baz','foo'],...'value':[5,6,7,8]})>>>df2 rkey value0 foo 51 bar 62 baz 73 foo 8[4 rows x 2 columns]
>>>df1.merge(df2,left_on='lkey',right_on='rkey') lkey value_x rkey value_y0 foo 1 foo 51 foo 1 foo 82 bar 2 bar 63 baz 3 baz 74 foo 5 foo 55 foo 5 foo 8[6 rows x 4 columns]
- Parameters:
right – Object to merge with.
how –
{'left','right','outer','inner','cross'},default'inner'Type of merge to be performed.left: use only keys from left frame, similar to a SQL left outer join;preserve key order.right: use only keys from right frame, similar to a SQL right outer join;preserve key order.outer: use union of keys from both frames, similar to a SQL full outerjoin; sort keys lexicographically.inner: use intersection of keys from both frames, similar to a SQL innerjoin; preserve the order of the left keys.cross: creates the cartesian product from both frames, preserves the orderof the left keys.on (label orlist oflabels) – Columns to join on. It must be found in both DataFrames. Either on or left_on + right_onmust be passed in.
left_on (label orlist oflabels) – Columns to join on in the left DataFrame. Either on or left_on + right_onmust be passed in.
right_on (label orlist oflabels) – Columns to join on in the right DataFrame. Either on or left_on + right_onmust be passed in.
left_index (bool,default False) – Use the index from the left DataFrame as the join key.
right_index (bool,default False) – Use the index from the right DataFrame as the join key.
sort – Default False. Sort the join keys lexicographically in theresult DataFrame. If False, the order of the join keys dependson the join type (how keyword).
suffixes – Default
("_x","_y"). A length-2 sequence where eachelement is optionally a string indicating the suffix to add tooverlapping column names inleft andright respectively.Pass a value ofNone instead of a string to indicate that thecolumn name fromleft orright should be left as-is, withno suffix. At least one of the values must not be None.
- Returns:
A DataFrame of the two merged objects.
- Return type:
- Raises:
ValueError – If value for
onis specified for cross join.ValueError – If
onorleft_on+right_onare not specified whenonisNone.ValueError – If
onandleft_on+right_onare specified whenonis notNone.ValueError – If no column with the provided label is found in
selffor left join.ValueError – If no column with the provided label is found in
selffor right join.