bigframes.pandas.DataFrame.merge#

DataFrame.merge(right:DataFrame,how:Literal['inner','left','outer','right','cross']='inner',on:Hashable|Sequence[Hashable]|None=None,*,left_on:Hashable|Sequence[Hashable]|None=None,right_on:Hashable|Sequence[Hashable]|None=None,left_index:bool=False,right_index:bool=False,sort:bool=False,suffixes:tuple[str,str]=('_x','_y'))DataFrame[source]#

Merge DataFrame objects with a database-style join.

The join is done on columns or indexes. If joining columns oncolumns, the DataFrame indexeswill be ignored. Otherwise if joining indexeson indexes or indexes on a column or columns, the index will be passed on.When performing a cross merge, no column specifications to merge on areallowed.

Warning

If both key columns contain rows where the key is a null value, thoserows will be matched against each other. This is different from usual SQLjoin behaviour and can lead to unexpected results.

Examples:

Merge DataFrames df1 and df2 by specifying type of merge:

>>>df1=bpd.DataFrame({'a':['foo','bar'],'b':[1,2]})>>>df1     a  b0  foo  11  bar  2[2 rows x 2 columns]
>>>df2=bpd.DataFrame({'a':['foo','baz'],'c':[3,4]})>>>df2     a  c0  foo  31  baz  4[2 rows x 2 columns]
>>>df1.merge(df2,how="inner",on="a")     a  b  c0  foo  1  3[1 rows x 3 columns]
>>>df1.merge(df2,how='left',on='a')     a  b     c0  foo  1     31  bar  2  <NA>[2 rows x 3 columns]

Merge df1 and df2 on the lkey and rkey columns. The value columns havethe default suffixes, _x and _y, appended.

>>>df1=bpd.DataFrame({'lkey':['foo','bar','baz','foo'],...'value':[1,2,3,5]})>>>df1  lkey  value0  foo      11  bar      22  baz      33  foo      5[4 rows x 2 columns]
>>>df2=bpd.DataFrame({'rkey':['foo','bar','baz','foo'],...'value':[5,6,7,8]})>>>df2  rkey  value0  foo      51  bar      62  baz      73  foo      8[4 rows x 2 columns]
>>>df1.merge(df2,left_on='lkey',right_on='rkey')  lkey  value_x rkey  value_y0  foo        1  foo        51  foo        1  foo        82  bar        2  bar        63  baz        3  baz        74  foo        5  foo        55  foo        5  foo        8[6 rows x 4 columns]
Parameters:
  • right – Object to merge with.

  • how{'left','right','outer','inner','cross'},default'inner'Type of merge to be performed.left: use only keys from left frame, similar to a SQL left outer join;preserve key order.right: use only keys from right frame, similar to a SQL right outer join;preserve key order.outer: use union of keys from both frames, similar to a SQL full outerjoin; sort keys lexicographically.inner: use intersection of keys from both frames, similar to a SQL innerjoin; preserve the order of the left keys.cross: creates the cartesian product from both frames, preserves the orderof the left keys.

  • on (label orlist oflabels) – Columns to join on. It must be found in both DataFrames. Either on or left_on + right_onmust be passed in.

  • left_on (label orlist oflabels) – Columns to join on in the left DataFrame. Either on or left_on + right_onmust be passed in.

  • right_on (label orlist oflabels) – Columns to join on in the right DataFrame. Either on or left_on + right_onmust be passed in.

  • left_index (bool,default False) – Use the index from the left DataFrame as the join key.

  • right_index (bool,default False) – Use the index from the right DataFrame as the join key.

  • sort – Default False. Sort the join keys lexicographically in theresult DataFrame. If False, the order of the join keys dependson the join type (how keyword).

  • suffixes – Default("_x","_y"). A length-2 sequence where eachelement is optionally a string indicating the suffix to add tooverlapping column names inleft andright respectively.Pass a value ofNone instead of a string to indicate that thecolumn name fromleft orright should be left as-is, withno suffix. At least one of the values must not be None.

Returns:

A DataFrame of the two merged objects.

Return type:

bigframes.pandas.DataFrame

Raises:
  • ValueError – If value foron is specified for cross join.

  • ValueError – Ifon orleft_on +right_on are not specified whenon isNone.

  • ValueError – Ifon andleft_on +right_on are specified whenon is notNone.

  • ValueError – If no column with the provided label is found inself for left join.

  • ValueError – If no column with the provided label is found inself for right join.

On this page

This Page