Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Merging/Joining



Pandas provides high-performance, in-memory join operations similar to those in SQL databases. These operations allow you to merge multipleDataFrame objects based on common keys or indexes efficiently.

The merge() Method in Pandas

TheDataFrame.merge() method in Pandas enables merging of DataFrame or named Series objects using database-style joins. A named Series is treated as a DataFrame with a single named column. Joins can be performed on columns or indexes.

If merging on columns, DataFrame indexes are ignored. If merging on indexes or indexes with columns, then the index will remains the same. However, in cross merges (how='cross'), you cannot specify column names for merging.

Below is the syntax of this method −

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False)

The key parameters are −

  • right: A DataFrame or a named Series to merge with.

  • on: Columns (names) to join on. Must be found in both the DataFrame objects.

  • left_on: Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.

  • right_on: Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.

  • left_index: IfTrue, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.

  • right_index: Same usage asleft_index for the right DataFrame.

  • how: Determines type of join operation, available options are 'left', 'right', 'outer', 'inner', and 'cross'. Defaults to 'inner'. Each method has been described below.

  • sort: Sort the result DataFrame by the join keys in lexicographical order. Defaults toTrue, setting toFalse will improve the performance substantially in many cases.

Example

Let's create two DataFrames and perform merge operations on them.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})print("Left DataFrame:")print(left)print("\nRight DataFrame:")print(right)

Itsoutput is as follows −

Left DataFrame:
idNamesubject_id
01Alexsub1
12Amysub2
23Allensub4
34Alicesub6
45Ayoungsub5
Right DataFrame:
idNamesubject_id
01Billysub2
12Briansub4
23Bransub3
34Brycesub6
45Bettysub5

Merge Two DataFrames on a Key

You can merge two DataFrames using a common key column by specifying the column name in theon parameter of themerge() method.

Example

The following example demonstrates how to merge two DataFrames on a key using theDataFrame.merge() method.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})# Merging DataFrames on a key 'id'result = left.merge(right, on='id')print(result)

Itsoutput is as follows −

idName_xsubject_id_xName_ysubject_id_y
01Alexsub1Billysub2
12Amysub2Briansub4
23Allensub4Bransub3
34Alicesub6Brycesub6
45Ayoungsub5Bettysub5

Merge Two DataFrames on Multiple Keys

To merge two DataFrames on multiple keys, provide a list of column names to theon parameter.

Example

The following example demonstrates how to merge DataFrames on multiple keys using themerge() method.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id':[1,2,3,4,5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id':['sub2','sub4','sub3','sub6','sub5']})# Merging on multiple keys 'id' and 'subject_id'result = left.merge(right, on=['id', 'subject_id'])print(result)

Itsoutput is as follows −

idName_xsubject_idName_y
04Alicesub6Bryce
15Ayoungsub5Betty

Merge Using 'how' Argument

Thehow argument determines which keys to include in the resulting DataFrame. If a key combination does not appear in either the left or right DataFrame, the values in the joined table will beNaN.

Merge Methods and Their SQL Equivalents

The following table summarizes thehow options and their SQL equivalents −

Merge MethodSQL EquivalentDescription
leftLEFT OUTER JOINUse keys from left object
rightRIGHT OUTER JOINUse keys from right object
outerFULL OUTER JOINUnion of keys from both DataFrames.
innerINNER JOINIntersection of keys from both DataFrames.

Example: Left Join

This example demonstrates merging the DataFrame by using theleft method.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})# Merging DataFrames using the left join method print(left.merge(right, on='subject_id', how='left'))

Itsoutput is as follows −

id_xName_xsubject_idid_yName_y
01Alexsub1NaNNaN
12Amysub21.0Billy
23Allensub42.0Brian
34Alicesub64.0Bryce
45Ayoungsub55.0Betty

Example: Right Join

This example performs the right join operation on two DataFrames using themerge() method by setting thehow='right'.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})# Merging DataFrames using the right join methodprint(left.merge(right, on='subject_id', how='right'))

Itsoutput is as follows −

id_xName_xsubject_idid_yName_y
02.0Amysub21Billy
13.0Allensub42Brian
2NaNNaNsub33Bran
34.0Alicesub64Bryce
45.0Ayoungsub55Betty

Example: Outer Join

This example will apply the outer join operation on two DataFrames by specifying thehow='outer' in themerge() method.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})# Merging the DataFrames using the outer join print(left.merge(right, how='outer', on='subject_id'))

Itsoutput is as follows −

id_xName_xsubject_idid_yName_y
01.0Alexsub1NaNNaN
12.0Amysub21.0Billy
23.0Allensub42.0Brian
34.0Alicesub64.0Bryce
45.0Ayoungsub55.0Betty
5NaNNaNsub33.0Bran

Inner Join

Joining will be performed on index. Join operation honors the object on which it is called.So,a.join(b) is not equal tob.join(a).

Example

The following example demonstrates how to apply inner join operation to the two DataFrames using theDataFrame.merge() method.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})# Merge the DataFrames using the inner join methodprint(left.merge(right, on='subject_id', how='inner'))

Itsoutput is as follows −

id_xName_xsubject_idid_yName_y
02Amysub21Billy
13Allensub42Brian
24Alicesub64Bryce
35Ayoungsub55Betty

The join() Method in Pandas

Pandas also provides aDataFrame.join() method, which is useful for merging DataFrames based on their index. It works similarly toDataFrame.merge() but is more efficient for index-based operations.

Below is the syntax of this method −

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='')

Example

This example demonstrates how to use theDataFrame.join() method for merging DataFrames using indexes instead of columns.

import pandas as pd# Creating the first DataFrameleft = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject_id': ['sub1', 'sub2', 'sub4', 'sub6', 'sub5']})# Creating the second DataFrameright = pd.DataFrame({'id': [1, 2, 3, 4, 5],'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject_id': ['sub2', 'sub4', 'sub3', 'sub6', 'sub5']})# Merge the DataFrames using the join() methodresult = left.join(right, lsuffix='_left', rsuffix='_right')print(result)

Itsoutput is as follows −

id_leftName_leftsubject_id_leftid_rightName_rightsubject_id_right
01Alexsub11Billysub2
12Amysub22Briansub4
23Allensub43Bransub3
34Alicesub64Brycesub6
45Ayoungsub55Bettysub5
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp