Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - GroupBy



Pandasgroupby() is an essential method for data aggregation and analysis in python. It follows the "Split-Apply-Combine" pattern, which means it allows users to −

  • Split data into groups based on specific criteria.

  • Apply functions independently to each group.

  • Combine the results into a structured format.

In this tutorial, we will learn about basics of groupby operations in pandas, such as splitting data, viewing groups, and selecting specific groups using an example dataset.

Introduction to GroupBy Operations

Everygroupby() operation involves three key steps, splitting data into groups based on some criteria, apply functions independently to each group, and then merge the results back into a meaningful structure.

In many situations, we apply some functions on each splitted groups. In the apply functionality, we can perform the following operations −

Split Data into Groups

Pandas objects can be split into groups based on any of their column values using thegroupby() method.

Example

Let us now see how the grouping objects can be applied to the Pandas DataFrame using thegroupby() method.

# import the pandas libraryimport pandas as pdipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}df = pd.DataFrame(ipl_data)# Display the Original DataFrameprint("Original DataFrame:")print(df)# Display the Grouped Dataprint('\nGrouped Data:')print(df.groupby('Team'))

Following is the output of the above code −

Original DataFrame:
TeamRankYearPoints
0Riders12014876
1Riders22015789
2Devils22014863
3Devils32015673
4Kings32014741
5kings42015812
6Kings12016756
7Kings12017788
8Riders22016694
9Royals42014701
10Royals12015804
11Riders22017690
Grouped Data:<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1a11545060>

GroupBy with Multiple Columns

You can group data based on multiple columns by applying a list of column values to thegroupby() method.

Example

Here is an example where the data is grouped by multiple columns.

# import the pandas libraryimport pandas as pd# Create a DataFrameipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}df = pd.DataFrame(ipl_data)# Display the Grouped Dataprint('Grouped Data:')print(df.groupby(['Team','Year']).groups)

Itsoutput is as follows −

Grouped Data:{('Devils', 2014): [2], ('Devils', 2015): [3], ('Kings', 2014): [4], ('Kings', 2016): [6], ('Kings', 2017): [7], ('Riders', 2014): [0], ('Riders', 2015): [1], ('Riders', 2016): [8], ('Riders', 2017): [11], ('Royals', 2014): [9], ('Royals', 2015): [10], ('kings', 2015): [5]}

Viewing Grouped Data

Once you have your data split into groups, you can view them using different methods. One of the simplest ways is to view how it has been internally stored using the.groups attribute.

Example

The following example demonstrates how to view the grouped data using the using the.groups attribute.

# import the pandas libraryimport pandas as pd# Create DataFrame ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}df = pd.DataFrame(ipl_data)print('Viewing Grouped Data:')print(df.groupby('Team').groups)

Itsoutput is as follows −

Viewing Grouped Data:{'Devils': [2, 3], 'Kings': [4, 6, 7], 'Riders': [0, 1, 8, 11], 'Royals': [9, 10], 'kings': [5]}

Selecting a Specific Group

Using theget_group() method, we can select a specific group.

Example

The following example demonstrates selecting a group from a grouped data using theget_group() method.

# import the pandas libraryimport pandas as pdipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}df = pd.DataFrame(ipl_data)grouped = df.groupby('Year')# Display the Selected Dataprint('Selected Group Data:')print(grouped.get_group(2014))

Itsoutput is as follows −

Selected Group Data:
TeamRankYearPoints
0Riders12014876
2Devils22014863
4Kings32014741
9Royals42014701
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp