Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Concatenation



Concatenation in Pandas refers to the process of joining two or more Pandas objects (like DataFrames or Series) along a specified axis. This operation is very useful when you need to merge data from different sources or datasets.

The primary tool for this operation ispd.concat() function, which can useful forSeries,DataFrame objects, whether you're combining rows or columns. Concatenation in Pandas involves combining multiple DataFrame or Series objects either row-wise or column-wise.

In this tutorial, we'll explore how to concatenate Pandas objects using thepd.concat() function. By discussing the different scenarios including concatenating along rows, using keys to distinguish concatenated DataFrames, ignoring indexes during concatenation, and concatenating along columns.

Understanding the pd.concat() Function

Thepandas.concat() function is the primary method used for concatenation in Pandas. It allows you to concatenate pandas objects along a particular axis with various options for handling indexes.

The syntax of thepd.concat() functions as follows −

pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)

Where,

  • objs: This is a sequence or mapping of Series, DataFrame, or Panel objects.

  • axis: {0, 1, ...}, default 0. This is the axis to concatenate along.

  • join: {"inner", "outer"}, default "outer". How to handle indexes on other axis(es). Outer for union and inner for intersection.

  • ignore_index: boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1.

  • keys: Used to create a hierarchical index along the concatenation axis.

  • levels: Specific levels to use for the MultiIndex in the result.

  • names: Names for the levels in the resulting hierarchical index.

  • verify_integrity: If True, checks for duplicate entries in the new axis and raises an error if duplicates are found.

  • sort: When combining DataFrames with unaligned columns, this parameter ensures the columns are sorted.

  • copy: default None. If False, do not copy data unnecessarily.

Theconcat() function does all of the heavy lifting of performing concatenation operations along an axis. Let us create different objects and do concatenation.

Example: Concatenating DataFrames

In this example, the two DataFrames are concatenated along rows, with the resulting DataFrame having duplicated indices.

import pandas as pd# Creating two DataFramesone = pd.DataFrame({   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],   'subject_id':['sub1','sub2','sub4','sub6','sub5'],   'Marks_scored':[98,90,87,69,78]},   index=[1,2,3,4,5])two = pd.DataFrame({   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],   'subject_id':['sub2','sub4','sub3','sub6','sub5'],   'Marks_scored':[89,80,79,97,88]},   index=[1,2,3,4,5])# Concatenating DataFramesresult = pd.concat([one, two])print(result)

Itsoutput is as follows −

     Name subject_id  Marks_scored1    Alex       sub1            982     Amy       sub2            903   Allen       sub4            874   Alice       sub6            695  Ayoung       sub5            781   Billy       sub2            892   Brian       sub4            803    Bran       sub3            794   Bryce       sub6            975   Betty       sub5            88

Example: Concatenating with Keys

If you want to distinguish between the concatenated DataFrames, you can use the keys parameter to associate specific keys with each part of the DataFrame.

import pandas as pdone = pd.DataFrame({   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],   'subject_id':['sub1','sub2','sub4','sub6','sub5'],   'Marks_scored':[98,90,87,69,78]},   index=[1,2,3,4,5])two = pd.DataFrame({   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],   'subject_id':['sub2','sub4','sub3','sub6','sub5'],   'Marks_scored':[89,80,79,97,88]},   index=[1,2,3,4,5])print(pd.concat([one,two],keys=['x','y']))

Itsoutput is as follows −

       Name subject_id  Marks_scoredx 1    Alex       sub1            98  2     Amy       sub2            90  3   Allen       sub4            87  4   Alice       sub6            69  5  Ayoung       sub5            78y 1   Billy       sub2            89  2   Brian       sub4            80  3    Bran       sub3            79  4   Bryce       sub6            97  5   Betty       sub5            88

Here, the x and y keys create a hierarchical index, allowing easy identification of which original DataFrame each row came from.

Example: Ignoring Indexes During Concatenation

If the resultant object has to follow its own indexing, setignore_index toTrue.

import pandas as pdone = pd.DataFrame({   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],   'subject_id':['sub1','sub2','sub4','sub6','sub5'],   'Marks_scored':[98,90,87,69,78]},   index=[1,2,3,4,5])two = pd.DataFrame({   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],   'subject_id':['sub2','sub4','sub3','sub6','sub5'],   'Marks_scored':[89,80,79,97,88]},   index=[1,2,3,4,5])print(pd.concat([one,two],keys=['x','y'],ignore_index=True))

Itsoutput is as follows −

     Name subject_id  Marks_scored0    Alex       sub1            981     Amy       sub2            902   Allen       sub4            873   Alice       sub6            694  Ayoung       sub5            785   Billy       sub2            896   Brian       sub4            807    Bran       sub3            798   Bryce       sub6            979   Betty       sub5            88

Observe, the index changes completely and the Keys are also overridden.

Example: Concatenating Along Columns

Instead of concatenating along rows, you can concatenate along columns by setting theaxis parameter to 1.

import pandas as pdone = pd.DataFrame({   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],   'subject_id':['sub1','sub2','sub4','sub6','sub5'],   'Marks_scored':[98,90,87,69,78]},   index=[1,2,3,4,5])two = pd.DataFrame({   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],   'subject_id':['sub2','sub4','sub3','sub6','sub5'],   'Marks_scored':[89,80,79,97,88]},   index=[1,2,3,4,5])print(pd.concat([one,two],axis=1))

Itsoutput is as follows −

    Name subject_id  Marks_scored   Name subject_id  Marks_scored1    Alex       sub1            98  Billy       sub2            892     Amy       sub2            90  Brian       sub4            803   Allen       sub4            87   Bran       sub3            794   Alice       sub6            69  Bryce       sub6            975  Ayoung       sub5            78  Betty       sub5            88
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp