Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Sorting



Sorting is a fundamental operation when working with data in Pandas, whether you're organizing rows, columns, or specific values. Sorting can help you to arrange your data in a meaningful way for better understanding and easy analysis.

Pandas provides powerful tools for sorting your data efficiently, which can be done by labels or actual values. In this tutorial, we'll explore various methods for sorting data in Pandas, from basic sorting by index or column labels to more advanced techniques like sorting by multiple columns and choosing specific sorting algorithms.

Types of Sorting in Pandas

There are two kinds of sorting available in Pandas. They are −

  • Sorting by Label − This involves sorting the data based on the index labels.

  • Sorting by Value − This involves sorting data based on the actual values in the DataFrame or Series.

Sorting by Label

To sort by the index labels, you can use thesort_index() method, by passing the axis arguments and the order of sorting, data structure object can be sorted. By default, this method sorts the DataFrame in ascending order based on the row labels.

Example

Let's take a basic example of demonstrating the sorting a DataFrame by using thesort_index() method.

import pandas as pdimport numpy as npunsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])print("Original DataFrame:\n", unsorted_df)# Sort the DataFrame by labelssorted_df=unsorted_df.sort_index()print("\nOutput Sorted DataFrame:\n", sorted_df)

Itsoutput is as follows −

Original DataFrame:        col2      col11  1.116188  1.6317274  0.287900 -1.0973596  0.058885 -0.6422732 -2.070172  0.1482553 -1.458229  1.2989075 -0.723663  2.2200489 -1.271494  2.0010258 -0.412954 -0.8086880  0.922697 -0.4293937 -0.476054 -0.351621Output Sorted DataFrame:        col2      col10  0.922697 -0.4293931  1.116188  1.6317272 -2.070172  0.1482553 -1.458229  1.2989074  0.287900 -1.0973595 -0.723663  2.2200486  0.058885 -0.6422737 -0.476054 -0.3516218 -0.412954 -0.8086889 -1.271494  2.001025

Example − Controlling the Order of Sorting

By passing the Boolean value to ascending parameter, the order of the sorting can be controlled. Let us consider the following example to understand the same.

import pandas as pdimport numpy as npunsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])print("Original DataFrame:\n", unsorted_df)# Sort the DataFrame by ascending ordersorted_df = unsorted_df.sort_index(ascending=False)print("\nOutput Sorted DataFrame:\n", sorted_df)

Itsoutput is as follows −

Original DataFrame:        col2      col11 -0.668366  0.5764224  0.605218 -0.0660656  1.140478  0.2366872  0.137617  0.3124233 -0.055631  0.7740575  0.108002  1.0388209 -0.929134 -0.9823588 -0.207542 -1.2833860 -0.210571 -0.6563717 -0.106388  0.672418Output Sorted DataFrame:        col2      col19 -0.929134 -0.9823588 -0.207542 -1.2833867 -0.106388  0.6724186  1.140478  0.2366875  0.108002  1.0388204  0.605218 -0.0660653 -0.055631  0.7740572  0.137617  0.3124231 -0.668366  0.5764220 -0.210571 -0.656371

Example − Sort the Columns

By passing the axis argument with a value 0 or 1, the sorting can be done on the column labels. By default, axis=0, sort by row. Let us consider the following example to understand the same.

import pandas as pdimport numpy as np unsorted_df = pd.DataFrame(np.random.randn(6,4),index=[1,4,2,3,5,0],columns = ['col2','col1', 'col4', 'col3'])print("Original DataFrame:\n", unsorted_df)# Sort the DataFrame columnssorted_df=unsorted_df.sort_index(axis=1)print("\nOutput Sorted DataFrame:\n", sorted_df)

Itsoutput is as follows −

Original DataFrame:        col2      col1      col4      col31 -0.828951 -0.798286 -1.794752 -0.0826564  0.440243 -0.693218 -0.218277 -0.7901682  1.017670  1.443679 -1.939119 -1.8872233 -0.992471 -1.425046  0.651336 -0.2782475 -0.103537 -0.879433  0.471838  0.8608850 -0.222297  1.094805  0.501531 -0.580382Output Sorted DataFrame:        col1      col2      col3      col41 -0.798286 -0.828951 -0.082656 -1.7947524 -0.693218  0.440243 -0.790168 -0.2182772  1.443679  1.017670 -1.887223 -1.9391193 -1.425046 -0.992471 -0.278247  0.6513365 -0.879433 -0.103537  0.860885  0.4718380  1.094805 -0.222297 -0.580382  0.501531

Sorting by Actual Values

Like index sorting, sorting by actual values can be done using thesort_values() method. This method allows sorting by one or more columns. It accepts a 'by' argument which will use the column name of the DataFrame with which the values are to be sorted.

Example − Sorting a Series Values

The following example demonstrates how to sort a pandas Series object using thesort_values() method.

import pandas as pdpanda_series = pd.Series([18, 95, 66, 12, 55, 0])print("Unsorted Pandas Series: \n", panda_series)panda_series_sorted = panda_series.sort_values(ascending=True)print("\nSorted Pandas Series: \n", panda_series_sorted)

On executing the above code you will get the following output −

Unsorted Pandas Series:  0    181    952    663    124    555     0dtype: int64Sorted Pandas Series:  5     03    120    184    552    661    95dtype: int64

Example − Sorting a DataFrame Values

The following example demonstrates working of thesort_values() method on a DataFrame Object.

import pandas as pdimport numpy as npunsorted_df = pd.DataFrame({'col1':[2,9,5,0],'col2':[1,3,2,4]})print("Original DataFrame:\n", unsorted_df)# Sort the DataFrame by valuessorted_df = unsorted_df.sort_values(by='col1')print("\nOutput Sorted DataFrame:\n", sorted_df)

Itsoutput is as follows −

Original DataFrame:    col1  col20     2     11     9     32     5     23     0     4Output Sorted DataFrame:    col1  col23     0     40     2     12     5     21     9     3

Observe, col1 values are sorted and the respective col2 value and row index will alter along with col1. Thus, they look unsorted.

Example − Sorting Value of the Multiple Columns

You can also sort by multiple columns by passing a list of column names to the'by' parameter.

import pandas as pdimport numpy as npunsorted_df = pd.DataFrame({'col1':[2,1,0,1],'col2':[1,3,4,2]})print("Original DataFrame:\n", unsorted_df)# Sort the DataFrame multiple columns by valuessorted_df = unsorted_df.sort_values(by=['col1','col2'])print("\nOutput Sorted DataFrame:\n", sorted_df)

Itsoutput is as follows −

Original DataFrame:    col1  col20     2     11     1     32     0     43     1     2Output Sorted DataFrame:    col1  col22     0     43     1     21     1     30     2     1

Choosing a Sorting Algorithm

Pandas allows you to specify the sorting algorithm using the kind parameter in thesort_values() method. You can choose between 'mergesort', 'heapsort', and 'quicksort'. 'mergesort' is the only stable algorithm.

Example

The following example sorts a DataFrame using thesort_values() method with specific algorithm.

import pandas as pdimport numpy as npunsorted_df = pd.DataFrame({'col1':[2,5,0,1],'col2':[1,3,0,4]})print("Original DataFrame:\n", unsorted_df)# Sort the DataFrame sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort')print("\nOutput Sorted DataFrame:\n", sorted_df)

Itsoutput is as follows −

Original DataFrame:    col1  col20     2     11     5     32     0     03     1     4Output Sorted DataFrame:    col1  col22     0     03     1     40     2     11     5     3
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp