Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Comparison with SQL

Pandas is a powerful Python library for data manipulation and analysis, widely used in data science and engineering. Many potential Pandas users come from a background in SQL, a language designed for managing and querying relational databases. Understanding how to perform SQL-like operations using Pandas can significantly ease the transition and enhance productivity.

This tutorial provides a side-by-side comparison of common SQL operations and their equivalents in Pandas, using the popular "tips" dataset.

Importing the Necessary Libraries

Before we dive into the comparison, let's start by importing the necessary libraries.

import pandas as pdimport numpy as np

We will also load the "tips" dataset, which will be used throughout this tutorial.

import pandas as pdurl = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv'tips=pd.read_csv(url)print(tips.head())

Itsoutput is as follows −

    total_bill   tip      sex  smoker  day     time  size0        16.99  1.01   Female      No  Sun  Dinner      21        10.34  1.66     Male      No  Sun  Dinner      32        21.01  3.50     Male      No  Sun  Dinner      33        23.68  3.31     Male      No  Sun  Dinner      24        24.59  3.61   Female      No  Sun  Dinner      4

Selecting Columns

In SQL, theSELECT statement is used to retrieve specific columns from a table. Selection is done using a comma-separated list of columns that you select (or a * to select all columns) −

SELECT total_bill, tip, smoker, timeFROM tipsLIMIT 5;

In Pandas, you can achieve the same result by selecting columns from a DataFrame using a list of column names −

tips[['total_bill', 'tip', 'smoker', 'time']].head(5)

Example

Let's check the full program of displaying the first five rows of the selected columns −

import pandas as pdurl = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv' tips=pd.read_csv(url)print(tips[['total_bill', 'tip', 'smoker', 'time']].head(5))

Itsoutput is as follows −

   total_bill   tip  smoker     time0       16.99  1.01      No   Dinner1       10.34  1.66      No   Dinner2       21.01  3.50      No   Dinner3       23.68  3.31      No   Dinner4       24.59  3.61      No   Dinner

Calling the DataFrame without the list of column names will display all columns (akin to SQLs *).

Filtering Rows

In SQL, theWHERE clause is used to filter records based on specific conditions.

SELECT * FROM tips WHERE time = 'Dinner' LIMIT 5;

DataFrames can be filtered in multiple ways; the most intuitive of which is using Boolean indexing.

tips[tips['time'] == 'Dinner'].head(5)

Example

Let's check the full program of displaying the first five records where the time is equal to 'Dinner' −

import pandas as pdurl = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv'tips=pd.read_csv(url)print(tips[tips['time'] == 'Dinner'].head(5))

Itsoutput is as follows −

   total_bill   tip      sex  smoker  day    time  size0       16.99  1.01   Female     No   Sun  Dinner    21       10.34  1.66     Male     No   Sun  Dinner    32       21.01  3.50     Male     No   Sun  Dinner    33       23.68  3.31     Male     No   Sun  Dinner    24       24.59  3.61   Female     No   Sun  Dinner    4

The above statement passes a Series of True/False objects to the DataFrame, returning all rows with True.

Grouping Data

SQL'sGROUP BY clause is used to group rows that have the same values in specified columns and perform aggregate functions on them. For example, to count the number of tips left by each gender: −

SELECT sex, count(*)FROM tipsGROUP BY sex;

In Pandas, thegroupby() method is used to achieve the same result −

tips.groupby('sex').size()

Example

Let's check the full program of displaying the count of tips grouped by gender −

import pandas as pdurl = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv'tips=pd.read_csv(url)print(tips.groupby('sex').size())

Itsoutput is as follows −

sexFemale   87Male    157dtype: int64

Limiting the Number of Rows

In SQL, theLIMIT clause is used to limit the number of rows returned by a query. For example −

SELECT * FROM tipsLIMIT 5 ;

In Pandas, thehead() method is used to achieve this −

tips.head(5)

Example

Let's check the full example of displaying the first five rows of the DataFrame −

import pandas as pdurl = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv'tips=pd.read_csv(url)tips = tips[['smoker', 'day', 'time']].head(5)print(tips)

Itsoutput is as follows −

   smoker   day     time0      No   Sun   Dinner1      No   Sun   Dinner2      No   Sun   Dinner3      No   Sun   Dinner4      No   Sun   Dinner

These are the few basic operations we compared are, which we learnt, in the previous chapters of the Pandas Library.

Print Page

Movatterモバイル変換

Python Pandas - Comparison with SQL

Importing the Necessary Libraries

Selecting Columns

Example

Filtering Rows

Example

Grouping Data

Example

Limiting the Number of Rows

Example