Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

How to Execute SQL Queries in Python and R Tutorial

Learn easy and effective ways to run SQL queries in Python and R for data analysis and database management.
Apr 4, 2022 · 13 min read

What is SQL

Structured Query Language (SQL) is the most common language used for running various data analysis tasks. It is also used for maintaining a relational database, for example: adding tables, removing values, and optimizing the database. A simple relational database consists of multiple tables that are interconnected, and each table consists of rows and columns.

On average, a technology company generates millions of data points every day. A storage solution that is robust and effective is required so they can use the data to improve the current system or come up with a new product. A relational database such as MySQL, PostgreSQL, and SQLite solve these problems by providing robust database management, security, and high performance.

Core functionalities of SQL

  • Create new tables in a database
  • Execute queries against a database
  • Retrieve data from a database
  • Insert records into a database
  • Update records in a database
  • Delete records from a database
  • Optimize any database

SQL is a high-demand skill that will help you land any job in the tech industry. Companies like Meta, Google, and Netflix are always on the lookout for data professionals that can extract information from SQL databases and come up with innovative solutions. You can learn the basics of SQL by taking theIntroduction to SQL tutorial on DataCamp.

Why use SQL with Python and R?

SQL can help us discover the company's performance, understand customer behaviors, and monitor the success metrics of marketing campaigns. Most data analysts can perform the majority of business intelligence tasks by running SQL queries, so why do we need tools such as PoweBI, Python, and R? By using SQL queries, you can tell what has happened in the past, but you cannot predict future projections. These tools help us understand more about the current performance and potential growth.

Python and R are multipurpose languages that allow professionals to run advanced statistical analysis, build machine learning models, create data APIs, and eventually help companies to think beyond KPIs. In this tutorial, we will learn to connect SQL databases, populate databases, and run SQL queries using Python and R.

Note: If you are new to SQL, then take theSQL skill track to understand the fundamentals of writing SQL queries.

Python Tutorial

The Python tutorial will cover the basics of connecting with various databases (MySQL and SQLite), creating tables, adding records, running queries, and learning about thePandas functionread_sql.

Setting Up

We can connect the database usingSQLAlchemy, but in this tutorial, we are going to use the inbuilt Python packageSQLite3 to run queries on the database. SQLAlchemy provides support for all kinds of databases by providing a unified API. If you are interested in learning more about SQLAlchemy and how it works with other databases then check out theIntroduction to Databases in Python course.

MySQL is the most popular database engine in the world, and it is widely used by companies like Youtube, Paypal, LinkedIn, and GitHub. Here we will learn how to connect the database. The rest of the steps for using MySQL are similar to the SQLite3 package.

First, install themysql package using '!pip install mysql' and then create a local database engine by providing your username, password, and database name.

import mysql.connector as sqlconn = sql.connect(  host="localhost",  user="abid",  password="12345",  database="datacamp_python")

Similarly, we can create or load a SQLite database by using thesqlite3.connect function.SQLite is a library that implements a self-contained, zero-configuration, and serverless database engine. It isDataLab-friendly, so we will use it in our project to avoid local host errors.

import sqlite3import pandas as pdconn= sqlite3.connect("datacamp_python.db")

Start Learning Python and SQL For Free

Introduction to Databases in Python

BeginnerSkill Level
4 hr
97.3K learners
In this course, you'll learn the basics of relational databases and how to interact with them.

Introduction to SQL

BeginnerSkill Level
2 hr
975.9K learners
Learn how to create and query relational databases using SQL in just two hours.

Creating a Database

In this part, we will learn how to load theCOVID-19's impact on airport traffic dataset, under theCC BY-NC-SA 4.0 license, into our SQLite database. We will also learn how to create tables from scratch.

Image fromKaggle Dataset

The airport traffic dataset consists of a percentage of the traffic volume during the baseline period from February 1, 2020, to March 15, 2020. We will load a CSV file by usingPandas functionread_csv and then use theto_sql function to transfer the dataframe into our SQLite table. Theto_sql function requires a table name (String) and connection to the SQLite engine.

data = pd.read_csv("data/covid_impact_on_airport_traffic.csv")data.to_sql(            'airport',             # Name of the sql table            conn,                 # sqlite.Connection or sqlalchemy.engine.Engine            if_exists='replace'           )

We will now test whether we have succeeded by running a quick SQL query. Before running a query, we need to create acursor that will help us execute queries, as shown in the code block below. You can have multiple cursors on the same database within a single connection.

In our case, the SQL query returned three columns and five rows from theairport table. To display the first row, we will usecursor.fetchone().

cursor = conn.cursor()cursor.execute("""SELECT Date, AirportName, PercentOfBaseline                    FROM airport                  LIMIT 5""")cursor.fetchone()>>> ('2020-04-03', 'Kingsford Smith', 64)

To display the rest of the records, we will usecursor.fetchall(). The airport dataset is successfully loaded into the database with a few lines of code.

cursor.fetchall()>>> [('2020-04-13', 'Kingsford Smith', 29),     ('2020-07-10', 'Kingsford Smith', 54),     ('2020-09-02', 'Kingsford Smith', 18),     ('2020-10-31', 'Kingsford Smith', 22)]

Now, let’s learn to create a table from scratch and populate it by adding sample values. We will create astudentinfo table withid (integer, primary key, auto-increment),name (text), andsubject (text).

Note: The SQLite syntax is a bit different. It is recommended to check out the SQLite cheat sheet to understand the SQL queries mentioned in this tutorial.

cursor.execute("""                CREATE TABLE studentinfo                (                id INTEGER PRIMARY KEY AUTOINCREMENT,                name TEXT,                subject TEXT                )                 """)

Let’s check how many tables we have added to the database by running a simple SQLite query.

cursor.execute("""                SELECT name                FROM sqlite_master                WHERE type='table'                 """)cursor.fetchall()>>> [('airport',), ('studentinfo',)]

Running Queries

In this section, we will add values to thestudentinfo table and run simple SQL queries. By usingINSERT INTO, we can add a single row to thestudentinfo table.

To insert values, we need to provide aquery andvalue arguments to theexecute function. The function fills the“?” inputs with the values that we have provided.

query = """            INSERT INTO studentinfo            (name, subject)            VALUES (?, ?)        """value = ("Marry", "Math")cursor.execute(query,value)

Repeat the above query by adding multiple records.

query = """           INSERT INTO studentinfo           (name, subject)           VALUES (?, ?)        """values = [("Abid", "Stat"),          ("Carry", "Math"),          ("Ali","Data Science"),          ("Nisha","Data Science"),          ("Matthew","Math"),          ("Henry","Data Science")]cursor.executemany(query,values)

It’s time to verify the record. To do this we will run a simple SQL query that will return rows where thesubject isData Science.

cursor.execute("""                SELECT *                FROM studentinfo                WHERE subject LIKE 'Data Science'                 """)cursor.fetchall()>>> [(4, 'Ali', 'Data Science'),     (5, 'Nisha', 'Data Science'),     (7, 'Henry', 'Data Science')]

TheDISTINCT subject command is used to display unique values present in subject columns. In our case, it’s Math, Stat, and Data Science.

cursor.execute("SELECT DISTINCT subject from studentinfo")cursor.fetchall()>>> [('Math',), ('Stat',), ('Data Science',)]

To save all of the changes, we will use thecommit() function. Without a commit, the data will be lost after the machine restarts.

conn.commit()

SQL with Pandas

In this part we will learn how to extract the data from the SQLite database and convert it into a Pandas dataframe with one line of code.read_sql provides more than just running SQL queries. We can use it for setting index columns, parsing date and time, adding values, and filtering out column names. Learn more about importing data in Python by taking a shortDataCamp course.

read_sql requires two arguments: aSQL query andconnection to the SQLite engine. The output contains the top five rows from theairport table wherePercentOfBaseline is greater than 20.

data_sql_1 = pd.read_sql("""                        SELECT Date,City,PercentOfBaseline                        FROM airport                        WHERE PercentOfBaseline > 20                        ORDER BY Date DESC                        """,                        conn)print(data_sql_1.head())         Date          City  PercentOfBaseline0  2020-12-02        Sydney                 271  2020-12-02      Santiago                 482  2020-12-02       Calgary                 993  2020-12-02  Leduc County                1004  2020-12-02      Richmond                 86

Running data analysis on relational databases has become easier with Pandas integration. We can also use this data to forecast the values and run complex statistical analyses.

Theplot function is used to visualize the histogram of thePercentOfBaseline column.

data_sql_1.plot(y="PercentOfBaseline",kind="hist");

Similarly, we can limit the values to the top 20 and display a time series line chart.

data_sql_2 = pd.read_sql("""                        SELECT Date,City,PercentOfBaseline                        FROM airport                        WHERE PercentOfBaseline > 20                        ORDER BY Date DESC                        LIMIT 20                        """,                        conn)data_sql_2.plot(x="Date",y="PercentOfBaseline",kind="line");

Finally, we will close the connection to free up resources. Most packages do this automatically, but it is preferred to close the connections after finalizing the changes.

conn.close()

R Tutorial

We are going to replicate all the tasks from the Python tutorial using R. The tutorial includes creating connections, writing tables, appending rows, running queries, and data analysis withdplyr.

Setting Up

TheDBI package is used for connecting with the most popular databases such asMariaDB,Postgres,Duckdb, andSQLite. For example, install theRMySQL package and create a database by providing a username, password, database name, and host address.

install.packages("RMySQL")library(RMySQL)conn = dbConnect(                MySQL(),                user = 'abid',                password = '1234',                dbname = 'datacamp_R',                host = 'localhost'                )

In this tutorial, we are going to create an SQLite database by providing a name and the SQLite function.

library(RSQLite)library(DBI)library(tidyverse)conn = dbConnect(SQLite(), dbname = 'datacamp_R.db')

Creating Database

By importing thetidyverse library, we will have access to the dplyr, ggplot, and defaults datasets.

dbWriteTable function takesdata.frame and adds it into the SQL table. It takes three arguments: connection to SQLite, name of the table, and data frame. WithdbReadTable, we can view the entire table. To view the top 6 rows, we have usedhead.

dbWriteTable(conn, "cars", mtcars)head(dbReadTable(conn, "cars"))    mpg    cyl    disp    hp    drat    wt    qsec    vs    am    gear    carb1    21.0    6    160    110    3.90    2.620    16.46    0    1    4    42    21.0    6    160    110    3.90    2.875    17.02    0    1    4    43    22.8    4    108    93    3.85    2.320    18.61    1    1    4    14    21.4    6    258    110    3.08    3.215    19.44    1    0    3    15    18.7    8    360    175    3.15    3.440    17.02    0    0    3    26    18.1    6    225    105    2.76    3.460    20.22    1    0    3    1

dbExecute lets us execute any SQLite query, so we will be using it to create a table calledidcard.

To display the names of the tables in the database, we will usedbListTables.

dbExecute(conn, 'CREATE TABLE idcard (id int, name text)')dbListTables(conn)>>> 'cars''idcard'

Let’s add a single row to theidcard table and usedbGetQuery to display results.

Note: dbGetQuery runs a query and returns the records whereasdbExecute runs SQL query but does not return any records.

dbExecute(conn, "INSERT INTO idcard (id,name)\                 VALUES(1,'love')")dbGetQuery(conn,"SELECT * FROM idcard")id    name1    love

We will now add two more rows and display results by usingdbReadTable.

dbExecute(conn,"INSERT INTO idcard (id,name)\                VALUES(2,'Kill'),(3,'Game')               ")dbReadTable(conn,'idcard')id    name1    love2    Kill3    Game

dbCreateTable lets us create a hassle-free table. It requires three arguments; connection, name of the table, and either a character vector or a data.frame. The character vector consists of names (column names) and values (types). In our case, we are going to provide a default populationdata.frame to create the initial structure.

dbCreateTable(conn,'population',population)dbReadTable(conn,'population')country    year    population

Then, we are going to usedbAppendTable to add values in thepopulation table.

dbAppendTable(conn,'population',head(population))dbReadTable(conn,'population')country    year    populationAfghanistan    1995    17586073Afghanistan    1996    18415307Afghanistan    1997    19021226Afghanistan    1998    19496836Afghanistan    1999    19987071Afghanistan    2000    20595360

Running Queries

We will usedbGetQuery to perform all of our data analytics tasks. Let’s try to run a simple query and then learn more about other functions.

dbGetQuery(conn,"SELECT * FROM idcard")id    name1    love2    Kill3    Game

You can also run a complex SQL query to filter horsepower and display limited rows and columns.

dbGetQuery(conn, "SELECT mpg,hp,gear\                  FROM cars\                  WHERE hp > 50\                  LIMIT 5")mpg    hp    gear21.0    110    421.0    110    422.8    93    421.4    110    318.7    175    3

To remove tables, usedbRemoveTable. As we can now see, we have successfully removed theidcard table.

dbRemoveTable(conn,'idcard')dbListTables(conn)>>> 'cars''population'

To understand more about tables we will usedbListFields which will display the column names in a particular table.

dbListFields(conn, "cars")>>> 'mpg''cyl''disp''hp''drat''wt''qsec''vs''am''gear''carb'

SQL with dplyr

In this section, we will use dplyr to read tables and then run queries using filter, select, and collect. If you don’t want tolearn SQL syntax and want to perform all the tasks using pure R, then this method is for you. We have pulled thecars table, filtered it bygears andmpg, and then selected three columns as shown below.

cars_results <-  tbl(conn, "cars") %>%  filter(gear %in% c(4, 3),         mpg >= 14,         mpg <= 21) %>%  select(mpg, hp, gear) %>%  collect()cars_resultsmpg    hp    gear21.0    110    421.0    110    418.7    175    318.1    105    314.3    245    3...   ...   ...

We can use the filtered data frame to display a boxplot graph usingggplot.

ggplot(cars_results,aes(fill=as.character(gear), y=mpg)) +  geom_boxplot()

Or we can display a facet point plot divided by the number ofgears.

ggplot(cars_results,  aes(mpg, ..count.. ) ) +  geom_point(stat = "count", size = 4) +  coord_flip()+  facet_grid( as.character(gear) ~ . )

Conclusion

In this tutorial, we have learned the importance of running SQL queries with Python and R, creating databases, adding tables, and performing data analysis using SQL queries. We have also learned how Pandas and dplyr help us run queries with a single line of code.

SQL is a must-learn skill for all tech-related jobs. If you are starting your career as a data analyst, we recommend completing theData Analyst with SQL Server career track within two months. This career track will teach you everything about SQL queries, servers, and managing resources.

You can run all the scripts used in this tutorial for free:

Related Python and SQL Courses

Course

Introduction to Python

4 hr
6.6M
Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.

Course

Introduction to Databases in Python

4 hr
99.7K
In this course, you'll learn the basics of relational databases and how to interact with them.
See MoreRight Arrow
Related

Tutorial

How to Execute Python/R in SQL Tutorial

Discover how to execute Python and R in SQL and unlock powerful new machine learning possibilities for your databases.
Moez Ali's photo

Moez Ali

Tutorial

How to Execute Python/R in SQL

After reading this tutorial, you'll know how to embed R & Python scripts in T-SQL statements & know what data types are used to pass data between SQL & Python/R.
Kyle Weller's photo

Kyle Weller

Tutorial

Remote Python and R in SQL

Learn how to remotely send R and Python execution to a SQL Server from Jupyter Notebooks.
Kyle Weller's photo

Kyle Weller

SQLAlchemy_Tutorial.

Tutorial

SQLAlchemy Tutorial With Examples

Learn to access and run SQL queries on all types of relational databases using Python objects.
Abid Ali Awan's photo

Abid Ali Awan

Tutorial

How to Use SQLite in Python

In this tutorial, you will learn about using SQLite, an extremely light-weight Relational database management system (RDBMS) in Python.
Sayak Paul's photo

Sayak Paul

Tutorial

SQL Query Examples and Tutorial

If you are looking to get started with SQL, we’ve got you covered. In this SQL tutorial, we will introduce you to SQL queries - a powerful tool that enables us to work with the data stored in a database.
Sejal Jaiswal's photo

Sejal Jaiswal

See MoreSee More

[8]ページ先頭

©2009-2025 Movatter.jp