
Structured Query Language (SQL) is the most common language used for running various data analysis tasks. It is also used for maintaining a relational database, for example: adding tables, removing values, and optimizing the database. A simple relational database consists of multiple tables that are interconnected, and each table consists of rows and columns.
On average, a technology company generates millions of data points every day. A storage solution that is robust and effective is required so they can use the data to improve the current system or come up with a new product. A relational database such as MySQL, PostgreSQL, and SQLite solve these problems by providing robust database management, security, and high performance.
Core functionalities of SQL
SQL is a high-demand skill that will help you land any job in the tech industry. Companies like Meta, Google, and Netflix are always on the lookout for data professionals that can extract information from SQL databases and come up with innovative solutions. You can learn the basics of SQL by taking theIntroduction to SQL tutorial on DataCamp.
SQL can help us discover the company's performance, understand customer behaviors, and monitor the success metrics of marketing campaigns. Most data analysts can perform the majority of business intelligence tasks by running SQL queries, so why do we need tools such as PoweBI, Python, and R? By using SQL queries, you can tell what has happened in the past, but you cannot predict future projections. These tools help us understand more about the current performance and potential growth.
Python and R are multipurpose languages that allow professionals to run advanced statistical analysis, build machine learning models, create data APIs, and eventually help companies to think beyond KPIs. In this tutorial, we will learn to connect SQL databases, populate databases, and run SQL queries using Python and R.
Note: If you are new to SQL, then take theSQL skill track to understand the fundamentals of writing SQL queries.
The Python tutorial will cover the basics of connecting with various databases (MySQL and SQLite), creating tables, adding records, running queries, and learning about thePandas functionread_sql.
We can connect the database usingSQLAlchemy, but in this tutorial, we are going to use the inbuilt Python packageSQLite3 to run queries on the database. SQLAlchemy provides support for all kinds of databases by providing a unified API. If you are interested in learning more about SQLAlchemy and how it works with other databases then check out theIntroduction to Databases in Python course.
MySQL is the most popular database engine in the world, and it is widely used by companies like Youtube, Paypal, LinkedIn, and GitHub. Here we will learn how to connect the database. The rest of the steps for using MySQL are similar to the SQLite3 package.
First, install themysql package using '!pip install mysql' and then create a local database engine by providing your username, password, and database name.
import mysql.connector as sqlconn = sql.connect( host="localhost", user="abid", password="12345", database="datacamp_python")Similarly, we can create or load a SQLite database by using thesqlite3.connect function.SQLite is a library that implements a self-contained, zero-configuration, and serverless database engine. It isDataLab-friendly, so we will use it in our project to avoid local host errors.
import sqlite3import pandas as pdconn= sqlite3.connect("datacamp_python.db")In this part, we will learn how to load theCOVID-19's impact on airport traffic dataset, under theCC BY-NC-SA 4.0 license, into our SQLite database. We will also learn how to create tables from scratch.

The airport traffic dataset consists of a percentage of the traffic volume during the baseline period from February 1, 2020, to March 15, 2020. We will load a CSV file by usingPandas functionread_csv and then use theto_sql function to transfer the dataframe into our SQLite table. Theto_sql function requires a table name (String) and connection to the SQLite engine.
data = pd.read_csv("data/covid_impact_on_airport_traffic.csv")data.to_sql( 'airport', # Name of the sql table conn, # sqlite.Connection or sqlalchemy.engine.Engine if_exists='replace' )We will now test whether we have succeeded by running a quick SQL query. Before running a query, we need to create acursor that will help us execute queries, as shown in the code block below. You can have multiple cursors on the same database within a single connection.
In our case, the SQL query returned three columns and five rows from theairport table. To display the first row, we will usecursor.fetchone().
cursor = conn.cursor()cursor.execute("""SELECT Date, AirportName, PercentOfBaseline FROM airport LIMIT 5""")cursor.fetchone()>>> ('2020-04-03', 'Kingsford Smith', 64)To display the rest of the records, we will usecursor.fetchall(). The airport dataset is successfully loaded into the database with a few lines of code.
cursor.fetchall()>>> [('2020-04-13', 'Kingsford Smith', 29), ('2020-07-10', 'Kingsford Smith', 54), ('2020-09-02', 'Kingsford Smith', 18), ('2020-10-31', 'Kingsford Smith', 22)]Now, let’s learn to create a table from scratch and populate it by adding sample values. We will create astudentinfo table withid (integer, primary key, auto-increment),name (text), andsubject (text).
Note: The SQLite syntax is a bit different. It is recommended to check out the SQLite cheat sheet to understand the SQL queries mentioned in this tutorial.
cursor.execute(""" CREATE TABLE studentinfo ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, subject TEXT ) """)Let’s check how many tables we have added to the database by running a simple SQLite query.
cursor.execute(""" SELECT name FROM sqlite_master WHERE type='table' """)cursor.fetchall()>>> [('airport',), ('studentinfo',)]In this section, we will add values to thestudentinfo table and run simple SQL queries. By usingINSERT INTO, we can add a single row to thestudentinfo table.
To insert values, we need to provide aquery andvalue arguments to theexecute function. The function fills the“?” inputs with the values that we have provided.
query = """ INSERT INTO studentinfo (name, subject) VALUES (?, ?) """value = ("Marry", "Math")cursor.execute(query,value)Repeat the above query by adding multiple records.
query = """ INSERT INTO studentinfo (name, subject) VALUES (?, ?) """values = [("Abid", "Stat"), ("Carry", "Math"), ("Ali","Data Science"), ("Nisha","Data Science"), ("Matthew","Math"), ("Henry","Data Science")]cursor.executemany(query,values)It’s time to verify the record. To do this we will run a simple SQL query that will return rows where thesubject isData Science.
cursor.execute(""" SELECT * FROM studentinfo WHERE subject LIKE 'Data Science' """)cursor.fetchall()>>> [(4, 'Ali', 'Data Science'), (5, 'Nisha', 'Data Science'), (7, 'Henry', 'Data Science')]TheDISTINCT subject command is used to display unique values present in subject columns. In our case, it’s Math, Stat, and Data Science.
cursor.execute("SELECT DISTINCT subject from studentinfo")cursor.fetchall()>>> [('Math',), ('Stat',), ('Data Science',)]To save all of the changes, we will use thecommit() function. Without a commit, the data will be lost after the machine restarts.
conn.commit()In this part we will learn how to extract the data from the SQLite database and convert it into a Pandas dataframe with one line of code.read_sql provides more than just running SQL queries. We can use it for setting index columns, parsing date and time, adding values, and filtering out column names. Learn more about importing data in Python by taking a shortDataCamp course.
read_sql requires two arguments: aSQL query andconnection to the SQLite engine. The output contains the top five rows from theairport table wherePercentOfBaseline is greater than 20.
data_sql_1 = pd.read_sql(""" SELECT Date,City,PercentOfBaseline FROM airport WHERE PercentOfBaseline > 20 ORDER BY Date DESC """, conn)print(data_sql_1.head()) Date City PercentOfBaseline0 2020-12-02 Sydney 271 2020-12-02 Santiago 482 2020-12-02 Calgary 993 2020-12-02 Leduc County 1004 2020-12-02 Richmond 86Running data analysis on relational databases has become easier with Pandas integration. We can also use this data to forecast the values and run complex statistical analyses.
Theplot function is used to visualize the histogram of thePercentOfBaseline column.
data_sql_1.plot(y="PercentOfBaseline",kind="hist");
Similarly, we can limit the values to the top 20 and display a time series line chart.
data_sql_2 = pd.read_sql(""" SELECT Date,City,PercentOfBaseline FROM airport WHERE PercentOfBaseline > 20 ORDER BY Date DESC LIMIT 20 """, conn)data_sql_2.plot(x="Date",y="PercentOfBaseline",kind="line");
Finally, we will close the connection to free up resources. Most packages do this automatically, but it is preferred to close the connections after finalizing the changes.
conn.close()We are going to replicate all the tasks from the Python tutorial using R. The tutorial includes creating connections, writing tables, appending rows, running queries, and data analysis withdplyr.
TheDBI package is used for connecting with the most popular databases such asMariaDB,Postgres,Duckdb, andSQLite. For example, install theRMySQL package and create a database by providing a username, password, database name, and host address.
install.packages("RMySQL")library(RMySQL)conn = dbConnect( MySQL(), user = 'abid', password = '1234', dbname = 'datacamp_R', host = 'localhost' )In this tutorial, we are going to create an SQLite database by providing a name and the SQLite function.
library(RSQLite)library(DBI)library(tidyverse)conn = dbConnect(SQLite(), dbname = 'datacamp_R.db')By importing thetidyverse library, we will have access to the dplyr, ggplot, and defaults datasets.
dbWriteTable function takesdata.frame and adds it into the SQL table. It takes three arguments: connection to SQLite, name of the table, and data frame. WithdbReadTable, we can view the entire table. To view the top 6 rows, we have usedhead.
dbWriteTable(conn, "cars", mtcars)head(dbReadTable(conn, "cars")) mpg cyl disp hp drat wt qsec vs am gear carb1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 42 21.0 6 160 110 3.90 2.875 17.02 0 1 4 43 22.8 4 108 93 3.85 2.320 18.61 1 1 4 14 21.4 6 258 110 3.08 3.215 19.44 1 0 3 15 18.7 8 360 175 3.15 3.440 17.02 0 0 3 26 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1dbExecute lets us execute any SQLite query, so we will be using it to create a table calledidcard.
To display the names of the tables in the database, we will usedbListTables.
dbExecute(conn, 'CREATE TABLE idcard (id int, name text)')dbListTables(conn)>>> 'cars''idcard'Let’s add a single row to theidcard table and usedbGetQuery to display results.
Note: dbGetQuery runs a query and returns the records whereasdbExecute runs SQL query but does not return any records.
dbExecute(conn, "INSERT INTO idcard (id,name)\ VALUES(1,'love')")dbGetQuery(conn,"SELECT * FROM idcard")id name1 loveWe will now add two more rows and display results by usingdbReadTable.
dbExecute(conn,"INSERT INTO idcard (id,name)\ VALUES(2,'Kill'),(3,'Game') ")dbReadTable(conn,'idcard')id name1 love2 Kill3 GamedbCreateTable lets us create a hassle-free table. It requires three arguments; connection, name of the table, and either a character vector or a data.frame. The character vector consists of names (column names) and values (types). In our case, we are going to provide a default populationdata.frame to create the initial structure.
dbCreateTable(conn,'population',population)dbReadTable(conn,'population')country year populationThen, we are going to usedbAppendTable to add values in thepopulation table.
dbAppendTable(conn,'population',head(population))dbReadTable(conn,'population')country year populationAfghanistan 1995 17586073Afghanistan 1996 18415307Afghanistan 1997 19021226Afghanistan 1998 19496836Afghanistan 1999 19987071Afghanistan 2000 20595360We will usedbGetQuery to perform all of our data analytics tasks. Let’s try to run a simple query and then learn more about other functions.
dbGetQuery(conn,"SELECT * FROM idcard")id name1 love2 Kill3 GameYou can also run a complex SQL query to filter horsepower and display limited rows and columns.
dbGetQuery(conn, "SELECT mpg,hp,gear\ FROM cars\ WHERE hp > 50\ LIMIT 5")mpg hp gear21.0 110 421.0 110 422.8 93 421.4 110 318.7 175 3To remove tables, usedbRemoveTable. As we can now see, we have successfully removed theidcard table.
dbRemoveTable(conn,'idcard')dbListTables(conn)>>> 'cars''population'To understand more about tables we will usedbListFields which will display the column names in a particular table.
dbListFields(conn, "cars")>>> 'mpg''cyl''disp''hp''drat''wt''qsec''vs''am''gear''carb'In this section, we will use dplyr to read tables and then run queries using filter, select, and collect. If you don’t want tolearn SQL syntax and want to perform all the tasks using pure R, then this method is for you. We have pulled thecars table, filtered it bygears andmpg, and then selected three columns as shown below.
cars_results <- tbl(conn, "cars") %>% filter(gear %in% c(4, 3), mpg >= 14, mpg <= 21) %>% select(mpg, hp, gear) %>% collect()cars_resultsmpg hp gear21.0 110 421.0 110 418.7 175 318.1 105 314.3 245 3... ... ...We can use the filtered data frame to display a boxplot graph usingggplot.
ggplot(cars_results,aes(fill=as.character(gear), y=mpg)) + geom_boxplot()
Or we can display a facet point plot divided by the number ofgears.
ggplot(cars_results, aes(mpg, ..count.. ) ) + geom_point(stat = "count", size = 4) + coord_flip()+ facet_grid( as.character(gear) ~ . )
In this tutorial, we have learned the importance of running SQL queries with Python and R, creating databases, adding tables, and performing data analysis using SQL queries. We have also learned how Pandas and dplyr help us run queries with a single line of code.
SQL is a must-learn skill for all tech-related jobs. If you are starting your career as a data analyst, we recommend completing theData Analyst with SQL Server career track within two months. This career track will teach you everything about SQL queries, servers, and managing resources.
You can run all the scripts used in this tutorial for free:

Related Python and SQL Courses
Course
Course
Tutorial
Tutorial
Tutorial

Tutorial
Tutorial
Tutorial