Run Queries Safely
library(dplyr)library(DBI)library(dbplyr)library(glue)con<- DBI::dbConnect(RSQLite::SQLite(),path =":memory:")db_airports<- nycflights13::airports%>%select(-tzone)copy_to(con, db_airports,"airports",temporary =FALSE)We will review four options to run SQL commands safely using theDBI package:
SQL Injection Attack
ThedbGetQuery() command allows us to write queries and retrieve the results. The query has to be written using the SQL syntax that matches to the database type.
For example, here is a database that contains theairports data from NYC Flights data:
dbGetQuery(con,"SELECT * FROM airports LIMIT 5")Often you need to write queries that depend on user input. For example, you might want to allow the user to pick an airport to focus their analysis on. To do this, it’s tempting to create the SQL string yourself by pasting strings together:
airport_code<-"GPT"dbGetQuery(con,paste0("SELECT * FROM airports WHERE faa = '", airport_code ,"'"))Hereairport_code is created in the script, in real-life it might be an input typed into a Shiny app.
The problem with creating SQL strings withpaste0() is that a careful attacker can create inputs that return more rows than you want:
airport_code<-"GPT' or faa = 'MSY"dbGetQuery(con,paste0("SELECT * FROM airports WHERE faa = '", airport_code ,"'"))Or takedestructive actions on your database:
airport_code<-"GPT'; DROP TABLE 'airports"dbGetQuery(con,paste0("SELECT * FROM airports WHERE faa = '", airport_code ,"'"))This is calledSQL injection attack.
There are three ways to avoid this problem:
- Use a parameterised query with
dbSendQuery()anddbBind() - Use the
sqlInterpolate()function to safely combine a SQL string with data - Manually escape the inputs using
dbQuoteString()
These are ordered by the level of safety they provide: if you can usedbSendQuery() anddbBind(), you should.
Parameterized queries
All modern database engines provide a way to writeparameterised queries, queries that contain some placeholder that allows you to re-run the query multiple times with different inputs. This protects you from SQL injection attacks, and as an added benefit, the database can often optimise the query so it runs faster.
Using a parameterised query with DBI requires three steps.
You create a query containing a
?placeholder and send it to the database withdbSendQuery():airport<-dbSendQuery(con,"SELECT * FROM airports WHERE faa = ?")Use
dbBind()to execute the query with specific values, thendbFetch()to get the results:dbBind(airport,list("GPT"))dbFetch(airport)Once you’re done using the parameterised query, clean it up by calling
dbClearResult()dbClearResult(airport)
Usingglue_sql()
Parameterized queries are generally the safest and most efficient way to pass user defined values in a query, however not every database driver supports them. The functionglue_sql(), part of theglue package, is able to handle the SQL quoting and variable placement.
library(glue)airport_sql<-glue_sql("SELECT * FROM airports WHERE faa = ?")airport<-dbSendQuery(con, airport_sql)dbBind(airport,list("GPT"))dbFetch(airport)dbClearResult(airport)If you place an asterisk* at the end of a glue expression the values will be collapsed with commas. This is useful for the SQL IN Operator for instance.
airport_sql<-glue_sql("SELECT * FROM airports WHERE faa IN ({airports*})",airports =c("GPT","MSY"),.con = con )airport<-dbSendQuery(con, airport_sql)dbFetch(airport)dbClearResult(airport)Interpolation by “hand”
While all modern databases support parameterised queries, they are not always supported in individual database drivers. If you find thatdbBind() doesn’t work with the database connector you are using, you can fall back onsqlInterpolate(), which will safely do the interpolation for you.
airport_code<-"GPT"sql<-sqlInterpolate(con,"SELECT * FROM airports where faa = ?code",code = airport_code)sqldbGetQuery(con, sql)The query returns no records if we try the same SQL injection attack:
airport_code<-"GPT' or faa = 'MSY"sql<-sqlInterpolate(con,"SELECT * FROM airports where faa = ?code",code = airport_code)sqldbGetQuery(con, sql)Manual escaping
Sometimes you can’t create the SQL you want using either of the previous methods. If you’re in this unhappy situation, first make absolutely sure that you haven’t missed an existing DBI helper function that does what you need. You need to be extremely careful when doing the escaping yourself, and it’s better to rely on existing code that multiple people have carefully reviewed.
However, if there’s no other way around it, you can usedbQuoteString() to add the quotes for you. This method will automatically take care of dangerous characters in the same way assqlInterpolate() (better) anddbBind() (best).
airport_code<-"GPT' or faa = 'MSY"sql<-paste0("SELECT * FROM airports WHERE faa = ",dbQuoteString(con, airport_code))sqldbGetQuery(con, sql)You may also needdbQuoteIdentifier() if you are creating tables or relying on user input to choose which column to filter on.