dittodbdittodb is designed to make it easy and fun to testfunctions that interact with a database. It works by looking for mockresponses for each query you send while you run your tests and willseamlessly pretend that those mocks were provided by the databaseconnection without needing a connection at all.
mean_delays()To get started, imagine that we are working on a package that queriesa database that consists of thenycflights13data1. We have the following function which takesa column to aggregate by and returns a dataframe with that column andthe mean delay for groups based on the values in the column namegiven.
library(DBI)mean_delays<-function(group_col) { con<-dbConnect( RMariaDB::MariaDB(),dbname ="nycflights" )on.exit(dbDisconnect(con)) query<- glue::glue("SELECT {group_col}, AVG(arr_delay) as mean_delay from nycflights13.flights ","WHERE arr_delay > 0 GROUP BY {group_col}" )return(dbGetQuery(con, query))}library(DBI)mean_delays<-function(group_col) { con<-dbConnect( RPostgres::Postgres(),dbname ="nycflights" )on.exit(dbDisconnect(con)) query<- glue::glue("SELECT {group_col}, AVG(arr_delay) as mean_delay from nycflights13.flights ","WHERE arr_delay > 0 GROUP BY {group_col}" )return(dbGetQuery(con, query))}library(DBI)mean_delays<-function(group_col) { con<-dbConnect( RSQLite::SQLite(),dbname ="nycflights" )on.exit(dbDisconnect(con)) query<- glue::glue("SELECT {group_col}, AVG(arr_delay) as mean_delay from nycflights13.flights ","WHERE arr_delay > 0 GROUP BY {group_col}" )return(dbGetQuery(con, query))}If we give it the column"month", we get the followingdataframe:
#> month mean_delay#> 1 1 34.47749#> 2 2 33.68921#> 3 3 40.57166#> 4 4 42.73958#> 5 5 41.88586#> 6 6 53.73827#> 7 7 53.95152#> 8 8 39.51294#> 9 9 38.80555#> 10 10 29.03665#> 11 11 27.48459#> 12 12 39.72725Great, now that we have our function we want to test it to make sureit is operating as expected. Normally, we could write somethinglike:
library(testthat)test_that("mean_delays()", { out<-mean_delays("month")expect_named(out,c("month","mean_delay"))expect_equal(dim(out),c(12,2))})And this works just fine if we only ever run your tests locally, butif we want torunour tests with a Continuous Integration system (and yes, we want todo that!), this won’t work without first setting up our productiondatabase of flights. For our tests, we don’t actually need to connect tothe database and get new data (and, in fact, that would make some testsfail erroneously suddenly if the underlying changed). Instead, what wewant is to take a snapshot of what happens when running the test code,and then be able to use that snapshot when we run tests later. Thesesnapshots are frequently called fixtures (though you might hear peopleuse other names like stubs or mocks).
We can record fixtures of the database interactions with the commandsstart_db_capturing(), run the functions we want to record,and then stop recording withstop_db_capturing().
This will write a new folder (by default in./tests/testthat/) with the name of the database (here:nycflights) and then write one file with the nameSELECT-e53189.R which is the fixture for this example. ThisSELECT-* file contains the data that was received from thedatabase for use in tests.
with_mock_db()Now that we have a fixture, we can use that fixture by wrapping ourcall that includes a database interaction with the functionwith_mock_db(). This will look for fixtures and usethose.
with_mock_db(mean_delays("month"))#> month mean_delay#> 1 1 34.47749#> 2 2 33.68921#> 3 3 40.57166#> 4 4 42.73958#> 5 5 41.88586#> 6 6 53.73827#> 7 7 53.95152#> 8 8 39.51294#> 9 9 38.80555#> 10 10 29.03665#> 11 11 27.48459#> 12 12 39.72725So, now we can write our tests like:
library(testthat)library(dittodb)with_mock_db(test_that("mean_delays()", { out<-mean_delays("month")expect_named(out,c("month","mean_delay"))expect_equal(dim(out),c(12,2)) }))#> Test passed 🥇#> [1] TRUEWhen placed inside ofwith_mock_db(...) a call tomean_delays("month") will return what we saved as ourfixtureas if it had actually connected to the database withoutneeding the database to be installed, reachable, operational, or toexist at all anywhere.
If we wanted to test that a day-based aggregation works, we can!Although we will have to make a new fixture. First we would run thefollowing interactively:
This will create a new file (SELECT-16d120.R) whichcontains the response when aggregating by day. dittodb saves eachdatabase interaction with a hash of the query that is sent, so that anumber of different responses from a database can be saved and thecorrect one will be used when called inside ofwith_mock_db(...). So now, we could write our new testwith:
dittodbUse the functiondittodb::use_dittodb() to easily getstarted usingdittodb. It will adddittodb toSuggests in theDESCRIPTION file and addlibrary(dittodb) totests/testthat/helper.R.
There are a few things to be careful about when using dittodb.
dbConnect()Always calldbConnect() inside ofwith_mock_db(...). You can make as many calls as you wantto the mock database inside of awith_mock_db(...), but youshould always make sure that you connect to the database inside of andnot outside ofwith_mock_db(...). This is because when you“connect” to the mock database, a few variables are set that telldittodb where to look for mocks. It’s less important (though still agood idea) to calldbDisconnect() inside ofwith_mock_db(). This is also true when recording fixtureswithstart_db_recording(), you should start the recordingand then calldbConnect().
Recording fixtures saves the whole query to disk in a relativelyinefficient way (from a data storage perspective), so be careful withwhat you save. And you’ll want to not save extremely large results if atall possible. This is also a best-practice for writing tests anyway: youshould have mocks that are as minimal as possible to test thefunctionality you need to. Minimal mocks make it easier to change thingsthat aren’t relevant to the test (you don’t have to change the way datais represented if it’s not important to what you’re testing) and itmakes your tests run faster.
There are a number of advanced features that might be useful. Howeverthey take a bit of configuration to use.
You can control where mocks are read from (when you’re usingwith_mock_db(...)) as well as where they are written to(when usingstart_db_capturing()). To do this, use thefunctiondb_mock_paths().
You can see what paths are being used by callingdb_mock_paths() with no arguments. dittodb will look formocks in each path starting with the first one. When recording mocks,dittodb always uses the first path that is returned bydb_mock_paths().
You can add a new path by callingdb_mock_paths("new/path/here") which will add the pathprovided to the top of the list of paths to use.
Sometimes (much? most? of the time!) there is sensitive data in yourdatabase that you don’t actually want to put into your test fixtures.dittodb allows you to specify columns that should always beredacted by specifying them like so:
start_db_capturing(redact_columns = c("sensitive_column", "other_sensitive_column"))This will always redact the columns “sensitive_column” and“other_sensitive_column” every time a query is recorded that includeseither. The redactor replaces every value in the column with a standardvalue (for example “[redacted]” for characters,9 fornumerics,1988-10-11T17:00:00 for date times) seeredact_columns() for more information.
When we usestart_db_recording() to record fixtures, weare creating what some people call fixtures (though other terms forthese abound). These are files that are used during testing to representand provide some data or state necessary to execute the test. In thecase of dittodb, these files contain the data that dittodb uses when itpretends to be a live database. During recording, each query that issent to the database gets a unique identifier (the first 6 digits of thehash of the query) and when the response is received, that response issaved to a file with the first SQL (Structured Query Language) verb(e.g. SELECT), a dash, and the hash using thedput() function. This lets you craft a fixture that testsexactly what you need to without having extraneous rows or columns thatmight not be relevant.
You can save our own responses for queries by getting figuring outthe hash (the easiest way to do this now is to write the test that youwant to create a fixture for, run it and see the error message thatlooks something like “Couldn’t find the file nycflights/SELECT-16d120.Rin any of the mock directories.” and use the filename from there.) andthen saving the dataframe that you want the test to use with the commanddput(df, file = "nycflights/SELECT-16d120.R", control = c("all", "hexNumeric"))(if the dataframe you want to save isdf and we are usingthe path we saw in the example error message). And you’ve created yourown fixture!
You can also take the approach of recording fixtures and then editingthem manually to pare them down. The workflow for that would besomething like:
# read in the recorded fixturedf_fixt<-source("nycflights/SELECT-16d120.R",keep.source =FALSE)$value# filter out anything after february and all days after the 9th of the monthdf_fixt<- dplyr::filter(df_fixt, month<=2& day<10)# save the fixture for use in testsdput(df_fixt,file ="nycflights/SELECT-16d120.R",control =c("all","hexNumeric"))These code snippets won’t work right out of the box —you will need to make sure the database you’re connecting to has thenycflights13 data data in it. You could dothis with SQLite by runningnycflights13_create_sqlite(location = "nycflights") beforethe rest of the commands here.↩︎