Movatterモバイル変換

dittodb is designed to make it easy and fun to testfunctions that interact with a database. It works by looking for mockresponses for each query you send while you run your tests and willseamlessly pretend that those mocks were provided by the databaseconnection without needing a connection at all.

RMariaDB

library(DBI)mean_delays<-function(group_col) {  con<-dbConnect(    RMariaDB::MariaDB(),dbname ="nycflights"  )on.exit(dbDisconnect(con))  query<- glue::glue("SELECT {group_col}, AVG(arr_delay) as mean_delay from nycflights13.flights ","WHERE arr_delay > 0 GROUP BY {group_col}"  )return(dbGetQuery(con, query))}

RPostgres

library(DBI)mean_delays<-function(group_col) {  con<-dbConnect(    RPostgres::Postgres(),dbname ="nycflights"  )on.exit(dbDisconnect(con))  query<- glue::glue("SELECT {group_col}, AVG(arr_delay) as mean_delay from nycflights13.flights ","WHERE arr_delay > 0 GROUP BY {group_col}"  )return(dbGetQuery(con, query))}

RSQLite

library(DBI)mean_delays<-function(group_col) {  con<-dbConnect(    RSQLite::SQLite(),dbname ="nycflights"  )on.exit(dbDisconnect(con))  query<- glue::glue("SELECT {group_col}, AVG(arr_delay) as mean_delay from nycflights13.flights ","WHERE arr_delay > 0 GROUP BY {group_col}"  )return(dbGetQuery(con, query))}

If we give it the column"month", we get the followingdataframe:

mean_delays("month")

#>    month mean_delay#> 1      1   34.47749#> 2      2   33.68921#> 3      3   40.57166#> 4      4   42.73958#> 5      5   41.88586#> 6      6   53.73827#> 7      7   53.95152#> 8      8   39.51294#> 9      9   38.80555#> 10    10   29.03665#> 11    11   27.48459#> 12    12   39.72725

Great, now that we have our function we want to test it to make sureit is operating as expected. Normally, we could write somethinglike:

library(testthat)test_that("mean_delays()", {  out<-mean_delays("month")expect_named(out,c("month","mean_delay"))expect_equal(dim(out),c(12,2))})

And this works just fine if we only ever run your tests locally, butif we want torunour tests with a Continuous Integration system (and yes, we want todo that!), this won’t work without first setting up our productiondatabase of flights. For our tests, we don’t actually need to connect tothe database and get new data (and, in fact, that would make some testsfail erroneously suddenly if the underlying changed). Instead, what wewant is to take a snapshot of what happens when running the test code,and then be able to use that snapshot when we run tests later. Thesesnapshots are frequently called fixtures (though you might hear peopleuse other names like stubs or mocks).

Recording fixtures

We can record fixtures of the database interactions with the commandsstart_db_capturing(), run the functions we want to record,and then stop recording withstop_db_capturing().

start_db_capturing()out<-mean_delays("month")stop_db_capturing()

This will write a new folder (by default in./tests/testthat/) with the name of the database (here:nycflights) and then write one file with the nameSELECT-e53189.R which is the fixture for this example. ThisSELECT-* file contains the data that was received from thedatabase for use in tests.

`with_mock_db()`

Now that we have a fixture, we can use that fixture by wrapping ourcall that includes a database interaction with the functionwith_mock_db(). This will look for fixtures and usethose.

with_mock_db(mean_delays("month"))#>    month mean_delay#> 1      1   34.47749#> 2      2   33.68921#> 3      3   40.57166#> 4      4   42.73958#> 5      5   41.88586#> 6      6   53.73827#> 7      7   53.95152#> 8      8   39.51294#> 9      9   38.80555#> 10    10   29.03665#> 11    11   27.48459#> 12    12   39.72725

So, now we can write our tests like:

library(testthat)library(dittodb)with_mock_db(test_that("mean_delays()", {    out<-mean_delays("month")expect_named(out,c("month","mean_delay"))expect_equal(dim(out),c(12,2))  }))#> Test passed 🥇#> [1] TRUE

When placed inside ofwith_mock_db(...) a call tomean_delays("month") will return what we saved as ourfixtureas if it had actually connected to the database withoutneeding the database to be installed, reachable, operational, or toexist at all anywhere.

If we wanted to test that a day-based aggregation works, we can!Although we will have to make a new fixture. First we would run thefollowing interactively:

start_db_capturing()out<-mean_delays("day")stop_db_capturing()

This will create a new file (SELECT-16d120.R) whichcontains the response when aggregating by day. dittodb saves eachdatabase interaction with a hash of the query that is sent, so that anumber of different responses from a database can be saved and thecorrect one will be used when called inside ofwith_mock_db(...). So now, we could write our new testwith:

with_mock_db(test_that("mean_delays()", {    out<-mean_delays("day")expect_named(out,c("day","mean_delay"))expect_equal(dim(out),c(31,2))  }))#> Test passed 🥇#> [1] TRUE

Things to be careful about

There are a few things to be careful about when using dittodb.

When to call`dbConnect()`

Always calldbConnect() inside ofwith_mock_db(...). You can make as many calls as you wantto the mock database inside of awith_mock_db(...), but youshould always make sure that you connect to the database inside of andnot outside ofwith_mock_db(...). This is because when you“connect” to the mock database, a few variables are set that telldittodb where to look for mocks. It’s less important (though still agood idea) to calldbDisconnect() inside ofwith_mock_db(). This is also true when recording fixtureswithstart_db_recording(), you should start the recordingand then calldbConnect().

Query size

Recording fixtures saves the whole query to disk in a relativelyinefficient way (from a data storage perspective), so be careful withwhat you save. And you’ll want to not save extremely large results if atall possible. This is also a best-practice for writing tests anyway: youshould have mocks that are as minimal as possible to test thefunctionality you need to. Minimal mocks make it easier to change thingsthat aren’t relevant to the test (you don’t have to change the way datais represented if it’s not important to what you’re testing) and itmakes your tests run faster.

Advanced uses

There are a number of advanced features that might be useful. Howeverthey take a bit of configuration to use.

Specify a new path

You can control where mocks are read from (when you’re usingwith_mock_db(...)) as well as where they are written to(when usingstart_db_capturing()). To do this, use thefunctiondb_mock_paths().

You can see what paths are being used by callingdb_mock_paths() with no arguments. dittodb will look formocks in each path starting with the first one. When recording mocks,dittodb always uses the first path that is returned bydb_mock_paths().

You can add a new path by callingdb_mock_paths("new/path/here") which will add the pathprovided to the top of the list of paths to use.

Redacting

Sometimes (much? most? of the time!) there is sensitive data in yourdatabase that you don’t actually want to put into your test fixtures.dittodb allows you to specify columns that should always beredacted by specifying them like so:

start_db_capturing(redact_columns = c("sensitive_column", "other_sensitive_column"))

This will always redact the columns “sensitive_column” and“other_sensitive_column” every time a query is recorded that includeseither. The redactor replaces every value in the column with a standardvalue (for example “[redacted]” for characters,9 fornumerics,1988-10-11T17:00:00 for date times) seeredact_columns() for more information.

You, too, can write a fixture!

When we usestart_db_recording() to record fixtures, weare creating what some people call fixtures (though other terms forthese abound). These are files that are used during testing to representand provide some data or state necessary to execute the test. In thecase of dittodb, these files contain the data that dittodb uses when itpretends to be a live database. During recording, each query that issent to the database gets a unique identifier (the first 6 digits of thehash of the query) and when the response is received, that response issaved to a file with the first SQL (Structured Query Language) verb(e.g. SELECT), a dash, and the hash using thedput() function. This lets you craft a fixture that testsexactly what you need to without having extraneous rows or columns thatmight not be relevant.

You can save our own responses for queries by getting figuring outthe hash (the easiest way to do this now is to write the test that youwant to create a fixture for, run it and see the error message thatlooks something like “Couldn’t find the file nycflights/SELECT-16d120.Rin any of the mock directories.” and use the filename from there.) andthen saving the dataframe that you want the test to use with the commanddput(df, file = "nycflights/SELECT-16d120.R", control = c("all", "hexNumeric"))(if the dataframe you want to save isdf and we are usingthe path we saw in the example error message). And you’ve created yourown fixture!

You can also take the approach of recording fixtures and then editingthem manually to pare them down. The workflow for that would besomething like:

# read in the recorded fixturedf_fixt<-source("nycflights/SELECT-16d120.R",keep.source =FALSE)$value# filter out anything after february and all days after the 9th of the monthdf_fixt<- dplyr::filter(df_fixt, month<=2& day<10)# save the fixture for use in testsdput(df_fixt,file ="nycflights/SELECT-16d120.R",control =c("all","hexNumeric"))

Movatterモバイル変換

Getting Started with`dittodb`

Jonathan Keane

2025-08-18

Our function`mean_delays()`

RMariaDB

RPostgres

RSQLite

Recording fixtures

`with_mock_db()`

Getting setup to use`dittodb`

Things to be careful about

When to call`dbConnect()`

Query size

Advanced uses

Specify a new path

Redacting

You, too, can write a fixture!

Movatterモバイル変換

Getting Started withdittodb

Jonathan Keane

2025-08-18

Our functionmean_delays()

RMariaDB

RPostgres

RSQLite

Recording fixtures

with_mock_db()

Getting setup to usedittodb

Things to be careful about

When to calldbConnect()

Query size

Advanced uses

Specify a new path

Redacting

You, too, can write a fixture!

Getting Started with`dittodb`

Our function`mean_delays()`

`with_mock_db()`

Getting setup to use`dittodb`

When to call`dbConnect()`