The goal of the {dm} package and thedm class that comeswith it, is to make your life easier when you are dealing with data fromseveral different tables.
Let’s take a look at thedm class.
dmThedm class consists of a collection of tables andmetadata about the tables, such as
All tables in adm must be obtained from the same datasource; csv files and spreadsheets would need to be imported to dataframes in R.
dm objectsThere are currently three options available for creating adm object. The relevant functions for creatingdm objects are:
dm()as_dm()new_dm()dm_from_con()To illustrate these options, we will now create the samedm in several different ways. We can use the tables fromthe well-known {nycflights13} package.
Create adm object directly by providing data frames todm():
dmStart with an emptydm object that has been created withdm() ornew_dm(), and add tables to thatobject:
Turn a named list of tables into adm withas_dm():
src into admSqueeze all (or a subset of) tables belonging to asrcobject into adm usingdm_from_con():
The functiondm_from_con(con, table_names = NULL)includes all available tables on a source in thedm object.This means that you can use this, for example, on a postgres databasethat you access viaDBI::dbConnect(RPostgres::Postgres())(with the appropriate argumentsdbname,host,port, …), to produce adm object with all thetables on the database.
Another way of creating adm object is callingnew_dm() on a list oftbl objects:
base_dm<-new_dm(list(airlines = airlines,airports = airports,flights = flights,planes = planes,weather = weather))base_dmThis constructor is optimized for speed and does not performintegrity checks. Use with caution, validate usingdm_validate() if necessary.
We can get the list of tables withdm_get_tables() andthesrc object withdm_get_con().
In order to pull a specific table from adm, use:
But how can we use {dm}-functions to manage the primary keys of thetables in adm object?
dm objectsSome useful functions for managing primary key settings are:
dm_add_pk()dm_get_all_pks()dm_rm_pk()dm_enum_pk_candidates()If you created adm object according to the examples in“Examples ofdm objects”, your objectdoes not yet have any primary keys set. So let’s add one.
We use thenycflights13 tables,i.e. flights_dm from above.
dm_has_pk(flights_dm, airports)flights_dm_with_key<-dm_add_pk(flights_dm, airports, faa)flights_dm_with_keyThedm now has a primary key:
To get an overview over all tables with primary keys, usedm_get_all_pks():
Remove a primary key:
If you still need to get to know your data better, and it is alreadyavailable in the form of adm object, you can use thedm_enum_pk_candidates() function in order to getinformation about which columns of the table are unique keys:
Theflights table does not have any one-column primarykey candidates:
dm_add_pk() has acheck argument. If set toTRUE, the function checks if the column of the table givenby the user is unique. For performance reasons, the default ischeck = FALSE. See also [dm_examine_constraints()] forchecking all constraints in adm.
Useful functions for managing foreign key relations include:
dm_add_fk()dm_get_all_fks()dm_rm_fk()dm_enum_fk_candidates()Now it gets (even more) interesting: we want to define relationsbetween different tables. With thedm_add_fk() function youcan define which column of which table points to another table’scolumn.
This is done by choosing a foreign key from one table that will pointto a primary key of another table. The primary key of the referred tablemust be set withdm_add_pk().dm_add_fk() willfind the primary key column of the referenced table by itself and makethe indicated column of the child table point to it.
This will throw an error:
Let’s create adm object with a foreign key relation towork with later on:
What if we tried to add another foreign key relation fromflights toairports to the object? Columndest might work, since it also contains airport codes:
Checks are opt-in and executed only ifcheck = TRUE. Youcan still add a foreign key with the defaultcheck = FALSE.See alsodm_examine_constraints() for checking allconstraints in adm.
Get an overview of all foreign key relationswithdm_get_all_fks():
Remove foreign key relations withdm_rm_fk() (parametercolumns = NULL means that all relations will be removed,with a message):
try( flights_dm_with_fk%>%dm_rm_fk(table = flights,column = dest,ref_table = airports)%>%dm_get_all_fks(c(flights, airports)))flights_dm_with_fk%>%dm_rm_fk(flights, origin, airports)%>%dm_get_all_fks(c(flights, airports))flights_dm_with_fk%>%dm_rm_fk(flights,columns =NULL, airports)%>%dm_get_all_fks(c(flights, airports))Since the primary keys are defined in thedm object, youdo not usually need to provide the referenced column name ofref_table.
Another function for getting to know your data better(cf. dm_enum_pk_candidates() in“Primary keysofdm objects”) isdm_enum_fk_candidates(). Use it to get an overview overforeign key candidates that point from one table to another: