In this article, we’re going show you how easy it is to move fromconnecting to the database holding your data to producing the resultsyou need. It’s meant to be a quick and friendly introduction to {dm}, soit is low on details and caveats. Links to detailed documentation areprovided at the end. (If your data is in data frames instead of adatabase and you’re in a hurry, jump over tovignette("howto-dm-df").)
dm objects can be created from individual tables or loaded directlyfrom a relational data model on an RDBMS (relational database managementsystem).
For this demonstration, we’re going to work with a model hosted on apublic server. The first thing we need is a connection to the RDBMShosting the data.
library(RMariaDB)fin_db<-dbConnect(MariaDB(),username ="guest",password ="ctu-relational",dbname ="Financial_ijs",host ="relational.fel.cvut.cz")We create a dm object from an RDBMS usingdm_from_con(),passing in the connection object we just created as the firstargument.
The dm object interrogates the RDBMS for table and columninformation, and primary and foreign keys. Currently, primary andforeign keys are only available from SQL Server, Postgres andMariaDB.
The dm object can be accessed like a named list of tables:
Additionally, mostdm functions arepipe-friendly and supporttidyevaluation. We can use[ or thedm_select_tbl() verb to derive a smaller dm with theloans,accounts,districts andtrans tables:
In many cases,dm_from_con() already returns a dm withall keys set. If not, dm allows us to define primary and foreign keysourselves. For this, we uselearn_keys = FALSE to obtain adm object with only the tables.
library(dm)fin_dm_small<-dm_from_con(fin_db,learn_keys =FALSE)%>%dm_select_tbl(loans, accounts, districts, trans)In our data model,id columns uniquely identify recordsin theaccounts andloans tables, and was usedas a primary key. A primary key is defined withdm_add_pk(). Each loan is linked to one account via theaccount_id column in theloans table, therelationship is established withdm_add_fk().
Having a diagram of the data model is the quickest way to verifywe’re on the right track. We can display a visual summary of the dm atany time. The default is to display the table name, any defined keys,and their links to other tables.
Visualizing the dm in its current state, we can see the keys we havecreated and how they link the tables together. Color guides the eye.
If we want to perform modeling or analysis on this relational model,we need to transform it into a tabular format that R functions can workwith. With the argumentrecursive = TRUE,dm_flatten_to_tbl() will automatically follow foreign keysacross tables to gather all the available columns into a singletable.
Apart from the rows printed above, no data has been fetched from thedatabase. Useselect() to reduce the number of columnsfetched, andcollect() to retrieve the entire result forlocal processing.
We don’t need to take the extra step of exporting the data to workwith it. Through the dm object, we have complete access to dplyr’s datamanipulation verbs. These operate on the data within individualtables.
To work with a particular table we usedm_zoom_to() toset the context to our chosen table. Then we can perform any of thedplyr operations we want.
fin_dm_total<- fin_dm_keys%>%dm_zoom_to(loans)%>%group_by(account_id)%>%summarize(total_amount =sum(amount,na.rm =TRUE))%>%ungroup()%>%dm_insert_zoomed("total_loans")fin_dm_total$total_loansNote that, in the above example, we usedm_insert_zoomed() to add the results as a new table to ourdata model. This table is temporary and will be deleted when our sessionends. If you want to make permanent changes to your data model on anRDBMS, please see the “Persisting results” section invignette("howto-dm-db").
It’s always smart to check that your data model follows itsspecifications. When building our own model or changing existing modelsby adding tables or keys, it is even more important that the new modelis validated.
dm_examine_constraints() checks all primary and foreignkeys and reports if they violate their expected constraints.
For more on constraint checking, including cardinality, findingcandidate columns for keys, and normalization, seevignette("tech-dm-low-level").
Now that you have been introduced to the basic operation of dm, thenext step is to learn more about the dm methods that your particular usecase requires.
Is your data in an RDBMS? Then move on tovignette("howto-dm-db") for a more detailed look at workingwith an existing relational data model.
If your data is in data frames, then you want to readvignette("howto-dm-df") next.
If you would like to know more about relational data models in orderto get the most out of dm, check outvignette("howto-dm-theory").
If you’re familiar with relational data models, but want to know howto work with them in dm, then any ofvignette("tech-dm-join"),vignette("tech-dm-filter"), orvignette("tech-dm-zoom") is a good next step.
The {dm} package follows the tidyverse principles:
dm objects are immutable (your data will never beoverwritten in place)dm objects are pipeable (i.e.,return newdm or table objects)The {dm} package builds heavily upon the{datamodelr} package,and upon thetidyverse. We’relooking forward to a good collaboration!
The{polyply}package has a similar intent with a slightly differentinterface.
The{data.cube}package has quite the same intent usingarray-likeinterface.
Articles in the{rquery} package discussjoincontrollers andjoindependency sorting, with the intent to move the declaration of tablerelationships from code to data.
The{tidygraph}package stores a network as two related tables ofnodesandedges, compatible with {dplyr} workflows.
In object-oriented programming languages,object-relationalmapping is a similar concept that attempts to map a set of relatedtables to a class hierarchy.