Movatterモバイル変換


[0]ホーム

URL:


Create a dm object from a database

James Wondrasek, Kirill Müller

2025-07-02

{dm} was designed to make connecting to and working with a relationaldatabase management system (RDBMS) as straightforward as possible. Tothis end, a dm object can be created from any database that has a {DBI} backend available (see list).

When a dm object is created via a DBI connection to an RDBMS, it canimport all the tables in the database, the active schema, or a limitedset. For some RDBMS, such as Postgres, SQL Server and MariaDB, primaryand foreign keys are also imported and do not have to be manually addedafterwards.

To demonstrate, we will connect to a relational dataset repository(https://relational.fel.cvut.cz/) with a database server that ispublicly accessible without registration. It hosts a financial dataset(https://relational.fel.cvut.cz/dataset/Financial) that contains loandata along with relevant account information and transactions. We chosethis dataset because the relationships betweenloan,account, andtransactions tables are a goodrepresentation of databases that record real-world businesstransactions.

Below, we open a connection to the publicly accessible databaseserver using their documented connection parameters. Connection detailsvary from database to database. Before connecting to your own RDBMS, youmay want to readvignette("DBI", package = "DBI") forfurther information.

library(RMariaDB)my_db<-dbConnect(MariaDB(),username ="guest",password ="ctu-relational",dbname ="Financial_ijs",host ="relational.fel.cvut.cz")

Creating a dm object takes a single call todm_from_con() with the DBI connection object as itsargument.

library(dm)my_dm<-dm_from_con(my_db)my_dm

The components of themy_dm object are lazy tablespowered by {dbplyr}.{dbplyr} translates the {dplyr} grammar of datamanipulation into queries the database server understands. Lazy tablesdefer downloading of table data until results are required for printingor local processing.

Building a dm from a subset of tables

A dm can also be constructed from individual tables or views. This isuseful for when you want to work with a subset of a database’s tables,perhaps from different schemas.

Below, we use the$ notation to extract two tables fromthe financial database. Then we create our dm by passing the tables inas arguments. Note that the tables arguments have to all be from thesame source, in this casemy_db.

dbListTables(my_db)library(dbplyr)loans<-tbl(my_db,"loans")accounts<-tbl(my_db,"accounts")my_manual_dm<-dm(loans, accounts)my_manual_dm

Defining Primary and Foreign Keys

Primary keys and foreign keys are how relational database tables arelinked with each other. A primary key is a column or column tuple thathas a unique value for each row within a table. A foreign key is acolumn or column tuple containing the primary key for a row in anothertable. Foreign keys act as cross references between tables. They specifythe relationships that gives us therelational database. Formore information on keys and a crash course on databases, seevignette("howto-dm-theory").

In many cases,dm_from_con() already returns a dm withall keys set. If not, dm allows us to define primary and foreign keysourselves. For this, we uselearn_keys = FALSE to obtain adm object with only the tables.

library(dm)fin_dm<-dm_from_con(my_db,learn_keys =FALSE)fin_dm

The model diagram(https://relational.fel.cvut.cz/assets/img/datasets-generated/financial.svg)provided by our test database loosely illustrates the intendedrelationships between tables. In the diagram, we can see that theloans table should be linked to theaccountstable. Below, we create those links in 3 steps:

  1. Add a primary keyid to theaccountstable
  2. Add a primary keyid to theloanstable
  3. Add a foreign keyaccount_id to theloanstable referencing theaccounts table

Then we assign colors to the tables and draw the structure of thedm.

Note that when the foreign key is created, the primary key in thereferenced table does not need to bespecified, but the primarykey must already bedefined. And, as mentioned above, primaryand foreign key constraints on the database are currently only importedfor Postgres, SQL Server databases and MariaDB, and only whendm_from_con() is used. This process of key definition needsto be done manually for other databases.

my_dm_keys<-  my_manual_dm%>%dm_add_pk(accounts, id)%>%dm_add_pk(loans, id)%>%dm_add_fk(loans, account_id, accounts)%>%dm_set_colors(green = loans,orange = accounts)my_dm_keys%>%dm_draw()

Once you have instantiated a dm object, you can continue to addtables to it. For tables from the original source for the dm, usedm()

trans<-tbl(my_db,"trans")my_dm_keys%>%dm(trans)

Serializing a dm object

A dm object is always linked to a database connection. Thisconnection is lost when the dm object is saved to disk, e.g., whensaving the workspace in R or in Posit Workbench, or when using knitrchunks:

unserialize(serialize(my_dm_keys,NULL))

The connection is tightly coupled with the tables in the dm objectand cannot be replaced. A practical solution is to define, for each dmobject your project uses, a function that recreates it using a newdatabase connection:

my_db_fun<-function() {dbConnect(MariaDB(),username ="guest",password ="ctu-relational",dbname ="Financial_ijs",host ="relational.fel.cvut.cz"  )}my_dm_fun<-function(my_db =my_db_fun()) {  loans<-tbl(my_db,"loans")  accounts<-tbl(my_db,"accounts")dm(loans, accounts)%>%dm_add_pk(accounts, id)%>%dm_add_pk(loans, id)%>%dm_add_fk(loans, account_id, accounts)%>%dm_set_colors(green = loans,orange = accounts)}

To avoid reconnecting and/or recreating every time you need a dmobject, you can usememoise::memoise() to memoize theconnection and/or dm functions.

Transient nature of operations

Like other R objects, a dm is immutable and all operations performedon it are transient unless stored in a new variable.

my_dm_keysmy_dm_trans<-  my_dm_keys%>%dm(trans)my_dm_trans

And, like {dbplyr}, results are never written to a database unlessexplicitly requested.

my_dm_keys%>%dm_flatten_to_tbl(loans)my_dm_keys%>%dm_flatten_to_tbl(loans)%>%sql_render()

Performing operations on tables by “zooming”

As the dm is a collection of tables, if we wish to perform operationson an individual table, we set it as the context for those operationsusingdm_zoom_to(). Seevignette("tech-dm-zoom") for more detail on zooming.

dm operations are transient unless persistence is explicitlyrequested. To make our chain of manipulations on the selected tablepermanent, we assign the result ofdm_insert_zoomed() to anew object,my_dm_total. This is a new dm object, derivedfrommy_dm_keys, with a new lazy tabletotal_loans linked to theaccounts table.

my_dm_total<-  my_dm_keys%>%dm_zoom_to(loans)%>%group_by(account_id)%>%summarize(total_amount =sum(amount,na.rm =TRUE))%>%ungroup()%>%dm_insert_zoomed("total_loans")

Context is set to the table “loans” usingdm_zoom_to(loans). You can learn more about zooming in thetutorialvignette("tech-dm-zoom"). We then use {dplyr} functions on the zoomedtable to generate a new summary table.

summarize() returns a temporary table with one row foreach group created by the precedinggroup_by() function.The columns in the temporary table are constrained to the columns passedas arguments to thegroup_by() function and the column(s)created by thesummarize() function.

dm_insert_zoomed("total_loans") adds the temporary tablecreated bysummarize() to the data model under a new name,total_loans. Because the grouping variableaccount_id is a primary key, the new derived table isautomatically linked to theaccounts table.

my_dm_total%>%dm_set_colors(violet = total_loans)%>%dm_draw()

The resulting tabletotal_loans can be accessed like anyother table in the dm object.

my_dm_total$total_loans

It is alazy table powered by the {dbplyr} package: the resultsare not materialized; instead, an SQL query is built and executed eachtime the data is requested.

my_dm_total$total_loans%>%sql_render()

Usecompute() on a zoomed table to materialize it to atemporary table and avoid recomputing. Seevignette("howto-dm-copy") for more details.

Downloading data

When it becomes necessary to move data locally for analysis orreporting, the {dm} methodcollect() is used. Operations ondm objects for databases are limited to report only the first tenresults.collect() forces the evaluation of all SQL queriesand the generation of the complete set of results. The resulting tablesare transferred from the RDBMS and stored as local tibbles.

my_dm_local<-  my_dm_total%>%collect()my_dm_local$total_loans

Use this method with caution. If you are not sure of the size of thedataset you will be downloading, you can calldm_nrow() onyourdm for the row count of your data model’s tables.

my_dm_total%>%dm_nrow()

Persisting results

It is just as simple to move a local relational model into an RDBMSas is usingcollect() to download it. The method used iscopy_dm_to() and it takes as arguments a databaseconnection and a dm object. In the example below, a local SQLitedatabase is used to demonstrate it, but {dm} is designed to work withany RDBMS supported by {DBI}.

destination_db<- DBI::dbConnect(RSQLite::SQLite())deployed_dm<-copy_dm_to(destination_db, my_dm_local)deployed_dmmy_dm_local

In the output, you can observe that thesrc fordeployed_dm is the SQLite database, while formy_dm_local the source is the local R environment.

Persisting tables are covered in more detail invignette("howto-dm-copy").

When done, do not forget to disconnect:

DBI::dbDisconnect(destination_db)DBI::dbDisconnect(my_db)

Conclusion

In this tutorial, we have demonstrated how simple it is to load adatabase into adm object and begin working with it.Currently, loading a dm from most RDBMS requires you to manually set keyrelations, but {dm} provides methods to make this straightforward. It isplanned that future versions of dm will support automatic key creationfor more RDBMS.

The next step is to readvignette("howto-dm-copy"),where copying your tables to and from an RDBMS is covered.vignette("howto-dm-rows") discusses manipulation ofindividual rows in a database.

Further reading

vignette("howto-dm-df") – Is your data in local dataframes? This article covers creating a data model from your local dataframes, including building the relationships in your data model,verifying your model, and leveraging the power of dplyr to operate onyour data model.

vignette("howto-dm-theory") – Do you know all about dataframes but very little about relational data models? This quickintroduction will walk you through the key similarities and differences,and show you how to move from individual data frames to a relationaldata model.


[8]ページ先頭

©2009-2025 Movatter.jp