Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
forked fromcynkra/dm

Working with relational data models in R

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

mgirlich/dm

 
 

Repository files navigation

Lifecycle: maturingR build statusCodecov test coverageCRAN statusLaunch rstudio.cloud

TL;DR

Are you using multiple data frames or database tables in R? Organize them with dm.

  • Use it today (if only like a list of tables).
  • Build data models tomorrow.
  • Deploy the data models to your organization’s RDBMS the day after.

Overview

dm bridges the gap in the data pipeline between individual data frames and relational databases. It’s a grammar of joined tables that provides a consistent set of verbs for consuming, creating, and deploying relational data models. For individual researchers, it broadens the scope of datasets they can work with and how they work with them. For organizations, it enables teams to quickly and efficiently create and share large, complex datasets.

dm objects encapsulate relational data models constructed from local data frames or lazy tables connected to an RDBMS. dm objects support the full suite of dplyr data manipulation verbs along with additional methods for constructing and verifying relational data models, including key selection, key creation, and rigorous constraint checking. Once a data model is complete, dm provides methods for deploying it to an RDBMS. This allows it to scale from datasets that fit in memory to databases with billions of rows.

Features

dm makes it easy to bring an existing relational data model into your R session. As the dm object behaves like a named list of tables it requires little change to incorporate it within existing workflows. The dm interface and behavior is modeled after dplyr, so you may already be familiar with many of its verbs. dm also offers:

  • visualization to help you understand relationships between entities represented by the tables
  • simpler joins that “know” how tables are related, including a “flatten” operation that automatically follows keys and performs column name disambiguation
  • consistency and constraint checks to help you understand (and fix) the limitations of your data

That’s just the tip of the iceberg. SeeGetting started to hit the ground running and explore all the features.

Installation

The latest stable version of the {dm} package can be obtained fromCRAN with the command

install.packages("dm")

The latest development version of {dm} can be installed from GitHub.

# install.packages("devtools")devtools::install_github("cynkra/dm")

Usage

Create a dm object (seeGetting started for details).

library(dm)dm<-dm_nycflights13()dm#>──Metadata────────────────────────────────────────────────────────────────────#> Tables: `airlines`, `airports`, `flights`, `planes`, `weather`#> Columns: 53#> Primary keys: 3#> Foreign keys: 3

dm is a named list of tables:

names(dm)#> [1] "airlines" "airports" "flights"  "planes"   "weather"nrow(dm$airports)#> [1] 1458dm$flights%>%count(origin)#># A tibble: 3 x 2#>originn#>*<chr><int>#>1 EWR4043#>2 JFK3661#>3 LGA3523

Visualize relationships at any time:

dm%>%dm_draw()

Simple joins:

dm%>%dm_flatten_to_tbl(flights)#> Renamed columns:#> * year -> flights.year, planes.year#> * name -> airlines.name, airports.name#># A tibble: 11,227 x 35#>flights.yearmonthdaydep_timesched_dep_timedep_delayarr_time#><int><int><int><int><int><dbl><int>#> 12013     1    10        32359         4      426#> 22013     1    10       162359        17      447#> 32013     1    10      450            500       -10      634#> 42013     1    10      520            525        -5      813#> 52013     1    10      530            530         0      824#> 62013     1    10      531            540        -9      832#> 72013     1    10      535            540        -51015#> 82013     1    10      546            600       -14      645#> 92013     1    10      549            600       -11      652#>102013     1    10      550            600       -10      649#># … with 11,217 more rows, and 28 more variables:sched_arr_time<int>,#>#arr_delay<dbl>,carrier<chr>,flight<int>,tailnum<chr>,origin<chr>,#>#dest<chr>,air_time<dbl>,distance<dbl>,hour<dbl>,minute<dbl>,#>#time_hour<dttm>,airlines.name<chr>,airports.name<chr>,lat<dbl>,#>#lon<dbl>,alt<dbl>,tz<dbl>,dst<chr>,tzone<chr>,planes.year<int>,#>#type<chr>,manufacturer<chr>,model<chr>,engines<int>,seats<int>,#>#speed<int>,engine<chr>

Check consistency:

dm%>%dm_examine_constraints()#>! Unsatisfied constraints:#> Table `flights`: foreign key tailnum into table `planes`: 1640 entries (14.6%) of `flights$tailnum` not in `planes$tailnum`: N722MQ (27), N725MQ (20), N520MQ (19), N723MQ (19), N508MQ (16), …

Learn more in theGetting started article.

Getting help

If you encounter a clear bug, please file an issue with a minimal reproducible example onGitHub. For questions and other discussion, please usecommunity.rstudio.com.


License: MIT © cynkra GmbH.

Funded by:

energie360°cynkra


Please note that the ‘dm’ project is released with aContributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

About

Working with relational data models in R

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R99.7%
  • Other0.3%

[8]ページ先頭

©2009-2025 Movatter.jp