Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/dmPublic

Working with relational data models in R

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

cynkra/dm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Lifecycle: stableR build statusCodecov test coverageCRAN statusLaunch rstudio.cloud

Are you using multiple data frames or database tables in R? Organizethem with dm.

  • Use it for data analysis today.
  • Build data models tomorrow.
  • Deploy the data models to your organization’s Relational DatabaseManagement System (RDBMS) the day after.

Overview

dm bridges the gap in the data pipeline between individual data framesand relational databases. It’s a grammar of joined tables that providesa consistent set of verbs for consuming, creating, and deployingrelational data models. For individual researchers, it broadens thescope of datasets they can work with and how they work with them. Fororganizations, it enables teams to quickly and efficiently create andshare large, complex datasets.

dm objects encapsulate relational data models constructed from localdata frames or lazy tables connected to an RDBMS. dm objects support thefull suite of dplyr data manipulation verbs along with additionalmethods for constructing and verifying relational data models, includingkey selection, key creation, and rigorous constraint checking. Once adata model is complete, dm provides methods for deploying it to anRDBMS. This allows it to scale from datasets that fit in memory todatabases with billions of rows.

Features

dm makes it easy to bring an existing relational data model into your Rsession. As the dm object behaves like a named list of tables itrequires little change to incorporate it within existing workflows. Thedm interface and behavior is modeled after dplyr, so you may already befamiliar with many of its verbs. dm also offers:

  • visualization to help you understand relationships between entitiesrepresented by the tables
  • simpler joins that “know” how tables are related, including a“flatten” operation that automatically follows keys and performscolumn name disambiguation
  • consistency and constraint checks to help you understand (and fix) thelimitations of your data

That’s just the tip of the iceberg. SeeGettingstarted to hit the groundrunning and explore all the features.

Installation

The latest stable version of the {dm} package can be obtained fromCRAN with the command

install.packages("dm")

The latest development version of {dm} can be installed from R-universe:

# Enable repository from cynkraoptions(repos= c(cynkra="https://cynkra.r-universe.dev",CRAN="https://cloud.r-project.org"  ))# Download and install dm in Rinstall.packages('dm')

or from GitHub:

# install.packages("devtools")devtools::install_github("cynkra/dm")

Usage

Create a dm object (seeGettingstarted for details).

library(dm)dm<- dm_nycflights13(table_description=TRUE)dm#> ── Metadata ────────────────────────────────────────────────────────────────────#> Tables: `airlines`, `airports`, `flights`, `planes`, `weather`#> Columns: 53#> Primary keys: 4#> Foreign keys: 4

dm is a named list of tables:

names(dm)#> [1] "airlines" "airports" "flights"  "planes"   "weather"nrow(dm$airports)#> [1] 86dm$flights %>%  count(origin)#> # A tibble: 3 × 2#>   origin     n#>   <chr>  <int>#> 1 EWR      641#> 2 JFK      602#> 3 LGA      518

Visualize relationships at any time:

dm %>%  dm_draw()

Simple joins:

dm %>%  dm_flatten_to_tbl(flights)#> Renaming ambiguous columns: %>%#>   dm_rename(flights, year.flights = year) %>%#>   dm_rename(flights, month.flights = month) %>%#>   dm_rename(flights, day.flights = day) %>%#>   dm_rename(flights, hour.flights = hour) %>%#>   dm_rename(airlines, name.airlines = name) %>%#>   dm_rename(airports, name.airports = name) %>%#>   dm_rename(planes, year.planes = year) %>%#>   dm_rename(weather, year.weather = year) %>%#>   dm_rename(weather, month.weather = month) %>%#>   dm_rename(weather, day.weather = day) %>%#>   dm_rename(weather, hour.weather = hour)#> # A tibble: 1,761 × 48#>    year.flights month.…¹ day.f…² dep_t…³ sched…⁴ dep_d…⁵ arr_t…⁶ sched…⁷ arr_d…⁸#>           <int>    <int>   <int>   <int>   <int>   <dbl>   <int>   <int>   <dbl>#>  1         2013        1      10       3    2359       4     426     437     -11#>  2         2013        1      10      16    2359      17     447     444       3#>  3         2013        1      10     450     500     -10     634     648     -14#>  4         2013        1      10     520     525      -5     813     820      -7#>  5         2013        1      10     530     530       0     824     829      -5#>  6         2013        1      10     531     540      -9     832     850     -18#>  7         2013        1      10     535     540      -5    1015    1017      -2#>  8         2013        1      10     546     600     -14     645     709     -24#>  9         2013        1      10     549     600     -11     652     724     -32#> 10         2013        1      10     550     600     -10     649     703     -14#> # ℹ 1,751 more rows#> # ℹ abbreviated names: ¹​month.flights, ²​day.flights, ³​dep_time,#> #   ⁴​sched_dep_time, ⁵​dep_delay, ⁶​arr_time, ⁷​sched_arr_time, ⁸​arr_delay#> # ℹ 39 more variables: carrier <chr>, flight <int>, tailnum <chr>,#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,#> #   hour.flights <dbl>, minute <dbl>, time_hour <dttm>, name.airlines <chr>,#> #   name.airports <chr>, lat <dbl>, lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>,#> #   tzone <chr>, year.planes <int>, type <chr>, manufacturer <chr>,#> #   model <chr>, engines <int>, seats <int>, speed <int>, engine <chr>,#> #   year.weather <int>, month.weather <int>, day.weather <int>,#> #   hour.weather <int>, temp <dbl>, dewp <dbl>, humid <dbl>, wind_dir <dbl>,#> #   wind_speed <dbl>, wind_gust <dbl>, precip <dbl>, pressure <dbl>, …

Check consistency:

dm %>%  dm_examine_constraints()#> ! Unsatisfied constraints:#> • Table `flights`: foreign key `tailnum` into table `planes`: values of `flights$tailnum` not in `planes$tailnum`: N725MQ (6), N537MQ (5), N722MQ (5), N730MQ (5), N736MQ (5), …

Learn more in theGettingstarted article.

Getting help

If you encounter a clear bug, please file an issue with a minimalreproducible example onGitHub.For questions and other discussion, please usecommunity.rstudio.com.


License: MIT © cynkra GmbH.

Funded by:

energie360°cynkra


Please note that the ‘dm’ project is released with aContributor Codeof Conduct. By contributingto this project, you agree to abide by its terms.

About

Working with relational data models in R

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors43


[8]ページ先頭

©2009-2025 Movatter.jp