Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

randrescastaneda/joyn

Repository files navigation

CRAN checks

joyn empowers you to assess the results of joining data frames, makingit easier and more efficient to combine your tables. Similar inphilosophy to themerge command inStata,joyn offers matching keyvariables and detailed join reports to ensure accurate and insightfulresults.

Motivation

Merging tables in R can be tricky. Ensuring accuracy and understandingthe joined data fully can be tedious tasks. That’s wherejoyn comesin. Inspired by Stata’s informative approach to merging,joyn makesthe process smoother and more insightful.

While standard R merge functions are powerful, they often lack featureslike assessing join accuracy, detecting potential issues, and providingdetailed reports.joyn fills this gap by offering:

  • Intuitive join handling: Whether you’re dealing with one-to-one,one-to-many, or many-to-many relationships,joyn helps you navigatethem confidently.
  • Informative reports: Get clear insights into the join process withhelpful reports that identify duplicate observations, missing values,and potential inconsistencies.

What makesjoyn special?

While standard R merge functions offer basic functionality,joyn goesabove and beyond by providing comprehensive tools and features tailoredto your data joining needs:

1. Flexibility in join types: Choose your ideal join type (“left”,“right”, or “inner”) with thekeep argument. Unlike R’s default,joyn performs a full join by default, ensuring all observations areincluded, but you have full control to tailor the results.

2. Seamless variable handling: No more wrestling with duplicatevariable names!joyn offers multiple options:

  • Update values: Useupdate_values orupdate_NA to automaticallyupdate conflicting variables in the left table with values from theright table.

  • Keep both (with different names): Enablekeep_common_vars = TRUEto retain both variables, each with a unique suffix.

  • Selective inclusion: Choose specific variables from the righttable withy_vars_to_keep, ensuring you get only the data you need.

3. Relationship awareness:joyn recognizes one-to-one,one-to-many, many-to-one, and many-to-many relationships between tables.While it defaults to many-to-many for compatibility,remember this isoften not ideal.Always specify the correct relationship usingbyarguments for accurate and meaningful results.

4. Join success at a glance: Get instant feedback on your join withthe automatically generated reporting variable. Identify potentialissues like unmatched observations or missing values to ensure dataintegrity and informed decision-making.

By addressing these common pain points and offering enhancedflexibility,joyn empowers you to confidently and effectively joinyour data frames, paving the way for deeper insights and data-drivensuccess.

Performance and flexibility

The cost of Reliability

While raw speed is essential, understanding your joins every step of theway is equally crucial.joyn prioritizes providinginsightfulinformation and preventing errors over solely focusing on speed.Unlike other functions, it adds:

  • Meticulous checks:joyn performs comprehensive checks to ensureyour join is accurate and avoids potential missteps, like unmatchedobservations or missing values.
  • Detailed reporting: Get a clear picture of your join with adedicated report, highlighting any issues you should be aware of.
  • User-friendly summary: Quickly grasp the join’s outcome with aconcise overview presented in a clear table.

These valuable features contribute to a slightly slower performancecompared to functions likedata.table::merge.data.table() orcollapse::join(). However, the benefits ofpreventing errors andgaining invaluable insights far outweigh the minor speed difference.

Know your needs, choose your tool

  • Speed is your top priority for massive datasets? Consider usingdata.table orcollapse directly.
  • Seek clear understanding and error prevention for your joins?joyn is your trusted guide.

Protective by design

joyn intentionally restricts certain actions and provides clearmessages when encountering unexpected data configurations. This mightseemopinionated, but it’s designed toprotect you fromaccidentally creating inaccurate or misleading joins. This “safetynet” empowers you to confidently merge your data, knowingjoyn hasyour back.

Flexibility

Currently,joyn focuses on the most common and valuable join types.Future development might explore expanding its flexibility based on userneeds and feedback.

joyn as wrapper: Familiar Syntax, Familiar Power

Whilejoyn::join() offers the core functionality and Stata-inspiredarguments, you might prefer a syntax more aligned with your existingworkflow.joyn has you covered!

Embrace base R anddata.table:

  • joyn::merge(): Leverage familiar base R anddata.table syntax forseamless integration with your existing code.

Join with flair usingdplyr:

  • joyn::{dplyr verbs}(): Enjoy the intuitiveverb-basedsyntax ofdplyr for a powerful and expressive way to perform joins.

Dive deeper: Explore the corresponding vignettes to unlock the fullpotential of these alternative interfaces and find the perfect fit foryour data manipulation style.

Installation

You can install the stable version ofjoyn fromCRAN with:

install.packages("joyn")

The development version fromGitHub with:

# install.packages("devtools")devtools::install_github("randrescastaneda/joyn")

Examples

library(joyn)#>#> Attaching package: 'joyn'#> The following object is masked from 'package:base':#>#>     mergelibrary(data.table)x1= data.table(id= c(1L,1L,2L,3L,NA_integer_),t= c(1L,2L,1L,2L,NA_integer_),x=11:15)y1= data.table(id= c(1,2,4),y= c(11L,15L,16))x2= data.table(id= c(1,4,2,3,NA),t= c(1L,2L,1L,2L,NA_integer_),x= c(16,12,NA,NA,15))y2= data.table(id= c(1,2,5,6,3),yd= c(1,2,5,6,3),y= c(11L,15L,20L,13L,10L),x= c(16:20))# using common variable `id` as key.joyn(x=x1,y=y1,match_type="m:1")#>#> ── JOYn Report ──#>#>   .joyn n percent#> 1     x 2   33.3%#> 2     y 1   16.7%#> 3 x & y 3     50%#> 4 total 6    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#> ℹ Note: Removing key variables id from id and y#>       id     t     x     y  .joyn#>    <num> <int> <int> <num> <fctr>#> 1:     1     1    11    11  x & y#> 2:     1     2    12    11  x & y#> 3:     2     1    13    15  x & y#> 4:     3     2    14    NA      x#> 5:    NA    NA    15    NA      x#> 6:     4    NA    NA    16      y# keep just those observations that matchjoyn(x=x1,y=y1,match_type="m:1",keep="inner")#>#> ── JOYn Report ──#>#>   .joyn n percent#> 1     x 2   66.7%#> 2     y 1   33.3%#> 3 total 3    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#> ℹ Note: Removing key variables id from id and y#>       id     t     x     y  .joyn#>    <num> <int> <int> <num> <fctr>#> 1:     1     1    11    11  x & y#> 2:     1     2    12    11  x & y#> 3:     2     1    13    15  x & y# Bad merge for not specifying by argumentjoyn(x=x2,y=y2,match_type="1:1")#>#> ── JOYn Report ──#>#>   .joyn n percent#> 1     x 4   44.4%#> 2     y 4   44.4%#> 3 x & y 1   11.1%#> 4 total 9    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#> ℹ Note: Removing key variables id and x from id, yd, y, and x#>       id     t     x    yd     y  .joyn#>    <num> <int> <num> <num> <int> <fctr>#> 1:     1     1    16     1    11  x & y#> 2:     4     2    12    NA    NA      x#> 3:     2     1    NA    NA    NA      x#> 4:     3     2    NA    NA    NA      x#> 5:    NA    NA    15    NA    NA      x#> 6:     2    NA    17     2    15      y#> 7:     5    NA    18     5    20      y#> 8:     6    NA    19     6    13      y#> 9:     3    NA    20     3    10      y# good merge, ignoring variable x from yjoyn(x=x2,y=y2,by="id",match_type="1:1")#>#> ── JOYn Report ──#>#>   .joyn n percent#> 1     x 2   28.6%#> 2     y 2   28.6%#> 3 x & y 3   42.9%#> 4 total 7    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#> ℹ Note: Removing key variables id from id, yd, y, and x#>       id     t     x    yd     y  .joyn#>    <num> <int> <num> <num> <int> <fctr>#> 1:     1     1    16     1    11  x & y#> 2:     4     2    12    NA    NA      x#> 3:     2     1    NA     2    15  x & y#> 4:     3     2    NA     3    10  x & y#> 5:    NA    NA    15    NA    NA      x#> 6:     5    NA    NA     5    20      y#> 7:     6    NA    NA     6    13      y# update NAs in var x in table x from var x in yjoyn(x=x2,y=y2,by="id",update_NAs=TRUE)#>#> ── JOYn Report ──#>#>         .joyn     n percent#>        <char> <int>  <char>#> 1:          x     2   28.6%#> 2:      x & y     1   14.3%#> 3: NA updated     4   57.1%#> 4:      total     7    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#> ℹ Note: Removing key variables id from id, yd, y, and x#>       id     t     x    yd     y      .joyn#>    <num> <int> <num> <num> <int>     <fctr>#> 1:     1     1    16     1    11      x & y#> 2:     4     2    12    NA    NA          x#> 3:     2     1    17     2    15 NA updated#> 4:     3     2    20     3    10 NA updated#> 5:    NA    NA    15    NA    NA          x#> 6:     5    NA    18     5    20 NA updated#> 7:     6    NA    19     6    13 NA updated# update values in var x in table x from var x in yjoyn(x=x2,y=y2,by="id",update_values=TRUE)#>#> ── JOYn Report ──#>#>            .joyn     n percent#>           <char> <int>  <char>#> 1:    NA updated     4   57.1%#> 2: value updated     1   14.3%#> 3:   not updated     2   28.6%#> 4:         total     7    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#> ℹ Note: Removing key variables id from id, yd, y, and x#>       id     t     x    yd     y         .joyn#>    <num> <int> <num> <num> <int>        <fctr>#> 1:     1     1    16     1    11 value updated#> 2:     4     2    12    NA    NA   not updated#> 3:     2     1    17     2    15    NA updated#> 4:     3     2    20     3    10    NA updated#> 5:    NA    NA    15    NA    NA   not updated#> 6:     5    NA    18     5    20    NA updated#> 7:     6    NA    19     6    13    NA updated# do not bring any variable from y into x, just the reportjoyn(x=x2,y=y2,by="id",y_vars_to_keep=NULL)#>#> ── JOYn Report ──#>#>   .joyn n percent#> 1     x 2   28.6%#> 2     y 2   28.6%#> 3 x & y 3   42.9%#> 4 total 7    100%#> ────────────────────────────────────────────────────────── End of JOYn report ──#> ℹ Note: Joyn's report available in variable .joyn#>       id     t     x  .joyn#>    <num> <int> <num> <fctr>#> 1:     1     1    16  x & y#> 2:     4     2    12      x#> 3:     2     1    NA  x & y#> 4:     3     2    NA  x & y#> 5:    NA    NA    15      x#> 6:     5    NA    NA      y#> 7:     6    NA    NA      y

About

joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Contributors4

  •  
  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp