joyn
empowers you to assess the results of joining data
frames, making it easier and more efficient to combine your tables.
Similar in philosophy to the merge
command in
Stata
, joyn
offers matching key variables and
detailed join reports to ensure accurate and insightful results.
Merging tables in R can be tricky. Ensuring accuracy and
understanding the joined data fully can be tedious tasks. That’s where
joyn
comes in. Inspired by Stata’s informative approach to
merging, joyn
makes the process smoother and more
insightful.
While standard R merge functions are powerful, they often lack
features like assessing join accuracy, detecting potential issues, and
providing detailed reports. joyn
fills this gap by
offering:
joyn
helps you navigate them confidently.joyn
special?While standard R merge functions offer basic functionality,
joyn
goes above and beyond by providing comprehensive tools
and features tailored to your data joining needs:
1. Flexibility in join types: Choose your ideal join
type (“left”, “right”, or “inner”) with the keep
argument.
Unlike R’s default, joyn
performs a full join by default,
ensuring all observations are included, but you have full control to
tailor the results.
2. Seamless variable handling: No more wrestling
with duplicate variable names! joyn
offers multiple
options:
Update values: Use update_values
or
update_NA
to automatically update conflicting variables in
the left table with values from the right table.
Keep both (with different names): Enable
keep_common_vars = TRUE
to retain both variables, each with
a unique suffix.
Selective inclusion: Choose specific variables
from the right table with y_vars_to_keep
, ensuring you get
only the data you need.
3. Relationship awareness: joyn
recognizes one-to-one, one-to-many, many-to-one, and many-to-many
relationships between tables. While it defaults to many-to-many for
compatibility, remember this is often not ideal.
Always specify the correct relationship using by
arguments for accurate and meaningful results.
4. Join success at a glance: Get instant feedback on your join with the automatically generated reporting variable. Identify potential issues like unmatched observations or missing values to ensure data integrity and informed decision-making.
By addressing these common pain points and offering enhanced
flexibility, joyn
empowers you to confidently and
effectively join your data frames, paving the way for deeper insights
and data-driven success.
While raw speed is essential, understanding your joins every step of
the way is equally crucial. joyn
prioritizes providing
insightful information and preventing errors over
solely focusing on speed. Unlike other functions, it adds:
joyn
performs
comprehensive checks to ensure your join is accurate and avoids
potential missteps, like unmatched observations or missing values.These valuable features contribute to a slightly slower performance
compared to functions like data.table::merge.data.table()
or collapse::join()
. However, the benefits of
preventing errors and gaining invaluable insights far
outweigh the minor speed difference.
data.table
or collapse
directly.joyn
is your trusted guide.joyn
intentionally restricts certain actions and
provides clear messages when encountering unexpected data
configurations. This might seem opinionated, but it’s
designed to protect you from accidentally creating inaccurate or
misleading joins. This “safety net” empowers you to confidently
merge your data, knowing joyn
has your back.
Currently, joyn
focuses on the most common and valuable
join types. Future development might explore expanding its flexibility
based on user needs and feedback.
joyn
as
wrapper: Familiar Syntax, Familiar PowerWhile joyn::join()
offers the core functionality and
Stata-inspired arguments, you might prefer a syntax more aligned with
your existing workflow. joyn
has you covered!
Embrace base R and data.table
:
joyn::merge()
: Leverage familiar base R and
data.table
syntax for seamless integration with your
existing code.Join with flair using dplyr
:
joyn::{dplyr verbs}()
: Enjoy the intuitive verb-based
syntax of dplyr
for a powerful and expressive way to
perform joins.Dive deeper: Explore the corresponding vignettes to unlock the full potential of these alternative interfaces and find the perfect fit for your data manipulation style.
You can install the stable version of joyn
from CRAN with:
install.packages("joyn")
The development version from GitHub with:
# install.packages("devtools")
::install_github("randrescastaneda/joyn") devtools
library(joyn)
#>
#> Attaching package: 'joyn'
#> The following object is masked from 'package:base':
#>
#> merge
library(data.table)
#> Warning: package 'data.table' was built under R version 4.4.2
= data.table(id = c(1L, 1L, 2L, 3L, NA_integer_),
x1 t = c(1L, 2L, 1L, 2L, NA_integer_),
x = 11:15)
= data.table(id = c(1,2, 4),
y1 y = c(11L, 15L, 16))
= data.table(id = c(1, 4, 2, 3, NA),
x2 t = c(1L, 2L, 1L, 2L, NA_integer_),
x = c(16, 12, NA, NA, 15))
= data.table(id = c(1, 2, 5, 6, 3),
y2 yd = c(1, 2, 5, 6, 3),
y = c(11L, 15L, 20L, 13L, 10L),
x = c(16:20))
# using common variable `id` as key.
joyn(x = x1,
y = y1,
match_type = "m:1")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 33.3%
#> 2 y 1 16.7%
#> 3 x & y 3 50%
#> 4 total 6 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id and y
#> id t x y .joyn
#> <num> <int> <int> <num> <fctr>
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
#> 4: 3 2 14 NA x
#> 5: NA NA 15 NA x
#> 6: 4 NA NA 16 y
# keep just those observations that match
joyn(x = x1,
y = y1,
match_type = "m:1",
keep = "inner")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 66.7%
#> 2 y 1 33.3%
#> 3 total 3 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id and y
#> id t x y .joyn
#> <num> <int> <int> <num> <fctr>
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
# Bad merge for not specifying by argument
joyn(x = x2,
y = y2,
match_type = "1:1")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 4 44.4%
#> 2 y 4 44.4%
#> 3 x & y 1 11.1%
#> 4 total 9 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id and x from id, yd, y, and x
#> id t x yd y .joyn
#> <num> <int> <num> <num> <int> <fctr>
#> 1: 1 1 16 1 11 x & y
#> 2: 4 2 12 NA NA x
#> 3: 2 1 NA NA NA x
#> 4: 3 2 NA NA NA x
#> 5: NA NA 15 NA NA x
#> 6: 2 NA 17 2 15 y
#> 7: 5 NA 18 5 20 y
#> 8: 6 NA 19 6 13 y
#> 9: 3 NA 20 3 10 y
# good merge, ignoring variable x from y
joyn(x = x2,
y = y2,
by = "id",
match_type = "1:1")
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 28.6%
#> 2 y 2 28.6%
#> 3 x & y 3 42.9%
#> 4 total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, yd, y, and x
#> id t x yd y .joyn
#> <num> <int> <num> <num> <int> <fctr>
#> 1: 1 1 16 1 11 x & y
#> 2: 4 2 12 NA NA x
#> 3: 2 1 NA 2 15 x & y
#> 4: 3 2 NA 3 10 x & y
#> 5: NA NA 15 NA NA x
#> 6: 5 NA NA 5 20 y
#> 7: 6 NA NA 6 13 y
# update NAs in var x in table x from var x in y
joyn(x = x2,
y = y2,
by = "id",
update_NAs = TRUE)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> <char> <int> <char>
#> 1: x 2 28.6%
#> 2: x & y 1 14.3%
#> 3: NA updated 4 57.1%
#> 4: total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, yd, y, and x
#> id t x yd y .joyn
#> <num> <int> <num> <num> <int> <fctr>
#> 1: 1 1 16 1 11 x & y
#> 2: 4 2 12 NA NA x
#> 3: 2 1 17 2 15 NA updated
#> 4: 3 2 20 3 10 NA updated
#> 5: NA NA 15 NA NA x
#> 6: 5 NA 18 5 20 NA updated
#> 7: 6 NA 19 6 13 NA updated
# update values in var x in table x from var x in y
joyn(x = x2,
y = y2,
by = "id",
update_values = TRUE)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> <char> <int> <char>
#> 1: NA updated 4 57.1%
#> 2: value updated 1 14.3%
#> 3: not updated 2 28.6%
#> 4: total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, yd, y, and x
#> id t x yd y .joyn
#> <num> <int> <num> <num> <int> <fctr>
#> 1: 1 1 16 1 11 value updated
#> 2: 4 2 12 NA NA not updated
#> 3: 2 1 17 2 15 NA updated
#> 4: 3 2 20 3 10 NA updated
#> 5: NA NA 15 NA NA not updated
#> 6: 5 NA 18 5 20 NA updated
#> 7: 6 NA 19 6 13 NA updated
# do not bring any variable from y into x, just the report
joyn(x = x2,
y = y2,
by = "id",
y_vars_to_keep = NULL)
#>
#> ── JOYn Report ──
#>
#> .joyn n percent
#> 1 x 2 28.6%
#> 2 y 2 28.6%
#> 3 x & y 3 42.9%
#> 4 total 7 100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> id t x .joyn
#> <num> <int> <num> <fctr>
#> 1: 1 1 16 x & y
#> 2: 4 2 12 x
#> 3: 2 1 NA x & y
#> 4: 3 2 NA x & y
#> 5: NA NA 15 x
#> 6: 5 NA NA y
#> 7: 6 NA NA y