| Title: | A Traceability Focused Grammar of Clinical Data Summary |
| Version: | 1.2.1 |
| Description: | A traceability focused tool created to simplify the data manipulation necessary to create clinical summaries. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/atorus-research/Tplyr |
| BugReports: | https://github.com/atorus-research/Tplyr/issues |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.5.0) |
| Imports: | rlang (≥ 0.4.6), assertthat (≥ 0.2.1), magrittr (≥ 1.5),dplyr (≥ 1.0.0), purrr (≥ 0.3.3), stringr (≥ 1.4.0), tidyr(≥ 1.0.2), tidyselect (≥ 1.1.0), tibble (≥ 3.0.1),lifecycle, forcats (≥ 1.0.0) |
| Suggests: | testthat (≥ 2.1.0), haven (≥ 2.2.0), knitr, rmarkdown,huxtable, tidyverse, readr, kableExtra, pharmaRTF, withr |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.2.3 |
| RdMacros: | lifecycle |
| Config/testthat/edition: | 3 |
| LazyData: | true |
| NeedsCompilation: | no |
| Packaged: | 2024-02-19 21:00:25 UTC; mike.stackhouse |
| Author: | Eli Miller |
| Maintainer: | Mike Stackhouse <mike.stackhouse@atorusresearch.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-02-20 08:00:02 UTC |
A grammar of summary data for clinical reports
Description
'r lifecycle::badge("experimental")'
Details
'Tplyr' is a package dedicated to simplifying the data manipulation necessaryto create clinical reports. Clinical data summaries can often be broken downinto two factors - counting discrete variables (or counting shifts in state),and descriptive statistics around a continuous variable. Many of the reportsthat go into a clinical report are made up of these two scenarios. Byabstracting this process away, 'Tplyr' allows you to rapidly build thesetables without worrying about the underlying data manipulation.
'Tplyr' takes this process a few steps further by abstracting away most ofthe programming that goes into proper presentation, which is where a greatdeal of programming time is spent. For example, 'Tplyr' allows you to easilycontrol:
- String formatting
Different reports warrantdifferent presentation of your strings. Programming this can get tedious, asyou typically want to make sure that your decimals properly align. 'Tplyr'abstracts this process away and provides you with a simple interface tospecify how you want your data presented
- Treatmentgroups
Need a total column? Need to group summaries of multiple treatments?'Tplyr' makes it simple to add additional treatment groups into your report
- Denominators
n (%) counts often vary based on the summarybeing performed. 'Tplyr' allows you to easily control what denominators areused based on a few common scenarios
- Sorting
Summarizingdata is one thing, but ordering it for presentation. Tplyr automaticallyderives sorting variable to give you the data you need to order your tableproperly. This process is flexible so you can easily get what you want byleveraging your data or characteristics of R.
Another powerful aspect of 'Tplyr' are the objects themselves. 'Tplyr' doesmore than format your data. Metadata about your table is kept under the hood,and functions allow you to access information that you need. For example,'Tplyr' allows you to calculate and access the raw numeric data ofcalculations as well, and easily pick out just the pieces of information thatyou need.
Lastly, 'Tplyr' was built to be flexible, yet intuitive. A common pitfall ofbuilding tools like this is over automation. By doing to much, you end up notdoing enough. 'Tplyr' aims to hit the sweet spot in between. Additionally, wedesigned our function interfaces to be clean. Modifier functions offer youflexibility when you need it, but defaults can be set to keep the codeconcise. This allows you to quickly assemble your table, and easily makechanges where necessary.
Author(s)
Maintainer: Mike Stackhousemike.stackhouse@atorusresearch.com (ORCID)
Authors:
Eli MillerEli.Miller@AtorusResearch.com (ORCID)
Ashley TarasiewiczAshley.Tarasiewicz@atorusresearch.com
Other contributors:
Nathan KosibaNathan.Kosiba@atorusresearch.com (ORCID) [contributor]
Sadchla Mascarysadchla.mascary@atorusresearch.com [contributor]
Andrew Batesandrew.bates@atorusresearch.com [contributor]
Shiyu Chenshiyu.chen@atorusresearch.com [contributor]
Oleksii Mikryukovalex.mikryukov@atorusresearch.com [contributor]
Atorus Research LLC [copyright holder]
See Also
Useful links:
Report bugs athttps://github.com/atorus-research/Tplyr/issues
Examples
# Load in pipelibrary(magrittr)# Use just the defaultstplyr_table(mtcars, gear) %>% add_layer( group_desc(mpg, by=cyl) ) %>% add_layer( group_count(carb, by=cyl) ) %>% build()# Customize and modifytplyr_table(mtcars, gear) %>% add_layer( group_desc(mpg, by=cyl) %>% set_format_strings( "n" = f_str("xx", n), "Mean (SD)" = f_str("a.a+1 (a.a+2)", mean, sd, empty='NA'), "Median" = f_str("a.a+1", median), "Q1, Q3" = f_str("a, a", q1, q3, empty=c(.overall='NA')), "Min, Max" = f_str("a, a", min, max), "Missing" = f_str("xx", missing) ) ) %>% add_layer( group_count(carb, by=cyl) %>% add_risk_diff( c('5', '3'), c('4', '3') ) %>% set_format_strings( n_counts = f_str('xx (xx%)', n, pct), riskdiff = f_str('xx.xxx (xx.xxx, xx.xxx)', dif, low, high) ) %>% set_order_count_method("bycount") %>% set_ordering_cols('4') %>% set_result_order_var(pct) ) %>% build()# A Shift Tabletplyr_table(mtcars, am) %>% add_layer( group_shift(vars(row=gear, column=carb), by=cyl) %>% set_format_strings(f_str("xxx (xx.xx%)", n, pct)) ) %>% build()Pipe operator
Description
Seemagrittr::%>% for details.
Usage
lhs %>% rhsArguments
lhs | A value or the magrittr placeholder. |
rhs | A function call using the magrittr semantics. |
Value
The result of calling 'rhs(lhs)'.
Add an anti-join onto a tplyr_meta object
Description
An anti-join allows a tplyr_meta object to refer to data that should beextracted from a separate dataset, like the population data of a Tplyr table,that is unavailable in the target dataset. The primary use case for this isthe presentation of missing subjects, which in a Tplyr table is presentedusing the functionadd_missing_subjects_row(). The missing subjectsthemselves are not present in the target data, and are thus only available inthe population data. Theadd_anti_join() function allows you to provide themeta information relevant to the population data, and then specify theonvariable that should be used to join with the target dataset and find thevalues present in the population data that are missing from the target data.
Usage
add_anti_join(meta, join_meta, on)Arguments
meta | A tplyr_meta object referring to the target data |
join_meta | A tplyr_meta object referring to the population data |
on | A list of quosures containing symbols - most likely set to USUBJID. |
Value
A tplyr_meta object
Examples
tm <- tplyr_meta( rlang::quos(TRT01A, SEX, ETHNIC, RACE), rlang::quos(TRT01A == "Placebo", TRT01A == "SEX", ETHNIC == "HISPANIC OR LATINO"))tm %>% add_anti_join( tplyr_meta( rlang::quos(TRT01A, ETHNIC), rlang::quos(TRT01A == "Placebo", ETHNIC == "HISPANIC OR LATINO") ), on = rlang::quos(USUBJID) )Attach column headers to a Tplyr output
Description
When working with 'huxtable' tables, column headers can be controlled as if they are rows in the data frame.add_column_headers eases the process of introducing these headers.
Usage
add_column_headers(.data, s, header_n = NULL)Arguments
.data | The data.frame/tibble on which the headers shall be attached |
s | The text containing the intended header string |
header_n | A header_n or generic data.frame to use for binding count values.This is required if you are using the token replacement. |
Details
Headers are created by providing a single string. Columns are specified by delimitting each header with a '|' symbol.Instead of specifying the destination of each header,add_column_headers assumes that you have organized the columnsof your data frame before hand. This means that after you useTplyr::build(), if you'd like to reorganize thedefault column order (which is simply alphabetical), simply pass the build output to adplyr::select ordplyr::relocatestatement before passing intoadd_column_headers.
Spanning headers are also supported. A spanning header is an overarching header that sits across multiple columns.Spanning headers are introduced toadd_column_header by providing the spanner text (i.e. the text thatyou'd like to sit in the top row), and then the spanned text (the bottom row) within curly brackets ('{}). For example,take the iris dataset. We have the names:
"Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
If we wanted to provide a header string for this dataset, with spanners to help with categorization ofthe variables, we could provide the following string:
"Sepal {Length | Width} | Petal {Length | Width} | Species"
Value
A data.frame with the processed header string elements attached as the top rows
Important note
Make sure you are aware of the order of your variables prior to passing in toadd_column_headers. The only requirementis that the number of column match. The rest is up to you.
Development notes
There are a few features ofadd_column_header that are intended but not yet supported:
Nested spanners are not yet supported. Only a spanning row and a bottom row can currently be created
Different delimiters and indicators for a spanned group may be used in the future. The current choices were intuitive,but based on feedback it could be determined that less common characters may be necessary.
Token Replacement
This function has support for reading values from the header_n object in a Tplyr tableand adding them in the column headers. Note: The order of the parameterspassed in the token is important. They should be first the treatment variablethen anycols variables in the order they were passed in the table construction.
Use a double asterisk "**" at the begining to start the token and anotherdouble asterisk to close it. You can separate column parameters in the tokenwith a single underscore. For example, **group1_flag2_param3** will pull the countfrom the header_n binding for group1 in thetreat_var, flag2 in the firstcolsargument, and param3 in the secondcols argument.
You can pass fewer arguments in the token to get the sum of multiple columns.For example, **group1** would get the sum of the group1 treat_var,and all cols from the header_n.
Examples
# Load in pipelibrary(magrittr)library(dplyr)header_string <- "Sepal {Length | Width} | Petal {Length | Width} | Species"iris2 <- iris %>% mutate_all(as.character)iris2 %>% add_column_headers(header_string)# Example with countsmtcars2 <- mtcars %>% mutate_all(as.character)t <- tplyr_table(mtcars2, vs, cols = am) %>% add_layer( group_count(cyl) )b_t <- build(t) %>% mutate_all(as.character)count_string <- paste0(" | V N=**0** {auto N=**0_0** | man N=**0_1**} |", " S N=**1** {auto N=**1_0** | man N=**1_1**} | | ")add_column_headers(b_t, count_string, header_n(t))Attach a layer to atplyr_table object
Description
add_layer attaches atplyr_layer to atplyr_table object. This allowsfor a tidy style of programming (usingmagrittr piping, i.e.%>%) with asecondary advantage - the construction of the layer object may consist of a series of pipedfunctions itself.
Tplyr encourages a user to view the construction of a table as a series of "layers".The construction of each of these layers are isolated and independent of one another - buteach of these layers are children of the table itself.add_layer isolates the constructionof an individual layer and allows the user to construct that layer and insert it back into theparent. The syntax for this is intuitive and allows for tidy piping. Simply pipe the currenttable object in, and write the code to construct your layer within thelayer parameter.
add_layers is another approach to attaching layers to atplyr_table. Instead ofconstructing the entire table at once,add_layers allows you to construct layers asdifferent objects. These layers can then be attached into thetplyr_table all atonce.
add_layer andadd_layers both additionally allow you to name the layers as youattach them. This is helpful when using functions likeget_numeric_data orget_stats_data when you can access information from a layer directly.add_layer has a name parameter, and layers can be named inadd_layers bysubmitting the layer as a named argument.
Usage
add_layer(parent, layer, name = NULL)add_layers(parent, ...)Arguments
parent | A |
layer | A layer construction function and associated modifier functions |
name | A name to provide the layer in the table layers container |
... | Layers to be added |
Value
Atplyr_table ortplyr_layer/tplyr_subgroup_layer with a new layer inserted into thelayerbinding
See Also
[tplyr_table(), tplyr_layer(), group_count(), group_desc(), group_shift()]
Examples
# Load in pipelibrary(magrittr)## Single layert <- tplyr_table(mtcars, cyl) %>% add_layer( group_desc(target_var=mpg) )## Single layer with namet <- tplyr_table(mtcars, cyl) %>% add_layer(name='mpg', group_desc(target_var=mpg) )# Using add_layerst <- tplyr_table(mtcars, cyl)l1 <- group_desc(t, target_var=mpg)l2 <- group_count(t, target_var=cyl)t <- add_layers(t, l1, 'cyl' = l2)Add a missing subject row into a count summary.
Description
This function calculates the number of subjects missing from a particulargroup of results. The calculation is done by examining the total number ofsubjects potentially available from the Header N values within the resultcolumn, and finding the difference with the total number of subjects presentin the result group. Note that for accurate results, the subject variableneeds to be defined using the 'set_distinct_by()' function. As with othermethods, this function instructs how distinct results should be identified.
Usage
add_missing_subjects_row(e, fmt = NULL, sort_value = NULL)Arguments
e | A 'count_layer' object |
fmt | An f_str object used to format the total row. If none is provided,display is based on the layer formatting. |
sort_value | The value that will appear in the ordering column for totalrows. This must be a numeric value. |
Examples
tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% add_missing_subjects_row(f_str("xxxx", n)) ) %>% build()Add risk difference to a count layer
Description
A very common requirement for summary tables is to calculate the risk difference between treatmentgroups.add_risk_diff allows you to do this. The underlying risk difference calculationsare performed using the Base R functionprop.test - so prior to using this function,be sure to familiarize yourself with its functionality.
Usage
add_risk_diff(layer, ..., args = list(), distinct = TRUE)Arguments
layer | Layer upon which the risk difference will be attached |
... | Comparison groups, provided as character vectors where the first group is the comparison,and the second is the reference |
args | Arguments passed directly into |
distinct | Logical - Use distinct counts (if available). |
Details
add_risk_diff can only be attached to a count layer, so the count layer must be constructedfirst.add_risk_diff allows you to compare the difference between treatment group, so allcomparisons should be based upon the values within the specifiedtreat_var in yourtplyr_table object.
Comparisons are specified by providing two-element character vectors. You can provide as many ofthese groups as you want. You can also use groups that have been constructed usingadd_treat_grps oradd_total_group. The first element provided will be consideredthe 'reference' group (i.e. the left side of the comparison), and the second group will be consideredthe 'comparison'. So if you'd like to see the risk difference of 'T1 - Placebo', you would specifythis asc('T1', 'Placebo').
Tplyr forms your two-way table in the background, and then runsprop.test appropriately.Similar to way that the display of layers are specified, the exact values and format of how you'd likethe risk difference display are set usingset_format_strings. This controls both the valuesand the format of how the risk difference is displayed. Risk difference formats are set withinset_format_strings by using the name 'riskdiff'.
You have 5 variables to choose from in your data presentation:
- comp
Probability of the left hand side group (i.e. comparison)
- ref
Probability of the right hand side group (i.e. reference)
- dif
Difference of comparison - reference
- low
Lower end of the confidence interval (default is 95%, override with the
argsparamter)- high
Upper end of the confidence interval (default is 95%, override with the
argsparamter)
Use these variable names when forming yourf_str objects. The default presentation, if nostring format is specified, will be:
f_str('xx.xxx (xx.xxx, xx.xxx)', dif, low, high)
Note - within Tplyr, you can account for negatives by allowing an extra space within your integerside settings. This will help with your alignment.
If columns are specified on a Tplyr table, risk difference comparisons still only take place betweengroups within thetreat_var variable - but they are instead calculated treating thecolsvariables as by variables. Just like the tplyr layers themselves, the risk difference will then be transposedand display each risk difference as separate variables by each of thecols variables.
Ifdistinct is TRUE (the default), all calculations will take place on the distinct counts, ifthey are available. Otherwise, non-distinct counts will be used.
One final note -prop.test may throw quite a few warnings. This is natural, because italerts you when there's not enough data for the approximations to be correct. This may be unnervingcoming from a SAS programming world, but this is R is trying to alert you that the values provideddon't have enough data to truly be statistically accurate.
Examples
library(magrittr)## Two group comparisons with default options appliedt <- tplyr_table(mtcars, gear)# Basic risk diff for two groups, using defaultsl1 <- group_count(t, carb) %>% # Compare 3 vs. 4, 3 vs. 5 add_risk_diff( c('3', '4'), c('3', '5') )# Build and show outputadd_layers(t, l1) %>% build()## Specify custom formats and display variablest <- tplyr_table(mtcars, gear)# Create the layer with custom formattingl2 <- group_count(t, carb) %>% # Compare 3 vs. 4, 3 vs. 5 add_risk_diff( c('3', '4'), c('3', '5') ) %>% set_format_strings( 'n_counts' = f_str('xx (xx.x)', n, pct), 'riskdiff' = f_str('xx.xxx, xx.xxx, xx.xxx, xx.xxx, xx.xxx', comp, ref, dif, low, high) )# Build and show outputadd_layers(t, l2) %>% build()## Passing arguments to prop.testt <- tplyr_table(mtcars, gear)# Create the layer with args optionl3 <- group_count(t, carb) %>% # Compare 3 vs. 4, 4 vs. 5 add_risk_diff( c('3', '4'), c('3', '5'), args = list(conf.level = 0.9, correct=FALSE, alternative='less') )# Build and show outputadd_layers(t, l3) %>% build()Add a Total row into a count summary.
Description
Adding a total row creates an additional observation in the count summarythat presents the total counts (i.e. the n's that are summarized). The formatof the total row will be formatted in the same way as the other countstrings.
Usage
add_total_row(e, fmt = NULL, count_missings = TRUE, sort_value = NULL)Arguments
e | A |
fmt | An f_str object used to format the total row. If none is provided,display is based on the layer formatting. |
count_missings | Whether or not to ignore the named arguments passed in'set_count_missing()' when calculating counts total row. This is useful ifyou need to exclude/include the missing counts in your total row. Defaultsto TRUE meaning total row will not ignore any values. |
sort_value | The value that will appear in the ordering column for totalrows. This must be a numeric value. |
Details
Totals are calculated using all grouping variables, including treat_var andcols from the table level. If by variables are included, the grouping of thetotal and the application of denominators becomes ambiguous. You will bewarned specifically if a percent is included in the format. To rectify this,useset_denoms_by(), and the grouping ofadd_total_row() willbe updated accordingly.
Note that when usingadd_total_row() withset_pop_data(), youshould calladd_total_row() AFTER callingset_pop_data(),otherwise there is potential for unexpected behaivior with treatment groups.
Examples
# Load in Pipelibrary(magrittr)tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% add_total_row(f_str("xxxx", n)) ) %>% build()Combine existing treatment groups for summary
Description
Summary tables often present individual treatment groups, but mayadditionally have a "Treatment vs. Placebo" or "Total" group added to showgrouped summary statistics or counts. This set of functions offers aninterface to add these groups at a table level and be consumed by subsequentlayers.
Usage
add_treat_grps(table, ...)add_total_group(table, group_name = "Total")treat_grps(table)Arguments
table | A |
... | A named vector where names will become the new treatment groupnames, and values will be used to construct those treatment groups |
group_name | The treatment group name used for the constructed 'Total' group |
Details
add_treat_grps allows you to specify specific groupings. This is doneby supplying named arguments, where the name becomes the new treatmentgroup's name, and those treatment groups are made up of the argument'svalues.
add_total_group is a simple wrapper aroundadd_treat_grps.Instead of producing custom groupings, it produces a "Total" group by thesupplied name, which defaults to "Total". This "Total" group is made up ofall existing treatment groups within the population dataset.
Note that when usingadd_treat_grps oradd_total_row() withset_pop_data(), you should calladd_total_row() AFTER callingset_pop_data(), otherwise there is potential for unexpected behaiviorwith treatment groups.
The functiontreat_grps allows you to see the custom treatment groupsavailable in yourtplyr_table object
Value
The modified table object
Examples
tab <- tplyr_table(iris, Species)# A custom groupadd_treat_grps(tab, "Not Setosa" = c("versicolor", "virginica"))# Add a total groupadd_total_group(tab)treat_grps(tab)# Returns:# $`Not Setosa`#[1] "versicolor" "virginica"##$Total#[1] "setosa" "versicolor" "virginica"Add variables to a tplyr_meta object
Description
Add additional variable names to atplyr_meta() object.
Usage
add_variables(meta, names)add_filters(meta, filters)Arguments
meta | A tplyr_meta object |
names | A list of names, providing variable names of interest. Provideas a list of quosures using |
filters | A list of symbols, providing variable names of interest. Provideas a list of quosures using 'rlang::quos()' |
Value
tplyr_meta object
Examples
m <- tplyr_meta()m <- add_variables(m, rlang::quos(a, b, c))m <- add_filters(m, rlang::quos(a==1, b==2, c==3))mAppend the Tplyr table metadata dataframe
Description
append_metadata() allows a user to extend the Tplyr metadata data framewith user provided data. In some tables, Tplyr may be able to provided mostof the data, but a user may have to extend the table with other summaries,statistics, etc. This function allows the user to extend the tplyr_table'smetadata with their own metadata content using custom data frames createdusing thetplyr_meta object.
Usage
append_metadata(t, meta)Arguments
t | A tplyr_table object |
meta | A dataframe fitting the specifications of the details section ofthis function |
Details
As this is an advanced feature of Tplyr, ownership is on the user to makesure the metadata data frame is assembled properly. The only restrictionsapplied byappend_metadata() are thatmeta must have a column namedrow_id, and the values inrow_id cannot be duplicates of anyrow_idvalue already present in the Tplyr metadata dataframe.tplyr_meta() objectsalign with constructed dataframes using therow_id and output datasetcolumn name. As such,tplyr_meta() objects should be inserted into a dataframe using a list column.
Value
A tplyr_table object
Examples
t <- tplyr_table(mtcars, gear) %>% add_layer( group_desc(wt) )t %>% build(metadata=TRUE)m <- tibble::tibble( row_id = c('x1_1'), var1_3 = list(tplyr_meta(rlang::quos(a, b, c), rlang::quos(a==1, b==2, c==3))))append_metadata(t, m)Conditional reformatting of a pre-populated string of numbers
Description
This function allows you to conditionally re-format a string of numbers basedon a numeric value within the string itself. By selecting a "format group",which is targeting a specific number within the string, a user can establisha condition upon which a provided replacement string can be used. Either theentire replacement can be used to replace the entire string, or thereplacement text can refill the "format group" while preserving the originalwidth and alignment of the target string.
Usage
apply_conditional_format( string, format_group, condition, replacement, full_string = FALSE)Arguments
string | Target character vector where text may be replaced |
format_group | An integer representing the targeted numeric field withinthe string, numbered from left to right |
condition | An expression, using the variable name 'x' as the targetvariable within the condition |
replacement | A string to use as the replacement value |
full_string | TRUE if the full string should be replaced, FALSE if thereplacement should be done within the format group |
Value
A character vector
Examples
string <- c(" 0 (0.0%)", " 8 (9.3%)", "78 (90.7%)")apply_conditional_format(string, 2, x == 0, " 0 ", full_string=TRUE)apply_conditional_format(string, 2, x < 1, "(<1%)")Apply Format Strings outside of a Tplyr table
Description
Thef_str object in Tplyr is used to drive formatting of the outputsstrings within a Tplyr table. This function allows a user to use the sameinterface to apply formatted string on any data frame within adplyr::mutate() context.
Usage
apply_formats(format_string, ..., empty = c(.overall = ""))Arguments
format_string | The desired display format. X's indicate digits. On theleft, the number of x's indicates the integer length. On the right, thenumber of x's controls decimal precision and rounding. Variables areinferred by any separation of the 'x' values other than a decimal. |
... | The variables to be formatted using the format specified in |
empty | The string to display when the numeric data is not available.Use a single element character vector, with the element named '.overall' toinstead replace the whole string. |
Details
Note that auto-precision is not currently supported withinapply_formats()
Value
Character vector of formatted values
Examples
library(dplyr)mtcars %>% head() %>% mutate( fmt_example = apply_formats('xxx (xx.x)', hp, wt) )Replace repeating row label variables with blanks in preparation for display.
Description
Depending on the display package being used, row label values may need to beblanked out if they are repeating. This gives the data frame supporting thetable the appearance of the grouping variables being grouped together inblocks.apply_row_masks does this work by blanking out the value ofany row_label variable where the current value is equal to the valuebefore it. Note -apply_row_masks assumes that the data frame hasalready be sorted and therefore should only be applied once the data frame isin its final sort sequence.
Usage
apply_row_masks(dat, row_breaks = FALSE, ...)Arguments
dat | Data.frame / tibble to mask repeating row_labels |
row_breaks | Boolean - set to TRUE to insert row breaks |
... | Variable used to determine where row-breaks should be inserted.Breaks will be inserted when this group of variables changes values. Thisis determined by dataset order, so sorting should be done prior to using |
Details
Additionally,apply_row_masks can add row breaks for you between eachlayer. Row breaks are inserted as blank rows. This relies on the "break by"variables (submitted via...) constructed inbuild still beingattached to the dataset. An additional order variable is attached namedord_break, but the output dataset is sorted to properly insert the rowbreaks between layers.
Value
tibble with blanked out rows where values are repeating
Trigger the execution of thetplyr_table
Description
The functions used to assemble atplyr_table object andeach of the layers do not trigger the processing of any data. Rather, a lazyexecution style is used to allow you to construct your table and thenexplicitly state when the data processing should happen.buildtriggers this event.
Usage
build(x, metadata = FALSE)Arguments
x | A |
metadata | Trigger to build metadata. Defaults to FALSE |
Details
When thebuild command is executed, all of the dataprocessing commences. Any pre-processing necessary within the tableenvironment takes place first. Next, each of the layers begins executing.Once the layers complete executing, the output of each layer is stacked intothe resulting data frame.
Once this process is complete, any post-processing necessary within the tableenvironment takes place, and the final output can be delivered. Metadata andtraceability information are kept within each of the layer environments,which allows an investigation into the source of the resulting datapoints.For example, numeric data from any summaries performed is maintained andaccessible within a layer usingget_numeric_data.
The 'metadata' option of build will trigger the construction of traceabilitymetadata for the constructed data frame. Essentially, for every "result" thatTplyr produces, Tplyr can also generate the steps necessary to obtain thesource data which produced that result from the input. For more information,see vignette("metadata").
Value
An executedtplyr_table
See Also
tplyr_table, tplyr_layer, add_layer, add_layers, layer_constructors
Examples
# Load in Pipelibrary(magrittr)tplyr_table(iris, Species) %>% add_layer( group_desc(Sepal.Length, by = "Sepal Length") ) %>% add_layer( group_desc(Sepal.Width, by = "Sepal Width") ) %>% build()Collapse row labels into a single column
Description
This is a generalized post processing function that allows you to take groupsof by variables and collapse them into a single column. Repeating values aresplit into separate rows, and for each level of nesting, a specifiedindentation level can be applied.
Usage
collapse_row_labels(x, ..., indent = " ", target_col = row_label)Arguments
x | Input data frame |
... | Row labels to be collapsed |
indent | Indentation string to be used, which is multiplied at each indentation level |
target_col | The desired name of the output column containing collapsed row labels |
Value
data.frame with row labels collapsed into a single column
Examples
x <- tibble::tribble(~row_label1, ~row_label2, ~row_label3, ~row_label4, ~var1, "A", "C", "G", "M", 1L, "A", "C", "G", "N", 2L, "A", "C", "H", "O", 3L, "A", "D", "H", "P", 4L, "A", "D", "I", "Q", 5L, "A", "D", "I", "R", 6L, "B", "E", "J", "S", 7L, "B", "E", "J", "T", 8L, "B", "E", "K", "U", 9L, "B", "F", "K", "V", 10L, "B", "F", "L", "W", 11L)collapse_row_labels(x, row_label1, row_label2, row_label3, row_label4)collapse_row_labels(x, row_label1, row_label2, row_label3)collapse_row_labels(x, row_label1, row_label2, indent = " ", target_col = rl)Create af_str object
Description
f_str objects are intended to be used within the functionset_format_strings. Thef_str object carries information that powers asignificant amount of layer processing. Theformat_string parameter iscapable of controlling the display of a data point and decimal precision. Thevariables provided in... control which data points are used to populatethe string formatted output.
Usage
f_str(format_string, ..., empty = c(.overall = ""))Arguments
format_string | The desired display format. X's indicate digits. On theleft, the number of x's indicates the integer length. On the right, thenumber of x's controls decimal precision and rounding. Variables areinferred by any separation of the 'x' values other than a decimal. |
... | The variables to be formatted using the format specified in |
empty | The string to display when the numeric data is not available.For desc layers, an unnamed character vector will populate within theprovided format string, set to the same width as the fitted numbers. Use asingle element character vector, with the element named '.overall' toinstead replace the whole string. |
Details
Format strings are one of the most powerful components of 'Tplyr'.Traditionally, converting numeric values into strings for presentation canconsume a good deal of time. Values and decimals need to align betweenrows, rounding before trimming is sometimes forgotten - it can become atedious mess that is realistically not an important part of the analysisbeing performed. 'Tplyr' makes this process as simple as we can, whilestill allowing flexibility to the user.
Tplyr provides both manual and automatic decimal precision formatting. Thedisplay of the numbers in the resulting data frame is controlled by theformat_string parameter. For manual precision, just like dummy values maybe presented on your mocks, integer and decimal precision is specified bythe user providing a string of 'x's for how you'd like your numbersformatted. If you'd like 2 integers with 3 decimal places, you specify yourstring as 'xx.xxx'. 'Tplyr' does the work to get the numbers in the rightplace.
To take this a step further, automatic decimal precision can also beobtained based on the collected precision within the data. When creatingtables where results vary by some parameter, different results may call fordifferent degrees of precision. To use automatic precision, use a single'a' on either the integer and decimal side. If you'd like to use increasedprecision (i.e. you'd like mean to be collected precision +1), use 'a+1'.So if you'd like both integer and and decimal precision to be based on thedata as collected, you can use a format like 'a.a' - or for collected+1decimal precision, 'a.a+1'. You can mix and match this with manual formatsas well, making format strings such as 'xx.a+1'.
If you want two numbers on the same line, you provide two sets of x's. Forexample, if you're presenting a value like "mean (sd)" - you could providethe string 'xx.xx (xx.xxx)', or perhaps 'a.a+1 (a.a+2). Note that you'reable to provide different integer lengths and different decimal precisionfor the two values. Each format string is independent and relates only tothe format specified.
As described above, when using 'x' or 'a', any other character within theformat string will stay stationary. So for example, if your format stringis 'xx (xxx.x)', your number may format as '12 ( 34.5)'. So the left sideparenthesis stays fixed. In some displays, you may want the parenthesis to'hug' your number. Following this example, when allotting 3 spaces for theinteger within parentheses, the parentehsis should shift to the right,making the numbers appear '12 (34.5)'. Usingf_str() you can achievethis by using a capital 'X' or 'A'. For this example, the format stringwould be 'xx (XXX.x)'.
There are a two rules when using 'parenthesis hugging':
Capital letters should only be used on the integer side of a number
A character must precede the capital letter, otherwise there's nocharacter to 'hug'
The other parameters of thef_str call specify what values should fillthe x's.f_str objects are used slightly differently between differentlayers. When declaring a format string within a count layer,f_str()expects to see the valuesn ordistinct_n for event or distinct counts,pct ordistinct_pct for event or distinct percentages, ortotal ordistinct_total for denominator calculations. Note that in anf_str()for a count layer 'A' or 'a' are based on n counts, and therefore don'tmake sense to use in percentages. But in descriptive statistic layers,f_str parameters refer to the names of the summaries being performed,either by built in defaults, or custom summaries declared usingset_custom_summaries(). Seeset_format_strings() for some more notesabout layers specific implementation.
Anf_str() may also be used outside of a Tplyr table. The functionapply_formats() allows you to apply anf_str within the context ofdplyr::mutate() or more generally a vectorized function.
Value
Af_str object
Validf_str() Variables by Layer Type
Valid variables allowed within the... parameter off_str() differ bylayer type.
Count layers
npcttotaldistinct_ndistinct_pctdistinct_total
Shift layers
npcttotal
Desc layers
nmeansdmedianvarminmaxiqrq1q3missingCustom summaries created by
set_custom_summaries()
Examples
f_str("xx.x (xx.x)", mean, sd)f_str("a.a+1 (a.a+2)", mean, sd)f_str("xx.a (xx.a+1)", mean, sd)f_str("xx.x, xx.x, xx.x", q1, median, q3)f_str("xx (XXX.x%)", n, pct)f_str("a.a+1 (A.a+2)", mean, sd)Set or return by layer binding
Description
Set or return by layer binding
Usage
get_by(layer)set_by(layer, by)Arguments
layer | A |
by | A string, a variable name, or a list of variable names suppliedusing |
Value
Forget_by, theby binding of the supplied layer. Forset_by the modified layer environment.
Examples
# Load in pipelibrary(magrittr)iris$Species2 <- iris$Specieslay <- tplyr_table(iris, Species) %>% group_count(Species) %>% set_by(vars(Species2, Sepal.Width))Get Data Labels
Description
Get labels for data sets included in Tplyr.
Usage
get_data_labels(data)Arguments
data | A Tplyr data set. |
Value
A data.frame with columns 'name' and 'label' containing the names and labels of each column.
Get or set the default format strings for descriptive statistics layers
Description
Tplyr provides you with the ability to set table-wide defaults of formatstrings. You may wish to reuse the same format strings across numerouslayers.set_desc_layer_formats andset_count_layer_formatsallow you to apply your desired format strings within the entire scope of thetable.
Usage
get_desc_layer_formats(obj)set_desc_layer_formats(obj, ...)get_count_layer_formats(obj)set_count_layer_formats(obj, ...)get_shift_layer_formats(obj)set_shift_layer_formats(obj, ...)Arguments
obj | A tplyr_table object |
... | formats to pass forward |
Details
For descriptive statistic layers, you can also useset_format_stringsandset_desc_layer_formats together within a table, but not within thesame layer. In the absence of specified format strings, first the table willbe checked for any available defaults, and otherwise thetplyr.desc_layer_default_formats option will be used.set_format_strings will always take precedence over either. Defaultscannot be combined betweenset_format_strings,set_desc_layer_formats, and thetplyr.desc_layer_default_formats because the order of presentation ofresults is controlled by the format strings, so relying on combinations ofthese setting would not be intuitive.
For count layers, you can override then_counts orriskdiffformat strings separately, and the narrowest scope available will be usedfrom layer, to table, to default options.
Extract the result metadata of a Tplyr table
Description
Given a row_id value and a result column, this function will return thetplyr_meta object associated with that 'cell'.
Usage
get_meta_result(x, row_id, column, ...)Arguments
x | A built Tplyr table or a dataframe |
row_id | The row_id value of the desired cell, provided as a characterstring |
column | The result column of interest, provided as a character string |
... | additional arguments |
Details
If a Tplyr table is built with themetadata=TRUE option specified, thenmetadata is assembled behind the scenes to provide traceability on eachresult cell derived. The functionsget_meta_result() andget_meta_subset() allow you to access that metadata by using an ID providedin the row_id column and the column name of the result you'd like to access.The purpose is of the row_id variable instead of a simple row index is toprovide a sort resistant reference of the originating column, so the outputTplyr table can be sorted in any order but the metadata are still easilyaccessible.
Thetplyr_meta object provided a list with two elements - names andfilters. The metadata contain every column from the target data.frame of theTplyr table that factored into the specified result cell, and the filterscontains all the necessary filters to subset to data summarized to create thespecified result cell.get_meta_subset() additionally provides a parameter tospecify any additional columns you would like to include in the returnedsubset data frame.
Value
A tplyr_meta object
Examples
t <- tplyr_table(mtcars, cyl) %>% add_layer( group_desc(hp) )dat <- t %>% build(metadata = TRUE)get_meta_result(t, 'd1_1', 'var1_4')m <- t$metadatadat <- t$targetget_meta_result(t, 'd1_1', 'var1_4')Extract the subset of data based on result metadata
Description
Given a row_id value and a result column, this function will return thesubset of data referenced by the tplyr_meta object associated with that'cell', which provides traceability to tie a result to its source.
Usage
get_meta_subset(x, row_id, column, add_cols = vars(USUBJID), ...)## S3 method for class 'data.frame'get_meta_subset( x, row_id, column, add_cols = vars(USUBJID), target = NULL, pop_data = NULL, ...)## S3 method for class 'tplyr_table'get_meta_subset(x, row_id, column, add_cols = vars(USUBJID), ...)Arguments
x | A built Tplyr table or a dataframe |
row_id | The row_id value of the desired cell, provided as a characterstring |
column | The result column of interest, provided as a character string |
add_cols | Additional columns to include in subset data.frame output |
... | additional arguments |
target | A data frame to be subset (if not pulled from a Tplyr table) |
pop_data | A data frame to be subset through an anti-join (if not pulledfrom a Tplyr table) |
Details
If a Tplyr table is built with themetadata=TRUE option specified, thenmetadata is assembled behind the scenes to provide traceability on eachresult cell derived. The functionsget_meta_result() andget_meta_subset() allow you to access that metadata by using an ID providedin the row_id column and the column name of the result you'd like to access.The purpose is of the row_id variable instead of a simple row index is toprovide a sort resistant reference of the originating column, so the outputTplyr table can be sorted in any order but the metadata are still easilyaccessible.
Thetplyr_meta object provided a list with two elements - names andfilters. The metadata contain every column from the target data.frame of theTplyr table that factored into the specified result cell, and the filterscontains all the necessary filters to subset to data summarized to create thespecified result cell.get_meta_subset() additionally provides a parameterto specify any additional columns you would like to include in the returnedsubset data frame.
Value
A data.frame
Examples
t <- tplyr_table(mtcars, cyl) %>% add_layer( group_desc(hp) )dat <- t %>% build(metadata = TRUE)get_meta_subset(t, 'd1_1', 'var1_4', add_cols = dplyr::vars(carb))m <- t$metadatadat <- t$targetget_meta_subset(t, 'd1_1', 'var1_4', add_cols = dplyr::vars(carb), target = target)Get the metadata dataframe from a tplyr_table
Description
Pull out the metadata dataframe from a tplyr_table to work with it directly
Usage
get_metadata(t)Arguments
t | A Tplyr table with metadata built |
Value
Tplyr metadata dataframe
Examples
t <- tplyr_table(mtcars, gear) %>% add_layer( group_desc(wt) )t %>% build(metadata=TRUE)get_metadata(t)Retrieve the numeric data from a tplyr objects
Description
get_numeric_data provides access to the un-formatted numeric data foreach of the layers within atplyr_table, with options to allow you toextract distinct layers and filter as desired.
Usage
get_numeric_data(x, layer = NULL, where = TRUE, ...)Arguments
x | A tplyr_table or tplyr_layer object |
layer | Layer name or index to select out specifically |
where | Subset criteria passed to dplyr::filter |
... | Additional arguments to pass forward |
Details
When used on atplyr_table object, this method will aggregate thenumeric data from all Tplyr layers. The data will be returned to the user ina list of data frames. If the data has already been processed (i.e.build has been run), the numeric data is already available and will bereturned without reprocessing. Otherwise, the numeric portion of the layerwill be processed.
Using the layer and where parameters, data for a specific layer can beextracted and subset. This is most clear when layers are given text namesinstead of using a layer index, but a numeric index works as well.
Value
Numeric data from the Tplyr layer
Examples
# Load in pipelibrary(magrittr)t <- tplyr_table(mtcars, gear) %>% add_layer(name='drat', group_desc(drat) ) %>% add_layer(name='cyl', group_count(cyl) ) # Return a list of the numeric data frames get_numeric_data(t) # Get the data from a specific layer get_numeric_data(t, layer='drat') get_numeric_data(t, layer=1) # Choose multiple layers by name or index get_numeric_data(t, layer=c('cyl', 'drat')) get_numeric_data(t, layer=c(2, 1)) # Get the data and filter it get_numeric_data(t, layer='drat', where = gear==3)Set or return precision_by layer binding
Description
The precision_by variables are used to collect the integer and decimalprecision when auto-precision is used. These by variables are used to groupthe input data and identify the maximum precision available within thedataset for each by group. The precision_by variables must be a subset of theby variables
Usage
get_precision_by(layer)set_precision_by(layer, precision_by)Arguments
layer | A |
precision_by | A string, a variable name, or a list of variable names suppliedusing |
Value
Forget_precision_by, the precision_by binding of the suppliedlayer. Forset_precision_by the modified layer environment.
Examples
# Load in pipelibrary(magrittr)lay <- tplyr_table(mtcars, gear) %>% add_layer( group_desc(mpg, by=vars(carb, am)) %>% set_precision_by(carb) )Set or return precision_on layer binding
Description
The precision_on variable is the variable used to establish numericprecision. This variable must be included in the list oftarget_varvariables.
Usage
get_precision_on(layer)set_precision_on(layer, precision_on)Arguments
layer | A |
precision_on | A string, a variable name, or a list of variable namessupplied using |
Value
Forget_precision_on, the precision_on binding of the suppliedlayer. Forset_precision_on the modified layer environment.
Examples
# Load in pipelibrary(magrittr)lay <- tplyr_table(mtcars, gear) %>% add_layer( group_desc(vars(mpg, disp), by=vars(carb, am)) %>% set_precision_on(disp) )Get statistics data
Description
Like the layer numeric data, Tplyr also stores the numeric data produced fromstatistics like risk difference. This helper function gives you access toobtain that data from the environment
Usage
get_stats_data(x, layer = NULL, statistic = NULL, where = TRUE, ...)Arguments
x | A tplyr_table or tplyr_layer object |
layer | Layer name or index to select out specifically |
statistic | Statistic name or index to select |
where | Subset criteria passed to dplyr::filter |
... | Additional arguments passed to dispatch |
Details
When used on atplyr_table object, this method will aggregate thenumeric data from all Tplyr layers and calculate all statistics. The datawill be returned to the user in a list of data frames. If the data hasalready been processed (i.e.build has been run), the numeric data isalready available and the statistic data will simply be returned. Otherwise,the numeric portion of the layer will be processed.
Using the layer, where, and statistic parameters, data for a specific layerstatistic can be extracted and subset, allowing you to directly access dataof interest. This is most clear when layers are given text names instead ofusing a layer index, but a numeric index works as well. If just a statisticis specified, that statistic will be collected and returned in a list of dataframes, allowing you to grab, for example, just the risk differencestatistics across all layers.
Value
The statistics data of the supplied layer
Examples
library(magrittr)t <- tplyr_table(mtcars, gear) %>% add_layer(name='drat', group_desc(drat) ) %>% add_layer(name="cyl", group_count(cyl) ) %>% add_layer(name="am", group_count(am) %>% add_risk_diff(c('4', '3')) ) %>% add_layer(name="carb", group_count(carb) %>% add_risk_diff(c('4', '3')) ) # Returns a list of lists, containing stats data from each layer get_stats_data(t) # Returns just the riskdiff statistics from each layer - NULL # for layers without riskdiff get_stats_data(t, statistic="riskdiff") # Return the statistic data for just the "am" layer - a list get_stats_data(t, layer="am") get_stats_data(t, layer=3) # Return the statistic data for just the "am" and "cyl", layer - a # list of lists get_stats_data(t, layer=c("am", "cyl")) get_stats_data(t, layer=c(3, 2)) # Return just the statistic data for "am" and "cyl" - a list get_stats_data(t, layer=c("am", "cyl"), statistic="riskdiff") get_stats_data(t, layer=c(3, 2), statistic="riskdiff") # Return the riskdiff for the "am" layer - a data frame get_stats_data(t, layer="am", statistic="riskdiff") # Return and filter the riskdiff for the am layer - a data frame get_stats_data(t, layer="am", statistic="riskdiff", where = summary_var==1)Set or return treat_var binding
Description
Set or return treat_var binding
Usage
get_target_var(layer)set_target_var(layer, target_var)Arguments
layer | A |
target_var | A symbol to perform the analysis on |
Value
Fortreat_var, the treatment variable binding of the layerobject. Forset_treat_var, the modified layer environment.
Examples
# Load in pipelibrary(magrittr)iris$Species2 <- iris$Specieslay <- tplyr_table(iris, Species) %>% group_count(Species) %>% set_target_var(Species2)Retrieve one of Tplyr's regular expressions
Description
This function allows you to extract important regular expressions used insideTplyr.
Usage
get_tplyr_regex(rx = c("format_string", "format_group"))Arguments
rx | A character string with either the value 'format_string' or'format_group' |
Details
There are two important regular expressions used within Tplyr. Theformat_string expression is the expression to parse format strings. This iswhat is used to make sense out of strings like 'xx (XX.x%)' or 'a+1 (A.a+2)'by inferring what the user is specifying about number formatting.
The 'format_group' regex is the opposite of this, and when given a string ofnumbers, such as ' 5 (34%) [9]' will return the separate segments of numbersbroken into their format groups, which in this example would be ' 5','(34%)', and '[9]'.
Value
A regular expression object
Examples
get_tplyr_regex('format_string')get_tplyr_regex('format_group')Set or return where binding for layer or table
Description
Set or return where binding for layer or table
Usage
## S3 method for class 'tplyr_layer'get_where(obj)## S3 method for class 'tplyr_layer'set_where(obj, where)get_where(obj)## S3 method for class 'tplyr_table'get_where(obj)set_where(obj, where)## S3 method for class 'tplyr_table'set_where(obj, where)set_pop_where(obj, where)get_pop_where(obj)Arguments
obj | A |
where | An expression (i.e. syntax) to be used to subset the data.Supply as programming logic (i.e. x < 5 & y == 10) |
Value
Forwhere, the where binding of the supplied object.Forset_where, the modified object
Examples
# Load in pipelibrary(magrittr)iris$Species2 <- iris$Specieslay <- tplyr_table(iris, Species) %>% group_count(Species) %>% set_where(Petal.Length > 3) %>% # Set logic for pop_data as well set_pop_where(Petal.Length > 3)Create acount,desc, orshift layer for discrete countbased summaries, descriptive statistics summaries, or shift count summaries
Description
This family of functions specifies the type of summary that isto be performed within a layer.count layers are used to createsummary counts of some discrete variable.desc layers create summarystatistics, andshift layers summaries the counts of differentchanges in states. See the "details" section below for more information.
Usage
group_count(parent, target_var, by = vars(), where = TRUE, ...)group_desc(parent, target_var, by = vars(), where = TRUE, ...)group_shift(parent, target_var, by = vars(), where = TRUE, ...)Arguments
parent | Required. The parent environment of the layer. This must be the |
target_var | Symbol. Required, The variable name(s) on which the summaryis to be performed. Must be a variable within the target dataset. Enterunquoted - i.e. target_var = AEBODSYS. You may also provide multiplevariables with |
by | A string, a variable name, or a list of variable names suppliedusing |
where | Call. Filter logic used to subset the target data whenperforming a summary. |
... | Additional arguments to pass forward |
Details
- Count Layers
Count layers allow you to createsummaries based on counting values with a variable. Additionally, thislayer allows you to create n (%) summaries where you're also summarizingthe proportion of instances a value occurs compared to some denominator.Count layers are also capable of producing counts of nested relationships.For example, if you want to produce counts of an overall outside group, andthen the subgroup counts within that group, you can specify the targetvariable as vars(OutsideVariable, InsideVariable). This allows you to dotables like Adverse Events where you want to see the Preferred Terms withinBody Systems, all in one layer. Further control over denominators isavailable using the function
set_denoms_byand distinctcounts can be set usingset_distinct_by- DescriptiveStatistics Layers
Descriptive statistics layers perform summaries oncontinuous variables. There are a number of summaries built into Tplyralready that you can perform, including n, mean, median, standarddeviation, variance, min, max, inter-quartile range, Q1, Q3, and missingvalue counts. From these available summaries, the default presentation of adescriptive statistic layer will output 'n', 'Mean (SD)', 'Median', 'Q1, Q3','Min, Max', and 'Missing'. You can change these summaries using
set_format_strings, and you can also add your own summariesusingset_custom_summaries. This allows you to implement anyadditional summary statistics you want presented.- Shift Layers
Ashift layer displays an endpoint's 'shift' throughout the duration of thestudy. It is an abstraction over the count layer, however we have providedan interface that is more efficient and intuitive. Targets are passed asnamed symbols using
dplyr::vars. Generally the baseline is passedwith the name 'row' and the shift is passed with the name 'column'. Bothcounts (n) and percentages (pct) are supported and can be specified withtheset_format_stringsfunction. To allow for flexibilitywhen defining percentages, you can define the denominator using theset_denoms_byfunction. This function takes variable names anduses those to determine the denominator for the counts.
Value
Antplyr_layer environment that is a child of the specifiedparent. The environment contains the object as listed below.
Atplyr_layer object
See Also
[add_layer,add_layers,tplyr_table,tplyr_layer]
Examples
# Load in pipelibrary(magrittr)t <- tplyr_table(iris, Species) %>% add_layer( group_desc(target_var=Sepal.Width) )t <- tplyr_table(iris, Species) %>% add_layer( group_desc(target_var=Sepal.Width) )t <- tplyr_table(mtcars, am) %>% add_layer( group_shift(vars(row=gear, column=carb), by=cyl) )Return or set header_n binding
Description
The 'header_n()' functions can be used to automatically pull the header_nderivations from the table or change them for future use.
Usage
header_n(table)header_n(x) <- valueset_header_n(table, value)Arguments
table | A |
x | A |
value | A data.frame with columns with the treatment variable, columnvariabes, and a variable with counts named 'n'. |
header_n | A data.frame with columns with the treatment variable, columnvariabes, and a variable with counts named 'n'. |
Details
The 'header_n' object is created by Tplyr when a table is built and intendedto be used by the 'add_column_headers()' function when displaying table levelpopulation totals. These methods are intended to be used for calling thepopulation totals calculated by Tplyr, and to overwrite them if a userchooses to.
If you have a need to change the header Ns that appear in your table headers,say you know you are working with a subset of the data that doesn't representthe totals, you can replace the data used with 'set_header_n()'.
Value
Fortplyr_header_n the header_n binding of thetplyr_table object. Fortplyr_header_n<- andset_tplyr_header_n the modified object.
Examples
tab <- tplyr_table(mtcars, gear)header_n(tab) <- data.frame( gear = c(3, 4, 5), n = c(10, 15, 45))Select levels to keep in a count layer
Description
In certain cases you only want a layer to include certain values of a factor.The 'keep_levels()' function allows you to pass character values to beincluded in the layer. The others are ignored.**NOTE: Denominator calculation is unaffected by this function, see theexamples on how to include this logic in your percentages'**
Usage
keep_levels(e, ...)Arguments
e | A |
... | Character values to count int he layer |
Value
The modified Tplyr layer object
Examples
library(dplyr)mtcars <- mtcars %>% mutate_all(as.character)t <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% keep_levels("4", "8") %>% set_denom_where(cyl %in% c("4", "8")) ) %>% build()Create, view, extract, remove, and use Tplyr layer templates
Description
There are several scenarios where a layer template may be useful. Sometables, like demographics tables, may have many layers that will allessentially look the same. Categorical variables will have the same countlayer settings, and continuous variables will have the same desc layersettings. A template allows a user to build those settings once per layer,then reference the template when the Tplyr table is actually built.
Usage
new_layer_template(name, template)remove_layer_template(name)get_layer_template(name)get_layer_templates()use_template(name, ..., add_params = NULL)Arguments
name | Template name |
template | Template layer syntax, starting with a layer constructor |
... | Arguments passed directly into a layer constructor, matching thetarget, by, and where parameters. |
add_params | Additional parameters passed into layer modifier functions.These arguments are specified in a template within curly brackets such as{param}. Supply as a named list, where the element name is the parameter. |
Details
This suite of functions allows a user to create and use layer templates.Layer templates allow a user to pre-build and reuse an entire layerconfiguration, from the layer constructor down to all modifying functions.Furthermore, users can specify parameters they may want to beinterchangeable. Additionally, layer templates are extensible, so a templatecan be use and then further extended with additional layer modifyingfunctions.
Layers are created usingnew_layer_template(). To use a layer, use thefunctionuse_template() in place ofgroup_count|desc|shift(). If you wantto view a specific template, useget_layer_template(). If you want to viewall templates, useget_layer_templates(). And to remove a layer template useremove_layer_template(). Layer templates themselves are stored in theoptiontplyr.layer_templates, but a user should not access this directlyand instead use the Tplyr supplied functions.
When providing the template layer syntax, the layer must start with a layerconstructor. These are one of the functiongroup_count(),group_desc(),orgroup_shift(). Instead of passing arguments into these function,templates are specified using an ellipsis in the constructor, i.e.group_count(...). This is required, as after the template is built a usersupplies these arguments viause_template()
use_template() takes thegroup_count|desc|shift() arguments by default.If a user specified additional arguments in the template, these are providedin a list throught the argumentadd_params. Provide these arguments exactlyas you would in a normal layer. When creating the template, these parameterscan be specified by using curly brackets. See the examples for details.
Examples
op <- options()new_layer_template( "example_template", group_count(...) %>% set_format_strings(f_str('xx (xx%)', n, pct)))get_layer_templates()get_layer_template("example_template")tplyr_table(mtcars, vs) %>% add_layer( use_template("example_template", gear) ) %>% build()remove_layer_template("example_template")new_layer_template( "example_template", group_count(...) %>% set_format_strings(f_str('xx (xx%)', n, pct)) %>% set_order_count_method({sort_meth}) %>% set_ordering_cols({sort_cols}))get_layer_template("example_template")tplyr_table(mtcars, vs) %>% add_layer( use_template("example_template", gear, add_params = list( sort_meth = "bycount", sort_cols = `1` )) ) %>% build()remove_layer_template("example_template")options(op)Return or set population data bindings
Description
The population data is used to gather information that may not be availablefrom the target dataset. For example, missing treatment groups, population Ncounts, and proper N counts for denominators will be provided through thepopulation dataset. The population dataset defaults to the target datasetunless otherwise specified usingset_pop_data.
Usage
pop_data(table)pop_data(x) <- valueset_pop_data(table, pop_data)Arguments
table | A |
x | A |
value | A data.frame with population level information |
pop_data | A data.frame with population level information |
Value
Fortplyr_pop_data the pop_data binding of thetplyr_table object. Fortplyr_pop_data<- nothing is returned,the pop_data binding is set silently. Forset_tplyr_pop_data themodified object.
Examples
tab <- tplyr_table(iris, Species)pop_data(tab) <- mtcarstab <- tplyr_table(iris, Species) %>% set_pop_data(mtcars)Return or set pop_treat_var binding
Description
The treatment variable used in the target data may be different than thevariable within the population dataset.set_pop_treat_var allows youto change this.
Usage
pop_treat_var(table)set_pop_treat_var(table, pop_treat_var)Arguments
table | A |
pop_treat_var | Variable containing treatment group assignments within the |
Value
Fortplyr_pop_treat_var the pop_treat_var binding of thetplyr_table object. Forset_tplyr_pop_treat_var the modifiedobject.
Examples
tab <- tplyr_table(iris, Species)pop_data(tab) <- mtcarsset_pop_treat_var(tab, mpg)Process layers to get formatted and pivoted tables.
Description
This is an internal method, but is exported to support S3 dispatch. Not intended for direct use by a user.
Usage
process_formatting(x, ...)Arguments
x | A tplyr_layer object |
... | arguments passed to dispatch |
Value
The formatted_table object that is bound to the layer
Process layers to get metadata tables
Description
This is an internal method, but is exported to support S3 dispatch. Not intended for direct use by a user.
Usage
process_metadata(x, ...)Arguments
x | A tplyr_layer object |
... | arguments passed to dispatch |
Value
The formatted_meta object that is bound to the layer
Process a tplyr_statistic object
Description
This is an internal function that is not meant for use externally, but must be exported.Use with caution.
Usage
process_statistic_data(x, ...)Arguments
x | A tplyr_statistic environment |
... | Additional pass through parameters |
Value
Numeric statistc data from a tplyr statistc
Process string formatting on a tplyr_statistic object
Description
This is an internal function that is not meant for use externally, but must be exported.Use with caution.
Usage
process_statistic_formatting(x, ...)Arguments
x | A tplyr_statistic environment |
... | Additional pass through parameters |
Value
Formatted tplyr_statistic data
Process layers to get numeric results of layer
Description
This is an internal method, but is exported to support S3 dispatch. Not intended for direct use by a user.
Usage
process_summaries(x, ...)Arguments
x | a tplyr_layer object |
... | arguments passed to dispatch |
Value
The tplyr_layer object with a 'built_table' binding
Reformat strings with leading whitespace for HTML
Description
Reformat strings with leading whitespace for HTML
Usage
replace_leading_whitespace(x, tab_width = 4)Arguments
x | Target string |
tab_width | Number of spaces to compensate for tabs |
Value
String with replaced for leading whitespace
Examples
x <- c(" Hello there", " Goodbye Friend ", "\tNice to meet you"," \t What are you up to? \t \t ")replace_leading_whitespace(x)replace_leading_whitespace(x, tab=2)Set custom summaries to be performed within a descriptive statistics layer
Description
This function allows a user to define custom summaries to be performed in acall todplyr::summarize(). A custom summary by the same name as adefault summary will override the default. This allows the user to overridethe default behavior of summaries built into 'Tplyr', while also adding newdesired summary functions.
Usage
set_custom_summaries(e, ...)Arguments
e |
|
... | Named parameters containing syntax to be used in a call to |
Details
When programming the logic of the summary function, use the variable name.var to within your summary functions. This allows you apply thesummary function to each variable when multiple target variables aredeclared.
An important, yet not immediately obvious, part of usingset_custom_summaries is to understand the link between the namedparameters you set inset_custom_summaries and the names called inf_str objects withinset_format_strings. Inf_str, after you supply the string format you'd like yournumbers to take, you specify the summaries that fill those strings.
When you go to set your format strings, the name you use to declare a summaryinset_custom_summaries is the same name that you use in yourf_str call. This is necessary becauseset_format_strings needs some means of putting two summaries inthe same value, and setting a row label for the summary being performed.
Review the examples to see this put into practice. Note the relationshipbetween the name created inset_custom_summaries and the name used inset_format_strings within thef_str call
Value
Binds a variablecustom_summaries to the specified layer
Examples
#Load in pipelibrary(magrittr)tplyr_table(iris, Species) %>% add_layer( group_desc(Sepal.Length, by = "Sepal Length") %>% set_custom_summaries( geometric_mean = exp(sum(log(.var[.var > 0]), na.rm=TRUE) / length(.var)) ) %>% set_format_strings( 'Geometric Mean' = f_str('xx.xx', geometric_mean) ) ) %>% build()Set values the denominator calculation will ignore
Description
'r lifecycle::badge("defunct")'
This is generally used for missing values. Values like "", NA, "NA" arecommon ways missing values are presented in a data frame. In certain cases,percentages do not use "missing" values in the denominator. This functionnotes different values as "missing" and excludes them from the denominators.
Usage
set_denom_ignore(e, ...)Arguments
e | A |
... | Values to exclude from the percentage calculation. If you use'set_missing_counts()' this should be the name of the parameters instead ofthe values, see the example below. |
Value
The modified layer object
Examples
library(magrittr)mtcars2 <- mtcarsmtcars2[mtcars$cyl == 6, "cyl"] <- NAmtcars2[mtcars$cyl == 8, "cyl"] <- "Not Found"tplyr_table(mtcars2, gear) %>% add_layer( group_count(cyl) %>% set_missing_count(f_str("xx ", n), Missing = c(NA, "Not Found")) # This function is currently deprecated. It was replaced with an # argument in set_missing_count # set_denom_ignore("Missing") ) %>% build()Set Logic for denominator subsetting
Description
By default, denominators in count layers are subset based on the layer levelwhere logic. In some cases this might not be correct. This functions allowsthe user to override this behavior and pass custom logic that will be used tosubset the target dataset when calculating denominators for the layer.
Usage
set_denom_where(e, denom_where)Arguments
e | A |
denom_where | An expression (i.e. syntax) to be used to subset thetarget dataset for calculating layer denominators. Supply as programminglogic (i.e. x < 5 & y == 10). To remove the layer where parametersubsetting for the total row and thus the percentage denominators,pass 'TRUE' to this function. |
Value
The modified Tplyr layer object
Examples
library(magrittr)t10 <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl, where = cyl != 6) %>% set_denom_where(TRUE) # The denominators will be based on all of the values, including 6 ) %>% build()Set variables used in pct denominator calculation
Description
This function is used when calculating pct in count or shift layers. Thepercentages default to the treatment variable and any column variables butcan be calculated on any variables passed to target_var, treat_var, by, orcols.
Usage
set_denoms_by(e, ...)Arguments
e | A count/shift layer object |
... | Unquoted variable names |
Value
The modified layer object
Examples
library(magrittr)# Default has matrix of treatment group, additional columns,# and by variables sum to 1tplyr_table(mtcars, am) %>% add_layer( group_shift(vars(row=gear, column=carb), by=cyl) %>% set_format_strings(f_str("xxx (xx.xx%)", n, pct)) ) %>% build()tplyr_table(mtcars, am) %>% add_layer( group_shift(vars(row=gear, column=carb), by=cyl) %>% set_format_strings(f_str("xxx (xx.xx%)", n, pct)) %>% set_denoms_by(cyl, gear) # Row % sums to 1 ) %>% build()tplyr_table(mtcars, am) %>% add_layer( group_shift(vars(row=gear, column=carb), by=cyl) %>% set_format_strings(f_str("xxx (xx.xx%)", n, pct)) %>% set_denoms_by(cyl, gear, am) # % within treatment group sums to 1 ) %>% build()Set counts to be distinct by some grouping variable.
Description
In some situations, count summaries may want to see distinct counts by avariable like subject. For example, the number of subjects in a populationwho had a particular adverse event.set_distinct_by allows you to setthe by variables used to determine a distinct count.
Usage
set_distinct_by(e, distinct_by)Arguments
e | A |
distinct_by | Variable(s) to get the distinct data. |
Details
When adistinct_by value is set, distinct counts will be used bydefault. If you wish to combine distinct and not distinct counts, you canchoose which to display in yourf_str() objects usingn,pct,distinct_n, anddistinct_pct. Additionally, denominatorsmay be presented usingtotal anddistinct_total
Value
The layer object with
Examples
#Load in pipelibrary(magrittr)tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% set_distinct_by(carb) ) %>% build()Set the format strings and associated summaries to be performed in a layer
Description
'Tplyr' gives you extensive control over how strings are presented.set_format_strings allows you to apply these string formats to yourlayer. This behaves slightly differently between layers.
Usage
set_format_strings(e, ...)## S3 method for class 'desc_layer'set_format_strings(e, ..., cap = getOption("tplyr.precision_cap"))## S3 method for class 'count_layer'set_format_strings(e, ...)Arguments
e | Layer on which to bind format strings |
... | Named parameters containing calls to |
cap | A named character vector containing an 'int' element for the capon integer precision, and a 'dec' element for the cap on decimal precision. |
Details
Format strings are one of the most powerful components of 'Tplyr'.Traditionally, converting numeric values into strings for presentation canconsume a good deal of time. Values and decimals need to align betweenrows, rounding before trimming is sometimes forgotten - it can become atedious mess that, in the grand scheme of things, is not an important partof the analysis being performed. 'Tplyr' makes this process as simple as wecan, while still allowing flexibility to the user.
In a count layer, you can simply provide a singlef_strobject to specify how you want your n's, percentages, and denominators formatted.If you are additionally supplying a statistic, like risk difference usingadd_risk_diff, you specify the count formats using the name'n_counts'. The risk difference formats would then be specified using thename 'riskdiff'. In a descriptive statistic layer,set_format_strings allows you to do a couple more things:
By naming parameters with character strings, those character stringsbecome a row label in the resulting data frame
The actual summaries that are performed come from the variable namesused within the
f_strcallsUsing multiple summaries (declared by your
f_strcalls), multiple summary values can appear within the same line. Forexample, to present "Mean (SD)" like displays.Format strings in the desc layer also allow you to configure howempty values should be presented. In the
f_strcall, use theemptyparameter to specify how missing values should present. Asingle element character vector should be provided. If the vector isunnamed, that value will be used in the format string and fill the spacesimilar to how the numbers will display. Meaning - if your empty string is'NA' and your format string is 'xx (xxx)', the empty values will populateas 'NA ( NA)'. If you name the character vector in the 'empty' parameter'.overall', likeempty = c(.overall=''), then that exact string willfill the value instead. For example, providing 'NA' will instead create theformatted string as 'NA' exactly.
See thef_str documentation for more details about how thisimplementation works.
Value
The layer environment with the format string binding added
tplyr_layer object with formats attached
Returns the modified layer object.
Examples
# Load in pipelibrary(magrittr)# In a count layertplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% set_format_strings(f_str('xx (xx%)', n, pct)) ) %>% build()# In a descriptive statistics layertplyr_table(mtcars, gear) %>% add_layer( group_desc(mpg) %>% set_format_strings( "n" = f_str("xx", n), "Mean (SD)" = f_str("xx.x", mean, empty='NA'), "SD" = f_str("xx.xx", sd), "Median" = f_str("xx.x", median), "Q1, Q3" = f_str("xx, xx", q1, q3, empty=c(.overall='NA')), "Min, Max" = f_str("xx, xx", min, max), "Missing" = f_str("xx", missing) ) ) %>% build()# In a shift layertplyr_table(mtcars, am) %>% add_layer( group_shift(vars(row=gear, column=carb), by=cyl) %>% set_format_strings(f_str("xxx (xx.xx%)", n, pct)) ) %>% build()Set the option to prefix the row_labels in the inner count_layer
Description
When a count layer uses nesting (i.e. triggered byset_nest_count),theindentation argument's value will be used as a prefix for the inner layer'srecords
Usage
set_indentation(e, indentation)Arguments
e | A |
indentation | A character to prefix the row labels in an innercount layer |
Value
The modified count_layer environment
Set variables to limit reported data values only to those that exist ratherthan fully completing all possible levels
Description
This function allows you to select a combination of by variables orpotentially target variables for which you only want to display valuespresent in the data. By default, Tplyr will create a cartesian combination ofpotential values of the data. For example, if you have 2 by variablespresent, then each potential combination of those by variables will have arow present in the final table.set_limit_data_by() allows you to choosethe by variables whose combination you wish to limit to values physicallypresent in the available data.
Usage
set_limit_data_by(e, ...)Arguments
e | A tplyr_layer |
... | Subset of variables within by or target variables |
Value
a tplyr_table
Examples
tplyr_table(tplyr_adpe, TRT01A) %>% add_layer( group_desc(AVAL, by = vars(PECAT, PARAM, AVISIT)) ) %>% build()tplyr_table(tplyr_adpe, TRT01A) %>% add_layer( group_desc(AVAL, by = vars(PECAT, PARAM, AVISIT)) %>% set_limit_data_by(PARAM, AVISIT) ) %>% build()tplyr_table(tplyr_adpe, TRT01A) %>% add_layer( group_count(AVALC, by = vars(PECAT, PARAM, AVISIT)) %>% set_limit_data_by(PARAM, AVISIT) ) %>% build()tplyr_table(tplyr_adpe, TRT01A) %>% add_layer( group_count(AVALC, by = vars(PECAT, PARAM, AVISIT)) %>% set_limit_data_by(PECAT, PARAM, AVISIT) ) %>% build()Set the display for missing strings
Description
Controls how missing counts are handled and displayed in the layer
Usage
set_missing_count(e, fmt = NULL, sort_value = NULL, denom_ignore = FALSE, ...)Arguments
e | A |
fmt | An f_str object to change the display of the missing counts |
sort_value | A numeric value that will be used in the ordering column.This should be numeric. If it is not supplied the ordering column will bethe maximum value of what appears in the table plus one. |
denom_ignore | A boolean. Specifies Whether or not to include themissing counts specified within the ... parameter within denominators. Ifset to TRUE, the values specified within ... will be ignored. |
... | Parameters used to note which values to describe as missing.Generally NA and "Missing" would be used here. Parameters can be namedcharacter vectors where the names become the row label. |
Value
The modified layer
Examples
library(magrittr)library(dplyr) mtcars2 <- mtcars %>%mutate_all(as.character)mtcars2[mtcars$cyl == 6, "cyl"] <- NAtplyr_table(mtcars2, gear) %>% add_layer( group_count(cyl) %>% set_missing_count(f_str("xx ", n), Missing = NA) ) %>% build()Set the label for the missing subjects row
Description
Set the label for the missing subjects row
Usage
set_missing_subjects_row_label(e, missing_subjects_row_label)Arguments
e | A |
missing_subjects_row_label | A character to label the total row |
Value
The modifiedcount_layer object
Examples
t <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% add_missing_subjects_row() %>% set_missing_subjects_row_label("Missing") )build(t)Set the option to nest count layers
Description
If set to TRUE, the second variable specified intarget_varwill be nested inside of the first variable. This allows you to createdisplays like those commonly used in adverse event tables, whereone column holds both the labels of the outer categorical variableand the inside event variable (i.e. AEBODSYS and AEDECOD).
Usage
set_nest_count(e, nest_count)Arguments
e | A |
nest_count | A logical value to set the nest option |
Value
The modified layer
Set a numeric cutoff
Description
In certain tables, it may be necessary to only include rows that meet numericconditions. Rows that are less than a certain cutoff can be suppressed fromthe output. This function allows you to pass a cutoff, a cutoff stat(n,distinct_n, pct, or distinct_pct) to supress values that are lesser than thecutoff.
Usage
set_numeric_threshold(e, numeric_cutoff, stat, column = NULL)Arguments
e | A |
numeric_cutoff | A numeric value where only values greater than or equalto will be displayed. |
stat | The statistic to use when filtering out rows. Either 'n','distinct_n', or 'pct' are allowable |
column | If only a particular column should be used to cutoff values, itcan be supplied here as a character value. |
Value
The modified Tplyr layer object
Examples
mtcars %>%tplyr_table(gear) %>% add_layer( group_count(cyl) %>% set_numeric_threshold(10, "n") %>% add_total_row() %>% set_order_count_method("bycount") )Set the ordering logic for the count layer
Description
The sorting of a table can greatly vary depending on thesituation at hand. For count layers, when creating tables like adverseevent summaries, you may wish to order the table by descending occurrencewithin a particular treatment group. But in other situations, such as AEsof special interest, or subject disposition, there may be a specific orderyou wish to display values. Tplyr offers solutions to each of thesesituations.
Instead of allowing you to specify a custom sort order, Tplyr insteadprovides you with order variables that can be used to sort your table afterthe data are summarized. Tplyr has a default order in which the table willbe returned, but the order variables will always persist. This allows youto use powerful sorting functions likearrangeto get your desired order, and in double programming situations, helps yourvalidator understand the how you achieved a particular sort order and wherediscrepancies may be coming from.
When creating order variables for a layer, for each 'by' variable Tplyrwill search for a <VAR>N version of that variable (i.e. VISIT <-> VISITN,PARAM <-> PARAMN). If available, this variable will be used for sorting. Ifnot available, Tplyr will created a new ordered factor version of thatvariable to use in alphanumeric sorting. This allows the user to control acustom sorting order by leaving an existing <VAR>N variable in your datasetif it exists, or create one based on the order in which you wish to sort -no custom functions in Tplyr required.
Ordering of results is where things start to differ. Different situationscall for different methods. Descriptive statistics layers keep it simple -the order in which you input your formats usingset_format_strings is the order in which the results willappear (with an order variable added). For count layers, Tplyr offers threesolutions: If there is a <VAR>N version of your target variable, use that.If not, if the target variable is a factor, use the factor orders. Finally,you can use a specific data point from your results columns. The resultcolumn can often have multiple data points, between the n counts, percent,distinct n, and distinct percent. Tplyr allows you to choose which of thesevalues will be used when creating the order columns for a specified resultcolumn (i.e. based on thetreat_var andcols arguments). Seethe 'Sorting a Table' section for more information.
Shift layers sort very similarly to count layers, but to order your rowshift variable, use an ordered factor.
Usage
set_order_count_method(e, order_count_method, break_ties = NULL)set_ordering_cols(e, ...)set_result_order_var(e, result_order_var)Arguments
e | A |
order_count_method | The logic determining how the rows in the finallayer output will be indexed. Options are 'bycount', 'byfactor', and'byvarn'. |
break_ties | In certain cases, a 'bycount' sort will result in conflictsif the counts aren't unique. break_ties will add a decimal to the sortingcolumn so resolve conflicts. A character value of 'asc' will add a decimalbased on the alphabetical sorting. 'desc' will do the samebut sort descending in case that is the intention. |
... | Unquoted variables used to select the columns whose values will beextracted for ordering. |
result_order_var | The numeric value the ordering will be done on.This can be either n, distinct_n, pct, or distinct_pct. Due to theevaluation of the layer you can add a value that isn't actually beingevaluated, if this happens this will only error out in the ordering. |
Value
Returns the modified layer object. The 'ord_' columns are addedduring the build process.
Sorting a Table
When a table is built, the output has severalordering(ord_) columns that are appended. The first represents the layerindex. The index is determined by the order the layer was added to thetable. Following are the indices for the by variables and the targetvariable. The by variables are ordered based on:
The 'by' variable is a factor in the target dataset
If the variable isn't a factor, but has a <VAR>N variable (i.e. VISIT-> VISITN, TRT -> TRTN)
If the variable is not a factor in the target dataset, it is coercedto one and ordered alphabetically.
The target variable is ordered depending on the type of layer. See morebelow.
Ordering a Count Layer
There are many ways to order a count layerdepending on the preferences of the table programmer.Tplyr supportssorting by a descending amount in a column in the table, sorting by a<VAR>N variable, and sorting by a custom order. These can be set using the'set_order_count_method' function.
- Sorting by a numericcount
A selected numeric value from a selected column will be indexedbased on the descending numeric value. The numeric value extracted defaultsto 'n' but can be changed with 'set_result_order_var'. The column selectedfor sorting defaults to the first value in the treatment group variable. Ifthere were arguments passed to the 'cols' argument in the table those mustbe specified with 'set_ordering_columns'.
- Sorting by a 'varn'variable
If the treatment variable has a <VAR>N variable. It can beindexed to that variable.
- Sorting by a factor(Default)
If a factoris found for the target variable in the target dataset that is used toorder, if no factor is found it is coerced to a factor and sortedalphabetically.
- Sorting a nested count layer
If two variables aretargeted by a count layer, two methods can be passed to 'set_order_count'.If two are passed, the first is used to sort the blocks, the second is usedto sort the "inside" of the blocks. If one method is passed, that will beused to sort both.
Ordering a Desc Layer
The order of a desc layer is mostly setduring the object construction. The by variables are resolved and indexwith the same logic as the count layers. The target variable is orderedbased on the format strings that were used when the layer was created.
Examples
library(dplyr)# Default sorting by factort <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) )build(t)# Sorting by <VAR>Nmtcars$cylN <- mtcars$cylt <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% set_order_count_method("byvarn") )# Sorting by row countt <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% set_order_count_method("bycount") %>% # Orders based on the 6 gear group set_ordering_cols(6) )# Sorting by row count by percentagest <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% set_order_count_method("bycount") %>% set_result_order_var(pct) )# Sorting when you have column arguments in the tablet <- tplyr_table(mtcars, gear, cols = vs) %>% add_layer( group_count(cyl) %>% # Uses the fourth gear group and the 0 vs group in ordering set_ordering_cols(4, 0) )# Using a custom factor to ordermtcars$cyl <- factor(mtcars$cyl, c(6, 4, 8))t <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% # This is the default but can be used to change the setting if it is #set at the table level. set_order_count_method("byfactor") )Set the value of a outer nested count layer to Inf or -Inf
Description
Set the value of a outer nested count layer to Inf or -Inf
Usage
set_outer_sort_position(e, outer_sort_position)Arguments
e | A |
outer_sort_position | Either 'asc' or 'desc'. If desc the final ordering helperwill be set to Inf, if 'asc' the ordering helper is set to -Inf. |
Value
The modified count layer.
Set precision data
Description
In some cases, there may be organizational standards surrounding decimal precision.For example, there may be a specific standard around the representation of precision relatingto lab results. As such,set_precision_data() provides an interface to provide integer anddecimal precision from an external data source.
Usage
set_precision_data(layer, prec, default = c("error", "auto"))Arguments
layer | A |
prec | A dataframe following the structure specified in the function details |
default | Handling of unspecified by variable groupings. Defaults to 'error'. Set to 'auto' to automatically infer any missing groups. |
Details
The ultimate behavior of this feature is just that of the existing auto precision method, exceptthat the precision is specified in the provided precision dataset rather than inferred from the source data.At a minimum, the precision dataset must contain the integer variablesmax_int andmax_dec. If by variablesare provided, those variables must be available in the layer by variables.
When the table is built, by default Tplyr will error if the precision dataset is missing by variable groupingsthat exist in the target dataset. This can be overriden using thedefault parameter. Ifdefault is set to"auto", any missing values will be automatically inferred from the source data.
Examples
prec <- tibble::tribble( ~vs, ~max_int, ~max_dec, 0, 1, 1, 1, 2, 2)tplyr_table(mtcars, gear) %>% add_layer( group_desc(wt, by = vs) %>% set_format_strings( 'Mean (SD)' = f_str('a.a+1 (a.a+2)', mean, sd) ) %>% set_precision_data(prec) %>% set_precision_on(wt) ) %>% build()Set descriptive statistics as columns
Description
In many cases, treatment groups are represented as columns within a table.But some tables call for a transposed presentation, where the treatmentgroups displayed by row, and the descriptive statistics are represented ascolumns.set_stats_as_columns() allows Tplyr to output a built tableusing this transposed format and deviate away from the standardrepresentation of treatment groups as columns.
Usage
set_stats_as_columns(e, stats_as_columns = TRUE)Arguments
e |
|
stats_as_columns | Boolean to set stats as columns |
Details
This function leaves all specified by variables intact. The only switch thathappens during the build process is that the provided descriptive statisticsare transposed as columns and the treatment variable is left as rows. Columnvariables will remain represented as columns, and multiple target variableswill also be respected properly.
Value
The input tplyr_layer
Examples
dat <- tplyr_table(mtcars, gear) %>% add_layer( group_desc(wt, by = vs) %>% set_format_strings( "n" = f_str("xx", n), "sd" = f_str("xx.x", sd, empty = c(.overall = "BLAH")), "Median" = f_str("xx.x", median), "Q1, Q3" = f_str("xx, xx", q1, q3), "Min, Max" = f_str("xx, xx", min, max), "Missing" = f_str("xx", missing) ) %>% set_stats_as_columns() ) %>% build()Set the label for the total row
Description
The row label for a total row defaults to "Total", however this can beoverriden using this function.
Usage
set_total_row_label(e, total_row_label)Arguments
e | A |
total_row_label | A character to label the total row |
Value
The modifiedcount_layer object
Examples
# Load in pipelibrary(magrittr)t <- tplyr_table(mtcars, gear) %>% add_layer( group_count(cyl) %>% add_total_row() %>% set_total_row_label("Total Cyl") )build(t)Extract format group strings or numbers
Description
These functions allow you to extract segments of information from within aresult string by targetting specific format groups.str_extract_fmt_group()allows you to pull out the individual format group string, whilestr_extract_num() allows you to pull out that specific numeric result.
Usage
str_extract_fmt_group(string, format_group)str_extract_num(string, format_group)Arguments
string | A string of number results from which to extract format groups |
format_group | An integer representing format group that should beextracted |
Details
Format groups refer to individual segments of a string. For example, giventhe string ' 5 (34.4%) [9]', there are three separate format groups, whichare ' 5', '(34.4%)', and '[9]'.
Value
A character vector
Examples
string <- c(" 0 (0.0%)", " 8 (9.3%)", "78 (90.7%)")str_extract_fmt_group(string, 2)str_extract_num(string, 2)Wrap strings to a specific width with hyphenation while preservingindentation
Description
str_indent_wrap() leveragesstringr::str_wrap() under the hood, but takessome extra steps to preserve any indentation that has been applied to acharacter element, and use hyphenated wrapping of single words that runlonger than the allotted wrapping width.
Usage
str_indent_wrap(x, width = 10, tab_width = 5)Arguments
x | An input character vector |
width | The desired width of elements within the output character vector |
tab_width | The number of spaces to which tabs should be converted |
Details
The functionstringr::str_wrap() is highly efficient, but in thecontext of table creation there are two select features missing - hyphenationfor long running strings that overflow width, and respect for pre-indentationof a character element. For example, in an adverse event table, you may havebody system rows as an un-indented column, and preferred terms as indentedcolumns. These strings may run long and require wrapping to not surpass thecolumn width. Furthermore, for crowded tables a single word may be longerthan the column width itself.
This function takes steps to resolve these two issues, while trying tominimize additional overhead required to apply the wrapping of strings.
Note: This function automatically converts tabs to spaces. Tab width variesdepending on font, so width cannot automatically be determined within a dataframe. As such, users can specify the width
Value
A character vector with string wrapping applied
Examples
ex_text1 <- c("RENAL AND URINARY DISORDERS", " NEPHROLITHIASIS")ex_text2 <- c("RENAL AND URINARY DISORDERS", "\tNEPHROLITHIASIS")cat(paste(str_indent_wrap(ex_text1, width=8), collapse="\n\n"),"\n")cat(paste(str_indent_wrap(ex_text2, tab_width=4), collapse="\n\n"),"\n")ADAE Data
Description
A subset of the PHUSE Test Data Factory ADAE data set.
Usage
tplyr_adaeFormat
A data.frame with 276 rows and 55 columns.
Source
https://github.com/phuse-org/TestDataFactory
See Also
[get_data_labels()]
ADAS Data
Description
A subset of the PHUSE Test Data Factory ADAS data set.
Usage
tplyr_adasFormat
A data.frame with 1,040 rows and 40 columns.
Source
https://github.com/phuse-org/TestDataFactory
See Also
[get_data_labels()]
ADLB Data
Description
A subset of the PHUSE Test Data Factory ADLB data set.
Usage
tplyr_adlbFormat
A data.frame with 311 rows and 46 columns.
Source
https://github.com/phuse-org/TestDataFactory
See Also
[get_data_labels()]
ADPE Data
Description
A mock-up dataset that is fit for testing data limiting
Usage
tplyr_adpeFormat
A data.frame with 21 rows and 8 columns.
ADSL Data
Description
A subset of the PHUSE Test Data Factory ADSL data set.
Usage
tplyr_adslFormat
A data.frame with 254 rows and 49 columns.
Source
https://github.com/phuse-org/TestDataFactory
See Also
[get_data_labels()]
Create atplyr_layer object
Description
This object is the workhorse of thetplyr package. Atplyr_layer can be thought of as a block, or "layer" of a table.Summary tables typically consist of different sections that require differentsummaries. When programming these section, your code will create differentlayers that need to be stacked or merged together. Atplyr_layer isthe container for those isolated building blocks.
When building thetplyr_table, each layer will execute independently.When all of the data processing has completed, the layers are broughttogether to construct the output.
tplyr_layer objects are not created directly, but are rather createdusing the layer constructor functionsgroup_count,group_desc, andgroup_shift.
Usage
tplyr_layer(parent, target_var, by, where, type, ...)Arguments
parent |
|
target_var | Symbol. Required, The variable name on which the summary isto be performed. Must be a variable within the target dataset. Enterunquoted - i.e. target_var = AEBODSYS. |
by | A string, a variable name, or a list of variable names suppliedusing |
where | Call. Filter logic used to subset the target data whenperforming a summary. |
type | "count", "desc", or "shift". Required. The category of layer -either "counts" for categorical counts, "desc" for descriptive statistics,or "shift" for shift table counts |
... | Additional arguments |
Value
Atplyr_layer environment that is a child of the specifiedparent. The environment contains the object as listed below.
tplyr_layer Core Object Structure
typeThis is an attribute. A string indicating the layertype, which controls the summary that will be performed.
target_varA quosure of a name, which is the variable onwhich a summary will be performed.
byA list of quosuresrepresenting either text labels or variable names used in grouping.Variable names must exist within the target dataset Text strings submitteddo not need to exist in the target dataset.
colsA list ofquosures used to determine the variables that are used to display incolumns.
whereA quosure of a call that containers thefilter logic used to subset the target dataset. This filtering is inaddition to any subsetting done based on
wherecriteria specified intplyr_tablelayersA list with class
tplyr_layer_container. Initialized as empty, but serves as thecontainer for any sublayers of the current layer. Used internally.
Different layer types will have some different bindings specific to thatlayer's needs.
See Also
Examples
tab <- tplyr_table(iris, Sepal.Width)l <- group_count(tab, by=vars('Label Text', Species), target_var=Species, where= Sepal.Width < 5.5, cols = Species)Tplyr Metadata Object
Description
If a Tplyr table is built with the 'metadata=TRUE' option specified, thenmetadata is assembled behind the scenes to provide traceability on eachresult cell derived. The functions 'get_meta_result()' and'get_meta_subset()' allow you to access that metadata by using an ID providedin the row_id column and the column name of the result you'd like to access.The purpose is of the row_id variable instead of a simple row index is toprovide a sort resistant reference of the originating column, so the outputTplyr table can be sorted in any order but the metadata are still easilyaccessible.
Usage
tplyr_meta(names = list(), filters = exprs())Arguments
names | List of symbols |
filters | List of expressions |
Details
The 'tplyr_meta' object provided a list with two elements - names andfilters. The names contain every column from the target data.frame of theTplyr table that factored into the specified result cell, and the filterscontains all the necessary filters to subset the target data to create thespecified result cell. 'get_meta_subset()' additionally provides a parameter tospecify any additional columns you would like to include in the returnedsubset data frame.
Value
tplyr_meta object
Examples
tplyr_meta( names = rlang::quos(x, y, z), filters = rlang::quos(x == 1, y==2, z==3) )Create a Tplyr table object
Description
Thetplyr_table object is the main container upon which a Tplyr table is constructed. Tplyr tables are made up ofone or more layers. Each layer contains an instruction for a summary to be performed. Thetplyr_table object containsthose layers, and the general data, metadata, and logic necessary.
Usage
tplyr_table(target, treat_var, where = TRUE, cols = vars())Arguments
target | Dataset upon which summaries will be performed |
treat_var | Variable containing treatment group assignments. Supply unquoted. |
where | A general subset to be applied to all layers. Supply as programming logic (i.e. x < 5 & y == 10) |
cols | A grouping variable to summarize data by column (in addition to treat_var). Provide multiplecolumn variables by using |
Details
When atplyr_table is created, it will contain the following bindings:
target - The dataset upon which summaries will be performed
pop_data - The data containing population information. This defaults to the target dataset
cols - A categorical variable to present summaries grouped by column (in addition to treat_var)
table_where - The
whereparameter provided, used to subset the target datatreat_var - Variable used to distinguish treatment groups.
header_n - Default header N values based on
treat_varpop_treat_var - The treatment variable for
pop_data(if different)layers - The container for individual layers of a
tplyr_tabletreat_grps - Additional treatment groups to be added to the summary (i.e. Total)
tplyr_table allows you a basic interface to instantiate the object. Modifier functions are available to changeindividual parameters catered to your analysis. For example, to add a total group, you can use theadd_total_group.
In future releases, we will provide vignettes to fully demonstrate these capabilities.
Value
Atplyr_table object
Examples
tab <- tplyr_table(iris, Species, where = Sepal.Length < 5.8)Return or set the treatment variable binding
Description
Return or set the treatment variable binding
Usage
treat_var(table)set_treat_var(table, treat_var)Arguments
table | A |
treat_var | Variable containing treatment group assignments. Supply unquoted. |
Value
Fortplyr_treat_var the treat_var binding of thetplyr_tableobject. Forset_tplyr_treat_var the modified object.
Examples
tab <- tplyr_table(mtcars, cyl)set_treat_var(tab, gear)