| Type: | Package |
| Title: | Functions to Support Data Management and Processing Using theMaelstrom Research Approach |
| Version: | 2.0.0 |
| Maintainer: | Guillaume Fabre <guijoseph.fabre@gmail.com> |
| Description: | Functions to support data cleaning, evaluation, and description, developed for integration with Maelstrom Research software tools. 'madshapR' provides functions primarily to evaluate and manipulate datasets and data dictionaries in preparation for data harmonization with the package 'Rmonize' and to facilitate integration and transfer between RStudio servers and secure Opal environments. 'madshapR' functions can be used independently but are optimized in conjunction with ‘Rmonize’ functions for streamlined and coherent harmonization processing. |
| License: | GPL-3 |
| LazyData: | true |
| Depends: | R (≥ 3.5) |
| Imports: | dplyr (≥ 1.1.0), rlang, stringr, crayon, ggplot2, grDevices,graphics, lubridate, janitor, forcats, knitr, haven, bookdown,stats, DT, readr, tidyr, fs, utils, fabR (≥ 2.1.1) |
| URL: | https://github.com/maelstrom-research/madshapR |
| BugReports: | https://github.com/maelstrom-research/madshapR/issues |
| RoxygenNote: | 7.2.3 |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| Language: | en-US |
| NeedsCompilation: | no |
| Packaged: | 2025-06-27 01:10:15 UTC; guill |
| Author: | Guillaume Fabre |
| Repository: | CRAN |
| Date/Publication: | 2025-06-27 07:50:05 UTC |
madshapR: Functions to Support Data Management and Processing Using the Maelstrom Research Approach
Description
Functions to support data cleaning, evaluation, and description, developed for integration with Maelstrom Research software tools. 'madshapR' provides functions primarily to evaluate and manipulate datasets and data dictionaries in preparation for data harmonization with the package 'Rmonize' and to facilitate integration and transfer between RStudio servers and secure Opal environments. 'madshapR' functions can be used independently but are optimized in conjunction with ‘Rmonize’ functions for streamlined and coherent harmonization processing.
Author(s)
Maintainer: Guillaume Fabreguijoseph.fabre@gmail.com (ORCID)
Authors:
Maelstrom Researchinfo@maelstrom-research.org [funder, copyright holder]
See Also
Useful links:
Report bugs athttps://github.com/maelstrom-research/madshapR/issues
Validate and coerce any object as a categorical variable.
Description
Converts a vector object to a categorical object, typically a column in adata frame.
Usage
as_category( x, labels = as.vector(c(na.omit(unique(x)))), na_values = NULL, as_factor = FALSE)Arguments
x | A vector object to be coerced to categorical. |
labels | An optional vector of the unique values (as character strings)that x might have taken. The default is the unique set of values taken byas.character(x), sorted into increasing order of x. Note that this set can bespecified as smaller than sort(unique(x)). |
na_values | An optional vector of the unique values (as character strings)among labels, for which the value is considered as missing. The defaultis NULL. Note that this set can be specified as smaller than labels. |
as_factor | Whether the output is a categorical variable (haven labelledobject) or is a factor (labels and na_values will be lost, but the order ofthe levels will be preserved). FALSE by default. |
Value
A vector with class haven_labelled.
See Also
Examples
{library(dplyr)##### Example 1: use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example` %>% mutate(prg_ever = as_category(prg_ever)) head(dataset$prg_ever)###### Example 2: any data frame can be a datasetcat_cyl <- as_category(mtcars[['cyl']])head(cat_cyl)}Validate and coerce any object as a data dictionary
Description
Checks if an object is a valid data dictionary and returns it with theappropriatemadshapR::class attribute. This function mainly helps validateinputs within other functions of the package but could be used to check ifan object is valid for use in a function. If either the columns 'typeof' or'class' already exists in 'Variables', or 'na_values', 'labels' in'Categories', the function will return the same data dictionary. Otherwise,These columns will be added, using 'valueType' in 'Variables', and, 'label'and 'missing' in 'Categories.
Usage
as_data_dict(object)Arguments
object | A potential data dictionary object to be coerced. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) withmadshapR::class 'data_dict'.
See Also
For a better assessment, please usedata_dict_evaluate().
Examples
{library(dplyr)# use madshapR_examples provided by the package###### Example 1 : use the function to apply the attribute "data_dict" to the # object. data_dict <- as_data_dict(madshapR_examples$`data_dictionary_example - as_data_dict`)glimpse(data_dict)###### Example 2 : use the function to shape the data dictionary formatted as# data_dict_mlstr to data_dict object. The function mainly converts valueType # column into corresponding typeof/class columns in 'Variables', and converts# missing column into "na_values" column. data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)data_dict <- as_data_dict(data_dict)glimpse(data_dict)}Validate and coerce any object as an Opal data dictionary format
Description
Validates the input object as a valid data dictionary compliant with formatsused in Maelstrom Research ecosystem, including Opal, and returns it withthe appropriatemadshapR::class attribute. This function mainly helpsvalidate input within other functions of the package but could be used tocheck if an object is valid for use in a function.
Usage
as_data_dict_mlstr(object, name_standard = FALSE)Arguments
object | A potential valid data dictionary to be coerced. |
name_standard | Whether the input data dictionary has variable namescompatible with Maelstrom Research ecosystem, including Opal)or not.FALSE by default. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
The object may be specifically formatted to be compatible with additionalMaelstrom Research software,in particularOpal environments.
Value
A list of data frame(s) withmadshapR::class 'data_dict_mlstr'.
See Also
For a better assessment, please usedata_dict_evaluate().
Examples
{library(dplyr)###### Example 1 : use the function to apply the attribute "data_dict" to the # object. data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)glimpse(data_dict)###### Example 2 : use the function to shape the data dictionary formatted as# data_dict_mlstr to data_dict object. The function mainly converts valueType # column into corresponding typeof/class columns in 'Variables', and converts# missing column into "na_values" column. data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example - as_data_dict`) glimpse(data_dict)}Validate and coerce any object as a workable data dictionary structure
Description
Validates the input object as a workable data dictionary structure andreturns it with the appropriatemadshapR::class attribute. This functionmainly helps validate input within other functions of the package but couldbe used to check if a data dictionary is valid for use in a function.
Usage
as_data_dict_shape(object)Arguments
object | A potential valid data dictionary to be coerced. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) withmadshapR::class 'data_dict_shape'.
See Also
For a better assessment, please usedata_dict_evaluate().
Examples
{library(dplyr)# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example`data_dict <- as_data_dict_shape(data_dict)glimpse(data_dict)}Validate and coerce any object as a dataset
Description
Checks if an object is a valid dataset and returns it with the appropriatemadshapR::class attribute. This function mainly helps validate inputswithin other functions of the package but could be used separately to checkif a dataset is valid.
Usage
as_dataset(object, col_id = NULL)Arguments
object | A potential dataset object to be coerced. |
col_id | An optional character string specifying the name(s) orposition(s) of the column(s) used as identifiers. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame withmadshapR::class 'dataset'.
Examples
{# use madshapR_examples provided by the packagelibrary(dplyr)library(fabR)###### Example 1: A dataset can have an id column specified as an attribute. dataset <- as_dataset(madshapR_examples$`dataset_example`, col_id = "part_id")print(attributes(dataset)$`madshapR::col_id`)glimpse(dataset)###### Example 2: Any data frame can be a dataset by definition.dataset <- tibble(iris %>% add_index("my_index"))dataset <- as_dataset(dataset, "my_index")print(attributes(dataset)$`madshapR::col_id`)}Validate and coerce any object as a dossier (list of dataset(s))
Description
Checks if an object is a valid dossier (list of datasets) and returns itwith the appropriatemadshapR::class attribute. This function mainly helpsvalidate inputs within other functions of the package but could be used tocheck if a dossier is valid.
Usage
as_dossier(object)Arguments
object | A potential dossier object to be coerced. |
Details
A dossier is a named list containing at least one data frame or more,each of them being datasets. The name of each tibble will be use as thereference name of the dataset.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A list of data frame(s) withmadshapR::class 'dossier'.
See Also
For a better assessment, please usedataset_evaluate().
Examples
{# use madshapR_examples provided by the packagelibrary(dplyr)library(stringr)###### Example 1: a dataset list is a dossier by definition.dossier <- as_dossier(madshapR_examples[str_detect(names(madshapR_examples),"^dataset_example")])glimpse(dossier) ###### Example 2: any list of data frame can be a dossier by # definition.dossier <- as_dossier(list(dataset_1 = iris, dataset_2 = mtcars))glimpse(dossier)}Validate and coerce any object as a taxonomy
Description
Confirms that the input object is a valid taxonomy and returns it as ataxonomy with the appropriatemadshapR::class attribute. This functionmainly helps validate input within other functions of the package but couldbe used to check if a taxonomy is valid.
Usage
as_taxonomy(object)Arguments
object | A potential taxonomy to be coerced. |
Details
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
Value
A list of data frame(s) withmadshapR::class 'taxonomy'.
See Also
Examples
{# use madshapR_examples provided by the packagetaxonomy <- as_taxonomy(madshapR_examples$`taxonomy_example`)head(taxonomy)}Validate and coerce any object according to a given valueType
Description
Attributes a valueType to an object, that can be a vector, or in a data frameusingdplyr::mutate.
Usage
as_valueType(x, valueType = "text")Arguments
x | Object to be coerced. Can be a vector. |
valueType | A character string of the valueType used to coerce x. |
Details
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
The object coerced accordingly to the input valueType.
See Also
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`as_valueType(head(dataset$dob),'date')# as_valueType is compatible with tidyverse syntaxlibrary(dplyr)dataset <- tibble(mtcars) %>% mutate(cyl = as_valueType(cyl,'integer'))head(dataset)}Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
- fabR
Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
- fabR
Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
Assess a data dictionary for potential issues in categories
Description
Generates a data frame report of any categorical variable name present in the'Categories' element but not present in 'Variables'. The data frame alsoreports any non-unique combinations of 'variable' and 'name' in the'Categories' element.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_data_dict_categories(data_dict)Arguments
data_dict | A list of data frame(s) representing metadata to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A data frame providing categorical variables that has issues within adata dictionary.
Examples
{# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example - errors`check_data_dict_categories(data_dict)}Assess categorical variables for non-Boolean values in 'missing' column
Description
Generates a data frame report of any categorical variables with non-Boolean(or compatible with boolean) values in the 'missing' column of the'Categories' element.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_data_dict_missing_categories(data_dict)Arguments
data_dict | A list of data frame(s) representing metadata to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A data frame providing categorical values which 'missing' column is not aboolean.
Examples
{# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example - errors`check_data_dict_missing_categories(data_dict)}Assess a data dictionary for non-valid valueType values
Description
Generates a data frame report of any variable with a valueType that is not inthe list of allowed valueType values. This function also assesses if thevalueType is compatible with any associated categorical values declared.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_data_dict_valueType(data_dict)Arguments
data_dict | A list of data frame(s) representing metadata to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A data frame providing non-standard valueType declared in a data dictionary.
Examples
{# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example - errors with data`check_data_dict_valueType(data_dict)}Assess a data dictionary for potential issues in variables
Description
Generates a data frame report of any non-unique variable names in the'Variables' element.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_data_dict_variables(data_dict)Arguments
data_dict | A list of data frame(s) representing metadata to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A data frame providing non unique variables across a data dictionary.
Examples
{# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example - errors`check_data_dict_variables(data_dict)}Assess a data dictionary and associated dataset for category differences
Description
Generates a data frame report of any categorical value options (thecombination of 'variable' and 'name' in 'Categories') in a data dictionarythat are not in the associated dataset and any categorical variable valuesin a dataset that are not declared in the associated data dictionary.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_dataset_categories(dataset, data_dict = NULL)Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing categorical values which differ between dataset andtheir data dictionary.
Examples
{library(dplyr)# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example - errors with data` %>% mutate(gndr = as_category(gndr))data_dict <- as_data_dict(madshapR_examples$`data_dictionary_example - errors with data`)check_dataset_categories(dataset['gndr'] , data_dict)}Assess a data dictionary and associated dataset for valueType differences
Description
Generates a data frame report of any incompatibility between variable valuesin a dataset and the declared valueType in the associated data dictionary.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_dataset_valueType(dataset, data_dict = NULL, valueType_guess = FALSE)Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata to be evaluated. |
valueType_guess | Whether the output should include a more accuratevalueType that could be applied to the dataset. FALSE by default. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A data frame providing values which valueType differs between dataset andtheir data dictionary.
Examples
{data_dict <- madshapR_examples$`data_dictionary_example - errors with data`dataset <- madshapR_examples$`dataset_example`check_dataset_valueType(dataset, data_dict, valueType_guess = TRUE)check_dataset_valueType(dataset, data_dict, valueType_guess = FALSE)}Assess a data dictionary and associated dataset for undeclared variables
Description
Generates a data frame report of any variable that is present in a datasetbut not in the associated data dictionary or present in a data dictionary butnot in the associated dataset.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_dataset_variables(dataset, data_dict = NULL)Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing undeclared variables across a data dictionary.
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example - errors`data_dict <- madshapR_examples$`data_dictionary_example - errors`check_dataset_variables(dataset,data_dict)}Assess variable names in a data dictionary for non-standard formats
Description
Generates a data frame report of any variable names that are not compatiblein Maelstrom Research ecosystem, including Opal.This report can be used to help assess data structure, presence of fields,coherence across elements, and taxonomy or data dictionary formats.
Usage
check_name_standards(var_names)Arguments
var_names | A character vector of names. |
Details
The object may be specifically formatted to be compatible with additionalMaelstrom Research software,in particularOpal environments.
Value
A data frame providing non-standard names across a vector.
Examples
{# use madshapR_examples provided by the packagenames_in_data_dict <- madshapR_examples$`data_dictionary_example - errors`$`Variables`$`name`check_name_standards(names_in_data_dict)check_name_standards(c("coucou", "cou cou", "$coucou",NA))}Return the id column names(s) of a dataset
Description
Return the id column names(s) of a dataset if any. If not, the functionreturns a NULL object.
Usage
col_id(dataset)Arguments
dataset | A data frame object. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
Name(s) of identifier column(s). NULL if not.
Examples
{# use madshapR_examples provided by the packagelibrary(dplyr)library(fabR)###### Example 1: A dataset can have an id column specified as an attribute. dataset <- as_dataset(madshapR_examples$`dataset_example`)col_id(dataset)dataset <- as_dataset(dataset, col_id = "part_id")col_id(dataset)###### Example 2: Any data frame can be a dataset by definition.dataset <- tibble(iris %>% add_index("my_index"))dataset <- as_dataset(dataset, "my_index")col_id(dataset)}Built-in data frame of colors used in the graphs and charts.
Description
Provides a built-in data frame of the colors used in the graphs and charts.
Usage
color_palette_maelstromFormat
data.frame
A data frame with 51 rows and 2 columns:
- values
possible class value in a dataset.
- color_palette
associated color
...
Examples
{head(color_palette_maelstrom)}Apply a data dictionary to a dataset
Description
Applies a data dictionary to a dataset, creating a labelled dataset withvariable attributes. Any previous attributes will be preserved. Forvariables that are factors, variables will be transformed intohaven-labelled variables. The data dictionary will be added as an attribute(attributes(dataset)$madshapR::Data dictionary) and can be extracted usingthe functiondata_dict_extract()
Usage
data_dict_apply(dataset, data_dict = NULL)Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A labelled data frame with metadata as attributes, specified for eachvariable from the input data dictionary.
See Also
attributes(),haven::labelled(),data_dict_extract()
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- data_dict_apply(dataset, data_dict)head(dataset)}Transform multi-row category column(s) to single rows and join to "Variables"
Description
Collapses a data dictionary element (the parameter 'from'),into column(s) in another element (the parameter 'to')If the element 'to' exists, and contains any column 'xx' or 'yy', thesecolumns will be added to the element 'from' under the names 'to:xx'and 'to:yy'. (unique names will be generated if necessary). Each elementof these column will gather all information to process the reverse operation.Separator of each element is the following structure :'name = xx1 ; name = xx2'.This function is mainly used to collapse the 'Categories' element intocolumns in 'Variables'.This function is the reversed operation ofdata_dict_expand()
Usage
data_dict_collapse( data_dict, from = "Categories", to = "Variables", name_prefix = "Categories::")Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
from | A symbol identifying the name of the element (data frame) to takecolumn(s) from. Default is 'Categories'. |
to | A symbol identifying the name of the element (data frame) to createcolumn(s) to. Default is 'Variables'. |
name_prefix | A character string of the prefix of columns of interest.This prefix will be used to select columns, and to rename them in the 'to'element. Default is 'Categories::'. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a data dictionary.
See Also
Examples
{# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example`data_dict_collapsed <- data_dict_collapse(data_dict)head(data_dict_collapse(data_dict_collapsed))}Generate an assessment report for a data dictionary
Description
Assesses the content and structure of a data dictionary and generates reportsof the results. The report can be used to help assess data dictionarystructure, presence of fields, coherence across elements, and taxonomyor data dictionary formats.
Usage
data_dict_evaluate(data_dict, taxonomy = NULL, is_data_dict_mlstr = TRUE)Arguments
data_dict | A list of data frame(s) representing metadata to be evaluated. |
taxonomy | An optional data frame identifying a variable classificationschema. |
is_data_dict_mlstr | Whether the input data dictionary should be coercedwith specific format restrictions for compatibility with otherMaelstrom Research software. TRUE by default. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname. The function truncates each cell to a maximum of10000 characters, to be readable and compatible with Excel.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
The object may be specifically formatted to be compatible with additionalMaelstrom Research software,in particularOpal environments.
Value
A list of data frames containing assessment reports.
Examples
{library(dplyr)# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example - errors`eval_data_dict <- data_dict_evaluate(data_dict,is_data_dict_mlstr = TRUE)glimpse(eval_data_dict)}Transform single-row category information to multiple rows as element
Description
Expands data dictionary column(s) in a element (the parameter 'from'),into another element (the parameter 'to').If the elementfrom contains any column starting with 'prefix', (xx,yy),these columns will be added as 'xx' and 'yy' in the element identified byto. This data frame will be created if necessary, and columns will beadded, from left to right. (unique names will be generated if necessary).Separator of each element is the following structure :'name = xx1 ; name = xx2'.This function is mainly used to expand the column(s) 'Categories::xx' in"Variables" to "Categories" element with column(s) xx.This function is the reversed operation ofdata_dict_collapse()
Usage
data_dict_expand( data_dict, from = "Variables", name_prefix = "Categories::", to = "Categories")Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
from | A symbol identifying the name of the element (data frame) to takecolumn(s) from. Default is 'Variables'. |
name_prefix | Character string of the prefix of columns of interest.This prefix will be used to select columns, and to rename them in the 'to'element. Default is 'Categories::'. |
to | A symbol identifying the name of the element (data frame) to createcolumn(s) to. Default is 'Categories'. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a data dictionary.
See Also
Examples
{library(dplyr)# use madshapR_examples provided by the packagedata_dict_collapsed <- madshapR_examples$`data_dictionary_example - collapsed`data_dict_expanded <- data_dict_expand(data_dict_collapsed)glimpse(data_dict_expand(data_dict_expanded))}Generate a data dictionary from a dataset
Description
Generates a data dictionary from a dataset. If the dataset variables have noassociated metadata, a minimum data dictionary is created by using variableattributes.
Usage
data_dict_extract(dataset, as_data_dict_mlstr = TRUE)Arguments
dataset | A dataset object. |
as_data_dict_mlstr | Whether the input data dictionary should be coercedwith specific format restrictions for compatibility with otherMaelstrom Research software. TRUE by default. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
The object may be specifically formatted to be compatible with additionalMaelstrom Research software,in particularOpal environments.
Value
A list of data frame(s) representing metadata of the dataset variables.
Examples
{library(dplyr)###### Example 1: use madshapR_examples provided by the package# download a dataset and its data dictionary# apply the data dictionary to its datasetdataset <- madshapR_examples$`dataset_example` data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- data_dict_apply(dataset,data_dict)# extract the data dictionary from the datasetdata_dict <- data_dict_extract(dataset)glimpse(data_dict)###### Example 2: extract data dictionary from any dataset (the # data dictionary will be created upon attributes of the dataset. Factors # will be considered as categorical variables)glimpse(data_dict)}Subset data dictionary by row values
Description
Subsets either or both the 'Variables' and 'Categories' elements of a datadictionary. Rows are conserved if their values satisfy the condition.This is a wrapper function analogous todplyr::filter().
Usage
data_dict_filter( data_dict, filter_var = NULL, filter_cat = NULL, filter_all = NULL)Arguments
data_dict | A list of data frame(s) representing metadata to befiltered. |
filter_var | Expressions that are defined in the element 'Variables' inthe data dictionary. |
filter_cat | Expressions that are defined in the element 'Categories' inthe data dictionary. |
filter_all | Expressions that are defined both in the 'Categories' and'Variables' in the data dictionary. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a workable data dictionary structure.
See Also
Examples
{library(dplyr)# use madshapR_examples provided by the package# Data dictionary where the column 'table' is added to # refer to the associated dataset.data_dict <- madshapR_examples$`data_dictionary_example` %>% lapply(function(x) mutate(x,table = "dataset"))###### Example 1 search and filter through a column in 'Variables' elementdata_dict_f1 <- data_dict_filter(data_dict,filter_var = "name == 'gndr'")glimpse(data_dict_f1)###### Example 2 search and filter through a column in 'Categories' elementdata_dict_f2 <- data_dict_filter(data_dict,filter_cat = "missing == TRUE")glimpse(data_dict_f2)###### Example 3 search and filter through a column across all elements.# The column must exist in both 'Variables' and 'Categories' and have the# same meaningdata_dict_f3 <- data_dict_filter(data_dict,filter_all = "table == 'dataset'")glimpse(data_dict_f3)}Group listed data dictionaries by specified column names
Description
Groups the data dictionary element(s) by the groups defined by the query.This function groups both the 'Variables' and 'Categories' elements (ifthe group exists under the same definition in in both). This function isanalogous to runningdplyr::group_by(). Each element is named using thegroup values.data_dict_ungroup() reverses the effect.
Usage
data_dict_group_by(data_dict, col)Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
col | variable to group by. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a workable data dictionary structure.
See Also
dplyr::group_by(),data_dict_ungroup()
Examples
{library(dplyr)# use madshapR_examples provided by the package# Create a list of data dictionaries where the column 'table' is added to # refer to the associated dataset. The object created is not a # data dictionary per say, but can be used as a structure which can be # shaped into a data dictionary.data_dict_list <- list( data_dict_1 = madshapR_examples$`data_dictionary_example` , data_dict_2 = madshapR_examples$`data_dictionary_example - collapsed`)data_dict_ns <- data_dict_list_nest(data_dict_list, name_group = "table")data_dict_gp <- data_dict_group_by(data_dict_ns, col = "table")glimpse(data_dict_gp)}Split grouped data dictionaries into a named list
Description
Divides data dictionary element(s) into the groups defined by the query.This function divides both the 'Variables' and 'Categories' elements (ifthe group exists under the same definition in in both) into a list ofdata dictionaries, each with the rows of the associated group and all theoriginal columns, including grouping variables. This function is analogousto runningdplyr::group_by() anddplyr::group_split(). Each element isnamed using the group values.data_dict_list_nest() reverses the effect.
Usage
data_dict_group_split(data_dict, ...)Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
... | Column in the data dictionary to split it by. If not provided, thesplitting will be done on the grouping element of a grouped data dictionary. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a list of workable data dictionary structure.
See Also
dplyr::group_by(),dplyr::group_split() ,data_dict_group_by(),data_dict_list_nest()
Examples
{library(dplyr)# use madshapR_examples provided by the package# Create a list of data dictionaries where the column 'table' is added to # refer to the associated dataset. The object created is not a # data dictionary per say, but can be used as a structure which can be # shaped into a data dictionary.data_dict_list <- list( data_dict_1 = madshapR_examples$`data_dictionary_example - collapsed`, data_dict_2 = madshapR_examples$`data_dictionary_example` )data_dict_ns <- data_dict_list_nest(data_dict_list, name_group = "table") %>% data_dict_group_by(col = "table") data_dict_sp <- data_dict_group_split(data_dict_ns,col = "table")glimpse(data_dict_sp) }Bind listed data dictionaries
Description
Binds a list of data dictionaries into one data dictionary.This is a wrapper function analogous todplyr::bind_rows().
Usage
data_dict_list_nest(data_dict_list, name_group = NULL)Arguments
data_dict_list | A list of data frame(s) representing metadata to betransformed. |
name_group | A character string of one column in the dataset that can betaken as a grouping column. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a workable data dictionary structure.
See Also
Examples
{library(dplyr)# use madshapR_examples provided by the package# Create a list of data dictionaries where the column 'table' is added to # refer to the associated dataset. The object created is not a # data dictionary per say, but can be used as a structure which can be # shaped into a data dictionary.data_dict_list <- list( data_dict_1 = madshapR_examples$`data_dictionary_example` , data_dict_2 = madshapR_examples$`data_dictionary_example - collapsed`)data_dict_ns <- data_dict_list_nest(data_dict_list, name_group = "table") %>% data_dict_group_by(col = "table")glimpse(data_dict_ns)}Inner join between a dataset and its associated data dictionary
Description
Performs an inner join between a dataset and its associated data dictionary,keeping only variables present in both. This function returns the matcheddataset rows, the matched data dictionary rows, or both, in a list.
Usage
data_dict_match_dataset( dataset, data_dict, data_dict_apply = FALSE, output = c("dataset", "data_dict"))Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. |
data_dict_apply | Whether data dictionary(ies) should be applied toassociated dataset(s), creating labelled dataset(s) with variable attributes.Any previous attributes will be preserved. FALSE by default. |
output | A vector of character string which indicates if the functionreturns a dataset ('dataset'), data dictionary ('data_dict') of both.Default is c('dataset','data_dict'). |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
Either a data frame, identifying the dataset, or a list of data frame(s)identifying a data dictionary. Returns both in a list by default.
Examples
{library(dplyr)# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`data_dict <- madshapR_examples$`data_dictionary_example - errors`match_data_dict <- data_dict_match_dataset(dataset,data_dict,output = 'data_dict')match_dataset <- data_dict_match_dataset(dataset,data_dict,output = 'dataset')head(match_data_dict)glimpse(match_dataset)}Transform column(s) of a data dictionary from wide format to long format
Description
Transforms column(s) of a data dictionary from wide format to long format.If a taxonomy is provided, the corresponding columns in the datadictionary will be converted to a standardized format with fewer columns.This operation is equivalent to performing atidyr::pivot_longer() onthese columns following the taxonomy structure provided. Variable names inthe data dictionary must be unique.
Usage
data_dict_pivot_longer(data_dict, taxonomy = NULL)Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
taxonomy | An optional data frame identifying a variable classificationschema. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
Value
A list of data frame(s) identifying a data dictionary.
See Also
tidyr::pivot_longer(),as_data_dict()
Examples
{library(dplyr)# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example`taxonomy <- madshapR_examples$`taxonomy_example`data_dict_longer <- data_dict_pivot_longer(data_dict, taxonomy)glimpse(data_dict_longer)}Transform column(s) of a data dictionary from long format to wide format
Description
Transforms column(s) of a data dictionary from long format to wide format.If a taxonomy is provided, the corresponding columns in the datadictionary will be converted to a format with the taxonomy expanded.This operation is equivalent to performing atidyr::pivot_wider() on thesecolumns following the taxonomy structure provided. Variable names in thedata dictionary must be unique.
Usage
data_dict_pivot_wider(data_dict, taxonomy = NULL)Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
taxonomy | An optional data frame identifying a variable classificationschema. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
Value
A list of data frame(s) identifying a data dictionary.
See Also
tidyr::pivot_wider(),as_data_dict()
Examples
{library(dplyr)# use madshapR_examples provided by the packagedata_dict <- madshapR_examples$`data_dictionary_example`taxonomy <- madshapR_examples$`taxonomy_example`data_dict_longer <- data_dict_pivot_longer(data_dict, taxonomy)data_dict_wider <- data_dict_pivot_wider(data_dict_longer, taxonomy)glimpse(data_dict_wider)}Add shortened labels to data dictionary
Description
This function modifies a data dictionary by adding shortened labels for bothvariables and categories. The shortened labels are created based on specifiedmaximum lengths for the variable and category names and labels.The function first validates the input usingas_data_dict_shape and extracts thefirst variable and category labels usingfirst_label_get. It then calculates thelengths of names and labels, ensuring that they do not exceed the specified maximumlengths. The function handles both variables and categories, creating short labelswhile replacing any missing values with "Empty".
Usage
data_dict_trim_labels( data_dict, max_length_var_name = 31, max_length_var_label = 255, max_length_cat_name = 15, max_length_cat_label_short = 15, max_length_cat_label_long = 63, .keep_columns = TRUE)Arguments
data_dict | A data dictionary, typically a list containing 'Variables'and 'Categories' data frames. |
max_length_var_name | An integer specifying the maximum length forvariable names (default is 10). |
max_length_var_label | An integer specifying the maximum length forvariable labels (default is 255). |
max_length_cat_name | An integer specifying the maximum length forcategory names (default is 10). |
max_length_cat_label_short | An integer specifying the maximum total lengthfor category labels (short) (default is 15). |
max_length_cat_label_long | An integer specifying the maximum total lengthfor category labels (long) (default is 63). |
.keep_columns | A boolean specifying if the output preserves the othercolumns of the data dictionary or not. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A modified data dictionary with additional columns for shortened labels:
madshapR::label_var_short: Shortened variable labels.madshapR::label_cat_long: Shortened category labels (if categories are present).
Examples
{ # use madshapR_examples provided by the package data_dict <- madshapR_examples$`data_dictionary_example - errors` data_dict_with_short_labels <- data_dict_trim_labels(data_dict) attributes(data_dict_with_short_labels)}Ungroup data dictionary
Description
Ungroups the data dictionary element(s). This function ungroups both the'Variables' and 'Categories' elements (if both are grouped data frames).This function is analogous to runningdplyr::ungroup().data_dict_group_by() allows to group a data dictionary and this functionreverses the effect.
Usage
data_dict_ungroup(data_dict)Arguments
data_dict | A list of data frame(s) representing metadata to betransformed. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a workable data dictionary structure.
See Also
dplyr::ungroup()data_dict_group_by()
Examples
{library(dplyr)# use madshapR_examples provided by the package# Create a list of data dictionaries where the column 'table' is added to # refer to the associated dataset. The object created is not a # data dictionary per say, but can be used as a structure which can be # shaped into a data dictionary.data_dict_list <- list( data_dict_1 = madshapR_examples$`data_dictionary_example` , data_dict_2 = madshapR_examples$`data_dictionary_example`)data_dict_nest <- data_dict_list_nest(data_dict_list, name_group = "table") %>% data_dict_group_by(col = "table")glimpse(data_dict_ungroup(data_dict_nest))}Update a data dictionary from a dataset
Description
Updates a data dictionary from a dataset, creating a new data dictionary withupdated content, from variables selected in the dataset. Any previous othermeta data will be preserved. The new data dictionary can be applied to thedataset usingdata_dict_apply().
Usage
data_dict_update(data_dict = NULL, dataset, cols = names(dataset))Arguments
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
dataset | A dataset object. |
cols | An optional character string specifying the name(s) orposition(s) of the column(s) for which meta data will be updated. All bydefault. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A list of data frame(s) identifying a data dictionary.
See Also
data_dict_apply(),data_dict_extract()
Examples
{library(dplyr)# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- data_dict_apply(dataset,data_dict)# the data dictionary contains no categorical variable.# create a category in the datasetdataset <- dataset %>% mutate(gndr = as_category(gndr, labels = c("coucou" = 1),na_values = 2))new_data_dict <- data_dict_update(data_dict, dataset, "gndr")head(dataset)}Create an empty dataset from a data dictionary
Description
Creates an empty dataset using information contained in a data dictionary.The column names are taken from 'name' in the 'Variables' element of thedata dictionary. If a 'valueType' or alternatively 'typeof' column isprovided, the class of each column is set accordingly (default is text).
Usage
data_extract(data_dict, data_dict_apply = FALSE)Arguments
data_dict | A list of data frame(s) representing metadata. |
data_dict_apply | Whether data dictionary(ies) should be applied toassociated dataset(s), creating labelled dataset(s) with variable attributes.Any previous attributes will be preserved. FALSE by default. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A data frame identifying the dataset created from the variable names list in'Variables' element of the data dictionary.
Examples
{# use madshapR_examples provided by the package# from a data dictionary, you can use the function to extract and generate an # empty datasetdata_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- data_extract(data_dict)head(dataset)}Apply data dictionary category labels to the associated dataset variables
Description
Applies category labels declared in a data dictionary to the associatedcolumns (variables) in the dataset.
Usage
dataset_cat_as_labels(dataset, data_dict = NULL, col_names = names(dataset))Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
col_names | A character string specifying the name(s) of the column(s)which refer to existing column(s) in the dataset. The column(s) can be namedor indicated by position. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A data frame identifying a dataset.
Examples
{dataset = madshapR_examples$`dataset_example`data_dict = as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset_cat_as_labels(dataset, data_dict, col_names = 'gndr')}Generate an assessment report for a dataset
Description
Assesses the content and structure of a dataset object and generates reportsof the results. This function can be used to evaluate data structure,presence of specific fields, coherence across elements, and data dictionaryformats.
Usage
dataset_evaluate( dataset, data_dict = NULL, is_data_dict_mlstr = TRUE, valueType_guess = TRUE, taxonomy = NULL, dataset_name = NULL)Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
is_data_dict_mlstr | Whether the input data dictionary should be coercedwith specific format restrictions for compatibility with otherMaelstrom Research software. TRUE by default. |
valueType_guess | Whether the output should include a more accuratevalueType that could be applied to the dataset. TRUE by default. |
taxonomy | An optional data frame identifying a variable classificationschema. |
dataset_name | A character string specifying the name of the dataset(used internally in the function |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname. The function truncates each cell to a maximum of10000 characters, to be readable and compatible with Excel.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
The object may be specifically formatted to be compatible with additionalMaelstrom Research software,in particularOpal environments.
Value
A list of data frames containing assessment reports.
See Also
Examples
library(dplyr)###### Example 1: use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example - errors with data`data_dict <- madshapR_examples$`data_dictionary_example - errors with data`eval_dataset <- dataset_evaluate(dataset, data_dict)glimpse(eval_dataset)###### Example 2: Any data frame can be a dataset by definitioneval_iris <- dataset_evaluate(iris)glimpse(eval_iris)Generate an evaluation of all variable values in a dataset
Description
Analyses the content of a dataset and its data dictionary (if any),identifies variable(s) data type and values accordingly and preprocess thevariables. The elements of the data frame generated are evaluation ofvalid/non valid/empty values (based on the data dictionary information ifprovided). This function can be used to personalize report parameters and isinternally used in the functiondataset_summarize().
Generates a data frame that evaluates and aggregates all columnsin a dataset with (if any) its data dictionary. The data dictionary (ifpresent) separates observations between open values, empty values,categorical values , and categorical non-valid values (which corresponds to the'missing' column in the 'Categories' sheet).This internal function is mainly used inside summary functions.
Usage
dataset_preprocess( dataset, data_dict = data_dict_extract(dataset), group_by = group_vars(dataset))Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
group_by | A character string identifying the column in the datasetto use as a grouping variable. Elements will be grouped by thiscolumn. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing summary elements of a dataset, including its valuesand data dictionary elements.
See Also
Examples
{ # use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`head(dataset_preprocess(dataset))}Generate an assessment report and summary of a dataset
Description
Assesses and summarizes the content and structure of a dataset and generatesreports of the results. This function can be used to evaluate data structure,presence of specific fields, coherence across elements, and data dictionaryformats, and to summarize additional information about variable distributionsand descriptive statistics.
Usage
dataset_summarize( dataset, data_dict = data_dict_extract(dataset), group_by = group_vars(dataset), taxonomy = NULL, valueType_guess = TRUE, dataset_name = NULL)Arguments
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
group_by | A character string identifying the column in the datasetto use as a grouping variable. Elements will be grouped by thiscolumn. |
taxonomy | An optional data frame identifying a variable classificationschema. |
valueType_guess | Whether the output should include a more accuratevalueType that could be applied to the dataset. TRUE by default. |
dataset_name | A character string specifying the name of the dataset(internally used in the function |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname. The function truncates each cell to a maximum of10000 characters, to be readable and compatible with Excel.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A list of data frames containing assessment reports and summaries.
See Also
Examples
library(dplyr)###### Example 1: use madshapR_examples provided by the packagedataset <- as_dataset(madshapR_examples$`dataset_example`, col_id = 'part_id')data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)summary_dataset <- dataset_summarize(dataset, data_dict,group_by = "gndr")glimpse(summary_dataset)###### Example 2: Any data frame can be a dataset by definitionsummary_iris <- dataset_summarize(iris, group_by = "Species")glimpse(summary_iris)Generate a web-based visual report for a dataset
Description
Generates a visual report of a dataset in an HTML bookdowndocument, with summary figures and statistics for each variable. The reportoutputs can be grouped by a categorical variable.
Usage
dataset_visualize( dataset = tibble(id = as.character()), bookdown_path, data_dict = data_dict_extract(dataset), group_by = attributes(dataset_summary)[["madshapR_group::group_by"]], valueType_guess = FALSE, taxonomy = NULL, dataset_summary = NULL, dataset_name = NULL)Arguments
dataset | A dataset object. |
bookdown_path | A character string identifying the folder path wherethe bookdown report files will be saved. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
group_by | A character string identifying the column in the datasetto use as a grouping variable. Elements will be grouped by thiscolumn. |
valueType_guess | Whether the output should include a more accuratevalueType that could be applied to the dataset. FALSE by default. |
taxonomy | An optional data frame identifying a variable classificationschema. |
dataset_summary | A list which identifies an existingsummary produced by |
dataset_name | A character string specifying the name of the dataset(used internally in the function |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
Value
A folder containing files for the bookdown site. To open the bookdown sitein a browser, open 'docs/index.html', or usebookdown_open() with thefolder path.
See Also
Examples
library(fs)library(dplyr) # use madshapR_examples provided by the package dataset <- madshapR_examples$`dataset_example` %>% group_by(gndr) %>% as_dataset(col_id = "part_id") data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- data_dict_apply(dataset,data_dict)dataset_summary <- dataset_summarize(dataset,data_dict) if(dir_exists(tempdir())) dir_delete(tempdir())bookdown_path <- tempdir() dataset_visualize( dataset, data_dict, dataset_summary = dataset_summary, bookdown_path = bookdown_path) # To open the file in browser, open 'bookdown_path/docs/index.html'. # Or use bookdown_open(bookdown_path) function.Remove labels (attributes) from a data frame, leaving its unlabelled columns
Description
Removes any attributes attached to a data frame. Any value in columns will bepreserved. Any 'Date' (typeof) column will be recast as character topreserve information.
Usage
dataset_zap_data_dict(dataset, zap_factor = FALSE)Arguments
dataset | A dataset object. |
zap_factor | Whether the factor column should be coerced with itscorresponding valueType. FALSE by default. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame identifying a dataset.
See Also
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- data_dict_apply(dataset,data_dict)unlabbeled_dataset <- dataset_zap_data_dict(dataset)head(unlabbeled_dataset)}Generate a dossier from a list of one or more datasets
Description
Generates a dossier object (list of one or more datasets).
Usage
dossier_create(dataset_list, data_dict_apply = FALSE)Arguments
dataset_list | A list of data frame, each of them being dataset object. |
data_dict_apply | Whether data dictionary(ies) should be applied toassociated dataset(s), creating labelled dataset(s) with variable attributes.Any previous attributes will be preserved. FALSE by default. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A list of data frame(s), containing input dataset(s).
Examples
{# use madshapR_examples provided by the packagelibrary(dplyr)###### Example 1: datasets can be gathered into a dossier which is a list.dataset_example1 <- madshapR_examples$`dataset_example`dataset_example2 <- madshapR_examples$`dataset_example - errors with data`dossier <- dossier_create(list(dataset_example1,dataset_example2))glimpse(dossier) ###### Example 2: Any data frame can be gathered into a dossierdossier <- dossier_create(list(iris,mtcars))glimpse(dossier) }Generate an assessment report of a dossier
Description
Assesses the content and structure of a dossier object (list of datasets)and generates reports of the results. This function can be used to evaluatedata structure, presence of specific fields, coherence across elements, anddata dictionary formats.
Usage
dossier_evaluate(dossier, taxonomy = NULL, is_data_dict_mlstr = TRUE)Arguments
dossier | List of data frame, each of them being datasets. |
taxonomy | An optional data frame identifying a variable classificationschema. |
is_data_dict_mlstr | Whether the input data dictionary should be coercedwith specific format restrictions for compatibility with otherMaelstrom Research software. TRUE by default. |
Details
A dossier is a named list containing at least one data frame or more,each of them being datasets. The name of each data frame will be use as thereference name of the dataset.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
The object may be specifically formatted to be compatible with additionalMaelstrom Research software,in particularOpal environments.
Value
A list of data frames containing assessment reports.
Examples
library(dplyr)# use madshapR_examples provided by the packagedataset1 <- as_dataset(madshapR_examples$`dataset_example`)dataset2 <- as_dataset(madshapR_examples$`dataset_example - error`,col_id = "part_id")dossier <- dossier_create(list(dataset1,dataset2))eval_dossier <- dossier_evaluate(dossier,is_data_dict_mlstr = TRUE)glimpse(eval_dossier)Generate an assessment report and summary of a dossier
Description
Assesses and summarizes the content and structure of a dossier(list of datasets) and generates reports of the results. This function canbe used to evaluate data structure, presence of specific fields, coherenceacross elements, and data dictionary formats, and to summarize additionalinformation about variable distributions and descriptive statistics.
Usage
dossier_summarize( dossier, group_by = NULL, taxonomy = NULL, valueType_guess = TRUE)Arguments
dossier | List of data frame(s), each of them being datasets. |
group_by | A character string identifying the column in the datasetto use as a grouping variable. Elements will be grouped by thiscolumn. |
taxonomy | An optional data frame identifying a variable classificationschema. |
valueType_guess | Whether the output should include a more accuratevalueType that could be applied to the dataset. TRUE by default. |
Details
A dossier is a named list containing at least one data frame or more,each of them being datasets. The name of each data frame will be use as thereference name of the dataset.
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A list of data frames containing overall assessment reports and summaries grouped by dataset.
Examples
# use madshapR_examples provided by the packagelibrary(dplyr)###### Example : a dataset list is a dossier by definition. dataset1 <- as_dataset(madshapR_examples$`dataset_example` %>% group_by(pick('gndr')))dataset2 <- as_dataset(madshapR_examples$`dataset_example - error`, col_id = "part_id")dossier <- dossier_create(list(dataset1,dataset2))summary_dossier <- dossier_summarize(dossier)glimpse(summary_dossier)Validate and coerce any object as a non-categorical variable.
Description
Converts a vector object to a non-categorical object, typically a column in adata frame. The categories come from valid values present in theobject and are suppressed from an associated data dictionary (when present).
Usage
drop_category(x)Arguments
x | object to be coerced. |
Value
A R object.
Examples
{library(dplyr)###### Example 1: use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example` %>% mutate(prg_ever_cat = as_category(prg_ever)) %>% mutate(prg_ever_no_cat = drop_category(prg_ever)) head(dataset[c("prg_ever_cat","prg_ever_no_cat")])###### Example 2: any data frame can be a datasetiris_no_cat <- tibble(iris) %>% mutate(Species = drop_category(Species))head(iris_no_cat)}Get the first label from a data dictionary
Description
This function retrieves the first variable and category labels from a data dictionary.It checks if the labels are present, and if not, returns empty strings. The functionalso determines the class of the data dictionary based on its attributes and structure.The function first validates the input usingas_data_dict_shape. It then attempts toextract the first variable label from the 'Variables' section of the data dictionary.If categories are present, it will also extract the first relevant category label.The class of the data dictionary is determined based on its attributes and structure.
Usage
first_label_get(data_dict)Arguments
data_dict | A list of data frame(s) representing metadata. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A named character vector with the following elements:
Variables: The first variable label found, or an empty string if none are found.Categories: The first category label found, orNULLif no categories are present.madshapR::class: A string indicating the class of the data dictionary.
Examples
{ # use madshapR_examples provided by the package data_dict <- madshapR_examples$`data_dictionary_example` first_label_get(data_dict)}Test if an object has categorical variables.
Description
Test if the object has categorical variables, typically a data frame orcategorical entries in the data dictionary. This function mainly helpsvalidate input within other functions of the package but could be used tocheck if a dataset or a data dictionary has categorical variables.
Usage
has_categories(...)Arguments
... | Object that can be either a dataset or a data dictionary. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A logical.
Examples
{library(dplyr)###### Example 1: use madshapR_examples provided by the packagedataset_with_cat <- madshapR_examples$`dataset_example` %>% mutate(prg_ever_cat = as_category(prg_ever)) has_categories(madshapR_examples$`dataset_example`)has_categories(dataset_with_cat)has_categories(madshapR_examples$`data_dictionary_example`)###### Example 2: any data frame can be a datasethas_categories(iris)has_categories(mtcars)}Test and validate if an object is a categorical variable.
Description
Tests if the input object is a categorical variable. This function mainly helpsvalidate input within other functions of the package but could be usedto check if a column is categorical.
Usage
is_category(x, threshold = NULL)Arguments
x | object to be coerced. |
threshold | Optional. The function returns TRUE if the number of uniquevalues in the input vector is lower. |
Value
A logical.
Examples
{library(dplyr)###### Example 1: use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example` %>% mutate(prg_ever_cat = as_category(prg_ever)) %>% mutate(prg_ever_no_cat = drop_category(prg_ever)) is_category(dataset[['prg_ever_cat']])is_category(dataset[['prg_ever_no_cat']])###### Example 2: any data frame can be a datasetiris %>% reframe(across(everything(), is_category))}Test if an object is a valid data dictionary
Description
Tests if the input object is a valid data dictionary. This function mainlyhelps validate input within other functions of the package but could be usedto check if an object is valid for use in a function.
Usage
is_data_dict(object)Arguments
object | A potential data dictionary to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A logical.
See Also
For a better assessment, please usedata_dict_evaluate().
Examples
{# use madshapR_examples provided by the packageis_data_dict(madshapR_examples$`data_dictionary_example - errors`)is_data_dict(madshapR_examples$`data_dictionary_example - errors with data`)is_data_dict(madshapR_examples$`data_dictionary_example`)is_data_dict(iris)}Test if an object is a valid Maelstrom data dictionary
Description
Tests if the input object is a valid data dictionary compliant with formatsused in Maelstrom Research ecosystem, including Opal. This function mainlyhelps validate input within other functions of the package but could be usedto check if an object is valid for use in a function.
Usage
is_data_dict_mlstr(object)Arguments
object | A potential Maelstrom formatted data dictionary to beevaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A logical.
See Also
For a better assessment, please usedata_dict_evaluate().
Examples
{# use madshapR_examples provided by the packageis_data_dict_mlstr(madshapR_examples$`data_dictionary_example - errors`)is_data_dict_mlstr(madshapR_examples$`data_dictionary_example - errors with data`)is_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)is_data_dict_mlstr(iris)}Test if an object is a workable data dictionary structure
Description
Tests if the input object has adequate structure to work with functionsinvolving data dictionary shaping. This function mainly helps validate inputwithin other functions of the package but could be used to check if anobject is valid for use in a function.
Usage
is_data_dict_shape(object)Arguments
object | A potential data dictionary structure to be evaluated. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
Value
A logical.
See Also
For a better assessment, please usedata_dict_evaluate().
Examples
{# use madshapR_examples provided by the packageis_data_dict_shape(madshapR_examples$`data_dictionary_example - errors`)is_data_dict_shape(madshapR_examples$`data_dictionary_example - errors with data`)is_data_dict_shape(madshapR_examples$`data_dictionary_example`)is_data_dict_shape(iris)}Test if an object is a valid dataset
Description
Tests if the input object is a valid dataset. This function mainly helpsvalidate input within other functions of the package but could be usedto check if a dataset is valid.
Usage
is_dataset(object)Arguments
object | A potential dataset to be evaluated. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A logical.
See Also
For a better assessment, please usedataset_evaluate().
Examples
{# use madshapR_examples provided by the package# any data frame can be a dataset by definition.is_dataset(madshapR_examples$`dataset_example`)is_dataset(iris)is_dataset(AirPassengers)}Test if an object is a valid dossier (list of dataset(s))
Description
Tests if the input object is a valid dossier. This function mainly helpsvalidate input within other functions of the package but could be used tocheck if a dossier is valid.
Usage
is_dossier(object)Arguments
object | A potential dossier to be evaluated. |
Details
A dossier is a named list containing at least one data frame or more,each of them being datasets. The name of each tibble will be use as thereference name of the dataset.
Value
A logical.
Examples
{# use madshapR_examples provided by the package# Any list of data frame can be a dossier by definition.library(stringr)dossier <- madshapR_examples[str_detect(names(madshapR_examples),"^dataset_example")]is_dossier(dossier)is_dossier(list(dataset_1 = iris, dataset_2 = mtcars))is_dossier(iris)}Test if an object is a valid taxonomy
Description
Confirms whether the input object is a valid taxonomy. This function mainlyhelps validate input within other functions of the package but could beused to check if a taxonomy is valid.
Usage
is_taxonomy(object)Arguments
object | A potential taxonomy to be evaluated. |
Details
A taxonomy is a classification schema that can be defined for variableattributes. A taxonomy is usually extracted from anOpal environment, and ataxonomy object is a data frame that must contain at least the columnstaxonomy,vocabulary, andterms. Additional details about Opaltaxonomies areavailable online.
Value
A logical.
Examples
{# use madshapR_examples provided by the packageis_taxonomy(madshapR_examples$`taxonomy_example`)is_taxonomy(madshapR_examples$`dataset_example`)}Test if a character object is one of the valid valueType values
Description
Confirms whether the input object is a valid valueType. This function mainlyhelps validate input within other functions of the package but could be usedto check if a valueType is valid.
Usage
is_valueType(object)Arguments
object | A potential valueType name to be evaluated. |
Details
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A logical.
See Also
Examples
{is_valueType('integer')is_valueType('intereg')}Built-in material allowing the user to test the package with example data
Description
Example datasets and data dictionaries, and taxonomy, to provide illustrativeexamples of objects used by madshapR.
Usage
madshapR_examplesFormat
list
A list with 9 elements (data frames and lists) providing example objectsfor testing the package:
- dataset_example
Dataset for example dataset
- data_dictionary_example - as_data_dict
Data dictionary for exampledataset where the structure is a data dictionary
- data_dictionary_example - as_data_dict_mlstr
Data dictionary forexample dataset where the structure is a data dictionary compliant withMaelstrom
- dataset_example - errors with data
Dataset of examplewith errors with example data dictionary
- data_dictionary_example - errors with data
Data Dictionary for exampledataset with errors with example dataset
- data_dictionary_example - errors
Data dictionary for example datasetwith errors
- data_dictionary_example - collapsed
Data dictionary for examplewith collapsed categories
- taxonomy_example
Taxonomy for example dataset
- summary - dataset_example
Dataset example summary
...
Examples
{ library(dplyr) head(madshapR_examples$`dataset_example`)glimpse(madshapR_examples$`data_dictionary_example`)}Call to online documentation
Description
Direct call to the online documentation for the package, which includes adescription of the latest version of the package, vignettes, user guides,and a reference list of functions and help pages.
Usage
madshapR_website()Value
Nothing to be returned. The function opens a web page.
Examples
{madshapR_website()}Provide descriptive statistics for variables in a dataset
Description
Summarizes (in a data frame) the columns in a dataset and its data dictionary(if any). The summary provides information about quality, type, composition,and descriptive statistics of variables. Statistics are generated byvalueType.
Usage
summary_variables( dataset_preprocess, dataset = NULL, data_dict = NULL, group_by = NULL)Arguments
dataset_preprocess | A data frame which provides summary of thevariables (used for internal processes and programming). |
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
group_by | A character string identifying the column in the datasetto use as a grouping variable. Elements will be grouped by thiscolumn. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing statistical description of variables present ina dataset.
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`dataset_preprocess <- dataset_preprocess(dataset)summary_prep <- summary_variables(dataset_preprocess = dataset_preprocess)head(summary_prep)}Provide descriptive statistics for variables of categorical in a dataset
Description
Summarizes (in a data frame) the columns of type 'categorical' in a dataset andits data dictionary (if any). The summary provides information aboutquality, type, composition, and descriptive statistics of variables.Statistics are generated by valueType.
Usage
summary_variables_categorical( dataset_preprocess, dataset = NULL, data_dict = NULL)Arguments
dataset_preprocess | A data frame which provides summary of the variables(for internal processes and programming). |
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A data frame providing statistical description of 'categorical' variablespresent in a dataset.
Examples
{library(dplyr)library(fabR)# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example` %>% mutate(prg_ever = as_category(prg_ever)) %>% select(prg_ever)dataset_preprocess <- dataset_preprocess(dataset)summary_prep <- summary_variables_categorical( dataset_preprocess = dataset_preprocess[[1]])head(summary_prep)}Provide descriptive statistics for variables of type 'date' in a dataset
Description
Summarizes (in a data frame) the columns of type 'date' in a dataset and itsdata dictionary (if any). The summary provides information about quality,type, composition, and descriptive statistics of variables. Statistics aregenerated by valueType.
Usage
summary_variables_date(dataset_preprocess, dataset = NULL, data_dict = NULL)Arguments
dataset_preprocess | A data frame which provides summary of thevariables (for internal processes and programming). |
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing statistical description of 'date' variables presentin a dataset.
Examples
{ library(dplyr)library(fabR)# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example` %>% mutate(dob = as_any_date(dob)) %>% select(dob)dataset_preprocess <- dataset_preprocess(dataset)summary_prep <- summary_variables_date( dataset_preprocess = dataset_preprocess[[1]])head(summary_prep)}Provide descriptive statistics for variables of type 'datetime' in a dataset
Description
Summarizes (in a data frame) the columns of type 'datetime' in a dataset andits data dictionary (if any). The summary provides information about quality,type, composition, and descriptive statistics of variables. Statistics aregenerated by valueType.
Usage
summary_variables_datetime( dataset_preprocess, dataset = NULL, data_dict = NULL)Arguments
dataset_preprocess | A data frame which provides summary of thevariables (for internal processes and programming). |
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing statistical description of 'datetime' variables presentin a dataset.
Examples
{ library(dplyr)library(lubridate)library(fabR)# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example` %>% mutate(dob = as_datetime(as_any_date(dob))) %>% select(dob)dataset_preprocess <- dataset_preprocess(dataset)summary_prep <- summary_variables_datetime( dataset_preprocess = dataset_preprocess[[1]])head(summary_prep)}Provide descriptive statistics for variables of type 'numeric' in a dataset
Description
Summarizes (in a data frame) the columns of type 'numeric' in a dataset andits data dictionary (if any). The summary provides information about quality,type, composition, and descriptive statistics of variables. Statistics aregenerated by valueType.
Usage
summary_variables_numeric(dataset_preprocess, dataset = NULL, data_dict = NULL)Arguments
dataset_preprocess | A data frame which provides summary of thevariables (for internal processes and programming). |
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing statistical description of 'numerical' variablespresent in a dataset.
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`dataset_preprocess <- dataset_preprocess(dataset)summary_prep <- summary_variables_numeric( dataset_preprocess = dataset_preprocess[[1]])head(summary_prep)}Provide descriptive statistics for variables of type 'text' in a dataset
Description
Summarizes (in a data frame) the columns of type 'text' in a dataset and itsdata dictionary (if any). The summary provides information about quality,type, composition, and descriptive statistics of variables. Statistics aregenerated by valueType.
Usage
summary_variables_text(dataset_preprocess, dataset = NULL, data_dict = NULL)Arguments
dataset_preprocess | A data frame which provides summary of the variables(for internal processes and programming). |
dataset | A dataset object. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
Value
A data frame providing statistical description of 'text' variables presentin a dataset.
Examples
{ # use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`dataset_preprocess <- dataset_preprocess(dataset)summary_prep <- summary_variables_text( dataset_preprocess = dataset_preprocess[[1]])head(summary_prep)}Convert typeof (and class if any) into its corresponding valueType
Description
The function converts a given typeof string into its corresponding valueTyperepresentation. This function is particularly useful for mapping differentdata types to their equivalent value types in contexts such as data modelingand data dictionary creation. An optional class parameter allows for morespecific conversions when necessary.
Usage
typeof_convert_to_valueType(typeof, class = NA_character_)Arguments
typeof | A string representing the type to be converted.Supported values include "character", "integer", "double", "logical". |
class | An optional parameter that specifies a class context.If provided, the function may return a more refined value type based on theclass type; if not, the function will return a general equivalent.Supported values include "character", "integer","numeric","logical","Date"and "POSIXct". NULL is the default. |
Details
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A character vector, named 'valueType'.
Examples
{typeof_convert_to_valueType(typeof = "character")typeof_convert_to_valueType(typeof = "double")typeof_convert_to_valueType(typeof = "double", class = "Date")}Attribute the valueType from a data dictionary to a dataset, or vice versa
Description
It is sometimes useful to take variable valueType’s from a dataset andattribute them to the associated data dictionary, or vice versa.valueType_adjust() takes the valueType of the input (from) andattributes it to the output (to). The parameters 'from' and 'to' can beeither a dataset or a data dictionary. Depending on the input provided,the valueType replaced is either in the 'valueType' column of adata dictionary or cast to a column in a dataset. If 'to' is not provided,the function callsvalueType_self_adjust() instead. The possible valuesof the valueTypes returned are date','datetime', 'boolean', 'integer','decimal', and text'.
Usage
valueType_adjust(from, to = NULL)Arguments
from | Object to take attributes from. Can be either a dataset or a datadictionary. |
to | Object to be adjusted. Can be either a dataset or a datadictionary. NULL by default. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
Either a data frame, identifying the dataset, or a list of data frame(s)identifying a data dictionary, depending which is 'to'.
See Also
Examples
{library(dplyr) # use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`data_dict <- as_data_dict_mlstr(madshapR_examples$`data_dictionary_example`)dataset <- valueType_adjust(from = data_dict,to = dataset)head(dataset)}Convert valueType into its corresponding typeof and class in R representation
Description
The function converts a given valueType string into its corresponding typeofrepresentation. This function is particularly useful for mapping differentdata types to their equivalent value types in contexts such as data modelingand data dictionary creation. The class is provided and allows for morespecific conversions when necessary.
Usage
valueType_convert_to_typeof(valueType)Arguments
valueType | A character string of the valueType to convert. |
Details
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A character vector, named 'typeof' and 'class'.
Examples
{valueType_convert_to_typeof(valueType = NA)valueType_convert_to_typeof(valueType = "text")valueType_convert_to_typeof(valueType = "date")valueType_convert_to_typeof(valueType = "decimal")}Guess the first possible valueType of an object (Can be a vector)
Description
Provides the first possible valueType of a variable. The function tries toassign the valueType of the object first to 'boolean', then 'integer', then'decimal', then 'date'. If all others fail, the default valueType is 'text'.
Usage
valueType_guess(x)Arguments
x | Object. Can be a vector. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A character string which is the first possible valueType of the input object.
See Also
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`valueType_of(dataset$dob)valueType_guess(dataset$dob)# any data frame can be a dataset by definitionvalueType_guess(mtcars$cyl)valueType_guess(mtcars$cyl)}Built-in data frame of allowed valueType values
Description
Provides a built-in data frame showing the list of allowed Opal valueTypevalues and their corresponding R data types. This data frame is mainly usedfor internal processes and programming.
Usage
valueType_listFormat
data.frame
A data frame with 12 rows and 7 columns:
- valueType
data type as described in Opal
- typeof
data type provided by base::typeof(x)
- class
data class provided by attributes(x)
- class
data class provided by base::class(x) explicit class
- call
function to transpose object according base::do.call function
- toValueType
ensemble data type as described in Opal
- genericType
ensemble data type which valueType belongs
...
Details
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
See Also
Examples
{head(valueType_list)}Return the valueType of an object
Description
Determines the valueType of an object based ontypeof() andclass(). The possible values returned are 'date', 'datetime', 'boolean','integer', 'decimal', and 'text'.
Usage
valueType_of(x)Arguments
x | Object. Can be a vector. |
Details
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A character string which is the valueType of the input object.
See Also
typeof(),class()Opal documentation
Examples
{# use madshapR_examples provided by the packagedataset <- madshapR_examples$`dataset_example`valueType_of(dataset[['part_id']])# any data frame can be dataset by definitionvalueType_of(iris[['Sepal.Length']])}Self-adjust the valueType from a data dictionary or a dataset.
Description
It is sometimes useful to take variable valueType’s from a dataset andattribute them to the associated data dictionary, or vice versa.valueType_self_adjust() takes the valueType guessed of the input andattributes it to itself. The parameter can be either a dataset or adata dictionary. Depending on the input provided, the valueType replaced iseither in the 'valueType' column of the data dictionary or cast to a columnin the dataset. The possible values of the valueTypes returned are'date','datetime', 'boolean', 'integer', 'decimal', and text'.
Usage
valueType_self_adjust(...)Arguments
... | Object that can be either a dataset or a data dictionary. |
Details
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
Either a data frame, identifying the dataset, or a list of data frame(s)identifying a data dictionary, depending which the input refers to.
See Also
Examples
{# use madshapR_examples provided by the package###### Example 1: The valueType of a dataset can be adjusted. each column is# evaluated as whole, and the best valueType match found is applied. If # there is no better match found, the column is left as it is.dataset <- madshapR_examples$`dataset_example`dataset <- valueType_self_adjust(dataset["gndr"])head(dataset)###### Example 2: Aany data frame can be dataset by definitiondataset <- valueType_self_adjust(mtcars)head(dataset)}Generate a list of charts, figures and summary tables of a variable
Description
Analyses the content of a variable and its data dictionary (if any),identifies its data type and values accordingly and generates figures andsummaries (datatable format). The figures and tables are representations ofdata distribution, statistics and valid/non valid/empty values (based onthe data dictionary information if provided and the data type of thevariable). This function can be used to personalize report parameters and isinternally used in the functiondataset_visualize(). Up to seven objectsare generated which include : One datatable of the key elements of thedata dictionary, one datatable summarizing statistics (such as mean,quartile, most common values, most recent date, ... , depending on thedata type of the variable), two graphs showing the distribution of thevariable, One bar chart for categorical values (if any), One bar chart fornon valid values (if any), One pie chart for the proportion of valid andnon-valid values (if any). The variable can be grouped usinggroup_byparameter, which is a (categorical) column in the dataset. The user may needto useas_category() in this context. To fasten the process (and allowrecycling object in a workflow) the user can feed the function with avariable_summary, which is the output of the functiondataset_summarize()of the column(s)col andgroup_by. The summary must have the sameparameters to operate.
Usage
variable_visualize( dataset, col, data_dict = data_dict_extract(dataset), group_by = attributes(variable_summary)[["madshapR_group::group_by"]], variable_summary = NULL, valueType_guess = TRUE)Arguments
dataset | A dataset object. |
col | A character string specifying the name of the column. |
data_dict | A list of data frame(s) representing metadata of the inputdataset. Automatically generated if not provided. |
group_by | A character string identifying the column in the datasetto use as a grouping variable. Elements will be grouped by thiscolumn. |
variable_summary | A summary list which is the summary of the variables. |
valueType_guess | Whether the output should include a more accuratevalueType that could be applied to the dataset. TRUE by default. |
Details
A dataset is a data table containing variables. A dataset object is adata frame and can be associated with a data dictionary. If nodata dictionary is provided with a dataset, a minimum workabledata dictionary will be generated as needed within relevant functions.Identifier variable(s) for indexing can be specified by the user.The id values must be non-missing and will be used in functions thatrequire it. If no identifier variable is specified, indexing ishandled automatically by the function.
A data dictionary contains the list of variables in a dataset and metadataabout the variables and can be associated with a dataset. A data dictionaryobject is a list of data frame(s) named 'Variables' (required) and'Categories' (if any). To be usable in any function, the data frame'Variables' must contain at least thename column, with all unique andnon-missing entries, and the data frame 'Categories' must contain at leastthevariable andname columns, with unique combination ofvariable andname.
The valueType is a declared property of a variable that is required incertain functions to determine handling of the variables. Specifically,valueType refers to theOBiBa data type of a variable.The valueType is specified in a data dictionary in a column 'valueType' andcan be associated with variables as attributes. Acceptable valueTypesinclude 'text', 'integer', 'decimal', 'boolean', datetime', 'date'. The fulllist of OBiBa valueType possibilities and their correspondence with R datatypes are available usingvalueType_list. The valueType can be used tocoerce the variable to the corresponding data type.
Value
A list of up to seven elements (charts and figures and datatables) which canbe used to summarize visualize data.
See Also
DT::datatable(),ggplot2::ggplot()dataset_summarize(),dataset_visualize()
Examples
library(dplyr)library(fs)# use madshapR_examples provided by the package dataset <- madshapR_examples$`dataset_example` %>% group_by(pick('gndr')) %>% as_dataset(col_id = "part_id") data_dict <- madshapR_examples$`data_dictionary_example`variable_summary <- dataset_summarize(dataset,data_dict) plots <- variable_visualize( dataset,data_dict, col = 'prg_ever', variable_summary = variable_summary,valueType_guess = TRUE) print(plots$main_values_1)