100% AI-free: we did not use any AI technologies indeveloping this package.
The goal of mrap is to provide wrapper functions to reduce the user’seffort in writing machine-readable data with thedtreg package. Theset of all-in-one wrappers will cover functions fromstatsand other well-known packages. These are very easy to use, seeExample III: an all-in-one wrapper for anova. Thepackage also contains wrappers for analytical schemata used byTIB Knowledge Loom. Thisvignette discusses in detail how to apply such a wrapper to write theresults of your data analysis as JSON-LD in five steps:
Select a wrapper for the schema you will use.
Check the types of arguments the wrapper requires.
Create an instance of the schema-related class.
Modify the instance by setting or correcting its fieldsmanually.
Write the finalised instance as a machine-readable JSON-LDfile.
To select a wrapper for an analytical schema, please check thehelp page. Forinstance, for a t-test you will need agroup_comparisonwrapper.
The wrappers are very easy in use, when the required arguments arespecified correctly, which is crucial for transparent reporting ofresults. This section explains how to do it.
Argumentcode_string should be a string (in R, acharacter vector). The argument cannot be omitted; please indicate “N/A”if this information is not provided. InExampleI, we use the following codestring:'stats::t.test(setosa, virginica, var.equal = FALSE)'
To specify the name of the package in the code is always a goodpractice. In mrap, we made it a requirement, and you will get an errormessage if thecode_string does not containpackage::function. In most cases, it is the beginning ofthe string, but we allow for generic method summary, in this case it issummary(package::function(formula)). For base R, pleaseindicatebase::.
Your data can be a string (URL), a named list, or a data frame (seeInput data below). In case of a string, you can addthe data name manually (seeModify the instance);if your data is a named list, as inExample I,mrap easily extracts the elements’ names. In these cases, thecode_string does not play a role, and the data name is notspecified in it. However, if your data is a single data frame, and youwant mrap to extract its name from thecode_string, pleaseindicate it as'data = dataset_name'(e.g.,'data = iris'), although most R packages allow for merelydataset_name.
Our wrappers extract the name of a target variable from thecode_string if the variable is before the~sign in the formula:
"package::function(Petal.Length ~ Species), data = iris""package::function(iris$Petal.Length ~ iris$Species), data = iris"We also allow for a few target variables in special cases such asMANOVA:
Alternatively, a target variable can be explicitly specified in twoor more vectors:
In the following case we cannot extract the name, and you can add thetarget label manually to the instance:
You will get a warning reminding to do it.
Incode_string, level variable is recognized by ourwrappers in “x | level” or “x || level” syntax:
"lme4::lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)""lme4::lmer(Reaction ~ Days + (Days || Subject), data = sleepstudy)"A level can be written more than once in a formula, in this case mrapalso recognizes it:
More than one level is possible, mrap will capture all levelnames:
If we cannot extract the name, you will get a warning reminding youto add the level label manually to the instance.
Argumentinput_data can be:
species_list<-list("setosa"= setosa,"virginica"= virginica)# check it is a listis.list(species_list)# check that the list is namednames(species_list)Please be sure that the argument is one of these three types. Youwill get an error message if a type is wrong (for instance, a listinstead of a named list).
Argumenttest_results can be either a data frame or alist of data frames. You can check whether you are writing down theargument correctly. For a data frame:
For a list of data frames:
# assume you have a few data frames in a listiris_new<- iris[,-1]my_results<-list(iris, iris_new)# check each of them in a loopfor (elementin my_results) {print(is.data.frame(element))}Argumentnamed_list_results is only used for thealgorithm_evaluation schema.
Now when we know which arguments to use, let us create agroup_comparison instance as inExampleI:
inst_gc<- mrap::group_comparison("stats::t.test(setosa, virginica, var.equal = FALSE)",list("setosa"= setosa,"virginica"= virginica), df_results )Here, thecode_string is a string and contains thepackage name; there is no need for the data name as theinput data argument is specified as a named list; and thetest_result argument is a data frame.
For the instance specified above, you will receive a warning message:“Target label is not available, you can set it manually”. Let us add thetarget name:
This is how you can add or correct any information after creating aninstance.
data_analysis instanceThedata_analysis instance should include all analyticinstances. For one instance:
For more than one instance, use a list:
Let us assume you conducted a t-test on the Iris data comparing petallength in setosa and virginica species:
data(iris)library(dplyr)setosa<- iris|> dplyr::filter(Species=="setosa")|> dplyr::select(Petal.Length)virginica<- iris|> dplyr::filter(Species=="virginica")|> dplyr::select(Petal.Length)tt<- stats::t.test(setosa, virginica,var.equal =FALSE)The results of the test should be presented as a data frame:
df_results<-data.frame(t.statistic = tt$statistic,df = tt$parameter,p.value = tt$p.value)rownames(df_results)<-"value"Now, let us follow the steps described above to create agroup_comparison instance, modify it, include indata_analysis instance, and write it as a JSON-LD file:
To report an algorithm performance, you write the evaluation resultsas a named list:
Typically, there is no specific line of code to report ascode_string, therefore “N/A” is allowed, as explained intheCode string section above. The data isreported as a URL string:
You need to add the name of the algorithm and the task manually:
This can be further included in thedata_analysis instance and written asJSON-LD file as explained above.
Currently, mrap contains an all-in-one wrapper forstats::aov function, and more such wrappers will be addedin the future. Let us assume you are currently usingstats::aov for conducting your ANOVA tests:
The all-in-one wrapper is as easy in use as the originalfunction:
The wrapper returns a list, the first element of which is theresulting object from the original function:
The second element is agroup_comparison instance:
The instance includes all required information. Of course, there isstill a possibility to modify it, e.g., to add a label:
This can be further included in thedata_analysis instance and written asJSON-LD file as explained above.