Movatterモバイル変換


[0]ホーム

URL:


Introduction to mrap

100% AI-free: we did not use any AI technologies indeveloping this package.

library(mrap)

The goal of mrap is to provide wrapper functions to reduce the user’seffort in writing machine-readable data with thedtreg package. Theset of all-in-one wrappers will cover functions fromstatsand other well-known packages. These are very easy to use, seeExample III: an all-in-one wrapper for anova. Thepackage also contains wrappers for analytical schemata used byTIB Knowledge Loom. Thisvignette discusses in detail how to apply such a wrapper to write theresults of your data analysis as JSON-LD in five steps:

1. Select a wrapper

To select a wrapper for an analytical schema, please check thehelp page. Forinstance, for a t-test you will need agroup_comparisonwrapper.

2. Check arguments

The wrappers are very easy in use, when the required arguments arespecified correctly, which is crucial for transparent reporting ofresults. This section explains how to do it.

2.1. Code string

Argumentcode_string should be a string (in R, acharacter vector). The argument cannot be omitted; please indicate “N/A”if this information is not provided. InExampleI, we use the following codestring:'stats::t.test(setosa, virginica, var.equal = FALSE)'

Package name

To specify the name of the package in the code is always a goodpractice. In mrap, we made it a requirement, and you will get an errormessage if thecode_string does not containpackage::function. In most cases, it is the beginning ofthe string, but we allow for generic method summary, in this case it issummary(package::function(formula)). For base R, pleaseindicatebase::.

Data name

Your data can be a string (URL), a named list, or a data frame (seeInput data below). In case of a string, you can addthe data name manually (seeModify the instance);if your data is a named list, as inExample I,mrap easily extracts the elements’ names. In these cases, thecode_string does not play a role, and the data name is notspecified in it. However, if your data is a single data frame, and youwant mrap to extract its name from thecode_string, pleaseindicate it as'data = dataset_name'(e.g.,'data = iris'), although most R packages allow for merelydataset_name.

Target variable(s)

Our wrappers extract the name of a target variable from thecode_string if the variable is before the~sign in the formula:

"package::function(Petal.Length ~ Species), data = iris""package::function(iris$Petal.Length ~ iris$Species), data = iris"

We also allow for a few target variables in special cases such asMANOVA:

"package::function(cbind(Petal.Length, Petal.Width) ~ Species), data = iris"

Alternatively, a target variable can be explicitly specified in twoor more vectors:

"package::function(setosa$Petal.Length, virginica$Petal.Length)"

In the following case we cannot extract the name, and you can add thetarget label manually to the instance:

"package::function(one_vector, another_vector)"

You will get a warning reminding to do it.

Level variable(s)

Incode_string, level variable is recognized by ourwrappers in “x | level” or “x || level” syntax:

"lme4::lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)""lme4::lmer(Reaction ~ Days + (Days || Subject), data = sleepstudy)"

A level can be written more than once in a formula, in this case mrapalso recognizes it:

"lme4::lmer(math ~ homework + (homework | schid) + (class_size | schid))"

More than one level is possible, mrap will capture all levelnames:

"lme4::lmer(math ~ homework + (1 | schid) + (1 | classid))"

If we cannot extract the name, you will get a warning reminding youto add the level label manually to the instance.

2.2. Input data

Argumentinput_data can be:

is.character("ABC")
is.data.frame(iris)
species_list<-list("setosa"= setosa,"virginica"= virginica)# check it is a listis.list(species_list)# check that the list is namednames(species_list)

Please be sure that the argument is one of these three types. Youwill get an error message if a type is wrong (for instance, a listinstead of a named list).

2.3. Test results or named list results

Argumenttest_results can be either a data frame or alist of data frames. You can check whether you are writing down theargument correctly. For a data frame:

is.data.frame(iris)

For a list of data frames:

# assume you have a few data frames in a listiris_new<- iris[,-1]my_results<-list(iris, iris_new)# check each of them in a loopfor (elementin my_results) {print(is.data.frame(element))}

Argumentnamed_list_results is only used for thealgorithm_evaluation schema.

3. Create an instance

Now when we know which arguments to use, let us create agroup_comparison instance as inExampleI:

inst_gc<-  mrap::group_comparison("stats::t.test(setosa, virginica, var.equal = FALSE)",list("setosa"= setosa,"virginica"= virginica),    df_results  )

Here, thecode_string is a string and contains thepackage name; there is no need for the data name as theinput data argument is specified as a named list; and thetest_result argument is a data frame.

4. Modify the instance

For the instance specified above, you will receive a warning message:“Target label is not available, you can set it manually”. Let us add thetarget name:

inst_gc$targets<-"Petal.Length"

This is how you can add or correct any information after creating aninstance.

5. Include the instance into the overarchingdata_analysis instance

Thedata_analysis instance should include all analyticinstances. For one instance:

inst_da<- mrap::data_analysis(inst_gc)

For more than one instance, use a list:

inst_da_all<- mrap::data_analysis(list(inst_preprocessing, inst_regression))

6. Write JSON-LD

json<- mrap::to_jsonld(inst_da)write(json,"data-analysis-1.json")

Example I: group comparison

Let us assume you conducted a t-test on the Iris data comparing petallength in setosa and virginica species:

data(iris)library(dplyr)setosa<- iris|>  dplyr::filter(Species=="setosa")|>  dplyr::select(Petal.Length)virginica<- iris|>  dplyr::filter(Species=="virginica")|>  dplyr::select(Petal.Length)tt<- stats::t.test(setosa, virginica,var.equal =FALSE)

The results of the test should be presented as a data frame:

df_results<-data.frame(t.statistic = tt$statistic,df = tt$parameter,p.value = tt$p.value)rownames(df_results)<-"value"

Now, let us follow the steps described above to create agroup_comparison instance, modify it, include indata_analysis instance, and write it as a JSON-LD file:

inst_gc<-  mrap::group_comparison("stats::t.test(setosa, virginica, var.equal = FALSE)",list("setosa"= setosa,"virginica"= virginica),    df_results  )inst_gc$targets<-"Petal.Length"inst_da<- mrap::data_analysis(inst_gc)json<- mrap::to_jsonld(inst_da)write(json,"data-analysis-1.json")

Example II: algorithm evaluation

To report an algorithm performance, you write the evaluation resultsas a named list:

eval_results<-list(F1 =0.46,recall =0.51)

Typically, there is no specific line of code to report ascode_string, therefore “N/A” is allowed, as explained intheCode string section above. The data isreported as a URL string:

inst_ae<-algorithm_evaluation("N/A","data_url", eval_results)

You need to add the name of the algorithm and the task manually:

inst_ae$evaluates<-"my_algorithm_name"inst_ae$evaluates_for<-"Classification"

This can be further included in thedata_analysis instance and written asJSON-LD file as explained above.

Example III: an all-in-one wrapper for anova

Currently, mrap contains an all-in-one wrapper forstats::aov function, and more such wrappers will be addedin the future. Let us assume you are currently usingstats::aov for conducting your ANOVA tests:

data(iris)anova_stats_results<- stats::aov(Petal.Length~ Species,data = iris)

The all-in-one wrapper is as easy in use as the originalfunction:

aov<- mrap::stats_aov(Petal.Length~ Species,data = iris)

The wrapper returns a list, the first element of which is theresulting object from the original function:

anova_mrap_results<- aov$anova

The second element is agroup_comparison instance:

inst_gc_anova<- aov$dtreg_object

The instance includes all required information. Of course, there isstill a possibility to modify it, e.g., to add a label:

inst_gc_anova$label<-"my_fancy_results"

This can be further included in thedata_analysis instance and written asJSON-LD file as explained above.


[8]ページ先頭

©2009-2025 Movatter.jp