matrixset MatricesThere are two ways to apply functions to the matrices of amatrixset object. The first one is through theapply_* family, which will be covered here.
The second is throughmutate_matrix(), covered in thenext section.
There are 3 functions in theapply_* family:
apply_matrix(): The functions must take a matrix asinput. In base R, this is similar to simply callingfun(matrix_object).apply_row(): The functions must take a vector as input.The vector will be a matrix row. In base R, this is akin toapply(matrix_object, 1, fun, simplify = FALSE).apply_column(): The functions must take a vector asinput. The vector will be a matrix column. In base R, this is similar toapply(matrix_object, 2, fun, simplify = FALSE).Each of these function will loop on thematrixsetobject’s matrices to apply the functions. In the case ofapply_row() andapply_column(), an additionalloop on the margin (row or column, as applicable) is executed, so thatthe functions are applied to each matrix and margin.
To see the functions in action, we will use the following object:
animals_ms#> matrixset of 2 28 × 2 matrices#>#> matrix_set: msr#> A 28 × 2 <dbl> matrix#> body brain#> Mountain beaver 1.35 8.10#> ... ... ...#> Pig 192.00 180.00#>#> matrix_set: log_msr#> A 28 × 2 <dbl> matrix#> body brain#> Mountain beaver 0.30 2.09#> ... ... ...#> Pig 5.26 5.19#>#>#> row_info:#> # A tibble: 28 × 3#> .rowname is_extinct class#> <chr> <lgl> <chr>#> 1 Mountain beaver FALSE Rodent#> 2 Cow FALSE Ruminant#> 3 Grey wolf FALSE Canine#> 4 Goat FALSE Ruminant#> 5 Guinea pig FALSE Rodent#> 6 Dipliodocus TRUE Dinosaurs#> 7 Asian elephant FALSE Elephantidae#> 8 Donkey FALSE Equidae#> 9 Horse FALSE Equidae#> 10 Potar monkey FALSE Primate#> # ℹ 18 more rows#>#>#> column_info:#> # A tibble: 2 × 2#> .colname unit#> <chr> <chr>#> 1 body kg#> 2 brain gWe will use the following custom printing functions for compactnesspurposes.
show_matrix<-function(x) {if (nrow(x)>4) { newx<-head(x,4)storage.mode(newx)<-"character" newx<-rbind(newx,rep("...",ncol(x))) }else newx<- x newx}show_vector<-function(x) { newx<-if (length(x)>4) {c(as.character(x[1:4]),"...") }else x newx}show_lst<-function(x) {lapply(x,function(u) {if (is.matrix(u))show_matrix(u)elseif (is.vector(u))show_vector(u)else u })}So now, let’s see theapply_matrix() in action.
library(magrittr)library(purrr)out<- animals_ms%>%apply_matrix(exp,~mean(.m,trim=.1),foo=asinh,pow =~2^.m,reg =~ { is_alive<-!is_extinctlm(.m~ is_alive+ class) })#> Warning: Formatting NULL matrices was deprecated in matrixset 0.4.0.#> This warning is displayed once every 8 hours.#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was#> generated.# out[[1]] %>% map(~ if (is.matrix(.x)) {head(.x, 5)} else .x)show_lst(out[[1]])#> $exp#> body brain#> Mountain beaver "3.85742553069697" "3294.46807528384"#> Cow "8.84981281719581e+201" "5.08821958272978e+183"#> Grey wolf "5996785676464821" "7.91025688556692e+51"#> Goat "1029402857448.45" "8.78750163583702e+49"#> "..." "..."#>#> $`mean(.m, trim = 0.1)`#> [1] 335.1291#>#> $foo#> body brain#> Mountain beaver "1.10857244179685" "2.78880004092018"#> Cow "6.83518574234833" "6.74052075680554"#> Grey wolf "4.28598038575143" "5.47648105816811"#> Goat "4.01346111184316" "5.43809821197888"#> "..." "..."#>#> $pow#> body brain#> Mountain beaver "2.54912125463852" "274.374006409291"#> Cow "9.52682052708738e+139" "2.16614819853189e+127"#> Grey wolf "86381301347.2935" "9.39906129562518e+35"#> Goat "212075099.808884" "4.15383748682786e+34"#> "..." "..."#>#> $reg#>#> Call:#> lm(formula = .m ~ is_alive + class)#>#> Coefficients:#> body brain#> (Intercept) 36033.33 91.50#> is_aliveTRUE -35997.00 28.00#> classDinosaurs NA NA#> classElephantidae 4564.17 5038.00#> classEquidae 317.72 417.50#> classFeline 15.32 -28.20#> classMacropodidae -1.33 -63.50#> classPrimate 31.26 372.50#> classRodent -35.44 -114.67#> classRuminant 232.96 228.75#> classSus 155.67 60.50#> classTalpidae -36.21 -116.50We have showcased several features of theapply_*functions:
You probably have noticed the use of.m. This is apronoun that is accessible insideapply_matrix() and refersto the current matrix in the internal loop. Similar pronouns exists forapply_row() andapply_column(), and they arerespecticely.i and.j.
The returned object is a list of lists. The first layer is for eachmatrix and the second layer is for each function call.
Let’s now showcase the row/column version with aapply_column() example:
out<- animals_ms%>%apply_column(exp,~mean(.j,trim=.1),foo=asinh,pow =~2^.j,reg =~ { is_alive<-!is_extinctlm(.j~ is_alive+ class) })out[[1]]%>%map(show_lst)#> $body#> $body$exp#> [1] "3.85742553069697" "8.84981281719581e+201" "5996785676464821"#> [4] "1029402857448.45" "..."#>#> $body$`mean(.j, trim = 0.1)`#> [1] 879.0059#>#> $body$foo#> [1] "1.10857244179685" "6.83518574234833" "4.28598038575143" "4.01346111184316"#> [5] "..."#>#> $body$pow#> [1] "2.54912125463852" "9.52682052708738e+139" "86381301347.2935"#> [4] "212075099.808884" "..."#>#> $body$reg#>#> Call:#> lm(formula = .j ~ is_alive + class)#>#> Coefficients:#> (Intercept) is_aliveTRUE classDinosaurs classElephantidae#> 36033.33 -35997.00 NA 4564.17#> classEquidae classFeline classMacropodidae classPrimate#> 317.72 15.32 -1.33 31.26#> classRodent classRuminant classSus classTalpidae#> -35.44 232.96 155.67 -36.21#>#>#>#> $brain#> $brain$exp#> [1] "3294.46807528384" "5.08821958272978e+183" "7.91025688556692e+51"#> [4] "8.78750163583702e+49" "..."#>#> $brain$`mean(.j, trim = 0.1)`#> [1] 240.425#>#> $brain$foo#> [1] "2.78880004092018" "6.74052075680554" "5.47648105816811" "5.43809821197888"#> [5] "..."#>#> $brain$pow#> [1] "274.374006409291" "2.16614819853189e+127" "9.39906129562518e+35"#> [4] "4.15383748682786e+34" "..."#>#> $brain$reg#>#> Call:#> lm(formula = .j ~ is_alive + class)#>#> Coefficients:#> (Intercept) is_aliveTRUE classDinosaurs classElephantidae#> 91.5 28.0 NA 5038.0#> classEquidae classFeline classMacropodidae classPrimate#> 417.5 -28.2 -63.5 372.5#> classRodent classRuminant classSus classTalpidae#> -114.7 228.7 60.5 -116.5The idea is similar, but in the returned object, there is a thirdlist layer: the first layer for the matrices, the second layer for thecolumns (it would be rows forapply_row()) and the thirdlayer for the functions.
Note as well the use of the.j pronoun instead of.m.
Theapply_* functions understand data grouping and willexecute on the proper matrix/vector subsets.
animals_ms%>%row_group_by(class)%>%apply_matrix(exp,~mean(.m,trim=.1),foo=asinh,pow =~2^.m,reg =~ { is_alive<-!is_extinctlm(.m~ is_alive) })#> $msr#> # A tibble: 11 × 2#> class .vals#> <chr> <list>#> 1 Canine <named list [5]>#> 2 Dinosaurs <named list [5]>#> 3 Elephantidae <named list [5]>#> 4 Equidae <named list [5]>#> 5 Feline <named list [5]>#> 6 Macropodidae <named list [5]>#> 7 Primate <named list [5]>#> 8 Rodent <named list [5]>#> 9 Ruminant <named list [5]>#> 10 Sus <named list [5]>#> 11 Talpidae <named list [5]>#>#> $log_msr#> # A tibble: 11 × 2#> class .vals#> <chr> <list>#> 1 Canine <named list [5]>#> 2 Dinosaurs <named list [5]>#> 3 Elephantidae <named list [5]>#> 4 Equidae <named list [5]>#> 5 Feline <named list [5]>#> 6 Macropodidae <named list [5]>#> 7 Primate <named list [5]>#> 8 Rodent <named list [5]>#> 9 Ruminant <named list [5]>#> 10 Sus <named list [5]>#> 11 Talpidae <named list [5]>As one can see, the output format differs in situation of grouping.We still end up with a list with an element for each matrix, but each ofthese element is now atibble.
Each tibble has a column called.vals, where thefunction results are stored. This column is a list, one element pergroup. The group labels are given by the other columns of the tibble.For a given group, things are like the ungrouped version: furthersub-lists for rows/columns - if applicable - and function values.
Similar to theapply() function that has asimplify argument, the output structured can be simplified,baring two conditions:
is.vector returnsTRUE.If the conditions are met, eachapply_* function has twosimplified version available:_dfl anddfw.
Below is the_dfl flavor in action. We point out twothings to notice:
apply_column_dfl (and_dfw), a.column column stores the column ID (.row forapply_row_*).lm result in alist so thatthe outcome is vector.animals_ms%>%apply_matrix_dfl(~mean(.m,trim=.1),MAD=mad,reg =~ { is_alive<-!is_extinctlist(lm(.m~ is_alive+ class)) })#> $msr#> # A tibble: 1 × 3#> `mean(.m, trim = 0.1)` MAD reg#> <dbl> <dbl> <list>#> 1 335. 155. <mlm>#>#> $log_msr#> # A tibble: 1 × 3#> `mean(.m, trim = 0.1)` MAD reg#> <dbl> <dbl> <list>#> 1 4.18 2.35 <mlm>animals_ms%>%apply_column_dfl(~mean(.j,trim=.1),MAD=mad,reg =~ { is_alive<-!is_extinctlist(lm(.j~ is_alive+ class)) })#> $msr#> # A tibble: 2 × 4#> .colname `mean(.j, trim = 0.1)` MAD reg#> <chr> <dbl> <dbl> <list>#> 1 body 879. 79.5 <lm>#> 2 brain 240. 193. <lm>#>#> $log_msr#> # A tibble: 2 × 4#> .colname `mean(.j, trim = 0.1)` MAD reg#> <chr> <dbl> <dbl> <list>#> 1 body 3.78 3.38 <lm>#> 2 brain 4.49 1.71 <lm>If usingapply_column_dfw in this context, you wouldn’tnotice a difference in output format.
The difference between the two lies when the vectors are of length> 1.
animals_ms%>%apply_row_dfl(rg =~range(.i),qt =~quantile(.i,probs =c(.25, .75)))#> $msr#> # A tibble: 56 × 5#> .rowname rg.name rg qt.name qt#> <chr> <chr> <dbl> <chr> <dbl>#> 1 Mountain beaver ..1 1.35 25% 3.04#> 2 Mountain beaver ..2 8.1 75% 6.41#> 3 Cow ..1 423 25% 434.#> 4 Cow ..2 465 75% 454.#> 5 Grey wolf ..1 36.3 25% 57.1#> 6 Grey wolf ..2 120. 75% 98.7#> 7 Goat ..1 27.7 25% 49.5#> 8 Goat ..2 115 75% 93.2#> 9 Guinea pig ..1 1.04 25% 2.16#> 10 Guinea pig ..2 5.5 75% 4.38#> # ℹ 46 more rows#>#> $log_msr#> # A tibble: 56 × 5#> .rowname rg.name rg qt.name qt#> <chr> <chr> <dbl> <chr> <dbl>#> 1 Mountain beaver ..1 0.300 25% 0.748#> 2 Mountain beaver ..2 2.09 75% 1.64#> 3 Cow ..1 6.05 25% 6.07#> 4 Cow ..2 6.14 75% 6.12#> 5 Grey wolf ..1 3.59 25% 3.89#> 6 Grey wolf ..2 4.78 75% 4.49#> 7 Goat ..1 3.32 25% 3.68#> 8 Goat ..2 4.74 75% 4.39#> 9 Guinea pig ..1 0.0392 25% 0.456#> 10 Guinea pig ..2 1.70 75% 1.29#> # ℹ 46 more rowsanimals_ms%>%apply_row_dfw(rg =~range(.i),qt =~quantile(.i,probs =c(.25, .75)))#> $msr#> # A tibble: 28 × 5#> .rowname `rg ..1` `rg ..2` `qt 25%` `qt 75%`#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 Mountain beaver 1.35 8.1 3.04 6.41#> 2 Cow 423 465 434. 454.#> 3 Grey wolf 36.3 120. 57.1 98.7#> 4 Goat 27.7 115 49.5 93.2#> 5 Guinea pig 1.04 5.5 2.16 4.38#> 6 Dipliodocus 50 11700 2962. 8788.#> 7 Asian elephant 2547 4603 3061 4089#> 8 Donkey 187. 419 245. 361.#> 9 Horse 521 655 554. 622.#> 10 Potar monkey 10 115 36.2 88.8#> # ℹ 18 more rows#>#> $log_msr#> # A tibble: 28 × 5#> .rowname `rg ..1` `rg ..2` `qt 25%` `qt 75%`#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 Mountain beaver 0.300 2.09 0.748 1.64#> 2 Cow 6.05 6.14 6.07 6.12#> 3 Grey wolf 3.59 4.78 3.89 4.49#> 4 Goat 3.32 4.74 3.68 4.39#> 5 Guinea pig 0.0392 1.70 0.456 1.29#> 6 Dipliodocus 3.91 9.37 5.28 8.00#> 7 Asian elephant 7.84 8.43 7.99 8.29#> 8 Donkey 5.23 6.04 5.43 5.84#> 9 Horse 6.26 6.48 6.31 6.43#> 10 Potar monkey 2.30 4.74 2.91 4.13#> # ℹ 18 more rowsWe can observe three things:
It may happen that you need to get information about the currentgroup. For this reason, the following context functions are madeavailable:
current_n_row() andcurrent_n_column().They each give the number of rows and columns, respectively, of thecurrent matrix.
They are the context equivalent ofnrow() andncol().
current_row_info() andcurrent_column_info(). They give access to the currentrow/column annotation data frame. The are the context equivlent ofrow_info() andcolumn_info().
row_pos() andcolumn_pos(). They givethe current row/column indices. The indices are the the ones beforematrix subsetting.
row_rel_pos() andcolumn_rel_pos().They give the row/column indices relative to the current matrix. Theyare equivalent toseq_len(current_n_row())/seq_len(current_n_column()).
For instance, a simple way of knowing the number of animals per groupcould be
The context functions can also be of use when one or more traits areshared (in name) between rows and columns.
Here’s a pseudo-code example:
It may happen that a variable in the calling environment shares itsname with a trait of amatrixset object.
You can make it explicit which version of the variable you are usingthe pronouns.data (the trait annotation version) and.env.
reg_expr<-expr({ is_alive<-!is_extinctlist(lm(.j~ is_alive+ class))})animals_ms%>%apply_column_dfl(~mean(.j,trim=.1),MAD=mad,reg =~!!reg_expr)#> $msr#> # A tibble: 2 × 4#> .colname `mean(.j, trim = 0.1)` MAD reg#> <chr> <dbl> <dbl> <list>#> 1 body 879. 79.5 <lm>#> 2 brain 240. 193. <lm>#>#> $log_msr#> # A tibble: 2 × 4#> .colname `mean(.j, trim = 0.1)` MAD reg#> <chr> <dbl> <dbl> <list>#> 1 body 3.78 3.38 <lm>#> 2 brain 4.49 1.71 <lm>