Movatterモバイル変換


[0]ホーム

URL:


Applying Functions

Apply Functions tomatrixset Matrices

There are two ways to apply functions to the matrices of amatrixset object. The first one is through theapply_* family, which will be covered here.

The second is throughmutate_matrix(), covered in thenext section.

There are 3 functions in theapply_* family:

Each of these function will loop on thematrixsetobject’s matrices to apply the functions. In the case ofapply_row() andapply_column(), an additionalloop on the margin (row or column, as applicable) is executed, so thatthe functions are applied to each matrix and margin.

To see the functions in action, we will use the following object:

animals_ms#> matrixset of 2 28 × 2 matrices#>#> matrix_set: msr#> A 28 × 2 <dbl> matrix#>                   body  brain#> Mountain beaver   1.35   8.10#>             ...    ...    ...#>             Pig 192.00 180.00#>#> matrix_set: log_msr#> A 28 × 2 <dbl> matrix#>                 body brain#> Mountain beaver 0.30  2.09#>             ...  ...   ...#>             Pig 5.26  5.19#>#>#> row_info:#> # A tibble: 28 × 3#>    .rowname        is_extinct class#>    <chr>           <lgl>      <chr>#>  1 Mountain beaver FALSE      Rodent#>  2 Cow             FALSE      Ruminant#>  3 Grey wolf       FALSE      Canine#>  4 Goat            FALSE      Ruminant#>  5 Guinea pig      FALSE      Rodent#>  6 Dipliodocus     TRUE       Dinosaurs#>  7 Asian elephant  FALSE      Elephantidae#>  8 Donkey          FALSE      Equidae#>  9 Horse           FALSE      Equidae#> 10 Potar monkey    FALSE      Primate#> # ℹ 18 more rows#>#>#> column_info:#> # A tibble: 2 × 2#>   .colname unit#>   <chr>    <chr>#> 1 body     kg#> 2 brain    g

We will use the following custom printing functions for compactnesspurposes.

show_matrix<-function(x) {if (nrow(x)>4) {    newx<-head(x,4)storage.mode(newx)<-"character"    newx<-rbind(newx,rep("...",ncol(x)))  }else newx<- x  newx}show_vector<-function(x) {  newx<-if (length(x)>4) {c(as.character(x[1:4]),"...")  }else x  newx}show_lst<-function(x) {lapply(x,function(u) {if (is.matrix(u))show_matrix(u)elseif (is.vector(u))show_vector(u)else u  })}

So now, let’s see theapply_matrix() in action.

library(magrittr)library(purrr)out<- animals_ms%>%apply_matrix(exp,~mean(.m,trim=.1),foo=asinh,pow =~2^.m,reg =~ {                  is_alive<-!is_extinctlm(.m~ is_alive+ class)                  })#> Warning: Formatting NULL matrices was deprecated in matrixset 0.4.0.#> This warning is displayed once every 8 hours.#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was#> generated.# out[[1]] %>% map(~ if (is.matrix(.x)) {head(.x, 5)} else .x)show_lst(out[[1]])#> $exp#>                 body                    brain#> Mountain beaver "3.85742553069697"      "3294.46807528384"#> Cow             "8.84981281719581e+201" "5.08821958272978e+183"#> Grey wolf       "5996785676464821"      "7.91025688556692e+51"#> Goat            "1029402857448.45"      "8.78750163583702e+49"#>                 "..."                   "..."#>#> $`mean(.m, trim = 0.1)`#> [1] 335.1291#>#> $foo#>                 body               brain#> Mountain beaver "1.10857244179685" "2.78880004092018"#> Cow             "6.83518574234833" "6.74052075680554"#> Grey wolf       "4.28598038575143" "5.47648105816811"#> Goat            "4.01346111184316" "5.43809821197888"#>                 "..."              "..."#>#> $pow#>                 body                    brain#> Mountain beaver "2.54912125463852"      "274.374006409291"#> Cow             "9.52682052708738e+139" "2.16614819853189e+127"#> Grey wolf       "86381301347.2935"      "9.39906129562518e+35"#> Goat            "212075099.808884"      "4.15383748682786e+34"#>                 "..."                   "..."#>#> $reg#>#> Call:#> lm(formula = .m ~ is_alive + class)#>#> Coefficients:#>                    body       brain#> (Intercept)         36033.33      91.50#> is_aliveTRUE       -35997.00      28.00#> classDinosaurs            NA         NA#> classElephantidae    4564.17    5038.00#> classEquidae          317.72     417.50#> classFeline            15.32     -28.20#> classMacropodidae      -1.33     -63.50#> classPrimate           31.26     372.50#> classRodent           -35.44    -114.67#> classRuminant         232.96     228.75#> classSus              155.67      60.50#> classTalpidae         -36.21    -116.50

We have showcased several features of theapply_*functions:

You probably have noticed the use of.m. This is apronoun that is accessible insideapply_matrix() and refersto the current matrix in the internal loop. Similar pronouns exists forapply_row() andapply_column(), and they arerespecticely.i and.j.

The returned object is a list of lists. The first layer is for eachmatrix and the second layer is for each function call.

Let’s now showcase the row/column version with aapply_column() example:

out<- animals_ms%>%apply_column(exp,~mean(.j,trim=.1),foo=asinh,pow =~2^.j,reg =~ {                  is_alive<-!is_extinctlm(.j~ is_alive+ class)                  })out[[1]]%>%map(show_lst)#> $body#> $body$exp#> [1] "3.85742553069697"      "8.84981281719581e+201" "5996785676464821"#> [4] "1029402857448.45"      "..."#>#> $body$`mean(.j, trim = 0.1)`#> [1] 879.0059#>#> $body$foo#> [1] "1.10857244179685" "6.83518574234833" "4.28598038575143" "4.01346111184316"#> [5] "..."#>#> $body$pow#> [1] "2.54912125463852"      "9.52682052708738e+139" "86381301347.2935"#> [4] "212075099.808884"      "..."#>#> $body$reg#>#> Call:#> lm(formula = .j ~ is_alive + class)#>#> Coefficients:#>       (Intercept)       is_aliveTRUE     classDinosaurs  classElephantidae#>          36033.33          -35997.00                 NA            4564.17#>      classEquidae        classFeline  classMacropodidae       classPrimate#>            317.72              15.32              -1.33              31.26#>       classRodent      classRuminant           classSus      classTalpidae#>            -35.44             232.96             155.67             -36.21#>#>#>#> $brain#> $brain$exp#> [1] "3294.46807528384"      "5.08821958272978e+183" "7.91025688556692e+51"#> [4] "8.78750163583702e+49"  "..."#>#> $brain$`mean(.j, trim = 0.1)`#> [1] 240.425#>#> $brain$foo#> [1] "2.78880004092018" "6.74052075680554" "5.47648105816811" "5.43809821197888"#> [5] "..."#>#> $brain$pow#> [1] "274.374006409291"      "2.16614819853189e+127" "9.39906129562518e+35"#> [4] "4.15383748682786e+34"  "..."#>#> $brain$reg#>#> Call:#> lm(formula = .j ~ is_alive + class)#>#> Coefficients:#>       (Intercept)       is_aliveTRUE     classDinosaurs  classElephantidae#>              91.5               28.0                 NA             5038.0#>      classEquidae        classFeline  classMacropodidae       classPrimate#>             417.5              -28.2              -63.5              372.5#>       classRodent      classRuminant           classSus      classTalpidae#>            -114.7              228.7               60.5             -116.5

The idea is similar, but in the returned object, there is a thirdlist layer: the first layer for the matrices, the second layer for thecolumns (it would be rows forapply_row()) and the thirdlayer for the functions.

Note as well the use of the.j pronoun instead of.m.

Grouped Data

Theapply_* functions understand data grouping and willexecute on the proper matrix/vector subsets.

animals_ms%>%row_group_by(class)%>%apply_matrix(exp,~mean(.m,trim=.1),foo=asinh,pow =~2^.m,reg =~ {                 is_alive<-!is_extinctlm(.m~ is_alive)                 })#> $msr#> # A tibble: 11 × 2#>    class        .vals#>    <chr>        <list>#>  1 Canine       <named list [5]>#>  2 Dinosaurs    <named list [5]>#>  3 Elephantidae <named list [5]>#>  4 Equidae      <named list [5]>#>  5 Feline       <named list [5]>#>  6 Macropodidae <named list [5]>#>  7 Primate      <named list [5]>#>  8 Rodent       <named list [5]>#>  9 Ruminant     <named list [5]>#> 10 Sus          <named list [5]>#> 11 Talpidae     <named list [5]>#>#> $log_msr#> # A tibble: 11 × 2#>    class        .vals#>    <chr>        <list>#>  1 Canine       <named list [5]>#>  2 Dinosaurs    <named list [5]>#>  3 Elephantidae <named list [5]>#>  4 Equidae      <named list [5]>#>  5 Feline       <named list [5]>#>  6 Macropodidae <named list [5]>#>  7 Primate      <named list [5]>#>  8 Rodent       <named list [5]>#>  9 Ruminant     <named list [5]>#> 10 Sus          <named list [5]>#> 11 Talpidae     <named list [5]>

As one can see, the output format differs in situation of grouping.We still end up with a list with an element for each matrix, but each ofthese element is now atibble.

Each tibble has a column called.vals, where thefunction results are stored. This column is a list, one element pergroup. The group labels are given by the other columns of the tibble.For a given group, things are like the ungrouped version: furthersub-lists for rows/columns - if applicable - and function values.

Simplified Results

Similar to theapply() function that has asimplify argument, the output structured can be simplified,baring two conditions:

If the conditions are met, eachapply_* function has twosimplified version available:_dfl anddfw.

Below is the_dfl flavor in action. We point out twothings to notice:

animals_ms%>%apply_matrix_dfl(~mean(.m,trim=.1),MAD=mad,reg =~ {                         is_alive<-!is_extinctlist(lm(.m~ is_alive+ class))                     })#> $msr#> # A tibble: 1 × 3#>   `mean(.m, trim = 0.1)`   MAD reg#>                    <dbl> <dbl> <list>#> 1                   335.  155. <mlm>#>#> $log_msr#> # A tibble: 1 × 3#>   `mean(.m, trim = 0.1)`   MAD reg#>                    <dbl> <dbl> <list>#> 1                   4.18  2.35 <mlm>
animals_ms%>%apply_column_dfl(~mean(.j,trim=.1),MAD=mad,reg =~ {                         is_alive<-!is_extinctlist(lm(.j~ is_alive+ class))                     })#> $msr#> # A tibble: 2 × 4#>   .colname `mean(.j, trim = 0.1)`   MAD reg#>   <chr>                     <dbl> <dbl> <list>#> 1 body                       879.  79.5 <lm>#> 2 brain                      240. 193.  <lm>#>#> $log_msr#> # A tibble: 2 × 4#>   .colname `mean(.j, trim = 0.1)`   MAD reg#>   <chr>                     <dbl> <dbl> <list>#> 1 body                       3.78  3.38 <lm>#> 2 brain                      4.49  1.71 <lm>

If usingapply_column_dfw in this context, you wouldn’tnotice a difference in output format.

The difference between the two lies when the vectors are of length> 1.

animals_ms%>%apply_row_dfl(rg =~range(.i),qt =~quantile(.i,probs =c(.25, .75)))#> $msr#> # A tibble: 56 × 5#>    .rowname        rg.name     rg qt.name     qt#>    <chr>           <chr>    <dbl> <chr>    <dbl>#>  1 Mountain beaver ..1       1.35 25%       3.04#>  2 Mountain beaver ..2       8.1  75%       6.41#>  3 Cow             ..1     423    25%     434.#>  4 Cow             ..2     465    75%     454.#>  5 Grey wolf       ..1      36.3  25%      57.1#>  6 Grey wolf       ..2     120.   75%      98.7#>  7 Goat            ..1      27.7  25%      49.5#>  8 Goat            ..2     115    75%      93.2#>  9 Guinea pig      ..1       1.04 25%       2.16#> 10 Guinea pig      ..2       5.5  75%       4.38#> # ℹ 46 more rows#>#> $log_msr#> # A tibble: 56 × 5#>    .rowname        rg.name     rg qt.name    qt#>    <chr>           <chr>    <dbl> <chr>   <dbl>#>  1 Mountain beaver ..1     0.300  25%     0.748#>  2 Mountain beaver ..2     2.09   75%     1.64#>  3 Cow             ..1     6.05   25%     6.07#>  4 Cow             ..2     6.14   75%     6.12#>  5 Grey wolf       ..1     3.59   25%     3.89#>  6 Grey wolf       ..2     4.78   75%     4.49#>  7 Goat            ..1     3.32   25%     3.68#>  8 Goat            ..2     4.74   75%     4.39#>  9 Guinea pig      ..1     0.0392 25%     0.456#> 10 Guinea pig      ..2     1.70   75%     1.29#> # ℹ 46 more rows
animals_ms%>%apply_row_dfw(rg =~range(.i),qt =~quantile(.i,probs =c(.25, .75)))#> $msr#> # A tibble: 28 × 5#>    .rowname        `rg ..1` `rg ..2` `qt 25%` `qt 75%`#>    <chr>              <dbl>    <dbl>    <dbl>    <dbl>#>  1 Mountain beaver     1.35      8.1     3.04     6.41#>  2 Cow               423       465     434.     454.#>  3 Grey wolf          36.3     120.     57.1     98.7#>  4 Goat               27.7     115      49.5     93.2#>  5 Guinea pig          1.04      5.5     2.16     4.38#>  6 Dipliodocus        50     11700    2962.    8788.#>  7 Asian elephant   2547      4603    3061     4089#>  8 Donkey            187.      419     245.     361.#>  9 Horse             521       655     554.     622.#> 10 Potar monkey       10       115      36.2     88.8#> # ℹ 18 more rows#>#> $log_msr#> # A tibble: 28 × 5#>    .rowname        `rg ..1` `rg ..2` `qt 25%` `qt 75%`#>    <chr>              <dbl>    <dbl>    <dbl>    <dbl>#>  1 Mountain beaver   0.300      2.09    0.748     1.64#>  2 Cow               6.05       6.14    6.07      6.12#>  3 Grey wolf         3.59       4.78    3.89      4.49#>  4 Goat              3.32       4.74    3.68      4.39#>  5 Guinea pig        0.0392     1.70    0.456     1.29#>  6 Dipliodocus       3.91       9.37    5.28      8.00#>  7 Asian elephant    7.84       8.43    7.99      8.29#>  8 Donkey            5.23       6.04    5.43      5.84#>  9 Horse             6.26       6.48    6.31      6.43#> 10 Potar monkey      2.30       4.74    2.91      4.13#> # ℹ 18 more rows

We can observe three things:

  1. dfl stands forlong and stacks theelements of the function output into different rows, adding a column toidentify the different elements.
  2. dfw stands forwide and put the elementsof the function output into different columns.
  3. Element names are made unique if necessary.

Knowing the current context

It may happen that you need to get information about the currentgroup. For this reason, the following context functions are madeavailable:

For instance, a simple way of knowing the number of animals per groupcould be

animals_ms%>%row_group_by(class)%>%apply_matrix_dfl(n =~current_n_row())%>%    .$msr#> # A tibble: 11 × 2#>    class            n#>    <chr>        <int>#>  1 Canine           1#>  2 Dinosaurs        3#>  3 Elephantidae     2#>  4 Equidae          2#>  5 Feline           2#>  6 Macropodidae     1#>  7 Primate          5#>  8 Rodent           6#>  9 Ruminant         4#> 10 Sus              1#> 11 Talpidae         1

With common row and column annotation trait

The context functions can also be of use when one or more traits areshared (in name) between rows and columns.

Here’s a pseudo-code example:

# ms_object %>%#     apply_matrix( ~ {#       ctrt <- current_column_info()$common_trait#       rtrt <- current_row_info()$common_trait##       do something with ctrt and rtrt#     })

Pronouns, or dealing with ambiguous variables

It may happen that a variable in the calling environment shares itsname with a trait of amatrixset object.

You can make it explicit which version of the variable you are usingthe pronouns.data (the trait annotation version) and.env.

Quasi quotation

reg_expr<-expr({    is_alive<-!is_extinctlist(lm(.j~ is_alive+ class))})animals_ms%>%apply_column_dfl(~mean(.j,trim=.1),MAD=mad,reg =~!!reg_expr)#> $msr#> # A tibble: 2 × 4#>   .colname `mean(.j, trim = 0.1)`   MAD reg#>   <chr>                     <dbl> <dbl> <list>#> 1 body                       879.  79.5 <lm>#> 2 brain                      240. 193.  <lm>#>#> $log_msr#> # A tibble: 2 × 4#>   .colname `mean(.j, trim = 0.1)`   MAD reg#>   <chr>                     <dbl> <dbl> <list>#> 1 body                       3.78  3.38 <lm>#> 2 brain                      4.49  1.71 <lm>

Multivariate

mutate_matrix


[8]ページ先頭

©2009-2025 Movatter.jp