A probability mass function can be represented by a multi-dimensionalarray. However, for high-dimensional distributions where each variablemay have a large state space, lack of computer memory can become aproblem. For example, an\(80\)-dimensional random vector in whicheach variable has\(10\) levels willlead to a state space with\(10^{80}\)cells. Such a distribution can not be stored in a computer; in fact,\(10^{80}\) is one of the estimates ofthe number of atoms in the universe. However, if the array consists ofonly a few non-zero values, we need only store these values along withinformation about their location. That is, a sparse representation of atable. Sparta was created for efficient multiplication andmarginalization of sparse tables.

How to use sparta

library(sparta)

Consider two arraysf andg:

dn<-function(x)setNames(lapply(x, paste0,1:2),toupper(x))d<-c(2,2,2)f<-array(c(5,4,0,7,0,9,0,0), d,dn(c("x","y","z")))g<-array(c(7,6,0,6,0,0,9,0), d,dn(c("y","z","w")))

with flat layouts

ftable(f,row.vars ="X")#>    Y y1    y2#>    Z z1 z2 z1 z2#> X#> x1    5  0  0  0#> x2    4  9  7  0ftable(g,row.vars ="W")#>    Y y1    y2#>    Z z1 z2 z1 z2#> W#> w1    7  0  6  6#> w2    0  9  0  0

We can convert these to their equivalentspartaversions as

sf<-as_sparta(f); sg<-as_sparta(g)

Printing the object by the default printing method yields

print.default(sf)#>   [,1] [,2] [,3] [,4]#> X    1    2    2    2#> Y    1    1    2    1#> Z    1    1    1    2#> attr(,"vals")#> [1] 5 4 7 9#> attr(,"dim_names")#> attr(,"dim_names")$X#> [1] "x1" "x2"#>#> attr(,"dim_names")$Y#> [1] "y1" "y2"#>#> attr(,"dim_names")$Z#> [1] "z1" "z2"#>#> attr(,"class")#> [1] "sparta" "matrix"

The columns are the cells in the sparse matrix and thevals attribute are the corresponding values which can beextracted with thevals function. Furthermore, the domainresides in thedim_names attribute, which can also beextracted using thedim_names function. From the output, wesee that (x2,y2,z1) has a valueof\(2\). Using thesparta print method prettifies things:

print(sf)#>   X Y Z val#> 1 1 1 1   5#> 2 2 1 1   4#> 3 2 2 1   7#> 4 2 1 2   9

where row\(i\) corresponds tocolumn\(i\) in the sparse matrix. Theproduct ofsf andsg

mfg<-mult(sf, sg); mfg#>   X Y Z W val#> 1 2 1 2 2  81#> 2 2 2 1 1  42#> 3 1 1 1 1  35#> 4 2 1 1 1  28

Convertingsf into a conditional probability table (CPT)with conditioning variableZ:

sf_cpt<-as_cpt(sf,y ="Z"); sf_cpt#>   X Y Z   val#> 1 1 1 1 0.312#> 2 2 1 1 0.250#> 3 2 2 1 0.438#> 4 2 1 2 1.000

Slicingsf onX1 = x1 and dropping theX dimension

slice(sf,s =c(X ="x1"),drop =TRUE)#>   Y Z val#> 1 1 1   5

reducessf to a single non-zero element, whereas theequivalent dense case would result in a(Y,Z) table withone non-zero element and three zero-elements.

Marginalizing (or summing) outY insgyields

marg(sg,y =c("Y"))#>   Z W val#> 1 2 2   9#> 2 2 1   6#> 3 1 1  13

Finally, we mention that a sparse table can be created using theconstructorsparta_struct, which can be necessary to use ifthe corresponding dense table is too large to have in memory.

Functionalities in sparta

Function name	Description
`as_<sparta>`	Convert -like object to a`sparta`
`as_<array/df/cpt>`	Convert`sparta` object to an`array/data.frame/CPT`
`sparta_struct`	Constructor for`sparta` objects
`mult, div, marg, slice`	Multiply/divide/marginalize/slice
`normalize`	Normalize (the values of the result sum to one)
`get_val`	Extract the value for a specific named cell
`get_cell_name`	Extract the named cell
`get_values`	Extract the values
`dim_names`	Extract the domain
`names`	Extract the variable names
`max/min`	The maximum/minimum value
`which_<max/min>_cell`	The column index referring to the max/min value
`which_<max/min>_idx`	The configuration corresponding to the max/minvalue
`sum`	Sum the values
`equiv`	Test if two tables are identical up to permutations ofthe columns

Movatterモバイル変換

Using sparta

How to use sparta

Functionalities in sparta