A probability mass function can be represented by a multi-dimensionalarray. However, for high-dimensional distributions where each variablemay have a large state space, lack of computer memory can become aproblem. For example, an\(80\)-dimensional random vector in whicheach variable has\(10\) levels willlead to a state space with\(10^{80}\)cells. Such a distribution can not be stored in a computer; in fact,\(10^{80}\) is one of the estimates ofthe number of atoms in the universe. However, if the array consists ofonly a few non-zero values, we need only store these values along withinformation about their location. That is, a sparse representation of atable. Sparta was created for efficient multiplication andmarginalization of sparse tables.
Consider two arraysf andg:
dn<-function(x)setNames(lapply(x, paste0,1:2),toupper(x))d<-c(2,2,2)f<-array(c(5,4,0,7,0,9,0,0), d,dn(c("x","y","z")))g<-array(c(7,6,0,6,0,0,9,0), d,dn(c("y","z","w")))with flat layouts
ftable(f,row.vars ="X")#> Y y1 y2#> Z z1 z2 z1 z2#> X#> x1 5 0 0 0#> x2 4 9 7 0ftable(g,row.vars ="W")#> Y y1 y2#> Z z1 z2 z1 z2#> W#> w1 7 0 6 6#> w2 0 9 0 0We can convert these to their equivalentspartaversions as
Printing the object by the default printing method yields
print.default(sf)#> [,1] [,2] [,3] [,4]#> X 1 2 2 2#> Y 1 1 2 1#> Z 1 1 1 2#> attr(,"vals")#> [1] 5 4 7 9#> attr(,"dim_names")#> attr(,"dim_names")$X#> [1] "x1" "x2"#>#> attr(,"dim_names")$Y#> [1] "y1" "y2"#>#> attr(,"dim_names")$Z#> [1] "z1" "z2"#>#> attr(,"class")#> [1] "sparta" "matrix"The columns are the cells in the sparse matrix and thevals attribute are the corresponding values which can beextracted with thevals function. Furthermore, the domainresides in thedim_names attribute, which can also beextracted using thedim_names function. From the output, wesee that (x2,y2,z1) has a valueof\(2\). Using thesparta print method prettifies things:
where row\(i\) corresponds tocolumn\(i\) in the sparse matrix. Theproduct ofsf andsg
Convertingsf into a conditional probability table (CPT)with conditioning variableZ:
sf_cpt<-as_cpt(sf,y ="Z"); sf_cpt#> X Y Z val#> 1 1 1 1 0.312#> 2 2 1 1 0.250#> 3 2 2 1 0.438#> 4 2 1 2 1.000Slicingsf onX1 = x1 and dropping theX dimension
reducessf to a single non-zero element, whereas theequivalent dense case would result in a(Y,Z) table withone non-zero element and three zero-elements.
Marginalizing (or summing) outY insgyields
Finally, we mention that a sparse table can be created using theconstructorsparta_struct, which can be necessary to use ifthe corresponding dense table is too large to have in memory.
| Function name | Description |
|---|---|
as_<sparta> | Convert -like object to asparta |
as_<array/df/cpt> | Convertsparta object to anarray/data.frame/CPT |
sparta_struct | Constructor forsparta objects |
mult, div, marg, slice | Multiply/divide/marginalize/slice |
normalize | Normalize (the values of the result sum to one) |
get_val | Extract the value for a specific named cell |
get_cell_name | Extract the named cell |
get_values | Extract the values |
dim_names | Extract the domain |
names | Extract the variable names |
max/min | The maximum/minimum value |
which_<max/min>_cell | The column index referring to the max/min value |
which_<max/min>_idx | The configuration corresponding to the max/minvalue |
sum | Sum the values |
equiv | Test if two tables are identical up to permutations ofthe columns |