corybrunson/pltPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star1

R interface & extension to an adapted Persistence Landscapes Toolbox

License

GPL-3.0 license

1 star 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
R		R
docs		docs
inst/examples		inst/examples
man		man
sandbox		sandbox
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
plt.Rproj		plt.Rproj

Repository files navigation

plt

Persistence landscapes are a vectorization of persistence data/diagramsthat have useful statistical properties including linearity and an innerproduct.¹ This is an R package interface to a C++ library toefficienctly compute and calculate with persistence landscapes.²

Installation

Until the package is on CRAN, usepak to install the package fromthe GitHub repository as follows:

install.packages("pak")pak::pkg_install("corybrunson/plt")

Alternatively—and especially if you want to contribute—you can clone ordownload the code repository and, from within the directory, install thepackage from source:

devtools::install()

You should now be able to load the package normally from an R session:

library(plt)

Quickstart Guide

Theplt package supports various operations involving persistencelandscapes:

Compute persistence landscapes from persistence data
Plot persistence landscapes
Perform Hilbert space operations (scaling, addition, inner product)plus some additional queries and transformations (extremal values,absolute value, integration) on persistence landscapes
Conduct hypothesis tests on samples of persistence landscapes
Vectorize persistence landscapes for other purposes including machinelearning

Examples and tests inplt rely on other packages to simulate dataand to compute persistence diagrams from data:

tdaunif provides functions to sample uniformly from variousimmersed manifolds.
ripserr andTDA provide functions to compute persistence datafrom point clouds and distance matrices.

plt introduces the ‘Rcpp_PersistenceLandscape’ S4 class, which isexposed usingRcpp from the underlying ‘PersistenceLandscape’ C++class. Instances of this class can be created usingnew() but therecommended way is to usepl_new(). This function accepts either asingle matrix of persistence data or a specially formatted list with theclass'persistence_diagram". The$pairs entry of the list is itselfa list, of a 2-column matrix of persistence pairs for each homologicaldegree from 0 ($pairs[[1]]) to the maximum degree calculated. Thegeneric converteras_persistence() (imported fromphutil) includesmethods for outputs fromripserr::vietoris_rips() and fromTDA::*Diag(); it operates under the hood ofpl_new(), but we invokeit explicitly here for illustration.

Calculation

To begin an illustration, we noisily sample 60 points from a figureeight and compute the persistence diagram of the point cloud:

set.seed(513611L)pc<-tdaunif::sample_lemniscate_gerono(60,sd=.1)plot(pc,asp=1,pch=16L)

pd<-ripserr::vietoris_rips(pc,dim=1,threshold=2,p=2)#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.print(pd)#> PHom object containing persistence data for 63 features.#>#> Contains:#> * 59 0-dim features#> * 4 1-dim features#>#> Radius/diameter: min = 0; max = 0.63582.

We the convert the persistence data to the preferred persistence diagramformat and inspect some of its features:

pd<- as_persistence(pd)print(pd)#> 'persistence' data computed up to degree 1:#>#> * 0-degree features: 59#> * 1-degree features:  4print(head(pd$pairs[[1]]))#>      [,1]       [,2]#> [1,]    0 0.01918952#> [2,]    0 0.01947548#> [3,]    0 0.02604350#> [4,]    0 0.04218479#> [5,]    0 0.04542467#> [6,]    0 0.05941691print(head(pd$pairs[[2]]))#>           [,1]      [,2]#> [1,] 0.4809292 0.6358225#> [2,] 0.3016234 0.6075172#> [3,] 0.2504500 0.2727915#> [4,] 0.2251884 0.2300871

This allows us to compute a persistence landscape—in this case, for the1-dimensional features. Here we compute the landscape exactly, which canbe cost-prohibitive for larger persistence data, and print its summary:

pl1<- pl_new(pd,degree=1,exact=TRUE)print(pl1)#> Persistence landscape (exact format) of 2 levels over (0,0.636)summary(pl1)#> Internal representation:  exact#> Number of levels:  2#> Representation limits: ( 0.22519 , 0.6358 )#> Landscape range: ( 0 , 0.15295 )#> Magnitude:  0.0026959#> Integral:   0.029522

Some advanced concepts like the magnitude of a landscape will beexplained below.

Class

The objectpl1 is not an array, but rather an object that encapsulatesboth the data that encode a landscape and several basic operations thatcan be performed on it. This allows us to work with persistencelandscapes without worrying about pre-processing their representations.At any point, the underlying encoding of the landscape can be extractedusing$getInternal(), which in the case of an exactly calculatedlandscape returns a list of 2-column matrices, each matrix containingcoordinates that define one level of the landscape as a piecewise linearfunction:

print(length(pl1$getInternal()))#> [1] 2print(pl1$getInternal())#> [[1]]#>            [,1]        [,2]#>  [1,]      -Inf 0.000000000#>  [2,] 0.2251884 0.000000000#>  [3,] 0.2276378 0.002449358#>  [4,] 0.2300871 0.000000000#>  [5,] 0.2504500 0.000000000#>  [6,] 0.2616207 0.011170764#>  [7,] 0.2727915 0.000000000#>  [8,] 0.3016234 0.000000000#>  [9,] 0.4545703 0.152946885#> [10,] 0.5442232 0.063293964#> [11,] 0.5583759 0.077446647#> [12,] 0.6358225 0.000000000#> [13,]       Inf 0.000000000#>#> [[2]]#>           [,1]       [,2]#> [1,]      -Inf 0.00000000#> [2,] 0.4809292 0.00000000#> [3,] 0.5442232 0.06329396#> [4,] 0.6075172 0.00000000#> [5,]       Inf 0.00000000

An alternative, approximate construction computes the value of eachlevel of the landscape at each point on a 1-dimensional grid, rangingfromxmin toxmax at increments ofxby. A landscape constructedusing a discrete approximation is stored as a 3-dimensional array ofdimensions (levels, values, 2), with one level per feature (some ofwhich may be trivial) and one value per grid point, stored as$x,y$pairs along the third dimension.

b_ran<- pl_support(pl1)pl1d<- pl_new(pd,degree=1,xmin=b_ran[[1L]],xmax=b_ran[[2L]],xby=0.05)print(dim(pl1d$getInternal()))#> [1]  4 10  2print(pl1d$getInternal())#> , , 1#>#>           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]#> [1,] 0.2251884 0.2751884 0.3251884 0.3751884 0.4251884 0.4751884 0.5251884#> [2,] 0.2251884 0.2751884 0.3251884 0.3751884 0.4251884 0.4751884 0.5251884#> [3,] 0.2251884 0.2751884 0.3251884 0.3751884 0.4251884 0.4751884 0.5251884#> [4,] 0.2251884 0.2751884 0.3251884 0.3751884 0.4251884 0.4751884 0.5251884#>           [,8]      [,9]     [,10]#> [1,] 0.5751884 0.6251884 0.6751884#> [2,] 0.5751884 0.6251884 0.6751884#> [3,] 0.5751884 0.6251884 0.6751884#> [4,] 0.5751884 0.6251884 0.6751884#>#> , , 2#>#>      [,1] [,2]       [,3]       [,4]     [,5]      [,6]       [,7]       [,8]#> [1,]    0    0 0.02356502 0.07356502 0.123565 0.1323288 0.08232875 0.06063412#> [2,]    0    0 0.00000000 0.00000000 0.000000 0.0000000 0.04425917 0.03232875#> [3,]    0    0 0.00000000 0.00000000 0.000000 0.0000000 0.00000000 0.00000000#> [4,]    0    0 0.00000000 0.00000000 0.000000 0.0000000 0.00000000 0.00000000#>            [,9] [,10]#> [1,] 0.01063412     0#> [2,] 0.00000000     0#> [3,] 0.00000000     0#> [4,] 0.00000000     0

Exactly computed landscapes can be converted to discrete landscapeobjects, but the other direction is not well-defined. Below, we view aportion of the discretized exact landscape, using the default bounds andresolution given topl1:

# default conversion to discrete uses `xby = 0.001`pl1_<-pl1$discretize()print(dim(pl1_$getInternal()))#> [1]   2 636   2# print first 12 x-coordinatespl1_$getInternal()[, seq(230L,270L), ,drop=FALSE]#> , , 1#>#>       [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]#> [1,] 0.229 0.23 0.231 0.232 0.233 0.234 0.235 0.236 0.237 0.238 0.239  0.24#> [2,] 0.229 0.23 0.231 0.232 0.233 0.234 0.235 0.236 0.237 0.238 0.239  0.24#>      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]#> [1,] 0.241 0.242 0.243 0.244 0.245 0.246 0.247 0.248 0.249  0.25 0.251 0.252#> [2,] 0.241 0.242 0.243 0.244 0.245 0.246 0.247 0.248 0.249  0.25 0.251 0.252#>      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]#> [1,] 0.253 0.254 0.255 0.256 0.257 0.258 0.259  0.26 0.261 0.262 0.263 0.264#> [2,] 0.253 0.254 0.255 0.256 0.257 0.258 0.259  0.26 0.261 0.262 0.263 0.264#>      [,37] [,38] [,39] [,40] [,41]#> [1,] 0.265 0.266 0.267 0.268 0.269#> [2,] 0.265 0.266 0.267 0.268 0.269#>#> , , 2#>#>              [,1]         [,2]         [,3]         [,4]         [,5]#> [1,] 0.0006539916 5.241935e-05 4.985605e-05 4.729276e-05 4.472946e-05#> [2,] 0.0000000000 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00#>              [,6]         [,7]         [,8]         [,9]        [,10]#> [1,] 4.216616e-05 3.960287e-05 3.703957e-05 3.447627e-05 3.191297e-05#> [2,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00#>             [,11]        [,12]        [,13]        [,14]        [,15]#> [1,] 2.934968e-05 2.678638e-05 2.422308e-05 2.165979e-05 1.909649e-05#> [2,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00#>             [,16]        [,17]       [,18]      [,19]        [,20]        [,21]#> [1,] 1.653319e-05 1.396989e-05 1.14066e-05 8.8433e-06 6.280002e-06 3.716705e-06#> [2,] 0.000000e+00 0.000000e+00 0.00000e+00 0.0000e+00 0.000000e+00 0.000000e+00#>             [,22]        [,23]       [,24]       [,25]       [,26]      [,27]#> [1,] 1.153408e-06 0.0009623328 0.001923512 0.002884692 0.003845871 0.00480705#> [2,] 0.000000e+00 0.0000000000 0.000000000 0.000000000 0.000000000 0.00000000#>           [,28]       [,29]       [,30]       [,31]       [,32]      [,33]#> [1,] 0.00576823 0.006729409 0.007690589 0.008651768 0.009612947 0.01057413#> [2,] 0.00000000 0.000000000 0.000000000 0.000000000 0.000000000 0.00000000#>            [,34]      [,35]       [,36]       [,37]       [,38]       [,39]#> [1,] 0.009677368 0.00878061 0.007883851 0.006987093 0.006090334 0.005193576#> [2,] 0.000000000 0.00000000 0.000000000 0.000000000 0.000000000 0.000000000#>            [,40]       [,41]#> [1,] 0.004296817 0.003400059#> [2,] 0.000000000 0.000000000

We can also specify the bounds and the resolution of the discretization:

pl1<- pl_delimit(pl1,xmin=0,xmax=1,xby=0.1)pl1_<- pl_discretize(pl1)pl1_$getInternal()#> , , 1#>#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]#> [1,]    0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8   0.9     1#> [2,]    0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8   0.9     1#>#> , , 2#>#>      [,1] [,2] [,3]       [,4]      [,5]       [,6]        [,7] [,8] [,9] [,10]#> [1,]    0    0    0 0.09837659 0.1075172 0.03582254 0.000000000    0    0     0#> [2,]    0    0    0 0.00000000 0.0000000 0.04388611 0.003068344    0    0     0#>      [,11]#> [1,]     0#> [2,]     0

Visualization

plt provides aplot() method for the ‘Rcpp_PersistenceLandscape’class. It usesgrDevices to build color palettes, and as such itsdefault palette is viridis; but the user may supply the name of arecognized palette or a sequence of colors between which to interpolate:

n_levs<- max(pl_num_levels(pl1), pl_num_levels(pl1d))par(mfrow= c(2L,1L),mar= c(2,2,0,2))plot(pl1,palette="terrain",n_levels=n_levs,asp=1)plot(pl1d,palette="terrain",n_levels=n_levs,asp=1)

par(mfrow= c(1L,1L),mar= c(5.1,4.1,4.1,2.1))

Hilbert Space Operations

To illustrate these features, we first generate a companion data set:

# a new landscape and its discretizationset.seed(772888L)pc2<-tdaunif::sample_circle(60,sd=.1)/2pd2<-ripserr::vietoris_rips(pc2,dim=1,threshold=2,p=2)#> Warning in vietoris_rips.matrix(pc2, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.pl2<- pl_new(pd2,degree=1,exact=TRUE)pl2<- pl_delimit(pl2,xmin=0,xmax=2,xby=0.1)pl2_<- pl_discretize(pl2)

Several infix operators have been taught to work in the natural way withlandscapes, so that users can explore vector space operations and innerproducts more conveniently:

par(mfcol= c(3L,2L),mar= c(2,2,0,2))# vector space operations on exact landscapesplot(pl1*2)plot(pl2)plot(pl1*2+pl2)# vector space operations on discrete landscapesplot(pl1_)plot(-pl2_)plot(2*pl1_-pl2_)

par(mfrow= c(1L,1L),mar= c(5.1,4.1,4.1,2.1))# inner products of exact and discrete landscapespl1%*%pl2#> [1] 0.005336876pl1_%*%pl2_#> [1] 0.004917171pl1%*%pl2_#> [1] 0.004917171

(Note that the landscapes are automatically delimited to a compatibledomain.)

Calculus

Thesummary() method above reported the magnitude and the integral ofthe persistence landscapepl1. The magnitude is the inner product ofpl1 with itself:pl1 %*% pl1 = 0.0026959. Meanwhile, the integral isthe (signed) area under the curve, itself also a linear operator:

pl_integrate(pl1)#> [1] 0.02952152pl_integrate(pl2)#> [1] 0.07295263# 1-integral obeys linearitypl_integrate(pl1)*2- pl_integrate(pl2)#> [1] -0.01390959pl_integrate(pl1*2-pl2)#> [1] -0.01390959

The distance between two landscapes is defined in terms of the integralof their absolute difference for finite norms and the maximum pointwisedistance for the infinite norm. Note that, becausepl_integrate()defaults to the 1-norm andpl_distance() defaults to the 2-norm, wemust be careful when comparing their results:

# using the 1-normpl_integrate(pl_abs(pl1*2-pl2),p=1)#> [1] 0.03560593pl_distance(pl1*2,pl2,p=1)#> [1] 0.03560593# using the 2-normpl_integrate(pl_abs(pl1*2-pl2),p=2)^ (1/2)#> [1] 0.05041889pl_distance(pl1*2,pl2,p=2)#> [1] 0.05041889# using the infinity normpl_vmax(pl_abs(pl1*2-pl2))#> [1] 0.1178478 0.1265879pl_distance(pl1*2,pl2,p=Inf)#> [1] 0.1265879

The norm of a persistence landscape is then defined as its distance fromthe null landscape that is constant at zero. (The following code chunkis not evaluated, pending a debug ofpl_new() when fed an emptymatrix.)

# null landscapepd0<-data.frame(start= double(0L),end= double(0L))# FIXME: Enable landscape construction from empty persistence data.pl0<- pl_new(pd0,degree=1,exact=TRUE)pl_distance(pl1,pl0)pl_norm(pl1)

Statistical Analysis

Finally,plt implements the two hypothesis tests described in theoriginal paper. To illustrate, we first generate lists of landscapes forsamples from two spaces:

# samples of landscapes from lemniscatespl1s<- replicate(6, {pc<-tdaunif::sample_lemniscate_gerono(60,sd=.1)pd<-ripserr::vietoris_rips(pc,dim=1,threshold=2,p=2)  pl_new(pd,degree=1,xby=.01)})#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.# samples of landscapes from circlespl2s<- replicate(8, {pc<-tdaunif::sample_circle(60,sd=.1)/2pd<-ripserr::vietoris_rips(pc,dim=1,threshold=2,p=2)  pl_new(pd,degree=1,xby=.01)})#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.#> Warning in vietoris_rips.matrix(pc, dim = 1, threshold = 2, p = 2): `dim`#> parameter has been deprecated; use `max_dim` instead.

An inspection of the mean landscapes makes clear that they are distinct:

# average landscape from each samplepar(mfcol= c(2L,1L),mar= c(2,2,0,2))plot(pl_mean(pl1s))plot(pl_mean(pl2s))

par(mfrow= c(1L,1L),mar= c(5.1,4.1,4.1,2.1))

However, the two hypothesis tests use different procedures and rely ondifferent test statistics, so one may be more effective than another.For convenience, both methods return objects of class'htest' forconvenient printing:

# z-test of difference in integrals of first levelpl_z_test(pl1s,pl2s)#>#>  z-test#>#> data:#> z = -2.3584, df = 12, p-value = 0.01835#> alternative hypothesis: true difference in means is not equal to 0#> 95 percent confidence interval:#>  -0.067781063 -0.006254089#> sample estimates:#> mean integral of x mean integral of y#>         0.02664640         0.06366397# permutation test of pairwise distances between landscapespl_perm_test(pl1s,pl2s)#>#>  permutation test#>#> data:#> p-value < 2.2e-16#> alternative hypothesis: true distance between mean landscapes is greater than 0#> sample estimates:#> distance between mean landscapes#>                       0.06631177

Acknowledgments

Precursors

The C++ library is adapted fromPaweł Dłotko’s Persistence LandscapeToolbox.It was originally adapted and ported to R inJose Bouza’stda-toolspackage.

Resources

Development of this package benefitted from the use of equipment and thesupport of colleagues at theUniversity ofFlorida, especiallyPeter Bubenik’s researchgroup and theLaboratory for SystemsMedicine.

Contribute

Bug reports, unit tests, documentation, use cases, feature suggestions,and other contributions are welcome. See theCONTRIBUTINGfile for guidance, and please respect theCode ofConduct.

Cite

If you useplt in published work, please include a citationfollowingcitation("plt").

Footnotes

Bubenik P (2015) “Statistical Topological Data Analysis usingPersistence Landscapes”.Journal of Machine Learning Research16(3):77–102.https://jmlr.csail.mit.edu/papers/v16/bubenik15a.html ↩
Bubenik P & Dłotko P (2017) “A persistence landscapes toolbox fortopological statistics”.Journal of Symbolic Computation78(1):91–114.https://www.sciencedirect.com/science/article/pii/S0747717116300104 ↩

About

R interface & extension to an adapted Persistence Landscapes Toolbox

corybrunson.github.io/plt

Movatterモバイル変換

License

corybrunson/plt

Folders and files

Latest commit

History

Repository files navigation

plt

Installation

Quickstart Guide

Calculation

Class

Visualization

Hilbert Space Operations

Calculus

Statistical Analysis

Acknowledgments

Precursors

Resources

Contribute

Cite

Footnotes

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors4

Uh oh!

Languages

Packages