This repository was archived by the owner on Mar 31, 2022. It is now read-only.

karhunenloeve/NTOPLPublic archive

NotificationsYou must be signed in to change notification settings
Fork0
Star2

Estimation of Neural Network Dimension using Algebraic Topology and Lie Theory.

arxiv.org/abs/2004.02881

License

GPL-3.0 license

2 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
autoencoderInvertible.py		autoencoderInvertible.py
countHomgroups.py		countHomgroups.py
counter3DPlot.py		counter3DPlot.py
main.pdf		main.pdf
persistenceLandscapes.py		persistenceLandscapes.py
persistenceStatistics.py		persistenceStatistics.py
talk.md		talk.md

Repository files navigation

Estimate of the Neural Network Dimension Using Algebraic Topology and Lie Theory

In this paper we present an approach to determine the smallest possible number of perceptrons in a neural net in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely, assuming that there is a smooth manifold on or near which the data are located. Furthermore, we require that this space is connected and has a commutative group structure in the mathematical sense. These assumptions allow us to derive a decomposition of the underlying space whose topology is well known. We use the representatives of thek-dimensional homology groups from the persistence landscape to determine an integer dimension for this decomposition. This number is the dimension of the embedding that is capable of capturing the topology of the data manifold. We derive the theory and validate it experimentally on toy data sets.

Keywords: Embedding Dimension, Parameterization, Persistent Homology, Neural Networks and Manifold Learning.

Citation

@inproceedings{imta7/MelodiaL21,  author    = {Luciano Melodia and               Richard Lenz},  editor    = {Del Bimbo, A.,               Cucchiara, R.,               Sclaroff, S.,               Farinella, G.M.,               Mei, T.,               Bertini, M.,               Escalante, H.J.,               Vezzani, R.},  title     = {Estimate of the Neural Network Dimension using Algebraic Topology and Lie Theory},  booktitle = {Pattern Recognition. ICPR International Workshops and Challenges, {IMTA VII}               2021, Milano, Italy, January 11, 2021, Proceedings},  series    = {Lecture Notes in Computer Science},  volume    = {12665},  pages     = {15--29},  publisher = {Springer},  year      = {2021},  url       = {https://doi.org/10.1007/978-3-030-68821-9_2},  doi       = {10.1007/978-3-030-68821-9_2},}

Content

imageAutoencode

take_out_element

take_out_element(k:tuple,r)->tuple

A function taking out specific values.

paramk: tuple object to be processed, typetuple.
paramr: value to be removed, typeint, float, string, None.
returnk2: cropped tuple object, typetuple.

primeFactors

primeFactors(n)

A function that returns the prime factors of an integer.

paramn: an integer, typeint.
returnfactors: a list of prime factors, typelist.

load_data_keras

load_data_keras(dimensions:tuple,factor:float=255.0,dataset:str='mnist')->tuple

A utility function to load datasets.

This functions helps to load particular datasets ready for a processing with convolutionalor dense autoencoders. It depends on the specified shape (the input dimensions). This functionsis for validation purpose and works for keras datasets only.Supported datasets aremnist (default),cifar10,cifar100 andboston_housing.The shapes:mnist (28,28,1),cifar10 (32,32,3),cifar100 (32,32,3)

paramdimensions: dimension of the data, typetuple.
paramfactor: division factor, default is255, typefloat.
paramdataset: keras dataset, default ismnist,typestr.
returnX_train, X_test, input_image: , typetuple.

add_gaussian_noise

add_gaussian_noise(data:numpy.ndarray,noise_factor:float=0.5,mean:float=0.0,std:float=1.0)->numpy.ndarray

A utility function to add gaussian noise to data.

The purpose of this functions is validating certain models under gaussian noise.The noise can be added changing the mean, standard deviation and the amount ofnoisy points added.

paramnoise_factor: amount of noise in percent, typefloat.
paramdata: dataset, typenp.ndarray.
parammean: mean, typefloat.
paramstd: standard deviation, typefloat.
returnx_train_noisy: noisy data, typenp.ndarray.

crop_tensor

crop_tensor(dimension:int,start:int,end:int)->Callable

A utility function cropping a tensor along a given dimension.

The purpose of this function is to be used for multivariate cropping and to serveas a procedure for the invertible autoencoders, which need a cropping to make thematrices trivially invertible, as can be seen in theReal NVP architecture.This procedure works up to dimension4.

paramdimension: the dimension of cropping, typeint.
paramstart: starting index for cropping, typeint.
paramend: ending index for cropping, typeint.
returnLambda(func): Lambda function on the tensor, typeCallable.

convolutional_group

convolutional_group(_input:numpy.ndarray,filterNumber:int,alpha:float=5.5,kernelSize:tuple= (2,2),kernelInitializer:str='uniform',padding:str='same',useBias:bool=True,biasInitializer:str='zeros')

This group can be extended for deep learning models and is a sequence of convolutional layers.

The convolutions is a2D-convolution and uses aLeakyRelu activation function. After the activationfunction batch-normalization is performed on default, to take care of the covariate shift. As defaultthe padding is set to same, to avoid difficulties with convolution.

param_input: data from previous convolutional layer, typenp.ndarray.
paramfilterNumber: multiple of the filters per layer, typeint.
paramalpha: parameter forLeakyRelu activation function, default5.5, typefloat.
paramkernelSize: size of the2D kernel, default(2,2), typetuple.
paramkernelInitializer: keras kernel initializer, defaultuniform, typestr.
parampadding: padding for convolution, defaultsame, typestr.
paramuseBias: whether or not to use the bias term throughout the network, typebool.
parambiasInitializer: initializing distribution of the bias values, typestr.
returndata: processed data by neural layers, typenp.ndarray.

loop_group

loop_group(group:Callable,groupLayers:int,element:numpy.ndarray,filterNumber:int,kernelSize:tuple,useBias:bool=True,kernelInitializer:str='uniform',biasInitializer:str='zeros')->numpy.ndarray

This callable is a loop over a group specification.

The neural embeddings ends always with dimension1 in the color channel. For otherspecifications use the parametercolorChannel. The function operates on every kerasgroup of layers using the same parameter set as2D convolution.

paramgroup: a callable that sets up the neural architecture, typeCallable.
paramgroupLayers: depth of the neural network, typeint.
paramelement: data, typenp.ndarray.
paramfilterNumber: number of filters as exponential of2, typeint.
paramkernelSize: size of the kernels, typetuple.
returndata: processed data by neural network, typenp.ndarray.
paramuseBias: whether or not to use the bias term throughout the network, typebool.
parambiasInitializer: initializing distribution of the bias values, typestr.

invertible_layer

invertible_layer(data:numpy.ndarray,alpha:float=5.5,kernelSize:tuple= (2,2),kernelInitializer:str='uniform',groupLayers:int=6,filterNumber:int=2,croppingFactor:int=4,useBias:bool=True,biasInitializer:str='zeros')->numpy.ndarray

Returns an invertible neural network layer.

This neural network layer learns invertible subspaces, parameterized by higher dimensionalfunctions with a trivial invertibility. The higher dimensional functions are also neuralsubnetworks, trained during learning process.

paramdata: data from previous convolutional layer, typenp.ndarray.
paramalpha: parameter forLeakyRelu activation function, default5.5, typefloat.
paramgroupLayers: depth of the neural network, typeint.
paramkernelSize: size of the kernels, typetuple.
paramfilterNumber: multiple of the filters per layer, typeint.
paramcroppingFactor: should be a multiple of the strides length, typeint.
paramuseBias: whether or not to use the bias term throughout the network, typebool.
parambiasInitializer: initializing distribution of the bias values, typestr.
returndata: processed data, typenp.ndarray.

invertible_subspace_dimension2

invertible_subspace_dimension2(units:int)

A helper function converting dimensions into 2D convolution shapes.

This functions works only for quadratic dimension size. It reshapes the dataaccording to an embedding with the same dimension, represented by a2D array.

paramunits: , typeint.
returnembedding: , typetuple.

invertible_subspace_autoencoder

invertible_subspace_autoencoder(data:numpy.ndarray,units:int,invertibleLayers:int,alpha:float=5.5,kernelSize:tuple= (2,2),kernelInitializer:str='uniform',groupLayers:int=6,filterNumber:int=2,useBias:bool=True,biasInitializer:str='zeros')

A function returning an invertible autoencoder model.

This model works only with a quadratic number as units. The convolutional embeddingdimension in2D is determined, for the quadratic matrix, as the square root of therespective dimension of the dense layer. This module is for testing purposes and notmeant to be part of a productive environment.

paramdata: data, typenp.ndarray.
paramunits: projection dim. into lower dim. by dense layer, typeint.
paraminvertibleLayers: amout of invertible layers in the middle of the network, typeint.
paramalpha: parameter forLeakyRelu activation function, default5.5, typefloat.
paramkernelSize: size of the kernels, typetuple.
paramkernelInitializer: initializing distribution of the kernel values, typestr.
paramgroupLayers: depth of the neural network, typeint.
paramfilterNumber: multiple of the filters per layer, typeint.
paramuseBias: whether or not to use the bias term throughout the network, typebool.
parambiasInitializer: initializing distribution of the bias values, typestr.
paramfilterNumber: an integer factor for each convolutional layer, typeint.
returnoutput: an output layer for keras neural networks, typenp.ndarray.

persistenceLandscapes

concatenate_landscapes

concatenate_landscapes(persLandscape1:numpy.ndarray,persLandscape2:numpy.ndarray,resolution:int)->list

This function concatenates the persistence landscapes according to homology groups.

The computation of homology groups requires a certain resolution for each homology class.According to this resolution the direct sum of persistence landscapes has to be concatenatedin a correct manner, such that the persistent homology can be plotted according to then-dimensionalpersistent homology groups.

parampersLandscape1: persistence landscape, typenp.ndarray.
parampersLandscape2: persistence landscape, typenp.ndarray.
returnconcatenatedLandscape: direct sum of persistence landscapes, typelist.

compute_persistence_landscape

compute_persistence_landscape(data:numpy.ndarray,res:int=1000,persistenceIntervals:int=1,maxAlphaSquare:float=1000000000000.0,filtration:str= ['alphaComplex','vietorisRips','tangential'],maxDimensions:int=10,edgeLength:float=0.1,plot:bool=False,smoothen:bool=False,sigma:int=3)->numpy.ndarray

A function for computing persistence landscapes for 2D images.

This function computes the filtration of a 2D image dataset, the simplicial complex,the persistent homology and then returns the persistence landscape as array. It takesthe resolution of the landscape as parameter, the maximum size foralphaSquare andoptions for certain filtrations.

paramdata: data set, typenp.ndarray.
paramres: resolution, default is1000, typeint.
parampersistenceIntervals: interval for persistent homology, default is1e12,typefloat.
parammaxAlphaSquare: max. parameter for delaunay expansion, typefloat.
paramfiltration: alphaComplex, vietorisRips, cech, delaunay, tangential, typestr.
parammaxDimensions: only needed for VietorisRips, typeint.
paramedgeLength: only needed for VietorisRips, typefloat.
paramplot: whether or not to plot, typebool.
paramsmoothen: whether or not to smoothen the landscapes, typebool.
paramsigma: smoothing factor for gaussian mixtures, typeint.
returnlandscapeTransformed: persistence landscape, typenp.ndarray.

compute_mean_persistence_landscapes

compute_mean_persistence_landscapes(data:numpy.ndarray,resolution:int=1000,persistenceIntervals:int=1,maxAlphaSquare:float=1000000000000.0,filtration:str= ['alphaComplex','vietorisRips','tangential'],maxDimensions:int=10,edgeLength:float=0.1,plot:bool=False,tikzplot:bool=False,name:str='persistenceLandscape',smoothen:bool=False,sigma:int=2)->numpy.ndarray

This function computes mean persistence diagrams over 2D datasets.

The functions shows a progress bar of the processed data and takes the directsum of the persistence modules to get a summary of the landscapes of the varioussamples. Further it can be decided whether or not to smoothen the persistencelandscape by gaussian filter. A plot can be created withmatplotlib or asanother option for scientific reporting withtikzplotlib, or both.

Information: The color scheme has 5 colors defined. Thus 5 homology groups can bedisplayed in different colors.

paramdata: data set, typenp.ndarray.
paramresolution: resolution of persistent homology per group, typeint.
parampersistenceIntervals: intervals for persistence classes, typeint.
parammaxAlphaSquare: max. parameter for Delaunay expansion, typefloat.
paramfiltration:alphaComplex,vietorisRips ortangential, typestr.
parammaxDimensions: maximal dimension of simplices, typeint.
paramedgeLength: length of simplex edge, typefloat.
paramplot: whether or not to plot, typebool.
paramtikzplot: whether or not to plot as tikz-picture, typebool.
paramname: name of the file to be saved, typestr.
paramsmoothen: whether or not to smoothen the landscapes, typebool.
paramsigma: smoothing factor for gaussian mixtures, typeint.
returnmeanPersistenceLandscape: mean persistence landscape, typenp.ndarray.

persistenceStatistics

hausd_interval

hausd_interval(data:numpy.ndarray,confidenceLevel:float=0.95,subsampleSize:int=-1,subsampleNumber:int=1000,pairwiseDist:bool=False,leafSize:int=2,ncores:int=2)->float

Computation of Hausdorff distance based confidence values.

Measures the confidence between two persistent features, wether they are drawn froma distribution fitting the underlying manifold of the data. This function is based onthe Hausdorff distance between the points.

paramdata: a data set, typenp.ndarray.
paramconfidenceLevel: confidence level, default0.95, typefloat.
paramsubsampleSize: size of each subsample, typeint.
paramsubsampleNumber: number of subsamples, typeint.
parampairwiseDist: iftrue, a symmetricnxn-matrix is generated out of the data, typebool.
paramleafSize: leaf size for KDTree, typeint.
paramncores: number of cores for parallel computing, typeint.
returnconfidence: the confidence to be a persistent homology class, typefloat.

truncated_simplex_tree

truncated_simplex_tree(simplexTree:numpy.ndarray,int_trunc:int=100)->tuple

This function return a truncated simplex tree.

A sparse representation of the persistence diagram in the form of a truncatedpersistence tree. Speeds up computation on large scale data sets.

paramsimplexTree: simplex tree, typenp.ndarray.
paramint_trunc: number of persistent interval kept per dimension, default is100, typeint.
returnsimplexTreeTruncatedPersistence: truncated simplex tree, typenp.ndarray.

About

Estimation of Neural Network Dimension using Algebraic Topology and Lie Theory.

arxiv.org/abs/2004.02881

Releases1

NTOPLv.1.0 Latest

Mar 19, 2022

Languages

Python100.0%

Movatterモバイル変換

License

karhunenloeve/NTOPL

Folders and files

Latest commit

History

Repository files navigation

Estimate of the Neural Network Dimension Using Algebraic Topology and Lie Theory

Citation

Content

imageAutoencode

take_out_element

primeFactors

load_data_keras

add_gaussian_noise

crop_tensor

convolutional_group

loop_group

invertible_layer

invertible_subspace_dimension2

invertible_subspace_autoencoder

persistenceLandscapes

concatenate_landscapes

compute_persistence_landscape

compute_mean_persistence_landscapes

persistenceStatistics

hausd_interval

truncated_simplex_tree

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Languages