Movatterモバイル変換

Title:

SOM Bound to Realize Euclidean and Relational Outputs

Version:

1.5.0

Date:

2025-10-06

Maintainer:

Nathalie Vialaneix <nathalie.vialaneix@inrae.fr>

Description:

The stochastic (also called on-line) version of the Self-Organising Map (SOM) algorithm is provided. Different versions of the algorithm are implemented, for numeric and relational data and for contingency tables as described, respectively, in Kohonen (2001) <isbn:3-540-67921-9>, Olteanu & Villa-Vialaneix (2005) <doi:10.1016/j.neucom.2013.11.047> and Cottrell et al (2004) <doi:10.1016/j.neunet.2004.07.010>. The package also contains many plotting features (to help the user interpret the results), can handle (and impute) missing values and is delivered with a graphical user interface based on 'shiny'.

BugReports:

https://github.com/tuxette/SOMbrero/issues

URL:

https://forge.inrae.fr/nathalie.villa-vialaneix/sombrero,http://sombrero.clementine.wf/

Depends:

R (≥ 3.1.0), igraph (≥ 1.0), markdown

Imports:

scatterplot3d, shiny, grDevices, graphics, stats, ggplot2,ggwordcloud, metR, interp, rlang

Suggests:

testthat, rmarkdown, knitr, hexbin, shinycssloaders, shinyBS,shinyjs, shinyjqui, RColorBrewer

License:

GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]

Repository:

CRAN

VignetteBuilder:

knitr

Encoding:

UTF-8

RoxygenNote:

7.3.2

Language:

en-US

NeedsCompilation:

Packaged:

2025-10-06 15:28:38 UTC; nathalie

Author:

Nathalie Vialaneix

[aut, cre], Elise Maigne [aut], Jerome Mariette [aut], Madalina Olteanu [aut], Fabrice Rossi [aut], Laura Bendhaiba [ctb], Julien Boelaert [ctb]

Date/Publication:

2025-10-06 16:00:07 UTC

Self Organizing Maps Bound to Realize Euclidean and Relational Outputs

Description

This package implements the stochastic (also called on-line) Self-Organizing Map (SOM) algorithms for numeric and relational data.

It is based on a grid (seeinitGrid), which is part of the parameters given to the algorithm (seeinitSOM andtrainSOM). Many graphs can help you with the results (seeplot.somRes).

The version of the SOM algorithm implemented in this package is the stochastic version.

Several variants able to handle non-vectorial data are also implemented in their stochastic versions:type = "korresp" for contingency tables, as described in Cottrell et al. (2004) (with the observation weights defined in Cottrell and Letrémy, 2005a) andtype = "relational" for dissimilarity data, as described in Olteanu and Villa-Vialaneix (2015a) with the fast implementation of Marietteet al. (2017). A special focus has been put on representing graphs, as described in Olteanu and Villa-Vialaneix (2015b).

In addition, the numeric version of the algorithm handles missing values: missing entries are not used during training but the resulting map can be used to fill missing entries (using the entry of the corresponding prototype). The method is taken from Cottrell and Letrémy (2005b).

Author(s)

Nathalie Vialaneixnathalie.vialaneix@inrae.fr
Élise Maignéelise.maigne@inrae.fr
Jérome Mariettejerome.mariette@inrae.fr
Madalina Olteanuolteanu@ceremade.dauphine.fr
Fabrice Rossifabrice.rossi@apiacoa.org
Laura Bendhaïbalaurabendhaiba@gmail.com
Julien Boelaertjulien.boelaert@gmail.com

Maintainer: Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Kohonen T. (2001)Self-Organizing Maps. Berlin/Heidelberg:Springer-Verlag, 3rd edition.

Cottrell M., Ibbou S., Letrémy P. (2004) SOM-based algorithms for qualitativevariables.Neural Networks,17, 1149-1167.

Cottrell M., Letrémy P. (2005a) How to use the Kohonen algorithm to simultaneously analyse individuals in a survey.Neurocomputing,21, 119-138.

Cottrell M., Letrémy P. (2005b) Missing values: processing with the Kohonen algorithm.Proceedings of Applied Stochastic Models and Data Analysis(ASMDA 2005), 489-496.

Letrémy P. (2005) Programmes basés sur l'algorithme de Kohonen et dediés àl'analyse des données. SAS/IML programs for 'korresp'.

Mariette J., Rossi F., Olteanu M., Villa-Vialaneix N. (2017) Accelerating stochastic kernel SOM. In: M. Verleysen,XXVth European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017), i6doc, Bruges, Belgium, 269-274.

Olteanu M., Villa-Vialaneix N. (2015a) On-line relational and multiple relational SOM.Neurocomputing,147, 15-30.

Olteanu M., Villa-Vialaneix N. (2015b) Using SOMbrero for clustering and visualizing graphs.Journal de la Société Française de Statistique,156, 95-119.

Rossi F. (2013) yasomi: Yet Another Self-Organising Map Implementation. R package, version 0.3.https://github.com/fabrice-rossi/yasomi

Villa-Vialaneix N. (2017) Stochastic self-organizing map variants with the Rpackage SOMbrero. In: J.C. Lamirel, M. Cottrell, M. Olteanu,12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (Proceedings of WSOM 2017), IEEE, Nancy, France.

Impute values from prototype information

Description

Impute values by replacing missing entries with the corresponding assigned prototype entries

Usage

impute(object, ...)

Arguments

object

asomRes object.

...

unused.

Value

Imputed matrix as in Cottrell and Letrémy, (2005)

Author(s)

Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Cottrell M., Letrémy P. (2005) Missing values: processing with the Kohonen algorithm.Proceedings of Applied Stochastic Models and Data Analysis(ASMDA 2005), 489-496.

Examples

# Run trainSOM algorithm on the iris data with 500 iterationsset.seed(1505)missings <- cbind(sample(1:150, 50, replace = TRUE),                  sample(1:4, 50, replace = TRUE))x.data <- as.matrix(iris[, 1:4])x.data[missings] <- NAiris.som <- trainSOM(x.data = x.data)iris.somimpute(iris.som)

Create an Empty Grid

Description

Create an empty (square) grid equipped with topology.

Usage

initGrid(  dimension = c(5, 5),  topo = c("square", "hexagonal"),  dist.type = c("euclidean", "maximum", "manhattan", "canberra", "minkowski", "letremy"))

Arguments

dimension

a 2-dimensional vector giving the dimensions (width, length)of the grid

topo

topology of the grid. Accept values"square" (Default) or"hexagonal".

dist.type

distance type that defines the topology of the grid (see'Details'). Default to"euclidean"

Details

The units (neurons) of the grid are positionned at coordinates (1,1), (1,2), (1,3), ..., (2,1), (2,2), ..., for thesquare topology.The topology of the map is defined by a distance based on those coordinates, that can be one of"euclidean","maximum","manhattan","canberra","minkowski","letremy", where the first 5 ones correspond to distance methods implemented indist and"letremy" is the distance of the original implementation by Patrick Letrémy that switches between"maximum" and"euclidean" duringthe training.

Value

an object of classmyGrid with the following entries:

coord 2-column matrix with x and y coordinates of the grid units
topo topology of the grid;
dim dimensions of the grid (width corresponds to x coordinates)
dist.type distance type that defines the topology of the grid.

Author(s)

Élise Maignéelise.maigne@inrae.fr
Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Letrémy P. (2005) Programmes basés sur l'algorithme de Kohonen et dédiés à l'analyse des données. SAS/IML programs for 'korresp'.

Examples

initGrid()initGrid(dimension=c(5, 7), dist.type = "maximum")

Initialize Parameters for the SOM Algorithm

Description

TheinitSOM function returns aparamSOM class object thatcontains the parameters needed to run the SOM algorithm.

Usage

initSOM(  dimension = c(5, 5),  topo = c("square", "hexagonal"),  radius.type = c("gaussian", "letremy"),  dist.type = switch(match.arg(radius.type), letremy = "letremy", gaussian = "euclidean"),  type = c("numeric", "relational", "korresp"),  mode = c("online"),  affectation = c("standard", "heskes"),  maxit = 500,  nb.save = 0,  verbose = FALSE,  proto0 = NULL,  init.proto = switch(type, numeric = "random", relational = "obs", korresp = "random"),  scaling = switch(type, numeric = "unitvar", relational = "none", korresp = "chi2"),  eps0 = 1)## S3 method for class 'paramSOM'print(x, ...)## S3 method for class 'paramSOM'summary(object, ...)

Arguments

dimension

Vector of two integer points corresponding to the x dimension and the y dimension of themyGrid class object. Default values are:(5,5). Other data-driven defaults are set by functiontrainSOM.

topo

The topology to be used to build the grid of themyGrid class object. Accept values"square" (Default) or"hexagonal".

radius.type

The neighborhood type. Default value is"gaussian", which corresponds to a Gaussian neighborhood. The annealing of the neighborhood during the training step is similar to the oneimplemented inyasomi. The alternative value corresponds to an piecewise linear neighborhood as implemented by Patrick Letrémy in his SAS scripts.

dist.type

The neighborhood relationship on the grid. One ofc("letremy", "euclidean", "maximum", "manhattan", "canberra", "minkowski"). Whenradius.type isletremy, default value isletremy which is the original implementation by Patrick Letrémy. Whenradius.type isgaussian, default value iseuclidean. Theother possible values are passed tomethod in functiondist.dist.type = "letremy" is not permitted withradius.type = "gaussian". Onlyeuclidian is allowed with hexagonal topology.

type

The SOM algorithm type. Possible values are:numeric (default value),korresp andrelational.

mode

The SOM algorithm mode. Default value isonline.

affectation

The SOM affectation type. Default value isstandardwhich corresponds to a hard affectation. Alternative isheskes which corresponds to Heskes's soft affectation.

maxit

The maximum number of iterations to be done during the SOM algorithm process. Default value is500. Other data-driven defaultsare set by functiontrainSOM.

nb.save

The number of intermediate back-ups to be done during the algorithm process. Default value is0.

verbose

The boolean value which activates the verbose mode during theSOM algorithm process. Default value isFALSE.

proto0

The initial prototypes. Default value isNULL.

init.proto

The method to be used to initialize the prototypes, whichmay be"random" (randomization),"obs" (each prototype is assigned a random observation) or"pca". Inpca the prototypesare initialized to the observations closest to a grid along the two first principal components of the data (numeric case) or along atwo-dimensional multidimensional scaling (relational case, equivalentto arelational PCA). Default value israndom for thenumeric andkorresp types, andobs for therelational type.pca is not available forkorresp SOM.

scaling

The type of data pre-processing. Fornumeric SOM, possibilities areunitvar (data are centered and scaled; this is the default value for anumeric SOM),none (no pre-processing), andcenter (data are centered but not scaled). Forkorresp SOM, the only available value ischi2. Forrelational SOM, possibilities arenone (no pre-processing, default value forrelational SOM) andcosine. This last one first turns the dissimilarity into a similarity using the suggestion in (Lee and Verleysen, 2007). Then, a cosine normalization as described in (Ben-Hur and Weston, 2010) is applied to the kernel, that is finally turned back into its induced distance. For further details on this processing, have a look atthe corresponding documentation in the directory "doc" of the package's installation directory.

eps0

The scaling value for the stochastic gradient descent step in theprototypes' update. The scaling value for the stochastic gradient descent step is equal to\frac{0.3\epsilon_0}{1+0.2t/\textrm{dim}} wheret is the current step number and\textrm{dim} is the griddimension (width multiplied by height).

x

an object of classparamSOM.

...

not used

object

an object of classparamSOM.

Value

TheinitSOM function returns an object of classparamSOM which is a list of the parameters passed to theinitSOM function, plus the default parameters for the ones not specified by the user.

Author(s)

Élise Maignéelise.maigne@inrae.fr
Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Ben-Hur A., Weston J. (2010) A user's guide to support vector machine. In:Data Mining Techniques for the Life Sciences, Springer-Verlag, 223-239.

Heskes T. (1999) Energy functions for self-organizing maps. In:KohonenMaps, Oja E., Kaski S. (Eds.), Elsevier, 303-315.

Lee J., Verleysen M. (2007)Nonlinear Dimensionality Reduction.Information Science and Statistics series, Springer.

Letrémy P. (2005) Programmes basés sur l'algorithme de Kohonen et dediés àl'analyse des données. SAS/IML programs for 'korresp'.

Rossi F. (2013) yasomi: Yet Another Self-Organising Map Implementation. R package, version 0.3.https://github.com/fabrice-rossi/yasomi

Examples

# create a default 'paramSOM' class objectdefault.paramSOM <- initSOM()summary(default.paramSOM)

Dataset "Les Misérables"

Description

This dataset contains the coappearance network (igraph object) of characters in the novel Les Misérables (written by the French writter Victor Hugo).

Format

lesmis is anigraph object. Its verticesare the characters of the novel and an edge indicates that the two charactersappear together in the same chapter of the novel, at least once. Vertex attributes for this graph areid, a vertex number between 1 and 77, andlabel, the character's name. The edge attributevalue gives the number of co-appearances between the two characters afferent to theedge (theigraph can thus be made a weighted graph using this attribute). Finally, a graph attributelayout is used to provide a layout (generated with theigraph functionlayout_with_fr) for visualizing the graph.

dissim.lesmis is a dissimilarity matrix computed with the functionshortest_paths and containing the length of the shortest paths between pairs of nodes.

Details

Les Misérables is a French historical novel, written by Victor Hugo and published in 1862. The co-appearance network has been extracted by D.E.Knuth (1993).

References

Hugo V. (1862)Les Miserables.

Knuth D.E. (1993)The Stanford GraphBase: A Platform for Combinatorial Computing. Reading (MA): Addison-Wesley.

Examples

data(lesmis)## Not run: summary(lesmis)plot(lesmis,vertex.size=0)## End(Not run)

Methods for 'myGrid' Objects.

Description

Methods for the result ofinitGrid(myGrid object)

Usage

## S3 method for class 'myGrid'print(x, ...)## S3 method for class 'myGrid'summary(object, ...)## S3 method for class 'myGrid'plot(x, show.names = TRUE, names = 1:prod(x$dim), ...)

Arguments

x

myGrid object

...

Further arguments to theplot function.

object

myGrid object

show.names

Whether the cluster names must be printed in center ofthe grid or not. Default toTRUE (names not displayed).

names

Ifshow.names = TRUE, values of the names to display. Default to the cluster number.

Details

ThemyGrid class has the following entries:

coord 2-column matrix with x and y coordinates of the grid units
topo topology of the grid;
dim dimensions of the grid (width corresponds to x coordinates)
dist.type distance type that defines the topology of the grid.

During plotting, the color filling process uses the coordinates of the objectx included inx$coord.

Author(s)

Élise Maignéelise.maigne@inrae.fr
Madalina Olteanu,olteanu@ceremade.dauphine.fr
Nathalie Vialaneix,nathalie.vialaneix@inrae.fr

Examples

# creating grida.grid <- initGrid(dimension=c(5,5), topo="square", dist.type="maximum")# plotting grid# without any color specificationplot(a.grid)# generating colors from rainbow() functionmy.colors <- grDevices::rainbow(5*5)plot(a.grid) + ggplot2::scale_fill_manual(values = my.colors)

Plot a`somRes` Object

Description

Produce graphics to help interpreting asomRes object.

Usage

## S3 method for class 'somRes'plot(  x,  what = c("obs", "prototypes", "energy", "add"),  type = switch(what, obs = "hitmap", prototypes = "color", add = "pie", energy =    "energy"),  variable = NULL,  my.palette = NULL,  is.scaled = if (x$parameters$type == "numeric") TRUE else FALSE,  show.names = TRUE,  names = if (what != "energy") switch(type, graph = 1:prod(x$parameters$the.grid$dim),    1:prod(x$parameters$the.grid$dim)) else NULL,  proportional = TRUE,  pie.graph = FALSE,  pie.variable = NULL,  s.radius = 1,  view = if (x$parameters$type == "korresp") "r" else NULL,  ...)

Arguments

x

AsomRes class object.

what

What you want to plot. Either the observations (obs, default case), the evolution of energy (energy), the prototypes (prototypes) or an additional variable (add).

type

Further argument indicating which type of chart you want to have.Choices depend on the value ofwhat (what="energy" has notype argument). Default values are"hitmap" forobs,"color" forprototypes and"pie" foradd. Seesection “Details” below for further details.

variable

Either the variable to be used forwhat="add" or the index of the variable of the data set to consider. Fortype="boxplot", the default value is the sequence from 1 to the minimum between 5 and the number of columns of the data set. In all other cases, default value is 1. SeesomRes.plotting for further details.

my.palette

A vector of colors. If omitted, predefined palettes are used, depending on the plot case. This argument is used for the followingcombinations: all"color" types and"prototypes"/"poly.dist".

is.scaled

A boolean indicating whether values should be scaled prior to plotting or not. Default value isTRUE whentype="numeric" andFALSE in the other cases.

show.names

Boolean used to indicate whether each neuron should have atitle or not, if relevant. Default toTRUE. It is feasible on the following cases: all"color","lines","meanline","barplot","boxplot","names" types,"add"/"pie","prototypes"/"umatrix","prototypes"/"poly.dist" and"add"/"words".

names

The names to be printed for each neuron ifshow.names=TRUE. Default to a number which identifies the neuron.

proportional

Boolean used whenwhat="add" andtype="pie". It indicates if the pies should be proportional to the number of observations in the class. Default value isTRUE.

pie.graph

Boolean used whenwhat="add" andtype="graph". It indicates if the vertices should be pies or not.

pie.variable

The variable needed to plot the pies whenwhat="add",type="graph" and argumentpie.graph=TRUE.

s.radius

The size of the pies to be plotted (maximum size whenproportional=TRUE) forwhat="add",type="graph" andpie.graph=TRUE. The default value is0.9.

view

Used only when the algorithm's type is"korresp". Itindicates whether rows ("r") or columns ("c") must be drawn.

...

Further arguments to be passed to the underlined plot function(which can beplot,barplot,pie...depending ontype; seesomRes.plotting for furtherdetails).

Details

SeesomRes.plotting for further details and more examples.

Author(s)

Élise Maigné <elise.maigne@inrae.fr>
Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

Examples

# run the SOM algorithm on the numerical data of 'iris' data setiris.som <- trainSOM(x.data = iris[, 1:4], nb.save = 2)# plots# on energyplot(iris.som, what = "energy") # on observationsplot(iris.som, what = "obs", type = "lines")# on prototypesplot(iris.som, what = "prototypes", type = "3d", variable = "Sepal.Length")# on an additional variable: the flower speciesplot(iris.som, what = "add", type = "pie", variable = iris$Species)

Predict the Class of a New Observation

Description

Predict the neuron where a new observation is classified

Usage

## S3 method for class 'somRes'predict(object, x.new = NULL, ..., radius = 0)

Arguments

object

asomRes object.

x.new

a new observation (optional). Default values is NULL whichcorresponds to performing prediction on the training dataset.

...

not used.

radius

current radius used to perform soft affectation (whenaffectation = "heskes", seeinitSOM for further detailsabout Heskes' soft affectation). Default value is0, which corresponds to a hard affectation.

Details

The number of columns of the new observations (or its length if only one observation is provided) must match the number of columns of the data setgiven to the SOM algorithm (seetrainSOM).

Value

predict.somRes returns the number of the neuron to which the new observation is assigned (i.e., neuron with the closest prototype).

When the algorithm's type is"korresp",x.new must be the original contingency table passed to the algorithm.

Author(s)

Jérome Mariettejerome.mariette@inrae.fr
Madalina Olteanuolteanu@ceremade.dauphine.fr
Fabrice Rossifabrice.rossi@apiacoa.org
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

Examples

set.seed(2343)my.som <- trainSOM(x.data = iris[-100, 1:4], dimension = c(5, 5))predict(my.som, iris[100, 1:4])

2002 French Presidential Election Dataset

Description

This data set provides the number of votes at the first round ofthe 2002 French presidential election for each of the 16 candidates for 106administrative districts called "Départements".

Format

presidentielles2002 is a data frame of 106 rows (the Frenchadministrative districts called "Départements") and 16 columns (the candidates).

Source

The data are provided by the French ministry "Ministère de l'Intérieur". The original data can be downloaded athttps://www.interieur.gouv.fr/Elections/Les-resultats/Presidentielles (2002 élections and "Résultats par départements").

References

The 2002 French presidential election consisted of two rounds. The second round attracted a greater than usual amount of international attention because of far-right candidate Le Pen's unexpected victory over Socialist candidate Lionel Jospin. The event is known because, on the one hand, the number of candidates was unusually high (16) and, on the other hand, because the polls had failed to predict that Jean-Marie Le Pen would beon the second round.

Further comments athttps://en.wikipedia.org/wiki/2002_French_presidential_election.

Examples

data(presidentielles2002)apply(presidentielles2002, 2, sum)

Compute the Projection of a Graph on a Grid

Description

Compute the projection of a graph, provided as anigraph object, on the grid of thesomRes object.

Usage

projectIGraph(object, init.graph, ...)

Arguments

object

asomRes object.

init.graph

anigraph whose number of vertices is equalto the clustering length of thesomRes object.

...

Not used.

Value

The result is anigraph which vertexes are theclusters (the clustering is thus understood as a vertex clustering) and the edges are the counts of edges in the original graph between two verticescorresponding to the two clusters in the projected graph or, ifinit.graph is a weighted graph, the sum of the weights between the pairs of vertices corresponding to the two clusters.

The resulting igraph object's attributes are:

the graph attributelayout which provides the layout of the projected graph according to the grid of the SOM;
the vertex attributesname andsize which, respectivelyare the vertex number on the grid and the number of vertexes included in the corresponding cluster;
the edge attributeweight which gives the number of edges (or the sum of the weights) between the vertexes of the two corresponding clusters.

Author(s)

Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Olteanu M., Villa-Vialaneix N. (2015) Using SOMbrero for clustering and visualizing graphs.Journal de la Société Française de Statistique,156, 95-119.

Examples

data(lesmis)set.seed(7383)mis.som <- trainSOM(x.data=dissim.lesmis, type="relational", nb.save=10)proj.lesmis <- projectIGraph(mis.som, lesmis)## Not run: plot(proj.lesmis)

Compute Distances Between Prototypes

Description

Compute distances, either between all prototypes (mode = "complete") or only between prototypes' neighbours (mode = "neighbors").

Usage

protoDist(object, mode = c("complete", "neighbors"), radius = 1, ...)

Arguments

object

asomRes object.

mode

Specifies which distances should be computed (default to"complete").

radius

Radius used to fetch the neighbors (default to 1). The distanceused to compute the neighbors is the Euclidean distance.

...

Not used.

Details

Whenmode="complete", distances between all prototypes arecomputed. Whenmode="neighbors", distances are computed only between the prototypes and their neighbors. If the data were preprocessed during theSOM training procedure, the distances are computed on the normalized values of the prototypes.

Value

Whenmode = "complete", the function returns a square matrix which dimensions are equal to the product of the grid dimensions.

Whenmode = "neighbors", the function returns a list which length is equal to the product of the grid dimensions; the length of each item is equalto the number of neighbors. Neurons are considered to have 8 neighbors at most (i.e., two neurons are neighbors if they have an Euclidean distance smaller thanradius. Natural choice forradius is1 for hexagonal topology and 1 or\sqrt{2} for square topology (4 and 8 neighbors respectively).

Author(s)

Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

Examples

set.seed(2343)my.som <- trainSOM(x.data = iris[,1:4], dimension = c(5,5))protoDist(my.som)

Compute SOM Quality Criteria

Description

Thequality function computes several quality criteria for the result of a SOM algorithm.

Usage

quality(sommap, quality.type, ...)

Arguments

sommap

AsomRes object (seetrainSOM for details).

quality.type

The quality type to compute. Two types are implemented:quantization andtopographic. The output of the function isone of those or both of them using the option"all". Default value is the latter.

...

Not used.

Value

Thequality function returns either a numeric value (if only one type is computed) or a list a numeric values (if all types are computed).

The quantization error calculates the mean squared euclidean distance betweenthe sample vectors and their respective cluster prototypes. It is a decreasing function of the size of the map.

The topographic error is the simplest of the topology preservation measure: it calculates the ratio of sample vectors for which the second best matching unit is not in the direct neighborhood of the best matching unit.

Author(s)

Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Polzlbauer G. (2004) Survey and comparison of quality measures forself-organizing maps. In:Proceedings of the Fifth Workshop on DataAnalysis (WDA'04), Paralic, J., Polzlbauer, G., Rauber, A. (eds) Sliezsky dom, Vysoke Tatry, Slovakia: Elfa Academic Press, 67-82.

Examples

my.som <- trainSOM(x.data = iris[,1:4])quality(my.som, quality.type = "all")quality(my.som, quality.type = "topographic")

Complete Documentation on`somRes` Plots

Description

Useful details on how to produce graphics to help interpreting asomResobject.

Important: the graphics availables for the different types of SOM are marked with a N, a K or a R.
(N = numerical SOM,K = korresp SOM andR = relational SOM).

Graphics on the observations:`what = "obs"`

For the caseswhat = "obs" andwhat = "add", if a neuron is empty,nothing will be plotted at its location.

The possible values fortype are:

"hitmap" (K, R): plots proportional areas according to the number ofobservations per neuron. It is the default plot whenwhat="obs".
"color" (N): can have one more argument,variable, the name or index of the variable to be considered (default,1, the first variable). Neurons are filled using the given colors according to the average value level of the observations for the chosen variable.
"lines" (N): plots a line for each observation in every neuron, between variables. A vector of variables (names or indexes) can be provided with the argumentvariable.
"meanline" (N): plots, for each neuron, the average value level of theobservations, with lines and points. One point represents a variable. By default, all variables of the dataset used to train the algorithm are plotted but a vector of variables (names or indexes) can be provided with the argumentvariable.
"barplot" (N): is similar to"meanline" but using barplots. Then, a bar represents a variable.
"boxplot" (N): plots boxplots for the observations in every neuron, by variable. Like"lines","meanline" and"barplot" a vector of variables (namesor indexes) can be provided with the argumentvariable.
"names" (N, K, R): prints on the grid the element names (i.e., therow names or row and column names in the case ofkorresp) in the neuronto which it belongs.

Graphic on the energy:`what = "energy"` (N, K, R)

This graphic is only available if some intermediate backups have been registered(i.e., with the argumentnb.save oftrainSOM orinitSOM resulting inx$parameters$nb.save>1). Graphic plots the evolution of the level of the energy according to the registered steps.

Graphics on the prototypes:`what = "prototypes"`

The possible values fortype are:

"lines" (N, K, R): has the same behavior as the"lines" case described in the observations section, but according to the prototypes level.
"barplot" (N, K, R): has the same behavior as the"barplot" case described in the observations section, but according to the prototypes level.
"color" (N, K): has the same behavior as the"color" case described in the observations section, but according to the prototypes level.
"3d" (N): case is similar to the"color" case, but in 3dimensions, with x and y the coordinates of the grid and z the value of theprototypes for the considered variable. This function can take two more arguments:maxsize (default to2) andminsize (default to0.5) for the size of the points representing neurons.
"smooth.dist" (N, K, R): depicts the average distance between a prototypes and its neighbors on a map where x and y are the coordinates of the prototypes on the grid.
"poly.dist" (N, K, R): also represents the distances between prototypes but with polygons plotted for each neuron. The closest from the border the polygon point is, the closest the pairs of prototypes are. The color used for filling the polygon shows the number of observations in each neuron. A white polygon means that there is no observation. With the default colors, a red polygon means a high number of observations.
"umatrix" (N, K, R): is another way of plotting distances between prototypes. The grid is plotted and filled withmy.palette colors according to the mean distance between the current neuron and the neighboring neurons. With the default colors, red indicates proximity.
"mds" (N, K, R): plots the number of the neuron on a map according to a Multi Dimensional Scaling (MDS) projection on a two dimensional space.
"grid.dist" (N, K, R): plots on a 2 dimension map all distances. The number of points on this picture is equal to\frac{\textrm{number of neurons}\times(\textrm{number of neurons}-1)}{2}. On the x axis corresponds to the prototype distances whereas the y axis depicts the grid distances.

Graphics on an additional variable:`what="add"`

The casewhat="add" considers an additional variable, which has to be given to the argumentvariable. Its length must match the number ofobservations in the original data.

When the algorithm's type iskorresp, no graphic is available forwhat = "add".

The possible values fortype are:

"color" (N, R): has the same behavior as the"color" case described in the observations section. Then, the additional variable must be a numerical vector.
"lines" (N, R): has the same behavior as the"lines" case described in the observations section. Then, the additional variable must be a numerical matrix or a data frame.
"boxplot" (N, R): has the same behavior as the"boxplot" case described in the observations section. Then, the additional variable must be either a numeric vector or a numeric matrix/data frame.
"barplot" (N, R): has the same behavior as the"barplot" case described in the observations section. Then, the additional variable must be either a numeric vector or a numeric matrix/data frame.
"pie" (N): requires the argumentvariable to be a vector, whichwill be passed to the functionas.factor, and plots one pie for each neuron according to this factor. By default, the size of the pie is proportional to the number of observations affected to its neuron but this can be changed with the argumentproportional = FALSE.
"names" (N, R): has the same behavior as the"names" case described in the observations section. Then, the names to be printed are the elements of the variable given to thevariable argument.This case can take one more argument:size (default to4) for the size of the words.
"words" (N, R): needs the argumentvariable be a numeric matrix or adata.frame: names of the columns will be used as words and the values express the frequency of a given word in the observation. Then, for each neuron of the grid, the words will be printed with sizes proportional to the sumof their values in the neuron. If thevariable given is a contingency table, it will plot directly the frequency of the words in the neurons.
"graph" (N, R): requires that the argumentvariable is anigraph object (seelibrary("igraph"). According to the existingedges in the graph and to the clustering obtained with the SOM algorithm, a clustered graph will be produced where a vertex between two vertices representsa neuron and the width of an edge is proportional to the number of edges in thegiven graph between the vertices affected to the corresponding neurons.The option can handle two more arguments:pie.graph andpie.variable.These are used to display the vertex as pie charts. For this case,pie.graph must be set toTRUE and a factor vector is supplied bypie.variable.

Further arguments via ...

Further arguments, their reference functions and theplot.somRes cases are summarized in the following list:

plot.igraph is called by the cases:
- what = "add" /type = "graph"
- what = "add" /type = "projgraph" (for a superclass object)
persp is called by the casewhat = "prototypes" /type = "3d"
ggplot is called in all the other cases.

In complement to ggplot,

geom_text_wordcloud is called by the cases:
- type = "names"
- what = "add" /type = "words"
geom_contour_fill is called by the casewhat = "prototypes" /type = "smooth.dist"

Author(s)

Élise Maignéelise.maigne@inrae.fr
Madalina Olteanumadalina.olteanu@univ-paris1.fr
Nathalie Vialaneixnathalie.vialaneix@inra.fr

Examples

### Numerical SOM# run the SOM algorithm on the numerical data of 'iris' data setiris.som <- trainSOM(x.data = iris[,1:4], nb.save = 2)####### energy plotplot(iris.som, what = "energy") # energy####### plots on observationsplot(iris.som, what = "obs", type = "hitmap")## Not run: plot(iris.som, what = "obs", type = "lines")plot(iris.som, what = "obs", type = "barplot")plot(iris.som, what = "obs", type = "boxplot")plot(iris.som, what = "obs", type = "meanline")plot(iris.som, what = "obs", type = "color", variable = 1)plot(iris.som, what = "obs", type = "names")## End(Not run)####### plots on prototypesplot(iris.som, what = "prototypes", type = "3d", variable = "Sepal.Length")## Not run: plot(iris.som, what = "prototypes", type = "lines")plot(iris.som, what = "prototypes", type = "barplot")plot(iris.som, what = "prototypes", type = "umatrix")plot(iris.som, what = "prototypes", type = "color", variable = "Petal.Length")plot(iris.som, what = "prototypes", type = "smooth.dist")plot(iris.som, what = "prototypes", type = "poly.dist")plot(iris.som, what = "prototypes", type = "grid.dist")plot(iris.som, what = "prototypes", type = "mds")## End(Not run)####### plots on an additional variable: the flower speciesplot(iris.som, what = "add", type = "pie", variable = iris$Species)## Not run: plot(iris.som, what = "add", type = "names", variable = iris$Species)plot(iris.som, what = "add", type = "words", variable = iris[,1:2])## End(Not run)

Graphical Web User Interface for SOMbrero

Description

Start the SOMbrero GUI.

Usage

sombreroGUI()

Value

This function starts the graphical user interface with the default system browser. This interface is more lickely to work properly with Firefoxhttps://www.firefox.com/fr/?redirect_source=mozilla-org. In case Firefox is not your default browser, copy/paste http://localhost:8100 into the URL bar.
Note that the same interface is available online athttps://sombrero.sk8.inrae.fr/.

Author(s)

Élise Maigné <elise.maigne@inrae.fr>
Julien Boelaertjulien.boelaert@gmail.com
Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

RStudio and Inc. (2013). shiny: Web Application Framework for R. R packageversion 0.7.0.https://cran.r-project.org/package=shiny

Create Super-Clusters from SOM Results

Description

Aggregate the resulting clustering of the SOM algorithm into super-clusters.

Usage

superClass(sommap, method, members, k, h, clustering = NULL, ...)## S3 method for class 'somSC'print(x, ...)## S3 method for class 'somSC'summary(object, ...)## S3 method for class 'somSC'plot(  x,  what = c("obs", "prototypes", "add"),  type = c("dendrogram", "grid", "hitmap", "lines", "meanline", "barplot", "boxplot",    "mds", "color", "poly.dist", "pie", "graph", "dendro3d", "projgraph"),  plot.var = TRUE,  show.names = TRUE,  names = 1:prod(x$som$parameters$the.grid$dim),  ...)## S3 method for class 'somSC'projectIGraph(object, init.graph, ...)cutree(object, k = NULL, h = NULL)

Arguments

sommap

AsomRes object.

method

Argument passed to thehclust function.

members

Argument passed to thehclust function.

k

Argument passed to thecutree function (number of super-clusters to cut the dendrogram).

h

Argument passed to thecutree function (height at which to cut the dendrogram).

clustering

Precomputed clustering provided by user. In this case, thefunction just returns asomSC object with this clustering but noassociated dendrogram. Not all methods and plots apply to this case.

...

Used forplot.somSC: further arguments passed either tothe functionplot (casetype = "dendro") or toplot.myGrid (casetype = "grid") or toplot.somRes (all other cases).

x

AsomSC object.

object

AsomSC object.

what

What to plot. Can be either the observations (obs), theprototypes (prototypes), an additional variable (add), orNULL if not appropriate.
Automatically set for types "hitmap" (to"obs") and"grid", (to"prototypes"). Default to"obs" otherwise.
Ifwhat = "add", the functionplot.somRes is also runwith the argumentwhat set to"add".

type

The type of plot to draw. Default value is"dendrogram", to plot the dendrogram of the clustering. Case"grid" plots the grid with colors corresponding to the clusters of the super clustering. Case"projgraph" uses anis_igraph object passed to the argumentvariable and plots the projected graph as defined by the methodprojectIGraph. All other cases are those available in the functionplot.somRes and superimpose the super-clusters over these plots.

plot.var

A boolean indicating whether a plot showing the evolution ofthe explained variance should be plotted. This argument is only used whentype = "dendrogram", its default value isTRUE.

show.names

Whether the cluster titles must be printed in center ofthe grid or not fortype = "grid". Default toFALSE (titles not displayed).

names

Ifshow.names = TRUE, values of the title to display fortype="grid". Default to "Cluster " followed by the clusternumber.

init.graph

Anigraph object which is projected according to the super-clusters. The vertices ofinit.graph mustcorrespond to the rows of the original dataset processed by SOM (note that case"korresp" is not handled by this function). In the projectedgraph, the vertices are positioned at the center of gravity of the super-clusters (more details in the sectionDetails below).

Details

ThesuperClass method can be used in 2 ways:

to choose the number of super clusters via anhclust object: then, both argumentsk andh can beNULL. Inthis case,superClass only returns the dendrogram of the hierarchical clustering, which can then be cut with the methodcutree (to which eitherk orh must be specified);
to cut the clustering into super clusters. Then, either argumentk or argumenth must be specified (seecutree for details).

The squared distance between prototypes is passed to the algorithm.

summary on asuperClass object produces a complete summary of the results that displays the number of clusters and super-clusters, the clustering itself and performs ANOVA analyses. Fortype = "numeric" the ANOVA is performed for each input variable and test the difference of this variable across the super-clusters of the map. Fortype = "relational" a dissimilarity ANOVA is performed (see (Anderson, 2001), except that in the present version, a crude estimate of the p-value is used which is based on the Fisher distribution and not on a permutation test.

On plots, the different super classes are identified in the following ways:

either with different color, whentype is set among:"grid" (N, K, R),"hitmap" (N, K, R),"lines" (N, K, R),"barplot" (N, K, R),"boxplot","poly.dist"(N, K, R),"mds" (N, K, R),"dendro3d" (N, K, R),"graph" (R),"projgraph" (R);
or with title, whentype is set among:"color" (N, K),"pie" (N, R).

In the list above, the charts available for anumerical SOM are indicated with a N, with a K for akorresp SOM and with an R forrelational SOM.

projectIGraph produces a projected graph from theis_igraph object passed to the argumentvariable as described in (Olteanu and Villa-Vialaneix, 2015). The attributes of this graph are the same than the ones obtained from the SOM map itself in the functionprojectIGraph.plot.somSC used withtype = "projgraph" calculates this graph and represents it by positioning the super-vertexes at the center of gravity of the super-clusters. This feature can be combined withpie.graph = TRUE to super-impose the information from an external factor related to the individuals in the original dataset (or, equivalently, to the vertexes of thegraph).

Value

ThesuperClass method returns an object of classsomSC,which is a list of the following elements:

cluster

The super clustering of the prototypes (only if eitherk orh are given by user).

tree

Anhclust object.

som

ThesomRes object given as argument (seetrainSOM for details).

TheprojectIGraph method returns an object of classis_igraph with the following attributes:

layout

provides the layout of the projected graph according to the center of gravity of the super-clusters positioned on the SOM grid (graph attribute);

name and size

respectively are the vertex number on the grid and the number of vertexes included in the corresponding cluster (vertex attribute);

weight

gives the number of edges (or the sum of the weights)between the vertexes of the two corresponding clusters (edge attribute).

Author(s)

Élise Maignéelise.maigne@inrae.fr
Madalina Olteanuolteanu@ceremade.dauphine.fr
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Anderson M.J. (2001). A new method for non-parametric multivariate analysis of variance.Austral Ecology,26, 32-46.

Olteanu M., Villa-Vialaneix N. (2015) Using SOMbrero for clustering and visualizing graphs.Journal de la Societe Francaise de Statistique,156, 95-119.

Examples

set.seed(11051729)my.som <- trainSOM(x.data = iris[, 1:4])# choose the number of super-clusterssc <- superClass(my.som)plot(sc)# cut the clusteringsc <- superClass(my.som, k = 4)summary(sc)plot(sc)plot(sc, type = "grid")plot(sc, what = "obs", type = "hitmap")# cut the clustering with a different number of clusterssc <- superClass(my.som, k = 5)summary(sc)# provide a precomputed clusteringsc2 <- superClass(my.som, clustering = sample(1:3, 25, replace = TRUE))

Run the SOM Algorithm

Description

ThetrainSOM function returns asomRes class object which contains the outputs of the algorithm.

Usage

trainSOM(x.data, ...)## S3 method for class 'somRes'print(x, ...)## S3 method for class 'somRes'summary(object, ...)

Arguments

x.data

a data frame or matrix containing the observations to be mappedon the grid by the SOM algorithm.

...

Further arguments to be passed to the functioninitSOM for specifying the parameters of the algorithm. The default values of the argumentsmaxit anddimension are calculated according to the SOM type if the user does not set them:

maxit is equal to (number of rows+number of columns)*5 if the SOM type iskorresp. It is equal to number of rows*5 in all other SOM types
dimension: for akorresp SOM, is approximately equal tothe square root of the number of observations to be classified divided by10 but it is never smaller than 5 or larger than 10.

x

an object of classsomRes.

object

an object of classsomRes.

Details

The version of the SOM algorithm implemented in this package is thestochastic version.

Several variants able to handle non-vectorial data are also implemented in their stochastic versions:type="korresp" for contingency tables, asdescribed in Cottrell et al. (2004) (with weights as in Cottrell and Letrémy, 2005a);type = "relational" for dissimilarity matrices, as described in Olteanu et al. (2015), with the fast implementation introduced in Marietteet al. (2017).

Missing values are handled as described in Cottrell et al. (2005b), not usingmissing entries of the selected observation during winner computation or prototype updates. This allows to proceed with the imputation of missingentries with the corresponding entries of the cluster prototype (withimpute).

summary produces a complete summary of the results that displays the parameters of the SOM, quality criteria and ANOVA. Fortype = "numeric" the ANOVA is performed for each input variable and test the difference of this variable across the clusters of the map. Fortype = "relational" a dissimilarity ANOVA is performed (Anderson, 2001), except that in the present version, a crude estimate of the p-value isused which is based on the Fisher distribution and not on a permutation test.

Value

ThetrainSOM function returns an object of classsomReswhich contains the following components:

clustering

the final classification of the data.

prototypes

the final coordinates of the prototypes.

energy

the final energy of the map. For the numeric case, energy with data having missing entries is based on data imputation as describedin Cottrell and Letrémy (2005b).

backup

a list containing some intermediate backups of the prototypes coordinates, clustering, energy and the indexes of the recorded backups, ifnb.save is set to a value larger than 1.

data

the original dataset used to train the algorithm.

parameters

a list of the map's parameters, which is an object of classparamSOM as produced by the functioninitSOM.

The functionsummary.somRes also provides an ANOVA (ANalysis Of VAriance) of each input numeric variables in function of the map's clusters. This is helpful to see which variables participate to the clustering.

Note

Warning! Recording intermediate backups with the argumentnb.save can strongly increase the computational time since calculatingthe entire clustering and the energy is time consuming. Use this option withcare and only when it is strictly necessary.

Author(s)

Élise Maignéelise.maigne@inrae.fr
Jérome Mariettejerome.mariette@inrae.fr
Madalina Olteanuolteanu@ceremade.dauphine.fr
Fabrice Rossifabrice.rossi@apiacoa.org
Nathalie Vialaneixnathalie.vialaneix@inrae.fr

References

Anderson M.J. (2001). A new method for non-parametric multivariate analysis of variance.Austral Ecology,26, 32-46.

Kohonen T. (2001)Self-Organizing Maps. Berlin/Heidelberg: Springer-Verlag, 3rd edition.

Cottrell M., Ibbou S., Letrémy P. (2004) SOM-based algorithms for qualitativevariables.Neural Networks,17, 1149-1167.

Cottrell M., Letrémy P. (2005a) How to use the Kohonen algorithm to simultaneously analyse individuals in a survey.Neurocomputing,21, 119-138.

Cottrell M., Letrémy P. (2005b) Missing values: processing with the Kohonen algorithm.Proceedings of Applied Stochastic Models and Data Analysis(ASMDA 2005), 489-496.

Olteanu M., Villa-Vialaneix N. (2015) On-line relational and multiplerelational SOM.Neurocomputing,147, 15-30.

Mariette J., Rossi F., Olteanu M., Mariette J. (2017) Accelerating stochastic kernel SOM. In: M. Verleysen,XXVth European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017), i6doc, Bruges, Belgium, 269-274.

Examples

# Run trainSOM algorithm on the iris data with 500 iterationsiris.som <- trainSOM(x.data=iris[,1:4])iris.somsummary(iris.som)

Movatterモバイル変換

Self Organizing Maps Bound to Realize Euclidean and Relational Outputs

Description

Author(s)

References

See Also

Impute values from prototype information

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Create an Empty Grid

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Initialize Parameters for the SOM Algorithm

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Dataset "Les Misérables"

Description

Format

Details

References

Examples

Methods for 'myGrid' Objects.

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

Plot asomRes Object

Description

Usage

Arguments

Details

Author(s)

See Also

Examples

Predict the Class of a New Observation

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

2002 French Presidential Election Dataset

Description

Format

Source

References

Examples

Compute the Projection of a Graph on a Grid

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Plot a`somRes` Object

Complete Documentation on`somRes` Plots

Graphics on the observations:`what = "obs"`

Graphic on the energy:`what = "energy"` (N, K, R)

Graphics on the prototypes:`what = "prototypes"`

Graphics on an additional variable:`what="add"`