| Type: | Package |
| Title: | Hierarchical Clustering with Spatial Constraints |
| Version: | 2.1 |
| Author: | Marie Chavent [aut, cre], Vanessa Kuentz [aut], Amaury Labenne [aut], Jerome Saracco [aut] |
| Maintainer: | Marie Chavent <Marie.Chavent@u-bordeaux.fr> |
| Description: | Implements a Ward-like hierarchical clustering algorithm including soft spatial/geographical constraints. |
| Depends: | R (≥ 3.0.0) |
| Imports: | graphics, stats, sp, spdep |
| License: | GPL-2 |GPL-3 [expanded from: GPL (≥ 2.0)] |
| LazyData: | true |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.1.2 |
| NeedsCompilation: | no |
| Packaged: | 2021-09-30 12:43:23 UTC; mchavent |
| Repository: | CRAN |
| Date/Publication: | 2021-09-30 14:20:13 UTC |
Choice of the mixing parameter
Description
This function calculates the proportion of inertia explained by the partitions inK clustersfor a range of mixing parametersalpha. When the proportion of explained inertia calculated withD0decreases, the proportion of explained inertiacalculated withD1 increases. The plot of the two curves of explainedinertia (one forD0 and one forD1) helpsthe user to choose the mixing parameteralpha.
Usage
choicealpha(D0, D1, range.alpha, K, wt = NULL, scale = TRUE, graph = TRUE)Arguments
D0 | a dissimilarity matrix of class |
D1 | an other dissimilarity matrix of class |
range.alpha | a vector of real values between 0 and 1. |
K | the number of clusters. |
wt | vector with the weights of the observations. By default, wt=NULL corresponds to the case where all observations are weighted by 1/n. |
scale | if TRUE the two dissimilarity matrices are scaled i.e. dividedby their max. |
graph | if TRUE, two graphics (proportion and normalized proportion of explained inertia) are drawn. |
Value
An object with S3 class "choicealpha" and the following components:
Q | a matrix of dimension |
Qnorm | a matrix of dimension |
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.
See Also
Examples
data(estuary)D0 <- dist(estuary$dat) # the socio-demographic distancesD1 <- as.dist(estuary$D.geo) # the geographic distances between the citiesrange.alpha <- seq(0,1,0.1)K <- 5cr <- choicealpha(D0,D1,range.alpha,K,graph=TRUE)cr$Q # proportion of explained pseudo inertiacr$Qnorm # normalized proportion of explained pseudo inertiaestuary data
Description
Data refering to n=303 French municipalities of gironde estuary (a south-ouest French county).The data are issued from the French population census conducted by the National Institute of Statistics and Economic Studies. The dataset is an extraction of four quantitative socio-economic variables for a subsample of 303 French municipalities located on theatlantic coast between Royan and Mimizan.employ.rate.city is the employment rate of the municipality, that is the ratio of the number of individuals who have a job to the population of working age (generally defined, for the purposes of international comparison, as persons of between 15 and 64 years of age).graduate.rate refers to the level of education of the population that is the highest degree declared by the individual. It is defined here as the ratio for the whole population having completed a diploma equivalent or of upper level to two years of higher education (DUT, BTS, DEUG, nursing and social training courses, license, maitrise, master, DEA, DESS, doctorate, or Grande Ecole diploma).housing.appart is the ratio of apartment housing.agri.land is the part of agricultural area of the municipality.
Format
The R dataset estuary is a list of three objects:
dat: a data frame with the description of the n=303 municipalities on p=4 socio-demographic variables.
D.geo: a matrix with the geographical distances between the town hall of the n=303 municipalities.
map: an object of class
SpatialPolygonsDataFramewith the map of the gironde estuary.
Source
Original data are issued from the French population census of National Institute of Statistics and Economic Studies for year 2009. The agricultural surface has been calculated on data coming from the French National Institute of Geographical and Forestry Information. The calculation of the ratio and recoding of categories have been made by Irstea Bordeaux.
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.
Examples
data(estuary)names(estuary)head(estuary$dat)Ward clustering with soft contiguity contraints
Description
Implements a Ward-like hierarchical clustering algorithm including soft contiguity constraints. The algorithm takes asinput two dissimilarity matricesD0 andD1 and a mixing parameter alpha between 0 an 1. The dissimilarities can be non euclidean and the weights of the observations can be non uniform. The first matrix gives the dissimilarities in the "feature space". The second matrix gives the dissimilarities in the "constraint" space. For instance,D1can be a matrix of geographical distances or a matrix build from a contiguity matrix. The mixing parameteralpha sets the importance of the constraint in the clustering process.
Usage
hclustgeo(D0, D1 = NULL, alpha = 0, scale = TRUE, wt = NULL)Arguments
D0 | an object of class |
D1 | an object of class "dist" with other dissimilarities between the same n observations. |
alpha | a real value between 0 and 1. This mixing parameter gives the relative importance of |
scale | if TRUE the two dissimilarity matric |
wt | vector with the weights of the observations. By default, wt=NULL corresponds tothe case where all observations are weighted by 1/n. |
Details
The criterion minimized at each stage is a convex combination ofthe homogeneity criterion calculated withD0 and the homogeneity criterion calculated withD1. The parameteralpha (the weightof this convex combination) controls the importance of the constraint in the quality of the solutions. Whenalpha increases, the homogeneity calculated withD0 decreases whereas thehomogeneity calculated withD1 increases.
Value
Returns an object of classhclust.
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.
See Also
Examples
data(estuary)# with one dissimilarity matrixw <- estuary$map@data$POPULATION # non uniform weights D <- dist(estuary$dat)tree <- hclustgeo(D,wt=w)sum(tree$height)inertdiss(D,wt=w)inert(estuary$dat,w=w)plot(tree,labels=FALSE)part <- cutree(tree,k=5)sp::plot(estuary$map, border = "grey", col = part)# with two dissimilarity matrixD0 <- dist(estuary$dat) # the socio-demographic distancesD1 <- as.dist(estuary$D.geo) # the geographical distancesalpha <- 0.2 # the mixing parametertree <- hclustgeo(D0,D1,alpha=alpha,wt=w)plot(tree,labels=FALSE)part <- cutree(tree,k=5)sp::plot(estuary$map, border = "grey", col = part)Inertia of a cluster
Description
Computes the inertia of a cluster i.e. on a subset of rows of a data matrix.
Usage
inert( Z, indices = 1:nrow(Z), wt = rep(1/nrow(Z), nrow(Z)), M = rep(1, ncol(Z)))Arguments
Z | matrix data |
indices | vectors representing the subset of rows |
wt | weight vector |
M | diagonal distance matrix |
Examples
data(estuary)n <- nrow(estuary$dat)Z <- scale(estuary$dat)*sqrt(n/(n-1))inert(Z) # number of variablesw <- estuary$map@data$POPULATION # non uniform weights inert(Z,wt=w)Pseudo inertia of a cluster
Description
The pseudo inertia of a cluster is calculated from a dissimilarity matrixand not from a data matrix.
Usage
inertdiss(D, indices = NULL, wt = NULL)Arguments
D | an object of class "dist" with the dissimilarities between the n observations. The function |
indices | a vector with the indices of the subset of observations. |
wt | vector with the weights of the n observations |
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.
Examples
data(estuary)n <- nrow(estuary$dat)Z <- scale(estuary$dat)*sqrt(n/(n-1))inertdiss(dist(Z)) # pseudo inertiainert(Z) #equals for euclidean distancew <- estuary$map@data$POPULATION # non uniform weights inertdiss(dist(Z),wt=w)Plot to choose the mixing parameter
Description
Plot two curves of explainedinertia (one forD0 and one forD1) calculated withchoicealpha.
Usage
## S3 method for class 'choicealpha'plot( x, norm = FALSE, lty = 1:2, pch = c(8, 16), type = c("b", "b"), col = 1:2, xlab = "alpha", ylab = NULL, legend = NULL, cex = 1, ...)Arguments
x | an object of class |
norm | if TRUE, the normalized explained inertia are plotted. Otherwise, the explained inertia are plotted. |
lty | a vector of size 2 with the line types of the two curves. Seepar |
pch | a vector of size 2 specifying the symbol for the points of the two curves. Seepar |
type | a vector of size 2 specifying the type of lines of the two curves. Seepar |
col | a vector of size 2 specifying the colors the two curves. Seepar |
xlab | the title fot the x axis. |
ylab | the title fot the y axis. |
legend | a vector of size two the the text for the legend of the two curves. |
cex | text size in the legend. |
... | further arguments passed to or from other methods. |
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.
See Also
Examples
data(estuary)D0 <- dist(estuary$dat)D1 <- as.dist(estuary$D.geo) # the geographic distances between the citiesrange.alpha <- seq(0,1,0.1)K <- 5cr <- choicealpha(D0,D1,range.alpha,K,graph=FALSE)plot(cr,cex=0.8,norm=FALSE,cex.lab=0.8,ylab="pev", col=3:4,legend=c("socio-demo","geo"), xlab="mixing parameter")plot(cr,cex=0.8,norm=TRUE,cex.lab=0.8,ylab="pev", col=5:6,pch=5:6,legend=c("socio-demo","geo"), xlab="mixing parameter")Ward aggregation measures between singletons
Description
This function calculates the Ward aggregation measures between pairs of singletons.
Usage
wardinit(D, wt = NULL)Arguments
D | a object of class "dist" with the dissimilarities between the n obsevations. The function |
wt | vector with the weights of the observations. By default, wt=NULL corresponds tothe case where all observations are weighted by 1/n. |
Details
The Ward agreggation measure between to singletons i and j weighted by wi and wj is : (wiwj)/(wi+wj)dij^2where dij is the dissimilarity between i and j.
Value
Returns an object of class dist with the Ward aggregation measures between the n singletons.
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.
Dissimilarity based pseudo within-cluster inertia of a partition
Description
This function performs the pseudo within-cluster inertia of a partition from a dissimilarity matrix.
Usage
withindiss(D, part, wt = NULL)Arguments
D | an object of class "dist" with the dissimilarities between the n observations. The function |
part | a vector with group membership. |
wt | vector with the weights of the observations |
References
M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco. ClustGeo: an R packagefor hierarchical clustering with spatial constraints.Comput Stat (2018) 33: 1799-1822.