This document shows how to use some functions included in the packageto document and reproduce a clustering workflow from the app. Althoughthese functions do not cover all the steps such as selecting features,they allow users of the Shiny app to show the resulting heatmaps andboxplots. The example dataset used here is theirisdataset.
First we split the numeric and categorical variables and scale thedata.
Let’s check the dataset for highly correlated variables that willlikely skew the clusters with redundant information:
correlation_heatmap(numeric_data)
As seen above, petal length and width are highly correlated, so wekeep only one of them:
The clustering itself takes three steps: computing a distance matrix,computing the hierarchical clusters and cutting the tree to find thedesired number of clusters. In the app, each of these steps has matchingparameters: apply scaling and distance/similarity metric, linkage methodand the number of clusters.
scaling<-TRUEdistance_method<-"euclidean"linkage_method<-"ward.D2"# this assumes that, in the app, we identified 3 as the optimal number of clustersk<-3These parameters are used in three functions that the app also uses:compute_dmat,compute_clusters andcut_clusters. You can check the documentation for eachfunction in the package website, or interactively through?compute_dmat.
dmat<-compute_dmat(subset_data,distance_method,TRUE)clusters<-compute_clusters(dmat,linkage_method)cluster_labels<-cut_clusters(clusters,k)Now we can check both the heatmap+dendrogram and boxplots. A functionthat covers most steps to produce the heatmap is included in thepackage, with the name:cluster_heatmaps(). It plots thedendrogram, the annotation layer, the clustered data heatmap and theheatmap with the rest of the data not used for clustering. In the Shinyapp this is done automatically, but outside, plotting the annotation andthe unselected data are optional steps; the annotations require an extrastep with the functioncreate_annotations(). The colorsused in the app are also exported by the package as the variablecluster_colors.
species_annotation<-create_annotations(iris,"Species")cluster_heatmaps(scale(subset_data),clusters,k,cluster_colors, annotation=species_annotation)
In addition to the heatmap, the boxplots in the app are also availablethrough functions. There are two steps required to show data through boxplots: annotating the original data with the cluster and plottingit.
annotated_data<-annotate_clusters(subset_data,cluster_labels,TRUE)cluster_boxplots(annotated_data)