- Notifications
You must be signed in to change notification settings - Fork10
Diverse Cluster Ensemble in R
License
Unknown, MIT licenses found
Licenses found
AlineTalhouk/diceR
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The goal ofdiceR is to provide a systematic framework for generatingdiverse cluster ensembles in R. There are a lot of nuances in clusteranalysis to consider. We provide a process and a suite of functions andtools to implement a systematic framework for cluster discovery, guidingthe user through the generation of a diverse clustering solutions fromdata, ensemble formation, algorithm selection and the arrival at a finalconsensus solution. We have additionally developed visual and analyticalvalidation tools to help with the assessment of the final result. Weimplemented a wrapper functiondice() that allows the user to easilyobtain results and assess them. Thus, the package is accessible to bothend user with limited statistical knowledge. Full access to the packageis available for informaticians and statisticians and the functions areeasily expanded. More details can be found in our companion paperpublished atBMCBioinformatics.
You can installdiceR from CRAN with:
install.packages("diceR")Or get the latest development version from GitHub:
# install.packages("devtools")devtools::install_github("AlineTalhouk/diceR")
The following example shows how to use the main function of the package,dice(). A data matrixhgsc contains a subset of gene expressionmeasurements of High Grade Serous Carcinoma Ovarian cancer patients fromthe Cancer Genome Atlas publicly available datasets. Samples as rows,features as columns. The function below runs the package through thedice() function. We specify (a range of)nk clusters overrepssubsamples of the data containing 80% of the full samples. We alsospecify the clusteringalgorithms to be used and the ensemblefunctions used to aggregated them incons.funs.
library(diceR)data(hgsc)obj<- dice(hgsc,nk=4,reps=5,algorithms= c("hc","diana"),cons.funs= c("kmodes","majority"),progress=FALSE,verbose=FALSE)
The first few cluster assignments are shown below:
knitr::kable(head(obj$clusters))
| kmodes | majority | |
|---|---|---|
| TCGA.04.1331_PRO.C5 | 2 | 2 |
| TCGA.04.1332_MES.C1 | 2 | 2 |
| TCGA.04.1336_DIF.C4 | 4 | 2 |
| TCGA.04.1337_MES.C1 | 2 | 2 |
| TCGA.04.1338_MES.C1 | 2 | 2 |
| TCGA.04.1341_PRO.C5 | 2 | 2 |
You can also compare the basealgorithms with thecons.funs usinginternal evaluation indices:
knitr::kable(obj$indices$ii$`4`)
| Algorithms | calinski_harabasz | dunn | pbm | tau | gamma | c_index | davies_bouldin | mcclain_rao | sd_dis | ray_turi | g_plus | silhouette | s_dbw | Compactness | Connectivity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HC_Euclidean | HC_Euclidean | 3.104106 | 0.2608547 | 59.73711 | 0 | 0.4285714 | 0.2844073 | 1.839182 | 0.8009149 | 0.1306062 | 1.4765665 | 0 | NaN | NaN | 24.83225 | 41.62183 |
| DIANA_Euclidean | DIANA_Euclidean | 53.647400 | 0.3348103 | 33.87817 | 0 | -1.8750000 | 0.1589442 | 2.824201 | 0.8051915 | 0.2119281 | 3.2978986 | 0 | 0.0692233 | NaN | 21.93396 | 241.66310 |
| kmodes | kmodes | 55.138600 | 0.3396909 | 50.51722 | 0 | -0.6822430 | 0.1453599 | 2.006752 | 0.7972999 | 0.1170829 | 1.1408258 | 0 | 0.1253664 | NaN | 21.91494 | 201.42540 |
| majority | majority | 19.373248 | 0.3544371 | 85.05173 | 0 | -1.1651376 | 0.2102487 | 1.622799 | 0.8019453 | 0.1108674 | 0.9200511 | 0 | 0.1884934 | NaN | 23.85408 | 64.04921 |
This figure is a visual schematic of the pipeline thatdice()implements.
Please visit theoverviewpage for more detail.
About
Diverse Cluster Ensemble in R
Resources
License
Unknown, MIT licenses found
Licenses found
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.

