Movatterモバイル変換

Title:

Longitudinal Consensus Clustering with 'flexmix'

Version:

1.0.0

Description:

An adaption of the consensus clustering approach from 'ConsensusClusterPlus' for longitudinal data. The longitudinal data is clustered with flexible mixture models from 'flexmix', while the consensus matrices are hierarchically clustered as in 'ConsensusClusterPlus'. By using the flexibility from 'flexmix' and 'FactoMineR', one can use mixed data types for the clustering.

License:

GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.1.1

URL:

https://cellmapslab.github.io/longmixr/

BugReports:

https://github.com/cellmapslab/longmixr/issues

Depends:

R (≥ 3.5.0)

Imports:

checkmate, ConsensusClusterPlus, graphics, grDevices, flexmix,StatMatch, stats, utils

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown, dplyr, tidyr, ggplot2,ggalluvial, FactoMineR, factoextra, lme4, purrr

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2022-01-13 15:26:02 UTC; jonas_hagenberg

Author:

Jonas Hagenberg

[aut, cre], Matt Wilkerson [aut, cph], Peter Waltman [aut, cph], Max Planck Institute of Psychiatry [cph]

Maintainer:

Jonas Hagenberg <jonas_hagenberg@psych.mpg.de>

Repository:

CRAN

Date/Publication:

2022-01-13 20:32:42 UTC

Cross-sectional clustering with categorical variables

Description

This function uses theConsensusClusterPlus function from the packagewith the same name with defaults for clustering data with categoricalvariables. As the distance function, the Gower distance is used.

Usage

crosssectional_consensus_cluster(  data,  reps = 1000,  finalLinkage = "ward.D2",  innerLinkage = "ward.D2",  ...)

Arguments

data

a matrix or data.frame containing variables that should be usedfor computing the distance. This argument is passed toStatMatch::gower.dist

reps

number of repetitions, same as inConsensusClusterPlus

finalLinkage

linkage method for final clustering,same as inConsensusClusterPlussame as inConsensusClusterPlus

innerLinkage

linkage method for clustering steps,same as inConsensusClusterPlus

...

other arguments passed toConsensusClusterPlus, attention:thed argument cannot be set as it is directly computed bycrosssectional_consensus_cluster

Details

data can take all input data types thatgower.distcan handle, i.e.numeric,character/factor,orderedandlogical.

Value

The output is produced byConsensusClusterPlus

Examples

dc <- mtcars# scale continuous variablesdc <- sapply(mtcars[, 1:7], scale)# code factor variablesdc <- cbind(as.data.frame(dc),            vs = as.factor(mtcars$vs),            am = as.factor(mtcars$am),            gear = as.factor(mtcars$gear),            carb = as.factor(mtcars$carb))cc <- crosssectional_consensus_cluster(  data = dc,  reps = 10,  seed = 1)

Fake questionnaire data

Description

A simulated data set containing observations of 100 individuals at four timepoints. The data was simulated in two groups (50 individuals each) andcontains two questionnaires with five items each, one questionnaire withfive continuous variables and one additional cross-sectional continuousvariable. In this data set the group variable from the simulation isincluded. You typically don't have this group variable in your data.

Usage

fake_questionnaire_data

Format

A data frame with 400 rows and 20 variables:

ID: patient ID
visit: time point of the observation
group: to which simulated group the observation belongs to
age_visit_1: age of the patient at time point 1
single_continuous_variable: a cross-sectional continuous variable,i.e. there is only one unique value per individual
questionnaire_A_1: the first item of questionnaire A with categories1 to 5
questionnaire_A_2: the second item of questionnaire A with categories1 to 5
questionnaire_A_3: the third item of questionnaire A with categories1 to 5
questionnaire_A_4: the fourth item of questionnaire A with categories1 to 5
questionnaire_A_5: the fifth item of questionnaire A with categories1 to 5
questionnaire_B_1: the first item of questionnaire B with categories1 to 5
questionnaire_B_2: the second item of questionnaire B with categories1 to 5
questionnaire_B_3: the third item of questionnaire B with categories1 to 5
questionnaire_B_4: the fourth item of questionnaire B with categories1 to 5
questionnaire_B_5: the fifth item of questionnaire B with categories1 to 5
questionnaire_C_1: the first continuous variable of questionnaire C
questionnaire_C_2: the second continuous variable of questionnaire C
questionnaire_C_3: the third continuous variable of questionnaire C
questionnaire_C_4: the fourth continuous variable of questionnaire C
questionnaire_C_5: the fifth continuous variable of questionnaire C

Source

simulated data

Extract the cluster assignments

Description

This functions extracts the cluster assignments from anlcc object.One can specify which for which number of clusters the assignmentsshould be returned.

Usage

get_clusters(cluster_solution, number_clusters = NULL)

Arguments

cluster_solution

anlcc object

number_clusters

default isNULL to return all assignments.Otherwise specify a numeric vector with the number of clusters for which theassignments should be returned, e.g.2:4

Value

adata.frame with an ID column (the name of the ID columnwas specified by the user when calling thelongitudinal_consensus_cluster) function and one column with clusterassignments for every specified number of clusters. Only the assignmentsincluded innumber_clusters are returned in the form of columns withthe namesassignment_num_clus_x

Examples

# not runset.seed(5)test_data <- data.frame(patient_id = rep(1:10, each = 4),visit = rep(1:4, 10),var_1 = c(rnorm(20, -1), rnorm(20, 3)) +rep(seq(from = 0, to = 1.5, length.out = 4), 10),var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +rep(seq(from = 1.5, to = 0, length.out = 4), 10))model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))clustering <- longitudinal_consensus_cluster(data = test_data,id_column = "patient_id",max_k = 2,reps = 3,model_list = model_list,flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))cluster_assignments <- get_clusters(clustering, number_clusters = 2)# end not run

Longitudinal consensus clustering with flexmix

Description

This function performs longitudinal clustering with flexmix. To get robustresults, the data is subsampled and the clustering is performed on thissubsample. The results are combined in a consensus matrix and a finalhierarchical clustering step performed on this matrix. In this, it followsthe approach from theConsensusClusterPlus package.

Usage

longitudinal_consensus_cluster(  data = NULL,  id_column = NULL,  max_k = 3,  reps = 10,  p_item = 0.8,  model_list = NULL,  flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"),  title = "untitled_consensus_cluster",  final_linkage = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",    "median", "centroid"),  seed = 3794,  verbose = FALSE)

Arguments

data

adata.frame with one or several observations per subject.It needs to contain one column that specifies to which subject the entry (row)belongs to. This ID column is specified inid_column. Otherwise, thereare no restrictions on the column names, as the model is specified inflexmix_formula.

id_column

name (character vector) of the ID column indata toidentify all observations of one subject

max_k

maximum number of clusters, default is3

reps

number of repetitions, default is10

p_item

fraction of samples contained in subsampled sample, default is0.8

model_list

either oneflexmix driver or a list offlexmixdrivers of classFLXMR

flexmix_formula

aformula object that describes theflexmixmodel relative to the formula in the flexmix drivers (the dot in the flexmixdrivers is replaced, see the example). That means that you usually onlyspecify the right-hand side of the formula here. However, this is not enforcedor checked to give you more flexibility over theflexmix interface

title

name of the clustering; used ifwriteTable = TRUE

final_linkage

linkage used for the last hierarchical clustering step onthe consensus matrix; has to beaverage, ward.D, ward.D2, single, complete, mcquitty, medianorcentroid. The default isaverage

seed

seed for reproducibility

verbose

boolean if status messages should be displayed.Default isFALSE

Details

The data typeslongitudinal_consensus_cluster can handle depends onhow theflexmix models are set up, in principle all data types aresupported for which there is aflexmix driver with the desiredoutcome variable.

If you follow the dimension reduction approach outlined invignette("Example clustering analysis", package = "longmixr"), theinput data types depend on whatFAMD from theFactoMineRpackage can handle.FAMD acceptsnumeric variables and treatsall other variables asfactor variables which it can handle as well.

Value

An object (list) of classlcc with lengthmaxk.The first entrygeneral_information contains the entries:

`consensus_matrices`	a list of all consensus matrices (for all specified clusters)

`cluster_assignments`	a`data.frame` with an ID column named after`id_column` and a column for every specified number of clusters, e.g.`assignment_num_clus_2`

`call`	the call/all arguments how`longitudinal_consensus_cluster` was called

The other entries correspond to the number of specified clusters (e.g. thesecond entry corresponds to 2 specified clusters) and each contains a list with thefollowing entries:

`consensus_matrix`	the consensus matrix

`consensus_tree`	the result of the hierarchical clustering on the consensus matrix

`consensus_class`	the resulting class for every observation

`found_flexmix_clusters`	a vector of the actual found number of clusters by`flexmix` (which can deviate from the specified number)

Examples

set.seed(5)test_data <- data.frame(patient_id = rep(1:10, each = 4),visit = rep(1:4, 10),var_1 = c(rnorm(20, -1), rnorm(20, 3)) +rep(seq(from = 0, to = 1.5, length.out = 4), 10),var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +rep(seq(from = 1.5, to = 0, length.out = 4), 10))model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))clustering <- longitudinal_consensus_cluster(data = test_data,id_column = "patient_id",max_k = 2,reps = 3,model_list = model_list,flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))# not run# plot(clustering)# end not run

Plot a longitudinal consensus clustering

Description

Plot a longitudinal consensus clustering

Usage

## S3 method for class 'lcc'plot(x, color_palette = NULL, ...)

Arguments

x

lcc object (output fromlongitudinal_consensus_cluster)

color_palette

optional character vector of colors for consensus matrix

...

additional parameters for plotting; currently not used

Value

Plots the following plots:

`consensus matrix legend`	the legend for the following consensus matrix plots

`consensus matrix plot`	for every specified number of clusters, a heatmap of the consensus matrix and the result of the final clustering is shown

`consensus CDF`	a line plot of the CDFs for all different specified numbers of clusters

`Delta area`	elbow plot of the difference in the CDFs between the different numbers of clusters

`tracking plot`	cluster assignment of the subjects throughout the different cluster solutions

`item-consensus`	for every item (subject), calculate the average consensus value with all items that are assigned to one consensus cluster. This is repeated for every cluster and for all different numbers of clusters

`cluster-consensus`	every bar represents the average pair-wise item-consensus within one consensus cluster

Try out different linkage methods

Description

In the final step, the consensus clustering performs a hierarchical clusteringstep on the consensus cluster. This function tries out different linkagemethods and returns the corresponding clusterings. The outputs can be plottedlike the results fromlongitudinal_consensus_cluster.

Usage

test_clustering_methods(  results,  use_methods = c("average", "ward.D", "ward.D2", "single", "complete", "mcquitty",    "median", "centroid"))

Arguments

results

clustering result of classlcc

use_methods

character vector of one or several items ofaverage,ward.D,ward.D2,single,complete,mcquitty,median orcentroid

Value

a list of elements, each element of classlcc. The entries arenamed after the used linkage method.

Examples

set.seed(5)test_data <- data.frame(patient_id = rep(1:10, each = 4),visit = rep(1:4, 10),var_1 = c(rnorm(20, -1), rnorm(20, 3)) +rep(seq(from = 0, to = 1.5, length.out = 4), 10),var_2 = c(rnorm(20, 0.5, 1.5), rnorm(20, -2, 0.3)) +rep(seq(from = 1.5, to = 0, length.out = 4), 10))model_list <- list(flexmix::FLXMRmgcv(as.formula("var_1 ~ .")),flexmix::FLXMRmgcv(as.formula("var_2 ~ .")))clustering <- longitudinal_consensus_cluster(data = test_data,id_column = "patient_id",max_k = 2,reps = 3,model_list = model_list,flexmix_formula = as.formula("~s(visit, k = 4) | patient_id"))clustering_linkage <- test_clustering_methods(results = clustering,use_methods = c("average", "single"))# not run# plot(clustering_linkage[["single"]])# end not run