| Type: | Package |
| Title: | Sequence Clustering with Discrete-Output HMMs |
| Version: | 0.0.3 |
| Date: | 2022-12-21 |
| Author: | Gabriel Budel [aut, cre], Flavius Frasincar [aut] |
| Maintainer: | Gabriel Budel <gabysp_budel@hotmail.com> |
| Description: | Provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| URL: | https://github.com/gabybudel/DBHC |
| BugReports: | https://github.com/gabybudel/DBHC/issues |
| Imports: | seqHMM (≥ 1.0.8), TraMineR (≥ 2.0-7), reshape2 (≥ 1.2.1),ggplot2 (≥ 2.2.1), methods (≥ 4.2.2) |
| NeedsCompilation: | no |
| Repository: | CRAN |
| RoxygenNote: | 7.2.3 |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Packaged: | 2022-12-22 07:49:05 UTC; gabys |
| Date/Publication: | 2022-12-22 13:10:15 UTC |
Cluster Assignment
Description
Assign sequences to cluster models that give the highest sequence-to-hmmlikelihood. Used inhmm.clust.
Usage
assign.clusters(partition, memberships, sequences, smoothing = 1e-04)Arguments
partition | A list object with the partition, a mixture of HMMs. Eachelement in the list is an |
memberships | A matrix with cluster memberships for each sequence. |
sequences | An |
smoothing | Smoothing parameter for absolute discounting in |
Value
The updated matrix with cluster memberships for each sequence.
See Also
Used in main function for the DBHC algorithmhmm.clust.
HMM BIC
Description
Compute the BIC of a single HMM given a threshold epsilon for countingparameters. Auxiliary function used insize.search.
Usage
cluster.bic(hmm, eps = 0.001)Arguments
hmm | An |
eps | A threshold epsilon for counting parameters. |
Value
The BIC ofhmm.
See Also
Used insize.search.
Count HMM Parameters
Description
Count the number of parameters in an HMM larger than a small number epsilon.Auxiliary function used inpartition.bic andcluster.bic.
Usage
count.parameters(hmm, eps = 0.001)Arguments
hmm | An |
eps | A threshold epsilon for counting parameters. |
Value
The number of parameters larger thaneps.
See Also
Used inpartition.bic andcluster.bic.
Heatmap Emission Probabilities
Description
Plots a heatmap of an HMM's emission probabilities.
Usage
emission.heatmap(emission, base_size = 10)Arguments
emission | A matrix with emission probabilities (see also |
base_size | Numerical, a size parameter for the plots made using |
See Also
Seehmm.clust for an example.
DBHC Algorithm
Description
Implementation of the DBHC algorithm, an HMM clustering algorithm that findsa mixture of discrete-output HMMs. The algorithm uses heuristics based on BICto search for the optimal number of hidden states in each HMM and the optimalnumber of clusters.
Usage
hmm.clust( sequences, id = NULL, smoothing = 1e-04, eps = 0.001, init.size = 2, alphabet = NULL, K.max = NULL, log_space = FALSE, print = FALSE, seed.size = 3)Arguments
sequences | An |
id | A vector with ids that identify the sequences in |
smoothing | Smoothing parameter for absolute discounting in |
eps | A threshold epsilon for counting parameters in |
init.size | The number of HMM states in an initial HMM. |
alphabet | The alphabet of output labels, if not provided alphabet istaken from |
K.max | Maximum number of clusters, if not provided algorithm searchesfor the optimal number itself. |
log_space | Logical, parameter provided to |
print | Logical, whether to print intermediate steps or not. |
seed.size | Seed size, the number of sequences to be selected for a seed |
Value
A list with components:
sequencesAn
stslistobject of sequences with discrete observations.idA vector with ids that identify the sequences in
sequences.clusterA vector with found clustermemberships for the sequences.
partitionA list object withthe partition, a mixture of HMMs. Each element in the list is an
hmmobject.membershipsA matrix with cluster memberships foreach sequence.
n.clustersNumerical, the found number ofclusters.
sizesA vector with the number of HMM states foreach cluster model.
bicA vector with the BICs for eachcluster model.
Examples
## Simulated datalibrary(seqHMM)output.labels <- c("H", "T")# HMM 1states.1 <- c("A", "B", "C")transitions.1 <- matrix(c(0.8,0.1,0.1,0.1,0.8,0.1,0.1,0.1,0.8), nrow = 3)rownames(transitions.1) <- states.1colnames(transitions.1) <- states.1emissions.1 <- matrix(c(0.5,0.75,0.25,0.5,0.25,0.75), nrow = 3)rownames(emissions.1) <- states.1colnames(emissions.1) <- output.labelsinitials.1 <- c(1/3,1/3,1/3)# HMM 2states.2 <- c("A", "B")transitions.2 <- matrix(c(0.75,0.25,0.25,0.75), nrow = 2)rownames(transitions.2) <- states.2colnames(transitions.2) <- states.2emissions.2 <- matrix(c(0.8,0.6,0.2,0.4), nrow = 2)rownames(emissions.2) <- states.2colnames(emissions.2) <- output.labelsinitials.2 <- c(0.5,0.5)# Simulatehmm.sim.1 <- simulate_hmm(n_sequences = 100, initial_probs = initials.1, transition_probs = transitions.1, emission_probs = emissions.1, sequence_length = 25)hmm.sim.2 <- simulate_hmm(n_sequences = 100, initial_probs = initials.2, transition_probs = transitions.2, emission_probs = emissions.2, sequence_length = 25)sequences <- rbind(hmm.sim.1$observations, hmm.sim.2$observations)n <- nrow(sequences)# Clustering algorithmid <- paste0("K-", 1:n)rownames(sequences) <- idsequences <- sequences[sample(1:n, n),]res <- hmm.clust(sequences, id = rownames(sequences))############################################################################### Swiss Household Datadata("biofam", package = "TraMineR")# Clustering algorithmnew.alphabet <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D")sequences <- seqdef(biofam[,10:25], alphabet = 0:7, states = new.alphabet)## Not run: res <- hmm.clust(sequences)# Heatmapscluster <- 1 # display heatmaps for cluster 1transition.heatmap(res$partition[[cluster]]$transition_probs, res$partition[[cluster]]$initial_probs)emission.heatmap(res$partition[[cluster]]$emission_probs)## End(Not run)## A smaller example, which takes less time to runsubset <- sequences[sample(1:nrow(sequences), 20, replace = FALSE),]# Clustering algorithm, limiting number of clusters to 2res <- hmm.clust(subset, K.max = 2)# Number of clustersprint(res$n.clusters)# Table of cluster membershipstable(res$memberships[,"cluster"])# BIC for each number of clustersprint(res$bic)# Heatmapscluster <- 1 # display heatmaps for cluster 1transition.heatmap(res$partition[[cluster]]$transition_probs, res$partition[[cluster]]$initial_probs)emission.heatmap(res$partition[[cluster]]$emission_probs)Get HMM Log Likelihood
Description
Get the log likelihood of an HMM object and check if it is feasible (i.e.,contains no illegal emissions). Auxiliary function used inpartition.bic.
Usage
model.ll(hmm)Arguments
hmm | An |
Value
The log likelihood of thehmm object, print warning if modelis infeasible (i.e., if the log likelihood is evaluated for a sequence thatcontains emissions that are assigned probability 0 in thehmmobject).
See Also
Used inpartition.bic.
Partition BIC
Description
Compute the BIC of a partition given a threshold epsilon for countingparameters. Auxiliary function used inhmm.clust.
Usage
partition.bic(partition, eps = 0.001)Arguments
partition | A list object with the partition of HMMs, a mixture of HMMs. |
eps | A threshold epsilon for counting parameters in |
Value
The BIC of the partition.
See Also
Used in main function for the DBHC algorithmhmm.clust.
Seed Selection Procedure
Description
Seed selection procedure of the DBHC algorithm, also invokes size searchalgorithm for seed insize.search. Used inhmm.clust.
Usage
select.seeds( sequences, log_space = FALSE, K, seed.size = 3, init.size = 2, print = FALSE, smoothing = 1e-04)Arguments
sequences | An |
log_space | Logical, parameter provided to |
K | The number of seeds to select, equal to the number of clusters in apartition. |
seed.size | Seed size, the number of sequences to be selected for aseed. |
init.size | The number of HMM states in an initial HMM. |
print | Logical, whether to print intermediate steps or not. |
smoothing | Smoothing parameter for absolute discounting in |
Value
A partition as a list object with HMMs for the selected seeds.
See Also
Used in main function for the DBHC algorithmhmm.clust.
Sequence-to-HMM Likelihood
Description
Compute the sequence-to-HMM likelihood of an HMM evaluated for a singlesequence and check if the sequence contains emissions that are not possibleaccording to the HMM. Auxiliary function used inselect.seedsandassign.clusters.
Usage
seq2hmm.ll(hmm)Arguments
hmm | An |
Value
The log likelihood of the sequence contained inhmm, valuewill be set to minus infinity if the sequence contains illegal emissions.
See Also
Used inselect.seeds andassign.clusters.
Size Search Algorithm
Description
The size search algorithm finds the optimal number of HMM states for a set ofsequences and returns both the optimalhmm object and thecorresponding number of hidden states. Used inselect.seeds.
Usage
size.search(sequences, log_space = FALSE, print = FALSE)Arguments
sequences | An |
log_space | Logical, parameter provided to |
print | Logical, whether to print intermediate steps or not. |
Value
A list with the optimal number of HMM states and the optimalhmm object.
See Also
Used in the DBHC seed selection procedure inselect.seeds.
Smooth HMM Parameters
Description
Smooth the parameters of an HMM using absolute discounting given a thresholdepsilon. Auxiliary function used inselect.seeds,assign.clusters, andhmm.clust.
Usage
smooth.hmm(hmm, smoothing = 1e-04)Arguments
hmm | A raw |
smoothing | Smoothing parameter for absolute discounting in |
Value
Anhmm object with smoothed probabilities.
See Also
Used inselect.seeds,assign.clusters,and main function for the DBHC algorithmhmm.clust.
Smooth Probabilities
Description
Smooth a vector of probabilities using absolute discounting. Auxiliaryfunction used insmooth.hmm.
Usage
smooth.probabilities(probs, smoothing = 1e-04)Arguments
probs | A vector of raw probabilities. |
smoothing | Smoothing parameter for absolute discounting. |
Value
A vector of smoothed probabilities.
See Also
Used insmooth.hmm.
Heatmap Transition Probabilities
Description
Plots a heatmap of an HMM's initial and transition probabilities.
Usage
transition.heatmap(transition, initial = NULL, base_size = 10)Arguments
transition | A matrix with transition probabilities (see also |
initial | An (optional) vector of initial probabilities. |
base_size | Numerical, a size parameter for the plots made using |
See Also
Seehmm.clust for an example.