Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Sequence Clustering with Discrete-Output HMMs
Version:0.0.3
Date:2022-12-21
Author:Gabriel Budel [aut, cre], Flavius Frasincar [aut]
Maintainer:Gabriel Budel <gabysp_budel@hotmail.com>
Description:Provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters.
License:GPL (≥ 3)
Encoding:UTF-8
URL:https://github.com/gabybudel/DBHC
BugReports:https://github.com/gabybudel/DBHC/issues
Imports:seqHMM (≥ 1.0.8), TraMineR (≥ 2.0-7), reshape2 (≥ 1.2.1),ggplot2 (≥ 2.2.1), methods (≥ 4.2.2)
NeedsCompilation:no
Repository:CRAN
RoxygenNote:7.2.3
Suggests:testthat (≥ 3.0.0)
Config/testthat/edition:3
Packaged:2022-12-22 07:49:05 UTC; gabys
Date/Publication:2022-12-22 13:10:15 UTC

Cluster Assignment

Description

Assign sequences to cluster models that give the highest sequence-to-hmmlikelihood. Used inhmm.clust.

Usage

assign.clusters(partition, memberships, sequences, smoothing = 1e-04)

Arguments

partition

A list object with the partition, a mixture of HMMs. Eachelement in the list is anhmm object (seebuild_hmm).

memberships

A matrix with cluster memberships for each sequence.

sequences

Anstslist object (seeseqdef) of sequences with discrete observations.

smoothing

Smoothing parameter for absolute discounting insmooth.probabilities.

Value

The updated matrix with cluster memberships for each sequence.

See Also

Used in main function for the DBHC algorithmhmm.clust.


HMM BIC

Description

Compute the BIC of a single HMM given a threshold epsilon for countingparameters. Auxiliary function used insize.search.

Usage

cluster.bic(hmm, eps = 0.001)

Arguments

hmm

Anhmm object (seebuild_hmm).

eps

A threshold epsilon for counting parameters.

Value

The BIC ofhmm.

See Also

Used insize.search.


Count HMM Parameters

Description

Count the number of parameters in an HMM larger than a small number epsilon.Auxiliary function used inpartition.bic andcluster.bic.

Usage

count.parameters(hmm, eps = 0.001)

Arguments

hmm

Anhmm object (seebuild_hmm).

eps

A threshold epsilon for counting parameters.

Value

The number of parameters larger thaneps.

See Also

Used inpartition.bic andcluster.bic.


Heatmap Emission Probabilities

Description

Plots a heatmap of an HMM's emission probabilities.

Usage

emission.heatmap(emission, base_size = 10)

Arguments

emission

A matrix with emission probabilities (see alsobuild_hmm).

base_size

Numerical, a size parameter for the plots made usingggplot2(seetheme), default = 10.

See Also

Seehmm.clust for an example.


DBHC Algorithm

Description

Implementation of the DBHC algorithm, an HMM clustering algorithm that findsa mixture of discrete-output HMMs. The algorithm uses heuristics based on BICto search for the optimal number of hidden states in each HMM and the optimalnumber of clusters.

Usage

hmm.clust(  sequences,  id = NULL,  smoothing = 1e-04,  eps = 0.001,  init.size = 2,  alphabet = NULL,  K.max = NULL,  log_space = FALSE,  print = FALSE,  seed.size = 3)

Arguments

sequences

Anstslist object (seeseqdef) of sequences with discrete observations oradata.frame.

id

A vector with ids that identify the sequences insequences.

smoothing

Smoothing parameter for absolute discounting insmooth.probabilities.

eps

A threshold epsilon for counting parameters incount.parameters.

init.size

The number of HMM states in an initial HMM.

alphabet

The alphabet of output labels, if not provided alphabet istaken fromstslist object (seeseqdef).

K.max

Maximum number of clusters, if not provided algorithm searchesfor the optimal number itself.

log_space

Logical, parameter provided tofit_model for whether to use optimization in logspace or not.

print

Logical, whether to print intermediate steps or not.

seed.size

Seed size, the number of sequences to be selected for a seed

Value

A list with components:

sequences

Anstslist object of sequences with discrete observations.

id

A vector with ids that identify the sequences insequences.

cluster

A vector with found clustermemberships for the sequences.

partition

A list object withthe partition, a mixture of HMMs. Each element in the list is anhmmobject.

memberships

A matrix with cluster memberships foreach sequence.

n.clusters

Numerical, the found number ofclusters.

sizes

A vector with the number of HMM states foreach cluster model.

bic

A vector with the BICs for eachcluster model.

Examples

## Simulated datalibrary(seqHMM)output.labels <-  c("H", "T")# HMM 1states.1 <- c("A", "B", "C")transitions.1 <- matrix(c(0.8,0.1,0.1,0.1,0.8,0.1,0.1,0.1,0.8), nrow = 3)rownames(transitions.1) <- states.1colnames(transitions.1) <- states.1emissions.1 <- matrix(c(0.5,0.75,0.25,0.5,0.25,0.75), nrow = 3)rownames(emissions.1) <- states.1colnames(emissions.1) <- output.labelsinitials.1 <- c(1/3,1/3,1/3)# HMM 2states.2 <- c("A", "B")transitions.2 <- matrix(c(0.75,0.25,0.25,0.75), nrow = 2)rownames(transitions.2) <- states.2colnames(transitions.2) <- states.2emissions.2 <- matrix(c(0.8,0.6,0.2,0.4), nrow = 2)rownames(emissions.2) <- states.2colnames(emissions.2) <- output.labelsinitials.2 <- c(0.5,0.5)# Simulatehmm.sim.1 <- simulate_hmm(n_sequences = 100,                          initial_probs = initials.1,                          transition_probs = transitions.1,                          emission_probs = emissions.1,                          sequence_length = 25)hmm.sim.2 <- simulate_hmm(n_sequences = 100,                          initial_probs = initials.2,                          transition_probs = transitions.2,                          emission_probs = emissions.2,                          sequence_length = 25)sequences <- rbind(hmm.sim.1$observations, hmm.sim.2$observations)n <- nrow(sequences)# Clustering algorithmid <- paste0("K-", 1:n)rownames(sequences) <- idsequences <- sequences[sample(1:n, n),]res <- hmm.clust(sequences, id = rownames(sequences))############################################################################### Swiss Household Datadata("biofam", package = "TraMineR")# Clustering algorithmnew.alphabet <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D")sequences <- seqdef(biofam[,10:25], alphabet = 0:7, states = new.alphabet)## Not run: res <- hmm.clust(sequences)# Heatmapscluster <- 1  # display heatmaps for cluster 1transition.heatmap(res$partition[[cluster]]$transition_probs,                   res$partition[[cluster]]$initial_probs)emission.heatmap(res$partition[[cluster]]$emission_probs)## End(Not run)## A smaller example, which takes less time to runsubset <- sequences[sample(1:nrow(sequences), 20, replace = FALSE),]# Clustering algorithm, limiting number of clusters to 2res <- hmm.clust(subset, K.max = 2)# Number of clustersprint(res$n.clusters)# Table of cluster membershipstable(res$memberships[,"cluster"])# BIC for each number of clustersprint(res$bic)# Heatmapscluster <- 1  # display heatmaps for cluster 1transition.heatmap(res$partition[[cluster]]$transition_probs,                   res$partition[[cluster]]$initial_probs)emission.heatmap(res$partition[[cluster]]$emission_probs)

Get HMM Log Likelihood

Description

Get the log likelihood of an HMM object and check if it is feasible (i.e.,contains no illegal emissions). Auxiliary function used inpartition.bic.

Usage

model.ll(hmm)

Arguments

hmm

Anhmm object (seebuild_hmm).

Value

The log likelihood of thehmm object, print warning if modelis infeasible (i.e., if the log likelihood is evaluated for a sequence thatcontains emissions that are assigned probability 0 in thehmmobject).

See Also

Used inpartition.bic.


Partition BIC

Description

Compute the BIC of a partition given a threshold epsilon for countingparameters. Auxiliary function used inhmm.clust.

Usage

partition.bic(partition, eps = 0.001)

Arguments

partition

A list object with the partition of HMMs, a mixture of HMMs.

eps

A threshold epsilon for counting parameters incount.parameters.

Value

The BIC of the partition.

See Also

Used in main function for the DBHC algorithmhmm.clust.


Seed Selection Procedure

Description

Seed selection procedure of the DBHC algorithm, also invokes size searchalgorithm for seed insize.search. Used inhmm.clust.

Usage

select.seeds(  sequences,  log_space = FALSE,  K,  seed.size = 3,  init.size = 2,  print = FALSE,  smoothing = 1e-04)

Arguments

sequences

Anstslist object (seeseqdef) of sequences with discrete observations.

log_space

Logical, parameter provided tofit_model for whether to use optimization in logspace or not.

K

The number of seeds to select, equal to the number of clusters in apartition.

seed.size

Seed size, the number of sequences to be selected for aseed.

init.size

The number of HMM states in an initial HMM.

print

Logical, whether to print intermediate steps or not.

smoothing

Smoothing parameter for absolute discounting insmooth.probabilities.

Value

A partition as a list object with HMMs for the selected seeds.

See Also

Used in main function for the DBHC algorithmhmm.clust.


Sequence-to-HMM Likelihood

Description

Compute the sequence-to-HMM likelihood of an HMM evaluated for a singlesequence and check if the sequence contains emissions that are not possibleaccording to the HMM. Auxiliary function used inselect.seedsandassign.clusters.

Usage

seq2hmm.ll(hmm)

Arguments

hmm

Anhmm object (seebuild_hmm)containing a single sequence.

Value

The log likelihood of the sequence contained inhmm, valuewill be set to minus infinity if the sequence contains illegal emissions.

See Also

Used inselect.seeds andassign.clusters.


Size Search Algorithm

Description

The size search algorithm finds the optimal number of HMM states for a set ofsequences and returns both the optimalhmm object and thecorresponding number of hidden states. Used inselect.seeds.

Usage

size.search(sequences, log_space = FALSE, print = FALSE)

Arguments

sequences

Anstslist object (seeseqdef) of sequences with discrete observations.

log_space

Logical, parameter provided tofit_model for whether to use optimization in logspace or not.

print

Logical, whether to print intermediate steps or not.

Value

A list with the optimal number of HMM states and the optimalhmm object.

See Also

Used in the DBHC seed selection procedure inselect.seeds.


Smooth HMM Parameters

Description

Smooth the parameters of an HMM using absolute discounting given a thresholdepsilon. Auxiliary function used inselect.seeds,assign.clusters, andhmm.clust.

Usage

smooth.hmm(hmm, smoothing = 1e-04)

Arguments

hmm

A rawhmm object (seebuild_hmm).

smoothing

Smoothing parameter for absolute discounting insmooth.probabilities.

Value

Anhmm object with smoothed probabilities.

See Also

Used inselect.seeds,assign.clusters,and main function for the DBHC algorithmhmm.clust.


Smooth Probabilities

Description

Smooth a vector of probabilities using absolute discounting. Auxiliaryfunction used insmooth.hmm.

Usage

smooth.probabilities(probs, smoothing = 1e-04)

Arguments

probs

A vector of raw probabilities.

smoothing

Smoothing parameter for absolute discounting.

Value

A vector of smoothed probabilities.

See Also

Used insmooth.hmm.


Heatmap Transition Probabilities

Description

Plots a heatmap of an HMM's initial and transition probabilities.

Usage

transition.heatmap(transition, initial = NULL, base_size = 10)

Arguments

transition

A matrix with transition probabilities (see alsobuild_hmm).

initial

An (optional) vector of initial probabilities.

base_size

Numerical, a size parameter for the plots made usingggplot2(seetheme), default = 10.

See Also

Seehmm.clust for an example.


[8]ページ先頭

©2009-2025 Movatter.jp