Movatterモバイル変換

Package Overview

Implements the Expectation Maximisation Algorithm for clustering themultivariate and univariate datasets. There are two versions of EMimplemented-EM* (converge faster by avoiding revisiting the data) andEM. For more details on EM*, see the ‘References’ section below.

The package has been tested with both real and simulated datasets.The package comes bundled with a dataset for demonstration(ionosphere_data.csv). More help about the package can be seen by typing?DCEM in the R console (after installing the package).

Currently, data imputation is not supported and user has tohandle the missing data before using the package.

Contact

For any Bug Fixes/Feature Update(s)

[Parichit Sharma: parishar@iu.edu]

For Reporting Issues

Issues

Package Link on CRAN

DCEM onCRAN

Installation Instructions

Dependencies First, install all the requiredpackages as follows:

install.packages(c(“matrixcalc”, “mvtnorm”, “MASS”, “Rcpp”))

Installing from CRAN

install.packages("DCEM"")

Installing from the Source Package

R CMD install DCEM_2.0.5.tar.gz

How to use the Package (Example: Working with the defaultbundled dataset)

For demonstration purpose, users can call thedcem_test() function from the R console. This functioninvokes the dcem_star_train() on the bundledionosphere_data. Alternatively, a minimal quick startexample is given below that explain how to cluster theionosphere_data from scratch.

# Example: Using the dcem_test()# Load the librarylibrary("DCEM")# call the dcem_test() function and store the result in a variablesample_out = dcem_test()# Probe the returned values # Note: Detailed description of the returned values is also given in the section# **_Displaying the output:_**sample_out$prob         # estimated posterior probabilitiessample_out$meu          # estimated mean of the clusterssample_out$sigma        # estimated covariance matricessample_out$priors       # estimated priorssample_out$memebership  # membership of data points based on maximum liklihood (posterior probabilities)

An example of clustering the ionosphere data

The DCEM package comes bundled with the ionosphere_data.csv fordemonstration. Help about the dataset can be seen by typing?ionosphere_data in the R console. Additional details canbe seen at the linkIonospheredata.
To use this dataset, paste the following code into the Rconsole.

ionosphere_data = read.csv2(  file = paste(trimws(getwd()),"/data/","ionosphere_data.csv",sep = ""),  sep = ",",  header = FALSE,  stringsAsFactors = FALSE)

Cleaning the data: Before the model can betrained (dcem_train() function), the data must be cleaned.This simply means to remove all redundant columns (example can be labelcolumn). This dataset contains labels in the last column (35th) and only0’s in the 2nd column so let’s remove them,

Paste the below code in the R session to clean the dataset.

ionosphere_data =  trim_data("35, 2", ionosphere_data)

Clustering the data: The dcem_train()learns the parameters of the Gaussian(s) from the input data.

Paste the below code in the R session to call the dcem_train()function.

dcem_out = dcem_train(data = ionosphere_data, threshold = 0.0001, iteration_count = 50, num_clusters = 2)

Displaying the output: The list returnedby thedcem_train() is stored in thedcem_out object. It contains the parametersassociated with the clusters (Gaussian(s)). These parameters are namely- posterior probabilities, meu, sigma and priors. Paste the followingcode in the R session to access any/all the output parameters.

          [1] Posterior Probabilities: dcem_out$prob: A matrix of posterior-probabilities for the               points in the dataset.                        [2] Meu(s): dcem_out$meu                            For multivariate data: It is a matrix of meu(s). Each row in the                matrix corresponds to one meu.                            For univariate data: It is a vector if meu(s). Each element of the vector corresponds               to one meu.                        [3] Co-variance matrices                         For multivariate data: dcem_out$sigma: List of co-variance matrices.                        For univariate data: dcem_out$sigma: Vector of standard deviation(s).                         [4] Priors: dcem_out$prior: A vector of prior.                    [5] Membership: dcem_out$membership: A vector of cluster membership for data.

How to access the help (after installing the package)

?DCEM?dcem_test?dcem_star_train?dcem_train

[8]ページ先頭