Movatterモバイル変換

Type:

Package

Title:

Cluster Ordinal Data via Proportional Odds or Ordered Stereotype

Version:

1.3.4

Date:

2025-05-21

Maintainer:

Louise McMillan <louise.mcmillan@vuw.ac.nz>

Description:

Biclustering, row clustering and column clustering using the proportional odds model (POM), ordered stereotype model (OSM) or binary model for ordinal categorical data. Fernández, D., Arnold, R., Pledger, S., Liu, I., & Costilla, R. (2019) <doi:10.1007/s11634-018-0324-3>.

License:

GPL-3

URL:

https://vuw-clustering.github.io/clustord/

Depends:

R (≥ 3.5.0), stats, utils

Imports:

Rcpp (≥ 1.0.1), MASS, nnet, flexclust, methods

LinkingTo:

Rcpp, RcppArmadillo, RcppClock

Suggests:

knitr, formatR, rmarkdown, testthat (≥ 2.1.0), multgee,parallel

Encoding:

UTF-8

RoxygenNote:

7.3.2

VignetteBuilder:

knitr, formatR

NeedsCompilation:

yes

Packaged:

2025-05-26 23:09:00 UTC; LFM

Author:

Louise McMillan

[aut, cre, cph], Daniel Fernández Martínez

[aut], Ying Cui [aut], Eleni Matechou

[aut], W. N. Venables [ctb, cph] (clustord osm regression functions and S3 methods derived by Louise McMillan from MASS package polr function by Venables and Ripley), B. D. Ripley [ctb, cph] (clustord osm regression functions and S3 methods derived by Louise McMillan from MASS package polr function by Venables and Ripley)

Repository:

CRAN

Date/Publication:

2025-05-29 08:20:02 UTC

clustord: Clustering Using Proportional Odds Model, Ordered Stereotype Model or Binary Model.

Description

Biclustering, row clustering and column clustering using the proportionalodds model (POM), ordered stereotype model (OSM) or binary model for ordinalcategorical data.

Details

The clustord package provides six functions:clustord(),rerun(),mat2df(),calc.SE.rowcluster(),calc.SE.bicluster(), andcalc.cluster.comparisons().

Clustering function

The main function isclustord(), whichfits a clustering model to the data. The model is fitted usinglikelihood-based clustering via the EM algorithm. The package assumes thatyou started with a data matrix of responses, though you will need toconvert that data matrix into a long-form data frame before runningclustord. Every element in the original data matrix becomes onerow in the data frame, and the row and column indices from the data matrixbecome the columns ROW and COL in the data frame. You can performclustering on rows or columns of the data matrix, or biclustering on bothrows and columns simultaneously. You can include any number of covariatesfor rows and covariates for columns. Ordinal models used in the package areOrdered Stereotype Model (OSM), Proportional Odds Model (POM) and adedicated Binary Model for binary data.

Thererun() function is useful for continuing clustering runs that didnot converge on the first attempt, and for running new clustering runs usingthe estimated parameters of a previous run as a starting point. The maininput for this function is aclustord object output byclustord,and internally thererun function runsclustord, after settingup all the input parameters based on the original model fitting run.#'

Utility function

mat2df() is a utility function provided to convert a data matrix ofresponses into the long-form data frame format required byclustord(), and can also attach any covariates to that long-formdata frame if needed.

SE calculation functions

calc.SE.rowcluster() andcalc.SE.bicluster() are functions torun after runningclustord(), to calculate the standard errors onthe parameters fitted usingclustord().

Clustering comparisons

calc.cluster.comparisons() can be used to compare the assigned clustermemberships of the rows or columns of the data matrix from two differentclustering fits, in a way that avoids the label-switching problem.

Author(s)

Maintainer: Louise McMillanlouise.mcmillan@vuw.ac.nz (ORCID) [copyright holder]

Authors:

Daniel Fernández Martínezdaniel.fernandez.martinez@upc.edu (ORCID)
Ying Cuiying.cui@sms.vuw.ac.nz
Eleni Matechoue.matechou@kent.ac.uk (ORCID)

Other contributors:

W. N. Venables (clustord osm regression functions and S3 methods derived by Louise McMillan from MASS package polr function by Venables and Ripley) [contributor, copyright holder]
B. D. Ripley (clustord osm regression functions and S3 methods derived by Louise McMillan from MASS package polr function by Venables and Ripley) [contributor, copyright holder]

Calculate standard errors of clustering parameters.

Description

Calculate SE of parameters fitted usingclustord.

Usage

calc.SE.rowcluster(long.df, clust.out, optim.control = default.optim.control())calc.SE.bicluster(long.df, clust.out, optim.control = default.optim.control())

Arguments

long.df

The data frame, in long format, as passed toclustord.

clust.out

Aclustord object.

optim.control

control list for theoptim call within the M stepof the EM algorithm. See the control list Details in theoptimmanual for more info.

Details

Usecalc.SE.rowcluster to calculate SE for row clustering and columnclustering, orcalc.SE.bicluster to calculate SE for biclustering.

Calculates SE by runningoptimHess (seeoptim) onthe incomplete-data log-likelihood to find the hessian at the fitted parametervalues fromclustord.Then the square roots of the diagonal elements of the negative inverse of thehessian are the standard errors of the parametersi.e.SE <- sqrt(diag(solve(-optim.hess)).

Note that SE values areonly calculated for the independentparameters. For example, if the constraint on the row clustering parametersis set to constraint_sum_zero = TRUE, where the last row clustering parameteris the negative sum of the other parameters, SE values will only becalculated for the first RG-1 parameters, the independent ones. This appliessimilarly to individual column effect coefficients, etc.

The function requires an input which is the output ofclustord, which includes the componentoutvect, thefinal vector of independent parameter values from the EM algorithm, whichwill correspond to a subset of the parameter values inparlist.out.

Value

The standard errors corresponding to the elements ofclust.out$outvect.

Functions

calc.SE.rowcluster(): SE for rowclustering
calc.SE.bicluster(): SE for biclustering

Calculate comparison measures between two sets of clustering results

Description

Given two sets of posterior probabilities of membership for clusters,calculate three measures to compare the clustering memberships.

Usage

calc.cluster.comparisons(ppr1, ppr2)

Arguments

ppr1

Posterior probabilities of cluster membership, namedppr_morppc_m in the output ofclustord. If you haveperformed biclustering, thenppr1 should be the clustering resultsfor just one of the dimensions i.e. just the row clustering results, orjust the column clustering results. The rows ofppr1 give theentries that have been clustered, and each column corresponds to onecluster.

ppr2

Posterior probabilities of cluster membership from a differentclustering run, which will be compared toppr1.

Details

The three measures are the Adjusted Rand Index (ARI), the NormalisedVariation of Information (NVI) and the Normalised Information Distance (NID).

The three measures are documented in

Value

A list with components:

ARI: Adjusted Rand Index.

NVI: Normalised Variation of Information.

NID: Normalised Information Distance.

References

Fernández, D., & Pledger, S. (2016). Categorising count data intoordinal responses with application to ecological communities. Journal ofagricultural, biological, and environmental statistics (JABES), 21(2),348–362.

Likelihood-based clustering using Ordered Stereotype Models (OSM), ProportionalOdds Models (POM) or Binary Models

Description

Likelihood-based clustering with parameters fitted using the EM algorithm.You can perform clustering on rows or columns of a data matrix, or biclusteringon both rows and columns simultaneously. You can include any number ofcovariates for rows and covariates for columns.Ordinal models used in the package are Ordered Stereotype Model (OSM),Proportional Odds Model (POM) and a dedicated Binary Model for binary data.

Usage

clustord(  formula,  model,  nclus.row = NULL,  nclus.column = NULL,  long.df,  initvect = NULL,  pi.init = NULL,  kappa.init = NULL,  EM.control = default.EM.control(),  optim.method = "L-BFGS-B",  optim.control = default.optim.control(),  constraint_sum_zero = TRUE,  start_from_simple_model = TRUE,  parallel_starts = FALSE,  nstarts = 5,  verbose = FALSE)

Arguments

formula

model formula (see 'Details').

model

"OSM" for Ordered Stereotype Model or"POM" forProportional Odds Model or"Binary" for binary data model.

nclus.row

number of row clustering groups.

nclus.column

number of column clustering groups.

long.df

data frame with at least three columns,Y andROWandCOL. Each row in the data frame corresponds to a single cellin the original data matrix; the response value in that cell is given byY, and the row and column indices of that cell in the matrix aregiven byROW andCOL. Usemat2df to createthis data frame from your data matrix of responses.mat2df also allows you to supply data frames of row orcolumn covariates which will be incorporated intolong.df.

initvect

(default NULL) vector of starting parameter values for the model.Note: if the user enters an initial vector of parameter values, it isstrongly recommend that the user also check the values ofparlist.init in the output object, tomake sure that theconstructed parameters are as expected.

IfNULL, starting parameter values will be generated automatically.

See 'Details' for definitions of the parameters used for different models.

pi.init

(defaultNULL) starting parameter values for the proportionsof observations in the different row clusters.

IfNULL, starting values will be generated automatically.

User-specified values ofpi.init must be of length(nclus.row-1)because the final value will be automatically calculated so that thevalues ofpi sum to 1.

kappa.init

(defaultNULL) starting parameter values for theproportions of observations in the different column clusters.

IfNULL, starting values will be generated automatically.

User-specified values ofkappa.init must be of length(nclus.column-1) because the final value will be automaticallycalculated so that the values ofkappa sum to 1.

EM.control

(default =list(EMcycles=50, EMlikelihoodtol=1e-4, EMparamtol=1e-2, paramstopping=TRUE, startEMcycles=10, keepallparams=FALSE, epsilon=1e-6))list of parameters controlling the EM algorithm.

EMcycles controls how many EM iterations of the main EM algorithmare used to fit the chosen submodel.

EMlikelihoodtol is the tolerance for the stopping criterion forthelog-likelihood in the EM algorithm. The criterion is theabsolute change in theincomplete log-likelihood since theprevious iteration, scaled by the size of the datasetn*p, wheren is the number of rows in the data matrix andp is thenumber of columns in the data matrix. The scaling is applied because theincomplete log-likelihood is predominantly affected by the dataset size.

EMparamtol is the tolerance for the stopping criterion for theparameters in the EM algorithm. This is a tolerance for thesum of the scaled parameter changes from the last iteration,i.e. the tolerance is not for any individual parameter but for the sum ofchanges in all the parameters. Thus the default tolerance is 1e-2.The individual parameter criteria are the absolute differences betweenthe exponentiated absolute parameter value at the current timestep andthe exponentiated absolute parameter value at the previous timestep, as aproportion of the exponentiated absolute parameter value at the currenttimestep. The exponentiation is to rescale parameter values that areclose to zero.

there are around 5 independent parameter values, then at the point ofconvergence using default tolerances for the log-likelihood and theparameters, each parameter will have a scaled absolute change since theprevious iteration of about 1e-4; if there are 20 or 30 independentparameters, then each will have a scaled aboslute change of about 1e-6.

paramstopping: ifFALSE, indicates that the EM algorithmshould only check convergence based on the change in incomplete-datalog-likelihood, relative to the current difference between thecomplete-data and incomplete-data log-likelihoods, i.e.abs(delta_lli)/abs(llc[iter] - lli[iter]);ifTRUE, indicates that as well as checking the likelihoodcriterion, the EM algorithm should also check whether the relative changein the exponentials of the absolute values of the current parameters isbelow the toleranceEMstoppingpar, to see whether the parametersand the likelihood have both converged.

startEMcycles controls how many EM iterations are used whenfitting the simpler submodels to get starting values for fitting modelswith interaction.

keepallparams: ifTRUE, keep a record of parameter values(including pi_r and kappa_c) for every EM iteration.

rerunestepbeforelli: ifTRUE, and only when using biclustering,rerun the E-step before calculating the incomplete-data log-likelihood.The EM algorithm runs the E-step to estimate the cluster membershipprobabilities and then runs the M-step to estimate the parameters bymaximising the complete-data log-likelihood (LLC), and then recalculatesthe estimated incomplete-data log-likelihood (LLI) to check forconvergence. The biclustering LLI approximation uses the clustermembership probabilities so will be slightly more accurate if thesecluster membership probabilities are recalculated using the very latestparameter estimates. So theTRUE setting for this controlrecalculates the cluster memberships before calculating the LLI approx.

uselatestlli: Kept for backwards compatibility with originalversion of clustord algorithm. Original version of the algorithm had thisset toFALSE, which keeps the best LLI from any previous iterationrather than the latest LLI. The default,TRUE, instead keeps thelatest LLI even if there was a better LLI in a previous iteration. Thisis appropriat: In row clustering the exact LLI is used and the EMalgorithm creators proved the LLI should always not decrease with eachiteration. In biclustering the approximation to the LLI may be veryinaccurate when the algorithm is a long way from convergence, so evenwhen the value of LLI appears to be very high in early iterations thismay not be accurate, so it is better to take the latest value.

Forcolumnclustering, the parameters saved from each iterationwill NOT be converted to column clustering format, and will be in the rowclustering format, soalpha inEM.status$params.every.iteration will correspond to beta_c andpi will correspond to kappa.

epsilon: default 1e-6, small value used to adjust values of pi,kappa and theta that are too close to zero so that taking logs of themdoes not create infinite values.

optim.method

(default "L-BFGS-B") method to use in optim within the Mstep of the EM algorithm. Must be one of 'L-BFGS-B', 'BFGS', 'CG' or'Nelder-Mead' (i.e. not the SANN method).

optim.control

control list for theoptim call within the M stepof the EM algorithm. See the control list Details in theoptimmanual for more info.Please note that althoughoptim, by default, usespgtol=0andfactr=1e7 in the L-BFGS-B method,clustord, by default,alters these topgtol=1e-4 andfactr=1e11, but you can usethisoptim.control argument inclustord to revert them tothe defaults if you want. The reason for the change is that the chosenvalues inclustord reduce the tolerance on the log-likelihoodfunction optimization in order to speed up the algorithm, and because thelog-likelihood is on the scale of 1e4 for <100 rows in the data matrixand 1e6 for 5000 rows in the data matrix, tolerance at the defaultoptim scale is not as important as the choice of model type andstructure or the number of starting points. If one model is better thananother, it will probably have a likelihood that is better by about thesize of the data matrix, which is far larger than the tolerance in theoptimization. If one starting point is better than another, it willprobably have a likelihood that is better on about 1/10th or 1/100th thesize of the data matrix, which is still far larger than the tolerance inthe optimization.If you need accurate parameter estimates, firstly make sure to try morestarting points, then perform model selection first, and then finallyrerun the chosen model with finer tolerance, e.g. theoptimdefaults,pgtol=0 andfactr=1e7.

constraint_sum_zero

(defaultTRUE) ifTRUE, use constraintsthat cluster effects sum to zero; ifFALSE, use constraints thatthe cluster effect for the first cluster will be 0.Both versions have the same constraints for joint row-column clustereffects: these effects are described by a matrix of parameters gamma_rc,indexed by row cluster and column cluster indices, and the constraintsare that the final column of gamma_rc is equal to the negative sum of theother columns (sogamma columns sum to zero) and first row ofgamma_rc is equal to the negative sum of the other rows (sogammarows sum to zero).

start_from_simple_model

(defaultTRUE) ifTRUE, fit asimpler clustering model first and use that to provide starting values forall parameters for the model with interactions;ifFALSE, use the more basic models to provide starting values onlyforpi.init andkappa.init.If the full model has interaction terms, then simpler models are oneswithout the interactions. If the model has individual row/column effectsalongside the clusters, then simpler models are ones without the individualrow/column effects. If the full model has covariates, then simpler modelsare ones without the covariates (to get starting values for the clusterparameters), and ones with the covariates but no clustering (to getstarting values for the covariates).

parallel_starts

(default FALSE) if TRUE, by generating multiple randomstarts, those random starts will be parallelised over as many cores as areavailable. For example, on a personal computer this will be one fewer thanthe number of cores in the machine, to make sure one is left for systemtasks external to R.

nstarts

(default 5) number of random starts to generate, if generatingrandom starting points for the EM algorithm.

verbose

(defaultFALSE) changes how much is reported to theconsole during the algorithm's progress. IfTRUE, reports theincomplete-data log-likelihood at every EM algorithm iteration and thetrace information from nnet::multinom() during the process of fittinginitial values for themu parameters. IfFALSE, theincomplete log-likelihood is only reported every 10 iterations of the EMalgorithm and the initial fitting reporting is suppressed. Regardless ofthe verbosity setting the algorithm reports each of the random starts andwhenever it finds a better log-likelihood than all previous starts, and italso reports when it is fitting simpler models to find starting values forthe parameters vs fitting the final, more complex model.If wanting the detailed output fromoptim(), useoptim.control=list(trace=X), where X is 1 to 6, with 6 being thehighest level of verbosity for the L-BFGS-B algorithm.

Details

You can select your own input parameters, or starting values will begenerated by running kmeans or by fitting simpler models and feeding theoutputs into the final model as starting values.

The starting point for clustering is a data matrix of response values thatare binary or categorical. You may also have a data frame of covariates thatare linked to the rows of the data matrix, and may also have a data frame ofcovariates that are linked to the columns of the data matrix.

For example, if clustering data from fishing trawls, where the rows are trawlevents and columns are species caught, then you could also supply a gearcovariate linked to the rows, representing gear used on each trawl event, andcould additionally supply species covariates linked to the columns,representing auxiliary information about each species. There is norequirement to provide any covariates, and you can provide only rowcovariates, or only column covariates.

Before runningclustord, you need to runmat2df toconvert the data matrix into a long form data frame. The data frame needs tohave at least three columns,Y andROW andCOL. Each rowin the data frame corresponds to a single cell in the original data matrix;the response value in that cell is given byY, and the row and columnindices of that cell in the matrix are given byROW andCOL.

mat2df also allows you to supply data frames of row or columncovariates which will be incorporated intolong.df.

Then, to run theclustord function, you need to enter your chosenformula and model, and the number of clusters you want to fit. The formulamodel_structure is akin to that forglm, but with a few restrictions. Youcan include any number of covariates in the same way as for a multipleregression model, though unlike forglm, you can include both row andcolumn covariates.

Note that, unlikeglm, you should not specify afamilyargument; themodel argument is used instead.

formulaargument details

In the following description of different models, the Binary model is usedfor simplicity when giving the mathematical descriptions of the models, butyou can use any of the following models with the Ordered Stereotype orProportional Odds Models as well.

In theformula argument, the response must be exactlyY. Youcannot use any functions ofY as the response, nor can you includeY in any terms on the right hand side of the formula.Y is thename inclustord of the response values in the original data matrix.

Theformula argument has 4 special variables:ROWCLUST,COLCLUST,ROW andCOL. There are some restrictions onhow these can be used in the formula, as they are not covariates, but insteadact as indicators of the clustering model_structure you want to use.

All other variables in the formula will be any covariates that you want toinclude in the model, and these are unrestricted, and can be used in the sameway as inglm.

ROWCLUST andCOLCLUST are used to indicate what row clusteringmodel_structure you want, and what column clustering model_structure you want,respectively. The inclusion ofROWCLUST as a single term indicatesthat you want to include a row clustering effect in the model. In thesimplest row clustering model, for Binary data withrow clusteringeffects only, the basic function call would be

clustord(Y ~ ROWCLUST, model="Binary", long.df=long.df)

and the model fitted would have the form:

Logit(P(Y = 1)) = mu + rowc_coef_r

where mu is the intercept term, and rowc_coef_r is the row cluster effectthat will be applied to every row from the original data matrix that is amember of row cluster r. The inclusion ofROWCLUST corresponds to theinclusion of rowc_coef_r.

Note that we are not using notation involving greek letters, because (a) weran out of letters for all the different types of parameters in the model and(b) with this many parameters, it would be difficult to remember which onesare which.

Similarly to row clustering, the formulaY ~ COLCLUST would performcolumn clustering, with model Logit(P(Y = 1)) = mu + colc_coef_c,where colc_coef_c is the column cluster effect that will be applied to everycolumn from the original data matrix that is a member of column cluster c.

Including bothROWCLUST andCOLCLUST in the same formulaindicates that you want to perform biclustering, i.e. you want to cluster therows and the columns of the original data matrix simultaneously. If includedwithout interaction, then the terms just correspond to including rowc_coef_rand colc_coef_c in the model:

The formula

Y ~ ROWCLUST + COLCLUST

is the simplest possiblebiclustering model,Logit(P(Y = 1)) = mu + rowc_coef_r + colc_coef_c

If you want to include interaction between the rows and columns, i.e. youwant to perform block biclustering where each block corresponds to a rowcluster r and a column cluster c, then that model has a matrix of parametersindexed by r and c.

clustord(Y ~ ROWCLUST*COLCLUST, model="Binary", ...) has the modelLogit(P(Y = 1)) = mu + rowc_colc_coef_rc

This model can instead be called using the equivalent formulaY ~ ROWCLUST + COLCLUST + ROWCLUST:COLCLUST.

You can instead use the formulaY ~ ROWCLUST:COLCLUST. Mathematically,this is equivalent to the previous two. In regression, the models would notbe equivalent but in clustering, they are equivalent, and have the samenumber of independent parameters overall. If you include the main effects,then that reduces the number of independent parameters in the interactionterm compared to if you just use the interaction term (see below section aboutinitvect).

You cannot include just one of the main effects alongside the interactionterm, i.e. you cannot useY ~ ROWCLUST + ROWCLUST:COLCLUST orY ~ COLCLUST + ROWCLUST:COLCLUST. This is for simplicity in the code,and to avoid confusion when interpreting the results.

However,clustord allows a lot more flexibility than this. ThevariablesROW andCOL are used to indicate that you want toalso includeindividual row or column effects, respectively.

For example, if you are clustering binary data that indicates the presence/absence of different species (columns) at different trawl events (rows), andyou know that one particular species is incredibly common, then you caninclude column effects in the model, which will allow for the possibilitythat two columns may correspond to species with different probabilities ofappearing in the trawl.

You can add individual column effects along withrow clustering, or you can add individual row effects along with column clustering.The formula for row clustering with individual column effects (withoutinteraction) is

Y ~ ROWCLUST + COL

which corresponds to Binary model

Logit(P(Y = 1)) = mu + rowc_coef_r + col_coef_j

So if two cells from the data matrix are in the same row cluster, but indifferent columns, they will not have the same probability of Y = 1.

You can also add interaction between the individual row/column effects andthe clustering effects.

If you still want to be able to see the row cluster and column effectsseparately, then you useY ~ ROWCLUST*COL orY ~ ROWCLUST + COL + ROWCLUST:COL (these are both the same), whichhave model

Logit(P(Y = 1)) = mu + rowc_coef_r + col_coef_j + rowc_col_coef_rj

As before, rowc_coef_r and col_coef_j are the row cluster effects andindividual column effects, and rowc_col_coef_rj are the interaction terms.

Alternatively, you can use the mathematically-equivalent formula

Y ~ ROWCLUST:COL which has model

Logit(P(Y = 1)) = mu + rowc_col_coef_rj

where the row cluster effects and individual column effects are absorbed intothe matrix rowc_col_coef_rj. These models are the same mathematically, theonly differences between them are in how they are constrained (see below inthe section about theinitvect argument) and how they should beinterpreted.

Note that if you were using covariates, then it would not be equivalent toleave out the main effects and just use the interaction terms, but theclustering models don't work quite the same as regression models withcovariates.

Equivalently, if you want to cluster the columns, you can include individualrow effects alongside the column clusters, i.e.

Y ~ COLCLUST + ROW orY ~ COLCLUST + ROW + COLCLUST:ROW,

depending on whether you want the interaction terms or not.

You arenot able to include individual row effects with row clusters,or include individual column effects with column clusters, because there isnot enough information in ordinal or binary data to fit these models. As aconsequence, you cannot include individual row or column effects if you aredoing biclustering, e.g.

Y ~ ROWCLUST + COLCLUST + ROW orY ~ ROwCLUST + COLCLUST + COL

are not permitted.

From version 1 of the package, you can now also includecovariatesalongside the clustering patterns. The basic way to do this is include themas additions to the clustering model_structure. For example, including one rowcovariatexr to a row clustering model would have the formula

Y ~ ROWCLUST + xr

with Binary model Logit(P(Y = 1)) = mu + rowc_coef_r + row_coef_1*xr_i

where row_coef_1 is the coefficient of xr_i, just as in a typical regressionmodel.

Additional row covariates can also be included, and you can includeinteractions between them, and functions of them, as in regression models, e.g.

Y ~ ROWCLUST + xr1*log(xr2)

which would have the Binary model

Logit(P(Y = 1)) = mu + rowc_coef_r + row_coef1*xr1_i + row_coef2*log(xr2_i) +row_coef3*xr1_i*log(xr2_i)

If instead you want to add column covariates to the model, they work in thesame way after they've been added to thelong.df data frame usingmat2df, but they are indexed by j instead of i. Simplest model,with one single column covariate xc, would have formula

Y ~ ROWCLUST + xc

with Binary model Logit(P(Y = 1)) = mu + rowc_coef_r + col_coef1*xc_j

You can use any functions of or interactions between column covariates, justas with row covariates. You can similarly add row or column covariates tocolumn clustering or biclustering models.

You can includeinteractions between covariates andROWCLUSTorCOLCLUST in the formula. But these arenot quite the sameas interactions between covariates. The formula

Y ~ ROWCLUST*xr

wherexr is some row covariate, corresponds to the Binary model

Logit(P(Y = 1)) = mu + rowc_coef_r + cov_coef*xr_i + rowc_row_coef_r1*xr_i

What this means is that there is a term in the linear predictor that involvesthe row covariate xr (which has the index i because it is a row covariate),and each cluster (indexed by r) has a different coefficient forthat covariate (as distinct from the non-interaction covariate models above,which have the same coefficients for the covariates regardless of whichcluster the row is in).

This is different from interaction terms involving only covariates, where twoor more covariates appear multiplied together in the model and then have ashared coefficient term attached to them. In a clustering/covariateinteraction, the row or column clustering pattern controls the coefficientsrather than adding a different type of covariate.

Note that the pure cluster effect rowc_coef_r is also included in the modelautomatically, in the same way that a regression formulaY ~ x1*x2would include the individual x1 and x2 effects as well as the interactionbetween x1 and x2.

The coefficients for row clusters interacting with row coefficients are namedrow_cluster_row_coef in the output ofclustord because youcan also have coefficients for interactions between row clustering and columncovariates, or column clustering and row covariates, or column clustering andcolumn covariates. Row clustering interacting with column covariates wouldlook something like

Y ~ ROWCLUST*xc

with Binary model Logit(P(Y = 1)) = mu + rowc_coef_r + rowc_col_coef_r1*xc_j

The other combinations of clustering and covariates work similarly.rowc_col_coef_rl and the other similar coefficients have two indices.Their first index is the index of the cluster, and their second index is theindex of the covariate among the list of covariates interacting with thatdirection of clustering. So if there are two row covariatesxr1 andxr2 interacting with three row clusters, that gives you 6coefficients:

rowc_col_coef_11, rowc_col_coef_12,rowc_col_coef_21, rowc_col_coef_22,rowc_col_coef_31, rowc_col_coef_32.

and you can also have a three-way interaction between row cluster and thosetwo covariates, which would add the coefficientsrowc_col_coef_r3for thexr1:xr2 term.

You can instead add covariates that interact with column clusters, which willhave parameterscolc_row_coef_cm, wherem here indexes just thecovariates interacting with column cluster.

If you have covariates interacting with row clusters and other covariatesinteracting with column clusters, then you will have parametersrowc_cov_coef_rlandcolc_cov_coef_cm.

An example of this is the model

Y ~ ROWCLUST + xr1 + ROWCLUST:xr1 + xc1 + COLCLUST + COLCLUST:log(xc1)

This has main effects for row clusters and column clusters, i.e.ROWCLUST andCOLCLUST. It also has two covariate terms notinteracting with clusters,xr1 andxc1. It also has 1 covariateterm interacting with row clusters,xr1, with coefficientsrowc_cov_coef_r1, and 1 covariate term interacting with columnclusters,log(xc1), with coefficientscolc_cov_coef_c1.

Restrictions onformula

The primary restriction on theformula argument is that that youcannot use functions ofROW,COL,ROWCLUST orCOLCLUST, such aslog(ROW) or I(COLCLUST^2). That is becausethey are not covariates, and cannot be manipulated like that; instead, theyare indicators for particular elements of the clustering model_structure.

If performing biclustering, i.e. ifROWCLUST andCOLCLUST areboth in the model, and you want to include the interaction between them, thenyou can use the interaction between them on its own, or you can include bothmain effects, but you are not allowed to use just one main effect alongsidethe interaction. That is, you can use

Y ~ ROWCLUST + COLCLUST + ROWCLUST:COLCLUST orY ~ ROWCLUST*COLCLUST,

or you can useY ~ ROWCLUST:COLCLUST, and these two types ofbiclustering model will have different parameter constraints (see below underinitvect details), but youcannot use

Y ~ ROWCLUST + ROWCLUST:COLCLUST orY ~ COLCLUST + ROWCLUST:COLCLUST

As stated above, you also cannot include individual row effects alongsiderow clustering, and you cannot use individual column effects alongsidecolumn clustering, i.e. ifROWCLUST is in the formula, thenROWcannnot be in the formula, and ifCOLCLUST is in the formulathenCOLcannot be in the formula.

If you are includingCOL withROWCLUST, then you can includethe interaction between them but that is theonly permitted interactionterm that involvesCOL, and similarly the interaction betweenROW andCOLCLUST is theonly permitted interactionterm that involvesROW. But you can include those interactions in theform

Y ~ ROWCLUST + COL + ROWCLUST:COL or asY ~ ROWCLUST*COL, or asY ~ ROWCLUST:COL.

These are the only permitted uses of theCOL term, and there areequivalent constraints on the inclusion ofROW.

As stated above, you can include interactions betweenROWCLUST orCOLCLUST and covariates, but youcannot include three-wayinteractions betweenROWCLUST,COLCLUST and one or morecovariates arenot permitted inclustord, mostly becauseof the prohibitive number of parameter values that would need to be fitted,and the difficulty of interpreting such a model. That is, you cannot useformulae such asY ~ ROWCLUST*COLCLUST*xr, which would have Binarymodel Logit(P(Y = 1)) = mu + bi_cluster_row_coef_rc1*xr_i.

modelargument details

The three models available inclustord are the Binary model, whichis a Bernoulli model equivalent to the binary model in the packageclustglm, the Proportional Odds Model (POM) and the Ordered StereotypeModel (OSM).

Many Binary model examples have been given above, which have the generalform

logit(P(Y = 1)) = mu + <<linear terms>>

where the linear terms can include row or column clustering effects,individual row or column effects, and row or column covariates, with orwithout interactions with row or column clustering.

The Proportional Odds Model and the Ordered Stereotype Model have the samemodel_structure for the linear terms, but the overall model equation is different.

The Proportional Odds Model (model = "POM") has the form

logit(P(Y <= k)) = log(P(Y <= k)/P(Y > k)) = mu_k - <<linear terms>>

So the simplest POM for row clustering would be

logit(P(Y <= k)) = mu_k - rowc_coef_r

and the model including individual column effects and no interactions would be

logit(P(Y <= k)) = mu_k - rowc_coef_r - col_coef_j

Note that the linear-term coefficients have negative signs for theProportional Odds Models. This is so that as the row cluster index increases,or as the column index increases, Y is more likely to fall at higher values(see Ch4 of Agresti, 2010).

The Ordered Stereotype model (model = "OSM") has the form

log(P(Y = k)/P(Y = 1)) = mu_k + phi_k(<<linear terms>>)

So the simplest OSM for row clustering would be

log(P(Y = k)/P(Y = 1)) = mu_k + phi_k*rowc_coef_r

and the model including individual column effects and no interactions would be

log(P(Y = k)/P(Y = 1)) = mu_k + phi_k(rowc_coef_r + col_coef_j)

Note that the OSM isnot a cumulative logit model, unlike the POM.The model describes the log of the kth level relative to the first level,which is the baseline category, but the patterns for k = 2 may be differentthan the patterns for k = 3. They are linked, because the linear terms willbe the same, but they may not have the same shape. In this sense, the OSM ismore flexible/less restrictive than the POM.

See Anderson (1984) for the original definition of the ordered stereotypemodel, and see Fernández et al. (2016) for the application to clustering.

The phi_k parameters may be treated as "score" parameters. After fitting theOSM, the fitted phi_k values can give some indication of what the trueseparation is between the categories. Even if the default labelling of thecategories is from 1 to n, that doesn't mean that the categories are actuallyequally spaced in reality. But the fitted phi_k values from the OSM can betreated as data-derived numerical labels for the categories. Moreover, if twocategories have very similar fitted phi_k values, e.g. if phi_2 = 0.11 andphi_3 = 0.13, that suggests that there is not enough information in the datato distinguish between categories 2 and 3, and so you might as well mergethem into a single category to simplify the model-fitting process and theinterpretation of the results.

initvectargument details

Initvect is the vector of starting values for the parameters, made up ofsections for each different type of parameter in the model. Note that thelength of each section of initvect is the number ofindependentparameter values, not the overall number of parameter values of that type.

If you want to supply a vector of starting values for the EM algorithm, youneed to be careful how many values you supply, and the order in which youinclude them ininitvect, and you shouldCHECK the outputlist of parameters (which is the full set of parameter values, includingdependent ones, broken up into each type of parameter) to check that yourinitvect model_structure is correct for the formula you have specified.

For example, the number ofmu values will always be 1 fewer than thenumber of categories in the data, and the remaining value of mu is dependenton those q-1 values. In the OSM for data with 3 categories, the first valueof mu for category 1 will be 0, and then the other 2 values of mu forcategories 2 and 3 will be the independent values of mu. For the POM for datawith 5 categories, the first 4 values of mu will be the independent valuesand then the last value of mu is infinity, because the probability of Y beingin category 5 is defined as 1 minus the sum of the probabilities for theother 4 levels.

q is the number of levels in the values of y,n is the numberof rows in the original data matrix, andp is the number of columns inthe original data matrix.

For Binary,There is one independent value formu, i.e. q = 2.

Ignorephi, which is not used in the Binary model.

For OSM,The starting values formu_k are lengthq-1, and the modelhasmu_1 = 0 always, so the initvect values formu will becomemu_2,mu_3, etc. up tomu_q.

The starting values forphi_k are lengthq-2.

Note that the starting values forphi do not correspond directly tophi, becausephi is restricted to being increasing and between0 and 1, so instead the starting values are treated as elementsu[2:q-1] of a vectoru which can be between-Inf and+Inf, and then

phi[2] <- expit(u[2]) and

phi[k] <- expit(u[2] + sum(exp(u[3:k]))) for k between 3 and q-1

(phi[1] = 0 and phi[q] = 1).

For POM,The starting values formu_k are lengthq-1, but the startingvalues do not correspond directly tomu_k, becausemu_k isrestricted to being increasing, i.e. the model has to havemu_1 <=mu_2 <= ...mu_q =+Inf

So instead of using the initvect values directly formu_k, the 2nd to(q-1)th elements of initvect are used to constructmu_k as follows:

mu_1 <- initvect[1]

mu_2 <- initvect[1] + exp(initvect[2])

mu_3 <- initvect[1] + exp(initvect[2]) + exp(initvect[3])

... and so on up tomu_{k-1}, andmu_k is infinity, becauseit is not used directly to construct the probability of Y = q.

Thus the values that are used to constructmu_k can be unconstrained,which makes it easier to specify initvect and easier to optimize the parametervalues.

Ignorephi, which is not used in POM.

Forall three models,

The starting values forrowc_coef_r are lengthnclus.row-1,wherenclus.row is the number of row clusters. The final row clusterparameter is dependent on the others (see the input parameter info forconstraint_sum_zero), whereas if it were independent it would be colinearwith themu_k parameters and thus not identifiable.

Similarly the starting values forcolc_coef_c are lengthnclus.column-1, wherenclus.column is the number of columnclusters, to avoid problems of colinearity and nonidentifiability.

If you have biclustering with an interaction term between row clusters andcolumn clusters, then the number of independent values in the matrix ofinteraction terms depends on whether you include the main effects of row andcolumn clusters separately. That is, if you use the biclustering model

Y ~ ROWCLUST + COLCLUST + ROWCLUST:COLCLUST, or equivalently

Y ~ ROWCLUST*COLCLUST,

then the main effect termROWCLUST hasnclus.row-1 independentparameters ininitvect, andCOLCLUST hasnclus.column-1independent parameters ininitvect, andROWCLUST:COLCLUST willhave(nclus.row - 1)*(nclus.column - 1) independent parameter values.The final matrix of interaction terms will be constrained to have its lastrow equal to the negative sum of the other rows, and the last column equal tothe negative sum of the other columns.

On the other hand, if you want to use only the interaction term and not themain effects (which for the clustering model is mathematically equivalent),i.e.

Y ~ ROWCLUST:COLCLUST,

then that matrix of interaction terms will havenclus.row*nclus.column - 1independent parameters, i.e. more independent parameters than if you includedthe main effects.

If you have column effects alongside row clusters (they are not permittedalongside column clusters), without interactions, i.e. the formulaY ~ ROWCLUST + COL with Binary model Logit(P(Y = 1)) = mu +rowc_coef_r + col_coef_jthen the row cluster coefficients havenclus.row - 1 independentparameters, and the column effect coefficients havep - 1 independentparameters, where p is the number of columns in the original data matrix,i.e. the maximum value oflong.df$COL.

If you include the interaction term, then the number of independent parametersagain depends on whether you just use the interaction term, or include themain effects.

In the formulaY ~ ROWCLUST + COL + ROWCLUST:COL or its equivalent with"*", the interaction term will have(nclus.row - 1)*(p-1) independentparameters.

If you instead use the formulaY ~ ROWCLUST:COL, then the interactionterm will havenclus.row*p - 1 independent parameters. Either way, thetotal number of independent parameters in the model will benclus.row*p.

Similarly, if you have row effects alongside column clusters, withoutinteractions, i.e. the formula

Y ~ COLCLUST + ROW,

with Binary model Logit(P(Y = 1)) = mu + colc_coef_c + row_coef_i

then the column cluster coefficients will havenclus.column - 1independent parameters, and the row coefficients will haven-1independent parameters, where n is the number of rows in the original datamatrix, i.e. the maximum value oflong.df$ROW.

If you include the interaction term alongside the main effects, i.e.

Y ~ COLCLUST + ROW + COLCLUST:ROW, or its equivalent with "*", theinteraction term will have(nclus.column - 1)*(n-1) independentparameters.

If you instead use the formulaY ~ COLCLUST:ROW, that interactioncoefficient matrix will havenclus.column*n - 1 independent parameters.

Any covariate terms included in the formula will be split up byclustord into the covariates that interact with row clusters, thecovariates that interact with column clusters, and the covariates that do notinteract with row or column clusters.

The number of independent parameters for row-cluster-interacting covariateswill benclus.row*L, whereL is the number of terms involvingrow clusters and covariates after any "*" terms have been expanded.

So in this formula, for example,

Y ~ ROWCLUST*xr1 + xr2 + ROWCLUST:log(xc1)

where xr1 and xr2 are row covariates, and xc1 is a column covariate, the fullyexpanded formula would be

Y ~ ROWCLUST + xr1 + xr2 + ROWCLUST:xr1 + ROWCLUST:log(xc1)

and the terms interacting withROWCLUST would beROWCLUST:xr1andROWCLUST:log(xc1), so there would benclus.row*2independent coefficients for those covariates.

The number of independent parameters for column-cluster-interactingcovariates will benclus.column*M, whereM is the number ofterms involving column clusters and covariates after any "*" terms have beenexpanded.

So this formula, for example,

Y ~ I(xr1^2) + COLCLUST*xc1 + COLCLUST:xc2:xc3 + COLCLUST*xr1

would be expanded as

Y ~ COLCLUST + xr1 + I(xr1^2) + xc1 + COLCLUST:xc1 + COLCLUST:xc2:xc3 + COLCLUST:xr1

and the terms interacting withCOLCLUST would beCOLCLUST:xc1,COLCLUST:xc2:xc3 andCOLCLUST:xr1, so there would benclus.column*3 independent coefficients for those covariates.

The number of independent parameters for covariates that do not interact withrow or column clusters will be the same as the number of those covariate terms,after any "*" terms have been expanded.

So this formula, for example,

Y ~ ROWCLUST*xr1 + xr2 + ROWCLUST:log(xc1) + COLCLUST*xc1

would be expanded as

Y ~ ROWCLUST +COLCLUST + xr1 + xr2 + xc1 + ROWCLUST:xr1 + ROWCLUST:log(xc1) + COLCLUST:xc1,

so there would be 3 independent coefficients for the termsxr1, xr2, xc1.

Note that there areno intercept terms for the coefficients,because those are incorporated into the parametersmu_k.

Theorder of theinitvectentries is as follows, andany entries that are not included in the formula will be ignored and notincluded ininitvect. That is, you should NOT provide values ininitvect for components that are not included in the formula.

1) mu (or values used to construct mu, POM only)2) values used to construct phi (OSM only)3) row cluster coefficients4) column cluster coefficients5) [matrix] bicluster coefficients (i.e. interaction between row and column clusters)6) individual row coefficients7) individual column coefficients8) [matrix] interactions between row clusters and individual column coefficients9) [matrix] interactions between column clusters and individual row coefficients10) [matrix] row-cluster-specific coefficients for covariates interacting with row clusters11) [matrix] column-cluster-specific coefficients for covariates interacting with column clusters12) coefficients for covariates that do not interact with row or column clusters

Any entries marked as [matrix] will be constructed into matrices by fillingthose matrices row-wise, e.g. if you want starting values 1:6 for a matrix of2 row clusters and 3 covariates interacting with those row clusters, thematrix of coefficients will become

1 2 34 5 6

For the formulaY ~ ROWCLUST*COLCLUST, where the matrix of interactionsbetween row and column clusters has(nclus.row - 1)*(nclus.column - 1)independent parameters, the last row and column of the matrix will be thenegative sums of the rest, so e.g. if you have 2 row clusters and 3 columnclusters, there will only be 2 independent values, so if you provide thestarting values -0.5 and 1.2, the final matrix of parameters will be:

column cluster 1 column cluster 2 column cluster 3row cluster 1 -0.5 1.2 -0.7row cluster 2 0.5 -1.2 0.7

If the matrix is a matrix relating to row clusters, then the row clusters arein the rows, and if it's a matrix relating to column clusters but not rowclusters, then the column clusters are in the rows, i.e. the matrix ofcoefficients for column clusters interacting with individual row effects willhave the rows of the matrix corresponding to the clusters, i.e. the matrixwould be indexed colc_row_coef_ci, c being the column cluster index and ibeing the row index.

Similarly, if the matrix is a matrix relating to column clusters and covariates,then the rows of the matrix will correspond to the column clusters, i.e. thematrix would be indexed colc_cov_coef_cl, c being the column cluster indexand l being the covariate index.

If using biclustering with interaction between row and column clusters, thenthe row clusters will be the rows and the column clusters will be the columns,i.e. the matrix would be indexed rowc_colc_coef_rc, r being the row clusterindex and c being the column cluster index.

Value

Aclustord object, i.e. a list with components:

info: Basic info n, p, q, the number of parameters, the number ofrow clusters and the number of column clusters, where relevant.

model: The model used for fitting, "OSM" for Ordered StereotypeModel, "POM" for Proportional Odds Model, or "Binary" for Binary model.

EM.status: a list containing the latest iterationiter,latest incomplete-data and complete-data log-likelihoodsnew.lliandnew.llc, the best incomplete-data log-likelihoodbest.lliand the corresponding complete-data log-likelihood,llc.for.best.lli,and the parameters for the best incomplete-data log-likelihood,params.for.best.lli, indicator of whether the algorithm convergedconverged, and if the user chose to keep all parameter values fromevery iteration, alsoparams.every.iteration.

Note that forbiclustering, i.e. whenROWCLUST andCOLCLUST are both included in the model, theincompletelog-likelihood is calculated using the entropy approximation, and thismay beinaccurate unless the algorithm has converged or is closeto converging. So beware of using the incomplete log-likelihood and thecorresponding AIC valueunless the EM algorithm has converged.

criteria: the calculated values of AIC, BIC,etc. from the best incomplete-data log-likelihood.

epsilon: the very small value (default 1e-6) used to adjust valuesof pi and kappa and theta that are too close to zero, so that taking logsof them does not produce infinite values. Use the EM.control argument toadjust epsilon.

constraints_sum_zero: the chosen value of constraints_sum_zero.

param_lengths: vector of total number of parameters/coefficientsfor each part of the model, labelled with the names of the components.The value is 0 for each component that is not included in the model, e.g.if there are no covariates interacting with row clusters then therowc_cov_coef value will be 0. If the component is included, thenthe value given will include any dependent parameter/coefficient values,so if column clusters are included then thecolc_coef value willbenclus.column, whereas the number of independent values will benclus.column - 1.

initvect: the initialvector of parameter values, eitherspecified by the user or generated automatically. This vector has onlytheindependent values of the parameters, not the full set.

outvect: the finalvector of parameter values, containingonly the independent parameter values fromparlist.out.

parlist.init: the initial list of parameters, constructed fromthe initial parameter vectorinitvect. Note that if the initialvector has been incorrectly specified, the values ofparlist.initmay not be as expected, and they should be checked by the user.

parlist.out: fitted values of parameters.

pi,kappa: fitted values of pi and kappa, where relevant.

ppr,ppc: the posterior probabilities of membership of therow clusters and the column clusters, where relevant.

rowc_mm,colc_mm,cov_mm: the model matrices for,respectively, the covariates interacting with row clusters, the covariatesinteracting with column clusters, and the covariates not interacting withrow or column clusters (i.e. the covariates with constant coefficients).Note that one row of each model matrix corresponds to one row of long.df.

RowClusters,ColumnClusters: the assigned row and columnclusters, where relevant, where each row/column is assigned to a clusterbased on maximum posterior probability of cluster membership (pprandppc).

RowClusterMembers,ColumnClusterMembers: vectors ofassigned members of each row or column cluster, where each row/column isassigned to a cluster based on maximum posterior probability of clustermembership (ppr andppc)

References

Fernandez, D., Arnold, R., & Pledger, S. (2016). Mixture-based clustering for the ordered stereotype model.Computational Statistics & Data Analysis, 93, 46-75.

Anderson, J. A. (1984). Regression and ordered categorical variables.Journal of the Royal Statistical Society: Series B (Methodological), 46(1), 1-22.

Agresti, A. (2010).Analysis of ordinal categorical data (Vol. 656). John Wiley & Sons.

Examples

set.seed(1)long.df <- data.frame(Y=factor(sample(1:3,5*20,replace=TRUE)),               ROW=factor(rep(1:20,times=5)),COL=rep(1:5,each=20))# Model Log(P(Y=k)/P(Y=1))=mu_k+phi_k*rowc_coef_r with 3 row clustering groups:clustord(Y~ROWCLUST,model="OSM",3,long.df=long.df,             EM.control=list(EMcycles=2,startEMcycles=2), nstarts=2)# Model Log(P(Y=k)/P(Y=1))=mu_k+phi_k*(rowc_coef_r + col_coef_j) with 3 row clustering groups:clustord(Y~ROWCLUST+COL,model="OSM",3,long.df=long.df,             EM.control=list(EMcycles=2,startEMcycles=2), nstarts=2)# Model Logit(P(Y <= k))=mu_k-rowc_coef_r-col_coef_j-rowc_col_coef_rj with 2 row clustering groups:clustord(Y~ROWCLUST*COL,model="POM",nclus.row=2,long.df=long.df,             EM.control=list(EMcycles=2,startEMcycles=2), nstarts=2)# Model Log(P(Y=k)/P(Y=1))=mu_k+phi_k*(colc_coef_c) with 3 column clustering groups:clustord(Y~COLCLUST,model="OSM",nclus.column=3,long.df=long.df,             EM.control=list(EMcycles=2,startEMcycles=2), nstarts=2)# Model Log(P(Y=k)/P(Y=1))=mu_k+phi_k*(colc_coef_c + row_coef_i) with 3 column clustering groups:clustord(Y~COLCLUST+ROW,model="OSM",nclus.column=3,long.df=long.df,             EM.control=list(EMcycles=2,startEMcycles=2), nstarts=2)# Model Log(P(Y=k)/P(Y=1))=mu_k+phi_k*(rowc_coef_r + colc_coef_c)#    with 3 row clustering groups and 2 column clustering groups:clustord(Y~ROWCLUST+COLCLUST,model="OSM",nclus.row=3,nclus.column=2,long.df=long.df,             EM.control=list(EMcycles=2), nstarts=1)# Model Logit(P(Y<=k))=mu_k-rowc_coef_r-colc_coef_c-rowc_colc_coef_rc#    with 2 row clustering groups and 4 column clustering groups, and#    interactions between them:clustord(Y~ROWCLUST*COLCLUST, model="POM", nclus.row=2, nclus.column=4,             long.df=long.df,EM.control=list(EMcycles=2), nstarts=1,             start_from_simple_model=FALSE)

Converting matrix of responses into a long-form data frame and incorporatingcovariates, if supplied.

Description

Converting matrix of responses into a long-form data frame and incorporatingcovariates, if supplied.

Usage

mat2df(mat, xr.df = NULL, xc.df = NULL)

Arguments

mat

matrix of responses to be clustered

xr.df

optional data frame of covariates corresponding to the rows ofmat. Each row ofxr.df corresponds to one row ofmat,and each column ofxr.df is a covariate.

xc.df

optional data frame of covariates corresponding to the columnsofmat. Each row ofxc.df corresponds to onecolumnofmat, and each column ofxc.df is a covariate.

Value

A data frame with columnsY,ROW andCOL, andadditional columns for covariates fromxr.df andxc.df, ifincluded.

TheY column of the output contains the entries inmat, withone row in the output per one cell inmat, and theROW andCOL entries indicate the row and column of the data matrix thatcorrespond to the given cell. Any cells that were NA are left out of theoutput data frame.

Ifxr.df is supplied, then there are additional columns in theoutput corresponding to the columns ofxr.df, and the values foreach covariate are repeated for every entry that was in the correspondingrow of the data matrix.

Similarly, ifxc.df is supplied, there are additional columns inthe output corresponding to the columns ofxc.df, and the valuesfor each covariate are repeated for every entry that was in thecorresponding column of the data matrix.

Ordinal data regression using the Ordered Stereotype Model (OSM).

Description

Fit a regression model to an ordered factor response. The model is NOT alogistic or probit model because the link function is not the logit, but thelink function is log-based.

Usage

osm(  formula,  data,  weights,  start,  ...,  subset,  na.action,  Hess = FALSE,  model = TRUE)

Arguments

formula

a formula expression as for regression models, of the formresponse ~ predictors. The response should be a factor (preferably anordered factor), which will be interpreted as an ordinal response, withlevels ordered as in the factor. The model must have an intercept: attemptsto remove one will lead to a warning and be ignored. An offset may be used.See the documentation of formula for other details.

data

a data frame, list or environment in which to interpret thevariables occurring informula.

weights

optional case weights in fitting. Default to 1.

start

initial values for the parameters. See the Details section forinformation about this argument.

...

additional arguments to be passed to optim, most often a controlargument.

subset

expression saying which subset of the rows of the data shouldbe used in the fit. All observations are included by default.

na.action

a function to filter missing data.

Hess

logical for whether the Hessian (the observed information matrix)should be returned.

model

logical for whether the model matrix should be returned.

Details

This function should be used in a very similar way toMASS::polr, andsome of the arguments are the same aspolr, but the ordinal model usedhere is less restrictive in its assumptions than the proportional odds model.However, it is still parsimonious i.e. it uses only a small number ofadditional parameters compared with the proportional odds model.

This model is theordered stereotype model (Anderson 1984,Agresti 2010)

It is more flexible than the proportional odds model but only adds a handfulof additional parameters. It is not a cumulative model, being instead definedin terms of the relationships between each of the higher categories and thelowest category that is treated as the reference category.

Each of the higher categories has its own intercept term, mu_k, which issimilar to the zeta parameters inpolr, but in the OSM each highercategory also has its own scaling parameter, phi_k, which adjusts the effectof the covariates on the response. This allows the effect of the covariateson the response to be slightly different for each category of the response,thus making the model more flexible than the proportional odds model.

The final set of parameters are coefficients for each of the covariates, andthese are equivalent to the coefs inpolr. Higher or more positivevalues of the coefficients increases the probability of the response being inthe higher categories, and lower or more negative values of the coefficientsincrease the probability of the response being in the lower categories.

The overall model takes the following form:

log(P(Y = k | X)/P(Y = 1 | X)) = mu_k + phi_k*beta_vec^T x_vec

for k = 2, ..., q, where x_vec is the vector of covariates for theobservation Y.

mu_1 is fixed at 0 for identifiability of the model, and the phi_k parametersare constrained to be ordered (giving the model its name) in the followingway:

0 = phi_1 <= phi_2 <= ... <= phi_k <= ... <= phi_q = 1.

(The unordered stereotype model restricts phi_1 and phi_q but allows theremaining phi_k to be in any order, and this is suitable for fitting themodel for nominal data. However, this package does not provide that option,as it is already available in other packages which can fit the stereotypemodel.)

After fitting the model, the estimated values of the intermediate phi_kvalues indicate a suitable numerical spacing of the ordinal responsecategories that is based on the data. The spacings indicate how much distinctinformation each of the corresponding levels provide. For example, if youhave five response categories and the fitted phi values are(0, 0.04,0.6, 0.62, 1) then this indicates that levels 1 and 2 provide very similarinformation about the effect of the covariates on the response, and levels 3and 4 provide very similar information as each other. The meaning of this isthat you could simplify the response by combining levels 1 and 2 andcombining levels 3 and 4 (i.e. reduce the levels to 1, 3 and 5) and you wouldstill be able to estimate the beta coefficients with similar accuracy.

Another use for the phi_k values is that if you want to carry out furtheranalysis of the response, treating it as a numerical variable, then the phivalues are a better choice of numerical values for the response categoriesthan the default values 1 to q.

start argument values:start is a vector of startvalues for estimating the model parameters.

The first part of thestart vector is starting values for thecoefficients of the covariates, the second part is starting values for the muvalues (per-category intercepts), and the third part is starting values forthe raw parameters used to construct the phi values.

The length of the vector is [number of covariate terms] + [number ofcategories in response variable - 1] + [number of categories in responsevariable - 2]. Every one of the values can take any real value.

The second part is the starting values for the mu_k per-category interceptparameters, and since mu_1 is fixed at 0 for identifiability, the number ofnon-fixed mu_k parameters is one fewer than the number of categories.

The third part of the starting vector is a re-parametrization used toconstruct starting values for the estimated phi parameters such that the phiparameters observe the ordering restriction of the ordered stereotype model,but the raw parameters are not restricted which makes it easier to optimiseover them. phi_1 is always 0 and phi_q is always 1 (where q is the number ofresponse categories). If the raw parameters are u_2 up to u_(q-1), then phi_2is constructed as expit(u_2), phi_3 is expit(u_2 + exp(u_3)), phi_4 isexpit(exp(u_3) + exp(u_4)) etc. which ensures that the phi_k values arenon-decreasing.

This code was adapted from file MASS/R/polr.Rcopyright (C) 1994-2013 W. N. Venables and B. D. RipleyUse of transformed intercepts contributed by David FirthThe osm and osm.fit functions were written by Louise McMillan, 2020.

This program is free software; you can redistribute it and/or modify it underthe terms of the GNU General Public License as published by the Free SoftwareFoundation; either version 2 or 3 of the License (at your option).

This program is distributed in the hope that it will be useful, but WITHOUTANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESSFOR A PARTICULAR PURPOSE. See the GNU General Public License for moredetails.

A copy of the GNU General Public License is available athttp://www.r-project.org/Licenses/

Value

An object of class"osm". This has components

beta the coefficients of the covariates, with NO intercept.

mu the intercepts for the categories.

phi the score parameters for the categories (restricted to beordered).

deviance the residual deviance.

fitted.values a matrix of fitted values, with a column for eachlevel of the response.

lev the names of the response levels.

terms theterms structure describing the model.

df.residual the number of residual degrees of freedom, calculatedusing the weights.

edf the (effective) number of degrees of freedom used by the model

n, nobs the (effective) number of observations, calculated using theweights.

call the matched call.

convergence the convergence code returned byoptim.

niter the number of function and gradient evaluations used byoptim.

eta

Hessian (ifHess is true). Note that this is a numericalapproximation derived from the optimization proces.

model (ifmodel is true), the model used in the fitting.

na.action the NA function used

xlevels factor levels from any categorical predictors

References

Agresti, A. (2010).Analysis of ordinal categorical data (Vol. 656). John Wiley & Sons.

Anderson, J. A. (1984). Regression and ordered categorical variables.Journal of the Royal Statistical Society: Series B (Methodological), 46(1), 1-22.

Rerun clustord using the results of a previous run as the starting point.

Description

This function is designed for two purposes.(1) You tried to run clustord and the results did not converge. You can supplythis function with the previous results and the previous data object, and itwill carry on running clustord from the endpoint of the previous run, whichis quicker than starting the run again from scratch with more iterations.

Usage

rerun(  results.original,  long.df,  EM.control = NULL,  verbose = FALSE,  optim.control = NULL)

Arguments

results.original

The results of the previous run that you want to useas a starting point. The model, number of clusters, and final parametervalues will be used, and the cluster controls such as EMcycles will bereused unless the user specifies new values. But the row cluster and/orcolumn cluster memberships will NOT be reused, and nor will the dataset,so you can change the dataset slightly and the rest of the details willbe applied to this new dataset.

long.df

The dataset to use for this run, which may be slightlydifferent to the original. Please note that the only compatibility checkperformed is comparing the sizes of the original and new datasets, and itis up to the user to check that the new dataset is sufficiently similarto the old one.

EM.control

Options to use for this run such as EMcycles (number of EMiterations). Note that "startEMcycles" will not be relevant as this runwill not generate random starts, it will run from the end parameters ofthe other run. Seeclustord documentation for more info.

verbose

(defaultFALSE) changes how much is reported to theconsole during the algorithm's progress. Seeclustorddocumentation for more info.

optim.control

Options to use for this run withinoptim(), whichis used to estimate the parameters during each M-step. Seeclustorddocumentation for more info.

Details

(2) The previous result converged, but you have changed the dataset slightly,and want to rerun from the previous endpoint to save time.

Either way, you call the function in the same way, supplying the previousresults object and a dataset, and optionally a new number of iterations('EM.control=list(EMcycles=XXX)', where 'XXX' is the new number of iterations.)

The output parameters of the old result will be used as the new initialparameters.

Value

An object of classclustord. Seeclustord for more info.

Examples

set.seed(1)long.df <- data.frame(Y=factor(sample(1:3,5*20,replace=TRUE)),               ROW=factor(rep(1:20,times=5)),COL=rep(1:5,each=20))results.original <- clustord(Y ~ ROWCLUST, model="OSM", nclus.row=4,                             long.df=long.df, EM.control=list(EMcycles=2))results.original$EM.status$converged# FALSE## Since original run did not converge, rerun from that finishing point and## allow more iterations this timeresults.new <- rerun(results.original, long.df, EM.control=list(EMcycles=10))## Alternatively, if dataset has changed slightly then rerun from the## previous finishing point to give the new results a helping handlong.df.new <- long.df[-c(4,25,140),]results.new <- rerun(results.original, long.df.new)

Movatterモバイル変換

clustord: Clustering Using Proportional Odds Model, Ordered Stereotype Model or Binary Model.

Description

Details

Clustering function

Utility function

SE calculation functions

Clustering comparisons

Author(s)

See Also

Calculate standard errors of clustering parameters.

Description

Usage

Arguments

Details

Value

Functions

Calculate comparison measures between two sets of clustering results

Description

Usage

Arguments

Details

Value

References

Likelihood-based clustering using Ordered Stereotype Models (OSM), ProportionalOdds Models (POM) or Binary Models

Description

Usage

Arguments

Details

Value

References

Examples

Converting matrix of responses into a long-form data frame and incorporatingcovariates, if supplied.

Description

Usage

Arguments

Value

Ordinal data regression using the Ordered Stereotype Model (OSM).

Description

Usage

Arguments

Details

Value

References

See Also

Rerun clustord using the results of a previous run as the starting point.

Description

Usage

Arguments

Details

Value

Examples