Hierarchical Multinomial Logit with SignConstraints

Introduction

bayesm’s posterior sampling functionrhierMnlRwMixture permits the imposition of signconstraints on the individual-specific parameters of a hierarchicalmultinomial logit model. This may be desired if, for example, theresearcher believes there are heterogenous effects from, say, price, butthat all responses should be negative (i.e., sign-constrained). Thisvignette provides exposition of the model, discussion of priorspecification, and an example.

Model

The model follows the hierarchical multinomial logit specificationgiven in Example 3 of the “bayesm Overview” Vignette, butwill be repeated here succinctly. Individuals are assumed to be rationaleconomic agents that make utility-maximizing choices. Utility is modeledas the sum of deterministic and stochastic components, where theinverse-logit of the probability of chosing an alternative is linear inthe parameters and the error is assumed to follow a Type I Extreme Valuedistribution:

\[ U_{ij} = X_{ij}\beta_i +\varepsilon_{ij}\hspace{0.8em} \text{with} \hspace{0.8em}\varepsilon_{ij}\ \sim \text{ iid Type I EV} \]

These assumptions yield choice probabilities of:

\[ \text{Pr}(y_i=j) = \frac{\exp\{x_{ij}'\beta_i\}}{\sum_{k=1}^p\exp\{x_{ik}'\beta_i\}}\]

$x_i$ is$n_i \times k$ and$i = 1, \ldots, N$. There are$p$ alternatives,$j = 1, \ldots, p$. An outside option,often denoted$j=0$ can be introducedby assigning$0$’s to that option’scovariate ($x$) values.

We impose sign constraints by defining a$k$-length constraint vectorSignRes that takes values from the set$\{-1, 0, 1\}$ to define$\beta_{ik} = f(\beta_{ik}^*)$ where$f(\cdot)$ is as follows:

\[ \beta_{ik} = f(\beta_{ik}^*) = \left\{ \begin{array}{lcl} \exp(\beta_{ik}^*) & \text{if} &\texttt{SignRes[k]} = 1 \\ \beta_{ik}^* & \text{if} & \texttt{SignRes[k]}= 0 \\ -\exp(\beta_{ik}^*) & \text{if} &\texttt{SignRes[k]} = -1 \\ \end{array} \right. \]

The “deep” individual-specific parameters ($\beta_i^*$) are assumed to be drawn from amixture of$M$ normal distributionswith mean values driven by cross-sectional unit characteristics$Z$. That is,$\beta_i^* = z_i' \Delta + u_i$ where$u_i$ has a mixture-of-normalsdistribution.¹

Considering$\beta_i^*$ alength-$k$ row vector, we will stackthe$N$$\beta_i^*$’s vertically and write:

\[ B=Z\Delta + U \] Thus we have$\beta_i$,$z_i$, and$u_i$ as the$i^\text{th}$ rows of$B$,$Z$, and$U$.$B$is$N \times k$,$Z$ is$N \timesM$,$\Delta$ is$M \times k$, and$U$ is$N \timesk$ where the distribution on$U$ is such that:

\[ \Pr(\beta_{ik}^*) = \sum_{m=1}^M \pi_m\phi(z_i' \Delta \vert \mu_j, \Sigma_j) \]

$\phi$ is the normal pdf.

Priors

Natural conjugate priors are specified:

\[ \pi \sim \text{Dirichlet}(a) \]\[ \text{vec}(\Delta) = \delta \simMVN(\bar{\delta}, A_\delta^{-1}) \]\[\mu_m \sim MVN(\bar{\mu}, \Sigma_m \otimes a_\mu^{-1}) \]\[ \Sigma_m \sim IW(\nu, V) \]

This specification of priors assumes that the$(\mu_m,\Sigma_m)$ are independent andthat, conditional on the hyperparameters, the$\beta_i$’s are independent.

$a$ implements prior beliefs onthe number of normal components in the mixture with a default of 5.$\nu$ is a “tightness” parameter ofthe inverted-Wishart distribution and$V$ is its location matrix. Without signconstraints, they default to$\nu=k+3$ and$V=\nu I$, which has the effect ofcentering the prior on$I$ and makingit “barely proper”.$a_\mu$ is atightness parameter for the priors on$\mu$, and when no sign constraints areimposed it defaults to an extremely diffuse prior of 0.01.

These defaults assume the logit coefficients ($\beta_{ik}$’s) are on the order ofapproximately 1 and, if so, are typically reasonable hyperprior values.However, when sign constraints are imposed, say,SignRes[k]=-1 such that$\beta_{ik} = -\exp\{\beta_{ik}^*\}$, thenthese hyperprior defults pile up mass near zero — a result that followsfrom the nature of the exponential function and the fact that the$\beta_{ik}^*$’s are on the log scale.Let’s show this graphically.

# define functiondrawprior<-function (mubar_betak, nvar, ncomp, a, nu, Amu, V, ndraw) {  betakstar<-double(ndraw)  betak<-double(ndraw)  otherbeta<-double(ndraw)  mubar<-c(rep(0, nvar-1), mubar_betak)for(iin1:ndraw) {    comps=list()for(kin1:ncomp) {      Sigma<-rwishart(nu,chol2inv(chol(V)))$IW      comps[[k]]<-list(mubar+t(chol(Sigma/Amu))%*%rnorm(nvar),backsolve(chol(Sigma),diag(1,nvar)) )    }    pvec<-rdirichlet(a)    beta<-rmixture(1,pvec,comps)$x    betakstar[i]<- beta[nvar]    betak[i]<--exp(beta[nvar])    otherbeta[i]<- beta[1]  }return(list(betakstar=betakstar,betak=betak,otherbeta=otherbeta))}set.seed(1234)

# specify rhierMnlRwMixture defaultsmubar_betak<-0nvar<-10ncomp<-3a<-rep(5, ncomp)nu<- nvar+3Amu<-0.01V<- nu*diag(c(rep(1,nvar-1),1))ndraw<-10000defaultprior<-drawprior(mubar_betak, nvar, ncomp, a, nu, Amu, V, ndraw)

# plot priors under defaultspar(mfrow=c(1,3))trimhist<--20hist(defaultprior$betakstar,breaks=40,col="magenta",main="Beta_k_star",xlab="",ylab="",yaxt="n")hist(defaultprior$betak[defaultprior$betak>trimhist],breaks=40,col="magenta",main="Beta_k",xlab="",ylab="",yaxt="n",xlim=c(trimhist,0))hist(defaultprior$otherbeta,breaks=40,col="magenta",main="Other Beta",xlab="",ylab="",yaxt="n")

We see that the hyperprior values for constrained logit parametersare far from uninformative. As a result,rhierMnlRwMixtureimplements different default priors for parameters when sign constraintsare imposed. In particular,$a_\mu=0.1$,$\nu = k + 15$, and$V = \nu*\text{diag}(d)$ where$d_i=4$ if$\beta_{ik}$ is unconstrained and$d_i=0.1$ if$\beta_{ik}$ is constrained. Additionally,$\bar{\mu}_m = 0$ if unconstrainedand$\bar{\mu}_m = 2$ otherwise. Asthe following plots show, this yields substantially less informativehyperpriors on$\beta_{ik}^*$ withoutsignificantly affecting the hyperpriors on$\beta_{ik}$ or$\beta_{ij}$ ($j\ne k$).

# adjust priors for constraintsmubar_betak<-2nvar<-10ncomp<-3a<-rep(5, ncomp)nu<- nvar+15Amu<-0.1V<- nu*diag(c(rep(4,nvar-1),0.1))ndraw<-10000tightprior<-drawprior(mubar_betak, nvar, ncomp, a, nu, Amu, V, ndraw)

# plot priors under adjusted valuespar(mfrow=c(1,3))trimhist<--20hist(tightprior$betakstar,breaks=40,col="magenta",main="Beta_k_star",xlab="",ylab="",yaxt="n")hist(tightprior$betak[tightprior$betak>trimhist],breaks=40,col="magenta",main="Beta_k",xlab="",ylab="",yaxt="n",xlim=c(trimhist,0))hist(tightprior$otherbeta,breaks=40,col="magenta",main="Other Beta",xlab="",ylab="",yaxt="n")

Example

Here we demonstrate the implementation of the hierarchicalmultinomial logit model with sign-constrained parameters. We return tothecamera data used in Example 3 of the“bayesm Overview” Vignette. This dataset contains conjointchoice data for 332 respondents who evaluated digital cameras. The dataare stored in a lists-of-lists format with one list per respondent, andeach respondent’s list having two elements: a vector of choices(y) and a matrix of covariates (X). Notice thedimensions: there is one value for each choice occasion in eachindividual’sy vector but one row per alternative in eachindividual’sX matrix, makingnrow(x) = 5$\times$length(y)because there are 5 alternatives per choice occasion.

library(bayesm)data(camera)length(camera)

## [1] 332

str(camera[[1]])

## List of 2##  $ y: int [1:16] 1 2 2 4 2 2 1 1 1 2 ...##  $ X: num [1:80, 1:10] 0 1 0 0 0 0 1 0 0 0 ...##   ..- attr(*, "dimnames")=List of 2##   .. ..$ : chr [1:80] "1" "2" "3" "4" ...##   .. ..$ : chr [1:10] "canon" "sony" "nikon" "panasonic" ...

As shown next, the first 4 covariates are binary indicators for thebrands Canon, Sony, Nikon, and Panasonic. These correspond to choice(y) values of 1, 2, 3, and 4.y can also takethe value 5, indicating that the respondent chose “none”. The datainclude binary indicators for two levels of pixel count, zoom strength,swivel video display capability, and wifi connectivity. The lastcovaritate is price, recorded in hundreds of U.S. dollars so that themagnitude of the expected price coefficient is such that the defaultprior settings inrhierMnlRwMixture do not need to beadjusted.

str(camera[[1]]$y)

##  int [1:16] 1 2 2 4 2 2 1 1 1 2 ...

str(as.data.frame(camera[[1]]$X))

## 'data.frame':    80 obs. of  10 variables:##  $ canon    : num  0 1 0 0 0 0 1 0 0 0 ...##  $ sony     : num  0 0 0 1 0 0 0 1 0 0 ...##  $ nikon    : num  1 0 0 0 0 0 0 0 1 0 ...##  $ panasonic: num  0 0 1 0 0 1 0 0 0 0 ...##  $ pixels   : num  0 1 0 0 0 1 1 1 1 0 ...##  $ zoom     : num  1 1 0 1 0 0 0 0 0 0 ...##  $ video    : num  0 0 0 0 0 0 1 1 1 0 ...##  $ swivel   : num  1 1 0 1 0 1 0 0 1 0 ...##  $ wifi     : num  0 1 1 0 0 0 1 1 1 0 ...##  $ price    : num  0.79 2.29 1.29 2.79 0 2.79 0.79 1.79 1.29 0 ...

Let’s say we would like to estimate the part-worths of the variousattributes of these digital cameras using a multinomial logit model. Toincorporate individual-level heterogeneous effects, we elect to use ahierarchical (i.e., random coefficient) specification. Further, webelieve that despite the heterogeneity, each consumer’s estimate priceresponse ($\beta_{i,\text{price}}$)should be negative, which we will impose with a sign constraint.Following the above discussion, we use the default priors, which“adjust” automatically when sign constraints are imposed.

SignRes<-c(rep(0,nvar-1),-1)data<-list(lgtdata=camera,p=5)prior<-list(mubar=mubar,Amu=Amu,ncomp=ncomp,a=a,nu=nu,V=V,SignRes=SignRes)mcmc<-list(R=1e4,nprint=0)out<-rhierMnlRwMixture(Data=data,Prior=prior,Mcmc=mcmc)

While much can be done to analyze the output, we will focus here onthe constrained parameters on price. We first plot the posteriordistributions for the price parameter for individuals$i=1,2,3$. Notice that the posteriordistributions for the selected individual’s price parameters lieentirely below zero.

par(mfrow=c(1,3))ind_hist<-function(mod, i) {hist(mod$betadraw[i ,10, ],breaks =seq(-14,0,0.5),col ="dodgerblue3",border ="grey",yaxt ="n",xlim =c(-14,0),xlab ="",ylab ="",main =paste("Ind.",i))}ind_hist(out,1)ind_hist(out,2)ind_hist(out,3)

Next we plot a histogram of the posterior means for the 332individual price paramters ($\beta_{i,\text{price}}$):

par(mfrow=c(1,1))hist(apply(out$betadraw[ ,10, ],1, mean),xlim =c(-20,0),breaks =20,col ="firebrick2",border ="gray",xlab ="",ylab ="",main ="Posterior Means for Ind. Price Params,             With Sign Constraint")

As a point of comparison, we re-run the model without the signconstraint using the default priors (output omitted) and provide thesame set of plots. Note now that the right tail of the posteriordistribution of$\beta_2^\text{price}$ extends to the rightof zero.

data0<-list(lgtdata = camera,p =5)prior0<-list(ncomp =5)mcmc0<-list(R =1e4,nprint =0)

out0<-rhierMnlRwMixture(Data = data0,Prior = prior0,Mcmc = mcmc0)

par(mfrow=c(1,3))ind_hist<-function(mod, i) {hist(mod$betadraw[i ,10, ],breaks =seq(-12,2,0.5),col ="dodgerblue4",border ="grey",yaxt ="n",xlim =c(-12,2),xlab ="",ylab ="",main =paste("Ind.",i))}ind_hist(out0,1)ind_hist(out0,2)ind_hist(out0,3)

par(mfrow=c(1,1))hist(apply(out0$betadraw[ ,10, ],1, mean),xlim =c(-15,5),breaks =20,col ="firebrick3",border ="gray",xlab ="",ylab ="",main ="Posterior Means for Ind. Price Params,             No Sign Constraint")

_ Last updated July 2019._

As documented in the helpfile for this function(accessible by?bayesm::rhierMnlRwMixture), draws from theposterior of the constrained parameters ($\beta$) can be found in the output$betadraw while draws from the posterior of theunconstrained parameters ($\beta^*$)are available in$nmix$compdraw.↩︎

Movatterモバイル変換

Hierarchical Multinomial Logit with SignConstraints

Introduction

Model

Priors

Example