272Accesses
Abstract
This paper presents a variational Bayes approach to a Lévy adaptive regression kernel (LARK) model that represents functions with an overcomplete system. In particular, we develop a variational inference method for a LARK model with multiple kernels (LARMuK) which estimates arbitrary functions that could have jump discontinuities. The algorithm is based on a variational Bayes approximation method with simulated annealing. We compare the proposed algorithm to a simulation-based reversible jump Markov chain Monte Carlo (RJMCMC) method using numerical experiments and discuss its potential and limitations.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.





Similar content being viewed by others
References
Bishop C (2006) Pattern recognition and machine learning. Springer, New York
Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877
Cerny V (1985) A thermodynamic approach to the traveling salesman problem: an efficient simulation. J Optim Theory Appl 45:41–51
Donoho DL, Johnstone JM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455
Du KL, Swamy MNS (2016) Search and optimization by metaheuristics: techniques and algorithms inspired by nature. Birkhäuser
Fan Y, Sisson SA (2011) Reversible jump Markov chain Monte Carlo. CRC Press, chap 3:67–87
Fox CW, Robert SJ (2012) A tutorial on variational Bayesian inference. Artie Intell Rev 38(2):85–95
German S, German D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Gijbels I, Goderniaux AC (2004) Bandwidth selection for changepoint estimation in nonparametric regression. Technometrics 46(1):76–86
Gijbels I, Lambert A, Qiu P (2007) Jump-preserving regression and smoothing using local linear fitting: a compromise. Ann Inst Stat Math 59(2):235
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732
Jaakkola TS, Jordan MI (2000) Bayesian parameter estimation via variational methods. Stat Comput 10:25–37
Jordan MI, Ghhramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Kang KH, Koo JY, Park CW (2000) Kernel estimation of discontinuous regression functions. Stat Probab Lett 47(3):277–285
Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Kochenderfer MJ, Wheeler TA (2019) Algorithms for optimization. The MIT Press, Cambridge
Koo JY (1997) Spline estimation of discontinuous regression functions. J Comput Graph Stat 6(3):266–284
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Lee Y, Mano S, Lee J (2020) Bayesian curve fitting for discontinuous functions using overcomplete system with multiple kernels. J Korean Stat Soc.https://doi.org/10.1007/s42952-019-00026-8
Luts J, Wand MP (2015) Variational inference for count response semiparametric regression. Bayesian Anal 10(4):991–1023
Muller HG (1992) Change-points in nonparametric regression analysis. Ann Stat pp 737–761
Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge
Ong VMH, Mensah DK, Nott DJ, Jo S, Park B, Choi T (2017) A variational Bayes approach to a semiparametric regression using Gaussian process priors. Electron J Stat 11:4258–4296
Ong VMH, Nott DJ, Smith MS (2018) Gaussian variational approximation with a factor covariance structure. J Compuat Gr Stat 27(3):456–478
Ormerod JT, Wand MP (2010) Explaining variational approximations. Am Statist 64(2):140–153
Ormerod JT, Wand MP (2011) Variational Bayesian inference for parametric and nonparametric regression with missing data. J Am Stat Assoc 106(495):959–971
Paisley J, Wang C, Blei DM (2012) The discrete infinite logistic normal distribution. Bayesian Anal 7(4):997–1034
Pillai NS (2008) Lévy random measures: posterior consistency and applications. Duke University
Qiu P (2003) A jump-preserving curve fitting procedure based on local piecewise-linear kernel estimation. J Nonpara Stat 15(4–5):437–453
Rannala B, Yang Z (2013) Improved reversible jump algorithms for Bayesian species delimitation. Genetics 194(1):245–253
Spiriti S, Eubank R, Smith PW, Young D (2013) Knot selection for least-squares and penalized splines. J Stat Comput Simul 83(6):1020–1036
Tan LSL, Nott DJ (2018) Gaussian variational approximation with sparse precision matrices. Stat Comput 28(2):259–275
Tan LSL, Ong VMH, Nott DJ, Jasra A (2017) Variational inference for sparse spectrum Gaussian process regression. Stat Comput 26(6):1243–1261
Tan LSL, Bhaskaran A, Nott DJ (2019) Conditionally structured variational gaussian approximation with importance weights.arXiv:1904.09591v1
Titterington DM (2004) Bayesian methods for neural networks and related models. Stat Sci 19:128–139
Tu C (2006) Bayesian nonparametric modeling using Levy process priors with applications for function estimation, time series modeling and spatio-temporal modeling. PhD thesis, Duke University
Wainwright M, Jordan M (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1:1–305
Waldmann E, Kneib T (2015) Variational approximations in geoadditive latent Gaussian regression: mean and quantile regression. Stat Comput 25:1247–1263
Wang B, Titterington DM (2005) Inadequacy of interval estimates corresponding to variational bayesian approximations. In: Cowell RG, Ghahramani Z (eds) In AISTATS05, Society for artificial intelligence and statistics, pp 373–380
Wolpert RL, Clyde MA, Tu C (2011) Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels. The Annals of Statistics pp 1916–1962
Zeiler MD (2012) Adadelta: an adaptive learning rate method.arXiv: 1212.5701
Author information
Authors and Affiliations
Samsung SDS, Seoul, South Korea
Youngseon Lee
Department of Statistics, Inha University, Incheon, South Korea
Seongil Jo
Department of Statistics, Seoul National University, Seoul, South Korea
Jaeyong Lee
- Youngseon Lee
You can also search for this author inPubMed Google Scholar
- Seongil Jo
You can also search for this author inPubMed Google Scholar
- Jaeyong Lee
You can also search for this author inPubMed Google Scholar
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Seongil Jo was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) grant funded by the Korea government(MIST) (No. 2020R1C1C1A01013338). Jaeyong Lee was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (Nos. 2018R1A2A3074973 and 2020R1A4A1018207)
Appendix
Appendix
1.1A. Evidence Lower Bound
We here provide the details of the evidence lower bound (ELBO) for the proposed variational approximation algorithm. Recall that we regard the number of featuresJ as a variational parameter. The ELBO of the LARMuK model described in Sect. 3.1 is
To derive the ELBO, we first evaluate each term of the right side in (16) as follows:
For the first term, we have
$$\begin{aligned} \mathbb {E}_q\log p(\mathbf{Y}|\varvec{\beta }, \varvec{\chi }, \varvec{\lambda }, \mathbf{c},\sigma ^2, J)\propto & {} \mathbb {E}_q \left[ -\frac{n}{2}\log (2\pi \sigma ^2)-\frac{1}{2\sigma ^2} \sum _{i=1}^n (y_i-\mathbf{g_i}^T \varvec{\beta })^2\right] \\\propto & {} \frac{n}{2} \mathbb {E}_q \log \frac{1}{\sigma ^2}-\frac{1}{2} \mathbb {E}_q \frac{1}{\sigma ^2} \sum _{i=1}^n \mathbb {E}_q (y_i-\mathbf{g_i}^T \varvec{\beta })^2\\\propto & {} \frac{n}{2} \left( \Psi \left( \frac{r_0}{2}\right) -\log \frac{r_0 R_0}{2}\right) \\&-\,\frac{1}{2 R_0} \sum _{i=1}^n \mathbb {E}_q (y_i-\mathbf{g_i}^T \varvec{\beta })^2, \end{aligned}$$where\(\mathbf{g_i} = (g_{c_1} (x_i, w_1), \ldots , g_{c_J} (x_i, w_J))^T\) and
$$\begin{aligned} \mathbb {E}_q \left[ \sum _{i=1}^n (y_i-\mathbf{g_i}^T \varvec{\beta })^2\right]= & {} \mathbb {E}_q \left[ \sum _{i=1}^n y_i^2-2\sum _{i=1}^n y_i (\mathbf{g_i}^T \varvec{\beta }) +\sum _{i=1}^n(\mathbf{g_i}^T \varvec{\beta })^2\right] \\= & {} \sum _{i=1}^ny_i^2 -2\sum _{i=1}^n y_i \mathbb {E}_q [\mathbf{g_i}^T \varvec{\beta }]+\sum _{i=1}^n\mathbb {E}_q [(\mathbf{g_i}^T \varvec{\beta })^2]\\= & {} \sum _{i=1}^ny_i^2 -2 \sum _{i=1}^n y_i \left[ \sum _{j=0}^{J_0}\mu _{0j} g_{0ij}\right] \\&+\,\sum _{i=1}^n\left[ \sum _{j=1}^{J_0}[\sigma _{0j}^2 g_{0ij}^2 +\mu _{0j}^2 g_{0ij}^2]+\sum _{l,j,l\ne j} [\mu _{0j}\mu _{0l} g_{0ij} g_{0il}]\right] \\= & {} \sum _{i=1}^n[y_i^2-2y_i (\mathbf{g_{0i}}^T {\varvec{\mu }}_0) +(\mathbf{g_{0i}}^T {\varvec{\mu }}_0)^2 + \sum _{j=1}^{J_0} \sigma _{0j}^2 g_{0ij}^2 ]\\= & {} \sum _{i=1}^n (y_i-\mathbf{g_{0i}}^T {\varvec{\mu }}_0)^2 + \sum _{j=1}^{J_0}\sum _{i=1}^n \sigma _{0j}^2 g_{0ij}^2. \end{aligned}$$For the second term, since the variational parameters\(\varvec{\beta }, \varvec{\chi }, \varvec{\lambda }, \mathbf{c}\), and\(\sigma ^2\) are determined independently under givenJ, we have
$$\begin{aligned}&\mathbb {E}_q\log p(\varvec{\beta },\varvec{\chi }, \varvec{\lambda }, \mathbf{c}, \sigma ^2, J)\\= & {} \mathbb {E}_q\log p(\varvec{\beta }|J)+\mathbb {E}_q\log p(\varvec{\chi }|J)+\mathbb {E}_q\log p(\varvec{\lambda }|J)+\mathbb {E}_q\log p(\mathbf{c}|J)\\&+ \mathbb {E}_q\log p(\sigma ^2) + \mathbb {E}_q\log p(J), \end{aligned}$$and the each term is evaluated as
$$\begin{aligned} \mathbb {E}_q \log p(\varvec{\beta }|J)= & {} \mathbb {E}_q \left[ -\frac{J_0}{2}\log (2\pi \sigma _\beta ^2)-\frac{1}{2\sigma _\beta ^2} \sum _{j=1}^{J_0} (\beta _j-0)^2\right] \\\propto & {} \frac{J_0}{2}\log \frac{1}{\sigma _\beta ^2}-\frac{1}{2\sigma _\beta ^2} \sum _{j=1}^{J_0} (\sigma _{0j}^2 +\mu _{0j}^2), \\ \mathbb {E}_q \log p(\varvec{\chi }|J)= & {} \sum _{j=1}^{J_0} \mathbb {E}_q \log p(\chi _j)\\= & {} \sum _{j=1}^{J_0} \mathbb {E}_q [\log I(0\le \chi _j \le 1)]\\= & {} \sum _{j=1}^{J_0} \log I(0\le \chi _{0j}\le 1) = 0, \\ \mathbb {E}_q \log p(\varvec{\lambda }|J)= & {} \sum _{j=1}^{J_0} \mathbb {E}_q \log p(\lambda _j)\\= & {} \sum _{j=1}^{J_0}\mathbb {E}_q [\log b_\lambda ^{a_\lambda }-\log \Gamma (a_\lambda )+(a_\lambda -1)\log \lambda _j - b_\lambda \lambda _j]\\\propto & {} \sum _{j=1}^{J_0} \left[ (a_\lambda -1) \mathbb {E}_q\log \lambda _j -b_\lambda \mathbb {E}_q\lambda _j\right] \\= & {} \sum _{j=1}^{J_0} \left[ (a_\lambda -1)\log \lambda _{0j}-b_\lambda \lambda _{0j}\right] \\= & {} (a_\lambda -1) \sum _{j=1}^{J_0}\log \lambda _{0j} -b_\lambda \sum _{j=1}^{J_0} \lambda _{0j},\\ \mathbb {E}_q \log p(\mathbf{c}|J)= & {} \sum _{j=1}^{J_0} \mathbb {E}_q \log p(c_j)\\\propto & {} \sum _{j=1}^{J_0} \mathbb {E}_q[I(c_j=0)\cdot \log p_0+ I(c_j=1)\cdot \log p_1 + I(c_j=2)\cdot \log p_2]\\\propto & {} \sum _{j=1}^{J_0}[\nu _{0j}\log p_0+\nu _{1j}\log p_1 +\nu _{2j}\log p_2]\\= & {} \log p_0 \sum _{j=1}^{J_0} \nu _{0j}+\log p_1 \sum _{j=1}^{J_0} \nu _{1j} +\log p_2 \sum _{j=1}^{J_0} \nu _{2j}, \\ \mathbb {E}_q \log p(\sigma ^2)= & {} \mathbb {E}_q \left[ \log \frac{r R}{2}^{r/2} -\log \Gamma \left( \frac{r}{2}\right) +\left( \frac{r}{2}-1\right) \log \frac{1}{\sigma ^2} - \frac{r R}{2\sigma ^2}\right] \\\propto & {} \left( \frac{r}{2}-1\right) \mathbb {E}_q \log \frac{1}{\sigma ^2}-\frac{r R}{2} \mathbb {E}_q \frac{1}{\sigma ^2}\\\propto & {} \left( \frac{r}{2}-1\right) \left( \Psi \left( \frac{r_0}{2}\right) -\log \frac{r_0 R_0}{2}\right) -\frac{r R}{2 R_0}, \\ \mathbb {E}_q\log p(J)= & {} \mathbb {E}_q [J\log M -M -\log J!]\\\propto & {} J_0\log M -\log J_0!. \end{aligned}$$Note that the joint entropy of random variables is calculated with the marginal and conditional entropies, i.e.,\(H_q(X|Y)=H_q(X,Y)-H_q(Y)\). Therefore, we need to compute the entropy of each variational parameter as
$$\begin{aligned} H_q(\varvec{\beta }|J)= & {} -\mathbb {E}_q\log q(\varvec{\beta }|J) =-\mathbb {E}_{q(J)}\mathbb {E}_{q(\varvec{\beta }|J)}\log q(\varvec{\beta }|J) \\= & {} -\mathbb {E}_{q(J)} \sum _{j=1}^J \left[ -\frac{1}{2}\log (2\pi \sigma _{0j}^2)-\frac{1}{2\sigma _{0j}^2} \mathbb {E}_q(\beta _j-\mu _{0j})^2\right] \\= & {} \sum _{j=1}^{J_0} \left[ \frac{1}{2}\log (2\pi \sigma _{0j}^2)+\frac{1}{2\sigma _{0j}^2}\mathbb {E}_q(\beta _j-\mu _{0j})^2\right] \\= & {} \sum _{j=1}^{J_0} \left[ \frac{1}{2} \log (2\pi \sigma _{0j}^2)+\frac{1}{2}\right] \\\propto & {} \frac{1}{2} \sum _{j=1}^{J_0} \log \sigma _{0j}^2, \\ H_q(\varvec{\chi }|J)= & {} -\mathbb {E}_{q(J)}\mathbb {E}_{q(\varvec{\chi }|J)}(\log q(\varvec{\chi }|J))=-\sum _{j=1}^{J_0}\log I(\chi _j=\chi _{0j})=0,\\ H_q(\varvec{\lambda }|J)= & {} -\mathbb {E}_{q(J)}\mathbb {E}_{q(\varvec{\lambda }|J)}(\log q(\varvec{\lambda }|J))=-\sum _{j=1}^{J_0}\log I(\lambda _j=\lambda _{0j})=0, \\ H_q(\mathbf{c}|J)= & {} -\mathbb {E}_q\log q(\mathbf{c}|J) =-\mathbb {E}_{q(J)}\mathbb {E}_{q(\mathbf{c}|J)}\log q(\mathbf{c}|J)\\= & {} -\sum _{j=1}^{J_0} \mathbb {E}_q (\log q(c_j))= -[\nu _{j0} \log \nu _{j0}+\nu _{j1} \log \nu _{j1}+\nu _{j2} \log \nu _{j2}]\\= & {} -\sum _{j=1}^{J_0} [\nu _{j0} \log \nu _{j0}+\nu _{j1} \log \nu _{j1}+\nu _{j2} \log \nu _{j2}],\\ H_q(\sigma ^2)= & {} -\mathbb {E}_q \left[ \log \left( \frac{r_0 R_0}{2}\right) ^{\frac{r_0}{2}} -\log \Gamma \left( \frac{r_0}{2}\right) +\left( \frac{r_0}{2}-1\right) \log \frac{1}{\sigma ^2} - \frac{r_0 R_0}{2\sigma ^2}\right] \\= & {} \log \Gamma \left( \frac{r_0}{2}\right) -\frac{r_0}{2} \log \frac{r_0 R_0}{2}-\left( \frac{r_0}{2}-1\right) \mathbb {E}_q \log \frac{1}{\sigma ^2} +\frac{r_0 R_0}{2}\mathbb {E}_q \frac{1}{\sigma ^2}\\= & {} \log \Gamma \left( \frac{r_0}{2}\right) -\frac{r_0}{2} \log \frac{r_0 R_0}{2}-\left( \frac{r_0}{2}-1\right) \left( \Psi \left( \frac{r_0}{2}\right) -\log \frac{r_0 R_0}{2}\right) +\frac{r_0}{2}\\= & {} \log \Gamma \left( \frac{r_0}{2}\right) -\log \frac{r_0 R_0}{2}-\left( \frac{r_0}{2}-1\right) \Psi \left( \frac{r_0}{2}\right) +\frac{r_0}{2},\\ H_q(J)= & {} -\mathbb {E}_q \log q(J)=-\log I(J=J_0)=0. \end{aligned}$$
In summary, the ELBO is given as
Rights and permissions
About this article
Cite this article
Lee, Y., Jo, S. & Lee, J. A variational inference for the Lévy adaptive regression with multiple kernels.Comput Stat37, 2493–2515 (2022). https://doi.org/10.1007/s00180-022-01200-z
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative