Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Variational autoencoder

From Wikipedia, the free encyclopedia
Deep learning generative model to encode data representation

The basic scheme of a variational autoencoder. The model receivesx{\displaystyle x} as input. The encoder compresses it into the latent space. The decoder receives as input the information sampled from the latent space and producesx{\displaystyle {x'}} as similar as possible tox{\displaystyle x}.
Part of a series on
Machine learning
anddata mining
Journals and conferences

Inmachine learning, avariational autoencoder (VAE) is anartificial neural network architecture introduced by Diederik P. Kingma andMax Welling.[1] It is part of the families ofprobabilistic graphical models andvariational Bayesian methods.[2]

In addition to being seen as anautoencoder neural network architecture, variational autoencoders can also be studied within the mathematical formulation ofvariational Bayesian methods, connecting a neural encoder network to its decoder through a probabilisticlatent space (for example, as amultivariate Gaussian distribution) that corresponds to the parameters of a variational distribution.

Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of thereparameterization trick, although the variance of the noise model can be learned separately.[citation needed]

Although this type of model was initially designed forunsupervised learning,[3][4] its effectiveness has been proven forsemi-supervised learning[5][6] andsupervised learning.[7]

Overview of architecture and operation

[edit]

A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually such models are trained using theexpectation-maximization meta-algorithm (e.g.probabilistic PCA, (spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is usually computationally intractable, and in doing so requires the discovery of q-distributions, or variationalposteriors. These q-distributions are normally parameterized for each individual data point in a separate optimization process. However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. In that way, the same parameters are reused for multiple data points, which can result in massive memory savings. The first neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder.

The decoder is the second neural network of this model. It is a function that maps from the latent space to the input space, e.g. as the means of the noise distribution. It is possible to use another neural network that maps to the variance, however this can be omitted for simplicity. In such a case, the variance can be optimized with gradient descent.

To optimize this model, one needs to know two terms: the "reconstruction error", and theKullback–Leibler divergence (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data, here referred to as p-distribution. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value.[8]

More recent approaches replaceKullback–Leibler divergence (KL-D) withvarious statistical distances, see"Statistical distance VAE variants" below.

Formulation

[edit]

From the point of view of probabilistic modeling, one wants to maximize the likelihood of the datax{\displaystyle x} by their chosen parameterized probability distributionpθ(x)=p(x|θ){\displaystyle p_{\theta }(x)=p(x|\theta )}. This distribution is usually chosen to be a GaussianN(x|μ,σ){\displaystyle N(x|\mu ,\sigma )} which is parameterized byμ{\displaystyle \mu } andσ{\displaystyle \sigma } respectively, and as a member of the exponential family it is easy to work with as a noise distribution. Simple distributions are easy enough to maximize, however distributions where a prior is assumed over the latentsz{\displaystyle z} results in intractable integrals. Let us findpθ(x){\displaystyle p_{\theta }(x)} viamarginalizing overz{\displaystyle z}.

pθ(x)=zpθ(x,z)dz,{\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x,z})\,dz,}

wherepθ(x,z){\displaystyle p_{\theta }({x,z})} represents thejoint distribution underpθ{\displaystyle p_{\theta }} of the observable datax{\displaystyle x} and its latent representation or encodingz{\displaystyle z}. According to thechain rule, the equation can be rewritten as

pθ(x)=zpθ(x|z)pθ(z)dz{\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x|z})p_{\theta }(z)\,dz}

In the vanilla variational autoencoder,z{\displaystyle z} is usually taken to be a finite-dimensional vector of real numbers, andpθ(x|z){\displaystyle p_{\theta }({x|z})} to be aGaussian distribution. Thenpθ(x){\displaystyle p_{\theta }(x)} is a mixture of Gaussian distributions.

It is now possible to define the set of the relationships between the input data and its latent representation as

Unfortunately, the computation ofpθ(z|x){\displaystyle p_{\theta }(z|x)} is expensive and in most cases intractable. To speed up the calculus to make it feasible, it is necessary to introduce a further function to approximate the posterior distribution as

qϕ(z|x)pθ(z|x){\displaystyle q_{\phi }({z|x})\approx p_{\theta }({z|x})}

withϕ{\displaystyle \phi } defined as the set of real values that parametrizeq{\displaystyle q}. This is sometimes calledamortized inference, since by "investing" in finding a goodqϕ{\displaystyle q_{\phi }}, one can later inferz{\displaystyle z} fromx{\displaystyle x} quickly without doing any integrals.

In this way, the problem is to find a good probabilistic autoencoder, in which the conditional likelihood distributionpθ(x|z){\displaystyle p_{\theta }(x|z)} is computed by theprobabilistic decoder, and the approximated posterior distributionqϕ(z|x){\displaystyle q_{\phi }(z|x)} is computed by theprobabilistic encoder.

Parametrize the encoder asEϕ{\displaystyle E_{\phi }}, and the decoder asDθ{\displaystyle D_{\theta }}.

Evidence lower bound (ELBO)

[edit]
Main article:Evidence lower bound

Like manydeep learning approaches that use gradient-based optimization, VAEs require a differentiable loss function to update the network weights throughbackpropagation.

For variational autoencoders, the idea is to jointly optimize the generative model parametersθ{\displaystyle \theta } to reduce the reconstruction error between the input and the output, andϕ{\displaystyle \phi } to makeqϕ(z|x){\displaystyle q_{\phi }({z|x})} as close as possible topθ(z|x){\displaystyle p_{\theta }(z|x)}. As reconstruction loss,mean squared error andcross entropy are often used.

As distance loss between the two distributions the Kullback–Leibler divergenceDKL(qϕ(z|x)pθ(z|x)){\displaystyle D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))} is a good choice to squeezeqϕ(z|x){\displaystyle q_{\phi }({z|x})} underpθ(z|x){\displaystyle p_{\theta }(z|x)}.[8][9]

The distance loss just defined is expanded as

DKL(qϕ(z|x)pθ(z|x))=Ezqϕ(|x)[lnqϕ(z|x)pθ(z|x)]=Ezqϕ(|x)[lnqϕ(z|x)pθ(x)pθ(x,z)]=lnpθ(x)+Ezqϕ(|x)[lnqϕ(z|x)pθ(x,z)]{\displaystyle {\begin{aligned}D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }(z|x)}{p_{\theta }(z|x)}}\right]\\&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})p_{\theta }(x)}{p_{\theta }(x,z)}}\right]\\&=\ln p_{\theta }(x)+\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})}{p_{\theta }(x,z)}}\right]\end{aligned}}}

Now define theevidence lower bound (ELBO):Lθ,ϕ(x):=Ezqϕ(|x)[lnpθ(x,z)qϕ(z|x)]=lnpθ(x)DKL(qϕ(|x)pθ(|x)){\displaystyle L_{\theta ,\phi }(x):=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\ln p_{\theta }(x)-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }({\cdot |x}))}Maximizing the ELBOθ,ϕ=argmaxθ,ϕLθ,ϕ(x){\displaystyle \theta ^{*},\phi ^{*}={\underset {\theta ,\phi }{\operatorname {argmax} }}\,L_{\theta ,\phi }(x)}is equivalent to simultaneously maximizinglnpθ(x){\displaystyle \ln p_{\theta }(x)} and minimizingDKL(qϕ(z|x)pθ(z|x)){\displaystyle D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))}. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posteriorqϕ(|x){\displaystyle q_{\phi }(\cdot |x)} from the exact posteriorpθ(|x){\displaystyle p_{\theta }(\cdot |x)}.

The form given is not very convenient for maximization, but the following, equivalent form, is:Lθ,ϕ(x)=Ezqϕ(|x)[lnpθ(x|z)]DKL(qϕ(|x)pθ()){\displaystyle L_{\theta ,\phi }(x)=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln p_{\theta }(x|z)\right]-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }(\cdot ))}wherelnpθ(x|z){\displaystyle \ln p_{\theta }(x|z)} is implemented as12xDθ(z)22{\displaystyle -{\frac {1}{2}}\|x-D_{\theta }(z)\|_{2}^{2}}, since that is, up to an additive constant, whatx|zN(Dθ(z),I){\displaystyle x|z\sim {\mathcal {N}}(D_{\theta }(z),I)} yields. That is, we model the distribution ofx{\displaystyle x} conditional onz{\displaystyle z} to be a Gaussian distribution centered onDθ(z){\displaystyle D_{\theta }(z)}. The distribution ofqϕ(z|x){\displaystyle q_{\phi }(z|x)} andpθ(z){\displaystyle p_{\theta }(z)} are often also chosen to be Gaussians asz|xN(Eϕ(x),σϕ(x)2I){\displaystyle z|x\sim {\mathcal {N}}(E_{\phi }(x),\sigma _{\phi }(x)^{2}I)} andzN(0,I){\displaystyle z\sim {\mathcal {N}}(0,I)}, with which we obtain by the formula forKL divergence of Gaussians:Lθ,ϕ(x)=12Ezqϕ(|x)[xDθ(z)22]12(Nσϕ(x)2+Eϕ(x)222Nlnσϕ(x))+Const{\displaystyle L_{\theta ,\phi }(x)=-{\frac {1}{2}}\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\|x-D_{\theta }(z)\|_{2}^{2}\right]-{\frac {1}{2}}\left(N\sigma _{\phi }(x)^{2}+\|E_{\phi }(x)\|_{2}^{2}-2N\ln \sigma _{\phi }(x)\right)+Const}HereN{\displaystyle N} is the dimension ofz{\displaystyle z}. For a more detailed derivation and more interpretations of ELBO and its maximization, seeits main page.

Reparameterization

[edit]
The scheme of the reparameterization trick. The randomness variableε{\displaystyle {\varepsilon }} is injected into the latent spacez{\displaystyle z} as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.

To efficiently search forθ,ϕ=argmaxθ,ϕLθ,ϕ(x){\displaystyle \theta ^{*},\phi ^{*}={\underset {\theta ,\phi }{\operatorname {argmax} }}\,L_{\theta ,\phi }(x)}the typical method isgradient ascent.

It is straightforward to findθEzqϕ(|x)[lnpθ(x,z)qϕ(z|x)]=Ezqϕ(|x)[θlnpθ(x,z)qϕ(z|x)]{\displaystyle \nabla _{\theta }\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\nabla _{\theta }\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]}However,ϕEzqϕ(|x)[lnpθ(x,z)qϕ(z|x)]{\displaystyle \nabla _{\phi }\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]}does not allow one to put theϕ{\displaystyle \nabla _{\phi }} inside the expectation, sinceϕ{\displaystyle \phi } appears in the probability distribution itself. Thereparameterization trick (also known as stochastic backpropagation[10]) bypasses this difficulty.[8][11][12]

The most important example is whenzqϕ(|x){\displaystyle z\sim q_{\phi }(\cdot |x)} is normally distributed, asN(μϕ(x),Σϕ(x)){\displaystyle {\mathcal {N}}(\mu _{\phi }(x),\Sigma _{\phi }(x))}.

The scheme of a variational autoencoder after the reparameterization trick

This can be reparametrized by lettingεN(0,I){\displaystyle {\boldsymbol {\varepsilon }}\sim {\mathcal {N}}(0,{\boldsymbol {I}})} be a "standardrandom number generator", and constructz{\displaystyle z} asz=μϕ(x)+Lϕ(x)ϵ{\displaystyle z=\mu _{\phi }(x)+L_{\phi }(x)\epsilon }. Here,Lϕ(x){\displaystyle L_{\phi }(x)} is obtained by theCholesky decomposition:Σϕ(x)=Lϕ(x)Lϕ(x)T{\displaystyle \Sigma _{\phi }(x)=L_{\phi }(x)L_{\phi }(x)^{T}}Then we haveϕEzqϕ(|x)[lnpθ(x,z)qϕ(z|x)]=Eϵ[ϕlnpθ(x,μϕ(x)+Lϕ(x)ϵ)qϕ(μϕ(x)+Lϕ(x)ϵ|x)]{\displaystyle \nabla _{\phi }\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\mathbb {E} _{\epsilon }\left[\nabla _{\phi }\ln {\frac {p_{\theta }(x,\mu _{\phi }(x)+L_{\phi }(x)\epsilon )}{q_{\phi }(\mu _{\phi }(x)+L_{\phi }(x)\epsilon |x)}}\right]}and so we obtained an unbiased estimator of the gradient, allowingstochastic gradient descent.

Since we reparametrizedz{\displaystyle z}, we need to findqϕ(z|x){\displaystyle q_{\phi }(z|x)}. Letq0{\displaystyle q_{0}} be the probability density function forϵ{\displaystyle \epsilon }, then[clarification needed]lnqϕ(z|x)=lnq0(ϵ)ln|det(ϵz)|{\displaystyle \ln q_{\phi }(z|x)=\ln q_{0}(\epsilon )-\ln |\det(\partial _{\epsilon }z)|}whereϵz{\displaystyle \partial _{\epsilon }z} is the Jacobian matrix ofz{\displaystyle z} with respect toϵ{\displaystyle \epsilon }. Sincez=μϕ(x)+Lϕ(x)ϵ{\displaystyle z=\mu _{\phi }(x)+L_{\phi }(x)\epsilon }, this islnqϕ(z|x)=12ϵ2ln|detLϕ(x)|n2ln(2π){\displaystyle \ln q_{\phi }(z|x)=-{\frac {1}{2}}\|\epsilon \|^{2}-\ln |\det L_{\phi }(x)|-{\frac {n}{2}}\ln(2\pi )}

Variations

[edit]

Many variational autoencoders applications and extensions have been used to adapt the architecture to other domains and improve its performance.

β{\displaystyle \beta }-VAE is an implementation with a weighted Kullback–Leibler divergence term to automatically discover and interpret factorised latent representations. With this implementation, it is possible to force manifold disentanglement forβ{\displaystyle \beta } values greater than one. This architecture can discover disentangled latent factors without supervision.[13][14]

The conditional VAE (CVAE), inserts label information in the latent space to force a deterministic constrained representation of the learned data.[15]

Some structures directly deal with the quality of the generated samples[16][17] or implement more than one latent space to further improve the representation learning.

Some architectures mix VAE andgenerative adversarial networks to obtain hybrid models.[18][19][20]

It is not necessary to use gradients to update the encoder. In fact, the encoder is not necessary for the generative model.[21]

Statistical distance VAE variants

[edit]

After the initial work of Diederik P. Kingma andMax Welling,[22] several procedures were proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts :

We obtain the final formula for the loss:Lθ,ϕ=ExPreal[xDθ(Eϕ(x))22]+d(μ(dz),EϕPreal)2{\displaystyle L_{\theta ,\phi }=\mathbb {E} _{x\sim \mathbb {P} ^{real}}\left[\|x-D_{\theta }(E_{\phi }(x))\|_{2}^{2}\right]+d\left(\mu (dz),E_{\phi }\sharp \mathbb {P} ^{real}\right)^{2}}

The statistical distanced{\displaystyle d} requires special properties, for instance it has to be posses a formula as expectation because the loss function will need to be optimized bystochastic optimization algorithms. Several distances can be chosen and this gave rise to several flavors of VAEs:

See also

[edit]

References

[edit]
  1. ^Kingma, Diederik P.; Welling, Max (2022-12-10). "Auto-Encoding Variational Bayes".arXiv:1312.6114 [stat.ML].
  2. ^Pinheiro Cinelli, Lucas; et al. (2021)."Variational Autoencoder".Variational Methods for Machine Learning with Applications to Deep Networks. Springer. pp. 111–149.doi:10.1007/978-3-030-70679-1_5.ISBN 978-3-030-70681-4.S2CID 240802776.
  3. ^Dilokthanakul, Nat; Mediano, Pedro A. M.; Garnelo, Marta; Lee, Matthew C. H.; Salimbeni, Hugh; Arulkumaran, Kai; Shanahan, Murray (2017-01-13). "Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders".arXiv:1611.02648 [cs.LG].
  4. ^Hsu, Wei-Ning; Zhang, Yu; Glass, James (December 2017)."Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation".2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). pp. 16–23.arXiv:1707.06265.doi:10.1109/ASRU.2017.8268911.ISBN 978-1-5090-4788-8.S2CID 22681625.
  5. ^Ehsan Abbasnejad, M.; Dick, Anthony; van den Hengel, Anton (2017).Infinite Variational Autoencoder for Semi-Supervised Learning. pp. 5888–5897.
  6. ^Xu, Weidi; Sun, Haoze; Deng, Chao; Tan, Ying (2017-02-12)."Variational Autoencoder for Semi-Supervised Text Classification".Proceedings of the AAAI Conference on Artificial Intelligence.31 (1).doi:10.1609/aaai.v31i1.10966.S2CID 2060721.
  7. ^Kameoka, Hirokazu; Li, Li; Inoue, Shota; Makino, Shoji (2019-09-01)."Supervised Determined Source Separation with Multichannel Variational Autoencoder".Neural Computation.31 (9):1891–1914.doi:10.1162/neco_a_01217.PMID 31335290.S2CID 198168155.
  8. ^abcKingma, Diederik P.; Welling, Max (2013-12-20). "Auto-Encoding Variational Bayes".arXiv:1312.6114 [stat.ML].
  9. ^"From Autoencoder to Beta-VAE".Lil'Log. 2018-08-12.
  10. ^Rezende, Danilo Jimenez; Mohamed, Shakir; Wierstra, Daan (2014-06-18)."Stochastic Backpropagation and Approximate Inference in Deep Generative Models".International Conference on Machine Learning. PMLR:1278–1286.arXiv:1401.4082.
  11. ^Bengio, Yoshua; Courville, Aaron; Vincent, Pascal (2013)."Representation Learning: A Review and New Perspectives".IEEE Transactions on Pattern Analysis and Machine Intelligence.35 (8):1798–1828.arXiv:1206.5538.doi:10.1109/TPAMI.2013.50.ISSN 1939-3539.PMID 23787338.S2CID 393948.
  12. ^Kingma, Diederik P.; Rezende, Danilo J.; Mohamed, Shakir; Welling, Max (2014-10-31). "Semi-Supervised Learning with Deep Generative Models".arXiv:1406.5298 [cs.LG].
  13. ^Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick, Matthew; Mohamed, Shakir; Lerchner, Alexander (2016-11-04).beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. NeurIPS.
  14. ^Burgess, Christopher P.; Higgins, Irina; Pal, Arka; Matthey, Loic; Watters, Nick; Desjardins, Guillaume; Lerchner, Alexander (2018-04-10). "Understanding disentangling in β-VAE".arXiv:1804.03599 [stat.ML].
  15. ^Sohn, Kihyuk; Lee, Honglak; Yan, Xinchen (2015-01-01).Learning Structured Output Representation using Deep Conditional Generative Models(PDF). NeurIPS.
  16. ^Dai, Bin; Wipf, David (2019-10-30). "Diagnosing and Enhancing VAE Models".arXiv:1903.05789 [cs.LG].
  17. ^Dorta, Garoe; Vicente, Sara; Agapito, Lourdes; Campbell, Neill D. F.; Simpson, Ivor (2018-07-31). "Training VAEs Under Structured Residuals".arXiv:1804.01050 [stat.ML].
  18. ^Larsen, Anders Boesen Lindbo; Sønderby, Søren Kaae; Larochelle, Hugo; Winther, Ole (2016-06-11)."Autoencoding beyond pixels using a learned similarity metric".International Conference on Machine Learning. PMLR:1558–1566.arXiv:1512.09300.
  19. ^Bao, Jianmin; Chen, Dong; Wen, Fang; Li, Houqiang; Hua, Gang (2017). "CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training". pp. 2745–2754.arXiv:1703.10155 [cs.CV].
  20. ^Gao, Rui; Hou, Xingsong; Qin, Jie; Chen, Jiaxin; Liu, Li; Zhu, Fan; Zhang, Zhao; Shao, Ling (2020)."Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning".IEEE Transactions on Image Processing.29:3665–3680.Bibcode:2020ITIP...29.3665G.doi:10.1109/TIP.2020.2964429.ISSN 1941-0042.PMID 31940538.S2CID 210334032.
  21. ^Drefs, J.; Guiraud, E.; Panagiotou, F.; Lücke, J. (2023)."Direct evolutionary optimization of variational autoencoders with binary latents".Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Vol. 13715. Springer Nature Switzerland. pp. 357–372.doi:10.1007/978-3-031-26409-2_22.ISBN 978-3-031-26408-5.
  22. ^Kingma, Diederik P.; Welling, Max (2022-12-10). "Auto-Encoding Variational Bayes".arXiv:1312.6114 [stat.ML].
  23. ^Kolouri, Soheil; Pope, Phillip E.; Martin, Charles E.; Rohde, Gustavo K. (2019)."Sliced Wasserstein Auto-Encoders".International Conference on Learning Representations. International Conference on Learning Representations. ICPR.
  24. ^Turinici, Gabriel (2021)."Radon-Sobolev Variational Auto-Encoders".Neural Networks.141:294–305.arXiv:1911.13135.doi:10.1016/j.neunet.2021.04.018.ISSN 0893-6080.PMID 33933889.
  25. ^Gretton, A.; Li, Y.; Swersky, K.; Zemel, R.; Turner, R. (2017). "A Polya Contagion Model for Networks".IEEE Transactions on Control of Network Systems.5 (4):1998–2010.arXiv:1705.02239.doi:10.1109/TCNS.2017.2781467.
  26. ^Tolstikhin, I.; Bousquet, O.; Gelly, S.; Schölkopf, B. (2018). "Wasserstein Auto-Encoders".arXiv:1711.01558 [stat.ML].
  27. ^Louizos, C.; Shi, X.; Swersky, K.; Li, Y.; Welling, M. (2019). "Kernelized Variational Autoencoders".arXiv:1901.02401 [astro-ph.CO].

Further reading

[edit]
Concepts
Applications
Implementations
Audio–visual
Text
Decisional
People
Architectures
Retrieved from "https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&oldid=1288065409"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp