Movatterモバイル変換

Restricted Boltzmann machine

From Wikipedia, the free encyclopedia

Class of artificial neural network

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences AAAI ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Diagram of a restricted Boltzmann machine with three visible units and four hidden units (no bias units)

Arestricted Boltzmann machine (RBM) (also called arestricted Sherrington–Kirkpatrick model with external field orrestricted stochastic Ising–Lenz–Little model) is agenerative stochastic artificial neural network that can learn aprobability distribution over its set of inputs.^[1]

RBMs were initially proposed under the nameHarmonium byPaul Smolensky in 1986,^[2] and rose to prominence afterGeoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications indimensionality reduction,^[3]classification,^[4]collaborative filtering,^[5]feature learning,^[6]topic modelling,^[7]immunology,^[8] and evenmany‑body quantum mechanics.^[9]^[10]^[11]

They can be trained in eithersupervised orunsupervised ways, depending on the task.^{[citation needed]}

As their name implies, RBMs are a variant ofBoltzmann machines, with the restriction that theirneurons must form abipartite graph:

a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and
there are no connections between nodes within a group.

By contrast, "unrestricted" Boltzmann machines may have connections betweenhidden units. This restriction allows for more efficient trainingalgorithms than are available for the general class of Boltzmann machines, in particular thegradient-basedcontrastive divergence algorithm.^[12]

Restricted Boltzmann machines can also be used indeep learning networks. In particular,deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network withgradient descent andbackpropagation.^[13]

Structure

[edit]

The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of amatrix of weights $W {\displaystyle W}$ of size $m\times n$ . Each weight element $(w_{i,j})$ of the matrix is associated with the connection between the visible (input) unit $v_{i}$ and the hidden unit $h_{j}$ . In addition, there are bias weights (offsets) $a_{i}$ for $v_{i}$ and $b_{j}$ for $h_{j}$ . Given the weights and biases, theenergy of a configuration (pair of Boolean vectors)(v,h) is defined as

E(v,h)=-\sum _{i}a_{i}v_{i}-\sum _{j}b_{j}h_{j}-\sum _{i}\sum _{j}v_{i}w_{i,j}h_{j}

or, in matrix notation,

E(v,h)=-a^{\mathrm {T} }v-b^{\mathrm {T} }h-v^{\mathrm {T} }Wh.

This energy function is analogous to that of aHopfield network. As with general Boltzmann machines, thejoint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows,^[14]

P(v,h)={\frac {1}{Z}}e^{-E(v,h)}

where $Z {\displaystyle Z}$ is apartition function defined as the sum of $e^{-E(v,h)}$ over all possible configurations, which can be interpreted as anormalizing constant to ensure that the probabilities sum to 1. Themarginal probability of a visible vector is the sum of $P(v,h)$ over all possible hidden layer configurations,^[14]

P(v)={\frac {1}{Z}}\sum _{\{h\}}e^{-E(v,h)}

and vice versa. Since the underlying graph structure of the RBM isbipartite (meaning there are no intra-layer connections), the hidden unit activations aremutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations.^[12] That is, form visible units andn hidden units, theconditional probability of a configuration of the visible unitsv, given a configuration of the hidden unitsh, is

P(v|h)=\prod _{i=1}^{m}P(v_{i}|h)

Conversely, the conditional probability ofh givenv is

P(h|v)=\prod _{j=1}^{n}P(h_{j}|v)

The individual activation probabilities are given by

P(h_{j}=1|v)=\sigma \left(b_{j}+\sum _{i=1}^{m}w_{i,j}v_{i}\right)

and

\,P(v_{i}=1|h)=\sigma \left(a_{i}+\sum _{j=1}^{n}w_{i,j}h_{j}\right)

where $\sigma$ denotes thelogistic sigmoid.

The visible units of Restricted Boltzmann Machine can bemultinomial, although the hidden units areBernoulli.^{[clarification needed]} In this case, the logistic function for visible units is replaced by thesoftmax function

P(v_{i}^{k}=1|h)={\frac {\exp(a_{i}^{k}+\Sigma _{j}W_{ij}^{k}h_{j})}{\Sigma _{k'=1}^{K}\exp(a_{i}^{k'}+\Sigma _{j}W_{ij}^{k'}h_{j})}}

whereK is the number of discrete values that the visible values have. They are applied in topic modeling,^[7] andrecommender systems.^[5]

Relation to other models

[edit]

Restricted Boltzmann machines are a special case ofBoltzmann machines andMarkov random fields.^[15]^[16]

Thegraphical model of RBMs corresponds to that offactor analysis.^[17]

Training algorithm

[edit]

Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set $V {\displaystyle V}$ (a matrix, each row of which is treated as a visible vector $v {\displaystyle v}$ ),

\arg \max _{W}\prod _{v\in V}P(v)

or equivalently, to maximize theexpected log probability of a training sample $v {\displaystyle v}$ selected randomly from $V {\displaystyle V}$ :^[15]^[16]

\arg \max _{W}\mathbb {E} \left[\log P(v)\right]

The algorithm most often used to train RBMs, that is, to optimize the weight matrix $W {\displaystyle W}$ , is the contrastive divergence (CD) algorithm due toHinton, originally developed to train PoE (product of experts) models.^[18]^[19]The algorithm performsGibbs sampling and is used inside agradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.

The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:

Take a training samplev, compute the probabilities of the hidden units and sample a hidden activation vectorh from this probability distribution.
Compute theouter product ofv andh and call this thepositive gradient.
Fromh, sample a reconstructionv' of the visible units, then resample the hidden activationsh' from this. (Gibbs sampling step)
Compute theouter product ofv' andh' and call this thenegative gradient.
Let the update to the weight matrix $W {\displaystyle W}$ be the positive gradient minus the negative gradient, times some learning rate: $\Delta W=\epsilon (vh^{\mathsf {T}}-v'h'^{\mathsf {T}})$ .
Update the biasesa andb analogously: $\Delta a=\epsilon (v-v')$ , $\Delta b=\epsilon (h-h')$ .

A Practical Guide to Training RBMs written by Hinton can be found on his homepage.^[14]

Stacked Restricted Boltzmann Machine

[edit]

This sectionmay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(August 2023) (Learn how and when to remove this message)

This sectionneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this section. Unsourced material may be challenged and removed.(August 2023) (Learn how and when to remove this message)

Literature

[edit]

Fischer, Asja; Igel, Christian (2012), "An Introduction to Restricted Boltzmann Machines",Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, vol. 7441, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 14–36,doi:10.1007/978-3-642-33275-3_2,ISBN 978-3-642-33274-6

References

[edit]

^Sherrington, David; Kirkpatrick, Scott (1975), "Solvable Model of a Spin-Glass",Physical Review Letters,35 (35):1792–1796,Bibcode:1975PhRvL..35.1792S,doi:10.1103/PhysRevLett.35.1792
^Smolensky, Paul (1986)."Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory"(PDF). In Rumelhart, David E.; McLelland, James L. (eds.).Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations. MIT Press. pp. 194–281.ISBN 0-262-68053-X.
^Hinton, G. E.; Salakhutdinov, R. R. (2006)."Reducing the Dimensionality of Data with Neural Networks"(PDF).Science.313 (5786):504–507.Bibcode:2006Sci...313..504H.doi:10.1126/science.1127647.PMID 16873662.S2CID 1658773. Archived fromthe original(PDF) on 2015-12-23. Retrieved2015-12-02.
^Larochelle, H.; Bengio, Y. (2008).Classification using discriminative restricted Boltzmann machines(PDF). Proceedings of the 25th international conference on Machine learning - ICML '08. p. 536.doi:10.1145/1390156.1390224.ISBN 978-1-60558-205-4.
^^a ^bSalakhutdinov, R.; Mnih, A.; Hinton, G. (2007).Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning - ICML '07. p. 791.doi:10.1145/1273496.1273596.ISBN 978-1-59593-793-3.
^Coates, Adam; Lee, Honglak; Ng, Andrew Y. (2011).An analysis of single-layer networks in unsupervised feature learning(PDF). International Conference on Artificial Intelligence and Statistics (AISTATS). Archived fromthe original(PDF) on 2014-12-20. Retrieved2014-12-19.
^^a ^bRuslan Salakhutdinov and Geoffrey Hinton (2010).Replicated softmax: an undirected topic model Archived 2012-05-25 at theWayback Machine.Neural Information Processing Systems23.
^Bravi, Barbara; Di Gioacchino, Andrea; Fernandez-de-Cossio-Diaz, Jorge; Walczak, Aleksandra M; Mora, Thierry; Cocco, Simona; Monasson, Rémi (2023-09-08). Bitbol, Anne-Florence; Eisen, Michael B (eds.)."A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity".eLife.12 e85126.doi:10.7554/eLife.85126.ISSN 2050-084X.PMC 10522340.PMID 37681658.
^Carleo, Giuseppe; Troyer, Matthias (2017-02-10). "Solving the quantum many-body problem with artificial neural networks".Science.355 (6325):602–606.arXiv:1606.02318.Bibcode:2017Sci...355..602C.doi:10.1126/science.aag2302.ISSN 0036-8075.PMID 28183973.S2CID 206651104.
^Melko, Roger G.; Carleo, Giuseppe; Carrasquilla, Juan; Cirac, J. Ignacio (September 2019)."Restricted Boltzmann machines in quantum physics".Nature Physics.15 (9):887–892.Bibcode:2019NatPh..15..887M.doi:10.1038/s41567-019-0545-1.ISSN 1745-2481.S2CID 256704838.
^Pan, Ruizhi; Clark, Charles W. (2024). "Efficiency of neural-network state representations of one-dimensional quantum spin systems".Physical Review Research.6 (2) 023193.arXiv:2302.00173.Bibcode:2024PhRvR...6b3193P.doi:10.1103/PhysRevResearch.6.023193.
^^a ^bMiguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005).On contrastive divergence learning.Artificial Intelligence and Statistics.
^Hinton, G. (2009)."Deep belief networks".Scholarpedia.4 (5): 5947.Bibcode:2009SchpJ...4.5947H.doi:10.4249/scholarpedia.5947.
^^a ^b ^c ^dGeoffrey Hinton (2010).A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto.
^^a ^bSutskever, Ilya; Tieleman, Tijmen (2010)."On the convergence properties of contrastive divergence"(PDF).Proc. 13th Int'l Conf. On AI and Statistics (AISTATS). Archived fromthe original(PDF) on 2015-06-10.
^^a ^bAsja Fischer and Christian Igel.Training Restricted Boltzmann Machines: An Introduction Archived 2015-06-10 at theWayback Machine. Pattern Recognition 47, pp. 25-39, 2014
^María Angélica Cueto; Jason Morton; Bernd Sturmfels (2010). "Geometry of the restricted Boltzmann machine".Algebraic Methods in Statistics and Probability.516. American Mathematical Society.arXiv:0908.4425.Bibcode:2009arXiv0908.4425A.
^Geoffrey Hinton (1999).Products of Experts.ICANN 1999.
^Hinton, G. E. (2002)."Training Products of Experts by Minimizing Contrastive Divergence"(PDF).Neural Computation.14 (8):1771–1800.doi:10.1162/089976602760128018.PMID 12180402.S2CID 207596505.

Bibliography

[edit]

Chen, Edwin (2011-07-18)."Introduction to Restricted Boltzmann Machines".Edwin Chen's blog.
Nicholson, Chris; Gibson, Adam."A Beginner's Tutorial for Restricted Boltzmann Machines".Deeplearning4j Documentation. Archived from the original on 2017-02-11. Retrieved2018-11-15.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
Nicholson, Chris; Gibson, Adam."Understanding RBMs".Deeplearning4j Documentation. Archived fromthe original on 2016-09-20. Retrieved2014-12-29.

External links

[edit]

Python implementation of Bernoulli RBM andtutorial
SimpleRBM is a very small RBM code (24kB) useful for you to learn about how RBMs learn and work.
Julia implementation of Restricted Boltzmann machines:https://github.com/cossio/RestrictedBoltzmannMachines.jl