a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and
there are no connections between nodes within a group.
By contrast, "unrestricted" Boltzmann machines may have connections betweenhidden units. This restriction allows for more efficient trainingalgorithms than are available for the general class of Boltzmann machines, in particular thegradient-basedcontrastive divergence algorithm.[12]
The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of amatrix of weights of size. Each weight element of the matrix is associated with the connection between the visible (input) unit and the hidden unit. In addition, there are bias weights (offsets) for and for. Given the weights and biases, theenergy of a configuration (pair of Boolean vectors)(v,h) is defined as
or, in matrix notation,
This energy function is analogous to that of aHopfield network. As with general Boltzmann machines, thejoint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows,[14]
where is apartition function defined as the sum of over all possible configurations, which can be interpreted as anormalizing constant to ensure that the probabilities sum to 1. Themarginal probability of a visible vector is the sum of over all possible hidden layer configurations,[14]
,
and vice versa. Since the underlying graph structure of the RBM isbipartite (meaning there are no intra-layer connections), the hidden unit activations aremutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations.[12] That is, form visible units andn hidden units, theconditional probability of a configuration of the visible unitsv, given a configuration of the hidden unitsh, is
.
Conversely, the conditional probability ofh givenv is
.
The individual activation probabilities are given by
Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set (a matrix, each row of which is treated as a visible vector),
The algorithm most often used to train RBMs, that is, to optimize the weight matrix, is the contrastive divergence (CD) algorithm due toHinton, originally developed to train PoE (product of experts) models.[18][19]The algorithm performsGibbs sampling and is used inside agradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.
The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:
Take a training samplev, compute the probabilities of the hidden units and sample a hidden activation vectorh from this probability distribution.
Compute theouter product ofv andh and call this thepositive gradient.
Fromh, sample a reconstructionv' of the visible units, then resample the hidden activationsh' from this. (Gibbs sampling step)
Compute theouter product ofv' andh' and call this thenegative gradient.
Let the update to the weight matrix be the positive gradient minus the negative gradient, times some learning rate:.
Update the biasesa andb analogously:,.
A Practical Guide to Training RBMs written by Hinton can be found on his homepage.[14]
The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes.
The usage of Stacked Boltzmann is tounderstand Natural languages,retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one.
Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure:. The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with Gibbs sampling: Δwij = e*(pij - p'ij)
The restricted Boltzmann's strength is it performs a non-linear transformation so it's easy to expand, and can give a hierarchical layer of features. The Weakness is that it has complicated calculations of integer and real-valued neurons. It does not follow the gradient of any function, so the approximation of Contrastive divergence to maximum likelihood is improvised.[14]
Fischer, Asja; Igel, Christian (2012), "An Introduction to Restricted Boltzmann Machines",Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, vol. 7441, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 14–36,doi:10.1007/978-3-642-33275-3_2,ISBN978-3-642-33274-6
^abSalakhutdinov, R.; Mnih, A.; Hinton, G. (2007).Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning - ICML '07. p. 791.doi:10.1145/1273496.1273596.ISBN978-1-59593-793-3.
^María Angélica Cueto; Jason Morton; Bernd Sturmfels (2010). "Geometry of the restricted Boltzmann machine".Algebraic Methods in Statistics and Probability.516. American Mathematical Society.arXiv:0908.4425.Bibcode:2009arXiv0908.4425A.