ADALINE (Adaptive Linear Neuron or laterAdaptive Linear Element) is an early single-layerartificial neural network and the name of the physical device that implemented it.[2][3][1][4][5] It was developed by professorBernard Widrow and his doctoral studentMarcian Hoff atStanford University in 1960. It is based on theperceptron and consists of weights, a bias, and a summation function. The weights and biases were implemented byrheostats (as seen in the "knobby ADALINE"), and later,memistors.
The difference between Adaline and the standard (Rosenblatt) perceptron is in how they learn. Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside function (see figure), but the standard perceptron unit weights are adjusted to match the correct output, after applying the Heaviside function.
Amultilayer network ofADALINE units is known as aMADALINE.
Adaline is a single-layer neural network with multiple nodes, where each node accepts multiple inputs and generates one output. Given the following variables:
the output is:
If we further assume that and, then the output further reduces to:
Thelearning rule used by ADALINE is the LMS ("least mean squares") algorithm, a special case ofgradient descent.
Given the following:
the LMS algorithm updates the weights as follows:
This update rule minimizes, the square of the error,[6] and is in fact thestochastic gradient descent update forlinear regression.[7]
MADALINE (Many ADALINE[8]) is a three-layer (input, hidden, output), fully connected,feedforward neural network architecture forclassification that uses ADALINE units in its hidden and output layers. I.e., itsactivation function is thesign function.[9] The three-layer network usesmemistors. As the sign function is non-differentiable,backpropagation cannot be used to train MADALINE networks. Hence, three different training algorithms have been suggested, called Rule I, Rule II and Rule III.
Despite many attempts, they never succeeded in training more than a single layer of weights in a MADALINE model. This was until Widrow saw the backpropagation algorithm in a 1985 conference inSnowbird, Utah.[10]
MADALINE Rule 1 (MRI) - The first of these dates back to 1962.[11] It consists of two layers: the first is made of ADALINE units (let the output of theth ADALINE unit be); the second layer has two units. One is a majority-voting unit that takes in all, and if there are more positives than negatives, outputs +1, and vice versa. Another is a "job assigner": suppose the desired output is -1, and different from the majority-voted output, then the job assigner calculates the minimal number of ADALINE units that must change their outputs from positive to negative, and picks those ADALINE units that areclosest to being negative, and makes them update their weights according to the ADALINE learning rule. It was thought of as a form of "minimal disturbance principle".[12]
The largest MADALINE machine built had 1000 weights, each implemented by a memistor. It was built in 1963 and used MRI for learning.[12][13]
Some MADALINE machines were demonstrated to perform tasks includinginverted pendulum balancing,weather forecasting, andspeech recognition.[3]
MADALINE Rule 2 (MRII) - The second training algorithm, described in 1988, improved on Rule I.[8] The Rule II training algorithm is based on a principle called "minimal disturbance". It proceeds by looping over training examples, and for each example, it:
MADALINE Rule 3 - The third "Rule" applied to a modified network withsigmoid activations instead of sign; it was later found to be equivalent to backpropagation.[12]
Additionally, when flipping single units' signs does not drive the error to zero for a particular example, the training algorithm starts flipping pairs of units' signs, then triples of units, etc.[8]