MULTIPLE MEMORY SELF-ORGANIZING
PATTERN RECOGNITION NETWORK
Background of the Invention
1. Field of the Invention
This invention relates to the field of pattern recognition and detection and, in particular, to a self-organizing network for pattern detection and matching having symmetrical triggering and learned characteristic patterns.
2. Background Art
Prior art artificial intelligence methods for pattern recognition systems tend to be subject to failures caused by small variations in the input pattern. This is called brittleness. Recently, biologically based neutral network methods have been used to solve this problem. These neural network methods show promise for providing high accuracy, noise resistant, parallel algorithms and architecture for image recognition. Additionally, these prior art methods which use networks more closely based on biology also improve the convergence and learning rates of pattern recognition devices. This improvement is achieved in part by reducing the number of layers of weights to one or two. The most common of these prior art neural network methods involve multilayer perceptrons.
However, the addition of layers of processors in these neural network methods decreases recognition speed by lowering the degree of parallelism in the system. In addition to requiring multiple layers of neural processors, these methods also required a priori assumptions about the nature of the filtering or segmentation required for the pattern recognition problem. A priori assumptions can cause the system to be specialized to a more narrow range of applications. The need for a priori assumptions therefore may decrease system flexibility.
Furthermore, while these neural network methods can be very effective in solving some problems in the prior art, they are still subject to some convergence and local minima problems. In addition, they must have a large known training set of examples from which to learn new patterns. The combination of large training sets and slow conversions leads to long learning time. Neural network methods also require extensive preprocessing to preselect pattern features in order to allow these methods to be used successfully for difficult problems such as accurate character recognition using low quality images. The recognition accuracy is strongly dependent on the success of the preprocessing operation. The preprocessing required in these methods is very data dependent.
Referring now to prior art Fig. 1, there is shown the generalized structure of self-organizing pattern recognition system 10. This structure is sufficiently general to provide a basic model for many types of prior art systems. In some prior art systems of this type, input processing block 14 or filtering block 14 does not closely couple to associative memory blocks 16 or to learning control 22 or triggering block 22. In other prior art systems, filtering block 14 is tightly coupled to associative memory blocks 16 by way of image data lines 18 in order to apply image data to associative memory blocks
16. Input processing block 14 is also coupled to learning control 22 by way of image data lines 20. In still other prior art systems, an additional layer of preprocessing (not shown) is placed in front of filtering block 14.
In self-organizing pattern recognition system 10, input images are applied by way of input lines 12 to input processing block 14 or filtering block 14. In input processing unit 14, input images are reconstructed and filtered. The image data are then applied both to associative memory blocks 16 and to learning control and triggering block 22 by way of image data lines 18, 20 respectively, as previously described. Associative memory blocks 16 of pattern recognition system 10 compare the filtered image data with previously stored images within associative memory blocks 16 using association rules. The association rules used within associative memory blocks 16 of pattern recognition system 10 may be any conventional association rules known to those skilled in the art. The structure of associative memory blocks 16 within a self-organizing pattern recognition systems such as generalized pattern recognition system 10 may contain both short-term memory blocks and long-term memory blocks.
The association strengths determined in accordance to association rules within associative memory blocks 16 of generalized pattern recognition system 10 are used to gate image data from filtering block 14 to learning control block 22 or triggering block 22. In adaptive resonance theory based methods, resonance is used. The image data which passes through triggering block 22 is used within block 32 and is applied to associative memory blocks 16 by way of memory update lines 26 to store new data into the associative memory blocks 16. Within system 10 data in associative memory blocks 16 are used for determining classification in rules block 34.
Referring now to prior art Fig. 2, there is shown prior art adaptive resonance theory network 50. Adaptive resonance theory network 50 is within the general form of prior art generalized pattern recognition system 10. It is possible to use conventional adaptive resonance methods, such as those implemented within adaptive resonance theory network 50, for feature detection in image recognition problems if the images involved have been appropriately preprocessed. In prior art adaptive resonance theory network 50, data input is applied to top-down-weights memory 54 by way of network input lines 52. If the match to the top-down pattern achieved within network 50 is over a specific limit then a match to the best bottom-up-weight is checked and the input is used to update both sets of weights.
However, in prior art adaptive resonance theory network 50, it is necessary to preprocess input data in order to achieve good feature selection. For example, Gabor filters may be provided to operate upon the image data before the image data is applied to top-down-weights memory 54. This filtering involves the use of incomplete functions, which adds considerable complexity to the representation problem within network 50. In adaptive resonance theory network 50 , resonant bottom-up and top- down-weights are used to learn and classify data.
The algorithm of prior art adaptive resonance theory 50 requires finding two sets of weights, Z's over each of j active memory locations which have an optimal resonance with one of i input images. The input and output weights Z
up,j and Z
down,j, for the j active memory locations are updated for each i images, I
i, by calculating:
and finding the maximum μ for which
, the sensitivity of the required resonance. When an image meeting the sensitivity constraint is found the appropriate weights may be adapted within adaptive resonance theory network 50 using:
and
Zdown,j = Ii. Zdown,j
If no resonance is achieved for an image, two new memory locations in top-down-weights block 54 and bottom-up-weights block 58 are added to the active list and the weights of this blank location are adapted to the image. Adaptive resonance theory network 50 continues this process with each image until all available memory locations in blocks 54, 58 are set or all of the input images received by way of input lines 52 are used. After all vacant memory location are used, each memory location is compared to the product and the memory location for
which this product is the largest over the learning threshold is then updated.
The evaluation of the self-organizing classes of this algorithm within prior art adaptive resonance system 50 is achieved by accumulation of statistics in a classification variable: Zclass,kj = Zclass,kj +1 if class of ( τi) = k
This table may then be used to determine the maximum selection strength of each memory location for all images and to assign classes to images based on resonance performance achieved over all memory location and class assignments. This allows a new set of images to be learned wholly by example and divided into resonance classes based on the recognition results achieved in the training set.
Recognition within prior art system 50 is then achieved by finding the maximum strength of resonance for the weighted classes, max(ωk,j), using: .
In addition, the average resonance strength of the strongest weighted resonance,
P = max(ωkj/nw,kj),
where nω,k,j is the number of terms used to form each ωk,j, provides a confidence limit for the evaluation of classification errors. If the value of P is less than the confidence expected for correctly classified data in the training set, then items should be classified as unknown. This significantly reduces the undetectable error rate within prior art adaptive resonance theory network 50.
It is well known in the art to apply image recognition systems such as adaptive resonance theory network 50 to the conversion of images of handwritten and machine print characters to computer representation. Additionally, both special purpose hardware and entirely software approaches have been used on this character recognition problem in the prior art. In a conventional character recognition operation, images of individual characters from either hand or machine print are recognized as specific characters or are rejected. An isolated character image is presented to the recognition device and a specific character identity or set of identities with associated confidence factors is returned. For a character recognition technology to be useful, both the accuracy and the ability to detect low confidence recognitions must be present. If human intervention is required to detect failures or if the failure rate is too high, the application of image recognition methods to this problem has no advantage over existing data entry methods.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a block diagram representation of a generalized structure of a prior art self-organizing pattern recognition system.
Fig. 2 shows a block diagram representation of a prior art adaptive resonance theory pattern recognition network.
Fig. 3 shows a block diagram representation of the self-organizing pattern detection and matching system of the present invention.
Fig. 4 shows an enlarged schematic representation of zero order statistics symmetrical triggering circuitry of the self-organizing pattern detection and matching system of Fig. 3.
Fig. 5 shows a schematic representation of first order statistics symmetrical triggering circuitry which is an alternate embodiment of the zero order statistics symmetrical triggering circuitry of Fig. 4.
Fig. 6 shows a schematic representation of second order statistics symmetrical triggering circuitry which is an alternate embodiment of the zero order statistics symmetrical triggering circuitry of Fig. 4.
Figs. 7A-E show association space diagrams used for pattern detection and matching determinations within the system of Fig. 3.
GENERAL DESCRIPTION OF THE INVENTION
Referring now to Fig. 3, there is shown self-organizing pattern detection and matching system 100 of the present invention. Pattern detection and matching system 100 retains the speed of prior art pattern recognition networks, such as adaptive resonance theory network 50, as well as their rapid convergence. However, while self-organizing pattern detecting and matching system 100 may achieve recognition accuracy as high as that of adaptive resonance theory network 50, system 100 does not require the external filtering required by prior art network 50. In pattern detection and matching system 100, recognition of the required input processing is learned in parallel with pattern recognition features using a multi-map procedure similar to those believed to exist in the mid-level visual cortex.
Input lines 112 of pattern detection and recognition system 100 are applied directly to associative pattern memory 114 and to associative relevance memory 118. This permits any preprocessing of data required by system 100 to be learned within system 100 in parallel with a determination of a pattern in associative relevance memory 118. It will be understood by those skilled in the art that any type of suitable preprocessing may occur within system 100 provided that it provides reduced noise image processing and image reconstruction using any type of basis set. Since separate associative memories 114, 118 are used in pattern detection and matching system 100, the memory update procedure for each associative memory 114, 118 may be optimized to extract features of different types in different memories 114, 118. It will be also understood by those skilled in the art that pattern detection and recognition system 100 may be provided with any number of blocks 114, 116, 118 and 120.
In system 100, associative memories 114, 118 are logically equivalent in that they are symmetrically triggered by zero order statistic symmetrical triggering circuitry 126. It is the symmetrical triggering of associative memories 114, 118 by symmetrical triggering circuitry 126 which permits system 100 to learn optimal reconstruction of input data within system 100 rather than using preselected external filters. Furthermore, because of this, only one layer of memories is required within detection and matching system 100. Additionally, self-organizing pattern detection and matching system 100 is provided with a feedforward adaptation using this symmetric triggering in pattern learning block 116 and relevance learning block 120. This adaptation improves the speed of computation, improves stability, and allows system 100 to operate with any required number of associative memories 114, 118.
It will be understood by those skilled in the art that three essential features are provided within self- organizing pattern detection and matching system 100. These three features are that within system 100: 1) different feature classes use individual association rules
2) different feature classes use-individual learning rules
3) all feature classes contribute symmetrically to learning. The architecture of system 100 according to the present invention is not restricted to any number or type of feature classes.
Pattern association 124, performed upon the data of pattern memory 114 applied by way of recall lines 122, is carried out using functions which produce a pattern association strength, ∑p. Relevance association 138, performed over the data of feature relevance memory 118, is carried out using functions which produce a relevance association strength, ∑R. The limit applied by pattern association 124 to gates 130, 146 by way of summation output line 132, is the learning threshold, ρ.
Thus, during the learning process within self-organizing pattern detection and matching system 100, image data are presented to system 100, compared with stored patterns within associative memory blocks 114, 118. Input processing, performing substantially the functions of prior art external Gabor filtering, is performed simultaneously with detecting and matching within system 100. The association strength equations used may be those known in the art. For example, correlation, 1/(1 + tan2 θ), 1/(1 + d2), Tanimoto, and offset cosine functions may be used.
The capability for the input processing to perform substantially the same functions as prior art external Gabor filtering simultaneously with detecting and matching, includes the ability to smooth an input image while providing a statistical generalization of the image. This is accomplished by determining the statistical properties of the association space for incoming data points. If incoming data further separates a memory location from others it makes the generalization stronger and the data are applied to associative memories 114, 118 and used for further pattern detection and matching. Those data which would decrease the level of statistical generalization are separated and not used for pattern detection and matching within system 100. The generalizing data are combined with the information in associative memories 114, 118. In addition to increasing statistical generalization, this method smooths the image making it less noisy.
Referring now to Fig. 4, there is shown an enlarged schematic representation of zero order statistics symmetrical triggering circuitry 126a. Symmetrical triggering circuitry 126a gates all of the data in parallel from pattern memory 114 to pattern learning 116 by way of lines of 122, 128. Triggering circuitry 126a also gates all of the data in parallel from relevance memory 118 to relevance learning 120 by way of lines 134, 142. The same conditions that are required by pattern gating circuit 130 for applying pattern data to pattern learning block 116 are also required by relevance gating circuit 146 for applying relevance data to relevance learning block 120. This results in the triggering of circuitry 126a being symmetrical.
The data applied by pattern memory 114 to pattern association 124 by way of recall lines 122 may be parallel data of any width, for example, it may be one kilobyte wide. Pattern association 124 performs a summation over the entire applied parallel data set using association rules such as, for example, correlation and Tanimoto. Pattern association 124 thereby produces a single output value, ∑p, for each memory block. It will be understood by those skilled in the art that n pattern memories 114 may be provided within pattern detection and matching system 100 and that system 100 would therefore be provided with n corresponding pattern associations 124 and a corresponding n single values ∑p.
The value of ∑p is applied by pattern association
124 to pattern gating circuit 130 by way of line 132 to control gating of data from pattern memory 114 to pattern learning block 116. Additionally, ∑p is applied to relevance gating circuit 146 for controlling the gating of data from relevance memory 118 to relevance learning block 120. Finally, ∑p is applied by pattern association 124 to output 150 to provide a classification signal on output line 154 of zero order statistics symmetrical triggering circuitry 126a within self-organizing pattern detection and matching system 100. Output 150 provides substantially the same function within system 100 as that provided by block 34 of the generalized structure of prior art self-organizing pattern recognition system 10.
In a completely symmetrical manner, relevance association 138 may use conventional association rules to determine a sum, ∑R, based upon parallel data applied to relevance association 138 by relevance memory 118 by way of recall lines 134. It will be understood by those skilled in the art that n relevance memories 118 may be provided within pattern recognition and detection system 100 and that system 100 would therefore be provided with n corresponding relevance associations 138 and n corresponding single values ∑R. Recall lines 134 may provide a massively parallel one kilobyte data set. The output value ∑R of relevance association 138 is applied by relevance association 138 to gating circuits 130, 146 by way of line 148. Line 148 also applies ∑R to output summation node 150 in order to provide a classification signal upon output line 154.
Referring now to Figs. 7A-E, there are shown association space diagrams 230, 240, 250, 260 and 270. Association space diagrams 230, 240 are used to illustrate the operation of the control logic of zero order statistics symmetrical triggering circuitry 126a for providing symmetrical triggering of learning within associative memories 114, 118 of pattern detection and matching system 100. Association space diagram 250, 260 are used for maximum resonance classification and associative classification respectively.
Association space diagram 230 may be used to illustrate the operation of the logic of symmetrical triggering circuitry 126a. Each value of ∑p provided by pattern association 124 is represented by a single block 236 in association space 230. Each ∑R provided by relevance association 138 is also represented by a single block 236 in association space diagram 230. The requirement that ∑p be above the limit ρ1 is a requirement that gating by gating circuits 130, 146 only occur for a data point 236 or a box 236 which is to the right of vertical line 234 of associative space diagram 230. The requirement that ∑R be above the limit ρ2 permits triggering of gating circuits 130, 146 only for a data point 236 or a box 236 which is above horizontal line 232 of association space diagram 230.
Thus, valid triggering data points 236 may only be found within upper right quadrant 237 of association space diagram 230. The only data points 236 meeting these conditions are data point 238 and data point 239. There is a further and separate logical requirement within gating circuits 130, 146 of triggering circuitry 126a that the ∑p or ∑R farthest from the origin of association space diagram 230 be selected. This results in the selection of data point 239, as the data point 236 which produces symmetrical triggering by gating circuits 130, 146 within symmetrical triggering circuitry 126a rather than data point 238. Thus, data point 239 symmetrically triggers pattern learning in block 116 and relevance learning in block 120 and the storing of learned data in respective memories 114, 118.
Referring now to Fig. 5, there is shown mean level symmetrical triggering circuitry 126b or first order statistics symmetrical triggering circuit 126b. Mean level symmetrical triggering circuitry 126b is an alternate embodiment of zero order statistical symmetrical triggering circuit 126a for triggering symmetrical learning within self-organizing pattern detection and matching system 100 of the present invention.
In mean level symmetrical triggering circuitry 126b, pattern association 124 receives parallel data from pattern memory 114 by way of recall lines 122 and provides an output signal ∑p on line 132 as previously described with respect to symmetrical triggering circuitry 126a. Relevance association 138 receives parallel data from relevance memory 118 by way of recall lines 134 and provides an output signal ∑R on line 148 as also previously described with respect to symmetrical triggering circuitry 126a. The values of ∑p and ∑R are applied by lines 132, 148 to subtractors 158, 160 respectively.
Additionally, within mean level symmetrical triggering circuitry 126b, pattern association 124 applies all values of ∑
p to mean determination unit 152. Mean determination unit 152 accumulates ∑
p applied by way of line 132 and generates
. Mean detection unit 152 then applies
to substraction unit 158. Subtraction unit 158 provides a signal
upon subtractor output line 162. In a symmetrical manner, relevance association 138 applies the values of ∑
R to mean determination unit 156. Mean determination unit 156 accumulates the values of ∑
p and calculates
. Mean determination unit 156 applies
to subtraction unit 160 which provides the signal
upon subtractor output line 164.
Within mean level symmetrical triggering system 126b, four conditions must be met in order for pattern gating circuit 130 and relevance gating circuit 146 to trigger learning within pattern detection and matching system 100 of present invention. The two difference signals applied to pattern gating circuit 130 and relevance gating circuit 146 by way of subtractor output lines 162, 164 respectively,
and
must be greater than zero. Additionally, both
and
must be the maximum in their respective sets of values as provided by pattern association 124 and relevance association 138 respectively.
Referring now to Fig. 6, there is shown maximum variance symmetrical triggering circuitry 126c or second order statistics symmetrical triggering circuitry 126c. Maximum variance symmetrical triggering circuitry 126c is an alternate embodiment of zero order statistics triggering circuitry 126a. In maximum variance symmetrical triggering circuitry 126c, pattern association 124 receives parallel data from pattern memory 114 by way of recall lines 122 and provides an output signal ∑p as previously described with respect to symmetrical triggering circuit 126a. Additionally, relevance association 138 receives parallel data from relevance memory 118 by way of recall lines 138 and provides an output signal ∑R as also previously described with respect to symmetrical triggering circuit 126a.
Within symmetrical triggering circuitry 126c, the value of ∑p and the value of ∑R are applied to linear cross-correlation unit 182 by way of lines 132, 148 respectively to determine the correlation of ∑p and ∑R. Linear cross-correlation unit 182 determines which output is the best linear correlation between all ∑p and ∑R values. The variance of the outputs of pattern association 124 and relevance association 138, ∑p and ∑R respectively, are then projected by means of projection variance unit 180 and projection variance unit 184. This provides the values var ∑p and var ∑R on lines 181, 185 respectively. The correlation of ∑p and ∑R is applied to pattern gating 130 and relevance gating 146 by way of line 186. Var ∑p and var ∑R are applied to summation node 188. The output of summation node 188 is the composite sum and is graphically represented as the vector sum 272 of association space diagram 270. Vector sum 272 is the maximum variation and comprises the best projection of self-organizing pattern detection and matching system 100. DETAILED DESCRIPTION OF THE INVENTION
Self-organizing pattern detection and matching system 100 of the present invention comprises a hierarchy of maps. This hierarchy of maps is created using unsupervised learning. In detection and matching system 100 the type of features to be learned are contained in separate memory blocks. All learning in pattern detection and matching system 100 is self-organizing and no preclassification of data is used during learning.
System 100 is a single layer, laterally parallel, symmetrically triggered system. It divides associative memories 114, 118 into associative memory blocks (not shown) each of which represents a single feature class or type which constitutes a class of memory maps. In imaging applications the memory locations contain learned images which are processed in parallel by self-organizing map type. System 100 differs from prior art resonance methods, such as that of adaptive resonance theory network 50, in that (1) the associative memory blocks 114, 118 of system 100 are coupled directly to the input of input lines 112 and (2) associative memories 114, 118 contribute to and are coupled to output 154. Associative memories 114, 116 are coupled to output 154 by way of trigger circuitry. This allows the separation of learning and association rules. This illustrates the critical role of the symmetrical triggering in the operation of the architecture of the present inventions.
The pattern recognition capability of self-organizing pattern detection and matching system 100 depends upon its self-organized learning capability. The self-organizing ability of system 100 is initiated by the symmetrical triggering mechanism. The triggering of system 100 is symmetrical partly due to the use of identical triggering rules on learning gates 130, 146. Gates 130, 146 maintain a fixed relationship between the feature types. This relationship also maintains the linear character of the method of the present invention and eliminates the need for a double parameter search over all combinations of feature classes. This symmetry derives from a logical relationship between the input of input lines 112 and the information retained in associative memories 114, 118 so that each address of associative memories 114, 118 comprises an input related, self-organized, feature set.
In the architecture of pattern recognition and detection system 100, the learning trigger logic takes place in an n × m dimensional space when n pattern and m relevance feature classes are used. In prior art adaptive resonance theory architecture, the equivalent learning logic space is one dimensional and was extended for parallel computation into two-dimensions. The n binary classes within system 100 of the present invention can be regarded as existence classes or vigilance classes which allow, but do not select learning. The m multi-bit or analog relevance classes within system 100 cannot allow learning independently but select what is learned when learning is allowed by the existence features. The relevance classes provide a concurrent parallel image process capability as distinguished to the serial capability of preprocessing.
Incoming data is preprocessed within memories 114, 118 and presented, in parallel, to each of the associative memory maps. Associative recall methods are then used to locate the best match. After the association strength has been determined for each memory location of each memory map, learning is triggered by a set of parallel comparisons that gate input image data by way of learning gates 130, 146 to learning modules 116, 120. This symmetric triggering allows internal filtering within system 100 rather than external filters of input data. When learning has been triggered within system 100, learning modules 116, 120 for each memory map update the selected memory locations. Once the associative learning of pattern detection and matching system 100 has progressed to a point where a sufficient number of patterns are retained in memories 114, 118, pattern classification takes place using the same logic used in symmetric triggering to access the classification history of the learned patterns.
The data, M, stored in the associate memories 114,
118 of pattern recognition and detection system 100 is compared to the image. The association between a memory
114, 118 and an image within system 100 is zero for no similarity and one for a perfect match. Associative memories 114, 118 are initialized for the pattern with the memory value M=1 and are initialized for the relevance with the value M=0. Everything is true but nothing is relevant.
Five functions may be used within pattern detection and matching system 100: correlation, 1/(1 + tan2 θ), 1/(1 + d2), Tanimoto similarity, and an offset cosine. The tangent based function has a mathematical form similar to Tanimoto with the same properties near a perfect match as the distance based form. All of the indicated sums are carried out over all pixels in the image applied to pattern detection and matching system 100 by way of input lines 112. It will be understood by those skilled in the art that input lines 112 of pattern matching and detection system 100 may be applied to a fast Fourier transform (not shown) and that the output of the Fourier transform may be applied to memories 114, 118. This permits associations against the transform data and adds a third associative feature.
The class of similarity functions which may be used within system 100 is very large. Any monotonic, single valued relationship between the image and the memory contents, M, which can be mapped on to the interval zero to one, may be used. The efficiency of any particular similarity function within system 100 depends on the ability to separate correct classifications from weak associations on marginal images. In a character recognition application, the efficiency is particularly counter balanced by computational cost.
If the match within system 100 is adequate, as determined using one of the conventional association rules, the image is used to update the both memories 114, 118 in the learning phase. If the image is not sufficiently similar to the existing patterns within pattern memory 114, it is treated as a unique pattern and placed in a new memory location within pattern memory 114 and memory 118 is initialized to zero. Symmetric triggering is the concurrent logical operation which uses comparable logical structures to update all of associate memory blocks 114, 118 in parallel. This parallel operation occurs in an association space which has a dimension equal to the number of feature classes and associative memory blocks. In pattern detection and matching system 100, having two general classes of features, pattern and relevance, the dimension is two. If, for example, the Fourier transform of the image is added, the dimension is increased to three.
The most general form of the triggering logic of pattern detection and matching system 100 for N associative memories 114, 118 for each feature class and M feature classes, for a total of N × M memories which each have combined association strengths function A
i,j, such as ∑
p,
and var ∑
p and logical thresholds for triggering, ρ ρ
j, where j = 1,....., M and i = 1,....., N is:
The combined association strength may be linear, measured from the mean, or projected along the line of maximum variance.
The learning within system 100 is triggered symmetrically in the ith set of N memories, and N features are being learned across M feature classes. This is graphically represented in two dimensions, M = 2, by association space diagrams 230.
A less general type of triggering, more similar to the prior art adaptive resonance theory methods, is obtained if M takes on values R and P and N = 2. Pattern existence data is represented by n binary map feature classes indexed on j and with i occurrences of each pattern P
i, with association strength, P
i,j. Triggering is initiated by n vigilance parameters, ρ
j. These vigilance parameters are not counted in the left hand distance term r
j and pattern strength. Relevance data is represented by m multi-bit map feature classes indexed on k with i occurrences of each strength class, R
i,k, with association strength, R
i,k, which have ρ
k = 0. The generalized logic for triggering learning within self-organizing pattern detection and matching system 100 takes the following form. An i and P=∏
nj=1(P
i, j) ×∏
mk=1(R
i,k) are found using:
Let j=1 and k=1 so that the triggering logic of pattern detection and matching system 100 reduces to finding an i using:
max (R21)&(P1 > ρ)
A typical association space diagram 240 for a set of Ri and Pi points is illustrated fori = 1,2,....10. The area P for the maximum case is marked by dashed lines 242. The limit Pi>ρ is represented by vertical line 224. The case which triggers learning within system 100 is indicated as point 246. Weaker associations which do not result in learning are shown as boxes 248.
After learning is triggered within pattern detection and matching system 100, information from the image is stored in one of the multi-bit memories used by the relevant feature class, relevance memory 118, and into one of the binary memories used by the pattern feature class. Any absolutely stable and convergent learning rule may be used within detection and matching system 100. Several different learning rules of varying complexity have been used within system 100 for the relevance memories 118. Additionally, several different rules have been used within system 100 for the pattern memories 116. During learning within system 100, the class of the sample images is not used. The process is self-organizing and requires no knowledge of class to construct the learned images. No explicit knowledge of class is used except in the statistical evaluation of classes which does not occur in the learning functions in pattern learning 116 and relevance learning 120 but takes place in classification unit 150.
All of the learning methods used within detection and matching system 100 for multi-bit relevance memory 118 take the form:
Mi,j(t + 1) = g)(Mi,j(t) + αΔMi,j) where if ±S is the scale, maximum range, of M, the limit function g is given by: ,
a is the learning rate, and the memories are updated form epoch t to epoch t+1. The time dependence generates a dynamically stable learning sequence. All of the rules used within system 100 of the present invention involve mechanism which generate positive feedback between the memory locations and the image. The function g is used to provide a bound on this feedback.
The first rule is given by:
This rule was developed to more closely approximate the behavior of neurons than previous rules. This has important consequences for the stability of the learning process within system 100. Unlike the other rules, this rule is self limiting at the scale values ±S. This rule is only applied vertically; data from each pixel only effect one memory element during learning. When a more extended field is used, image data effects memory elements over some local region referred to as a receptor field.
The second rule is a simple Hebbian rule of the form:
This rule is used because it is simpler and faster than other rules. The stability of the rule is guaranteed since only vertical interconnections are used.
The third rule is vertically distributed Hebbian rule of the form:
The vertical part of the learning is identical to the second rule. The field used is a three-by-three receptor field. Although a loop structure is implied by the notation used for the rule, a linear sequence of shifts is used to allow for massively parallel implementation.
The fourth rule used is vertically distributed Hebbian rule and a laterally connected anti-Hebbian rule of the form:
The vertical part of the learning is identical to rule three. The lateral component is anti-Hebbian.
The fifth rule used is vertically distributed
Hebbian rule with Gaussian weights on a field n
f square which must be summed over q = (n
f -1)/2 elements and takes the form:
where σ is the envelope variance of the field. This variance is not related to the variance used in Gabor filtering. The addition of the Gaussian weights increases the stability of the learning by making the equivalent shunting matrix diagonally dominant.
The sixth rule used is a vertically distributed Hebbian rule and a laterally connected anti-Hebbian rule with Gaussian weights on field nf square which must be summed over q = (nf - 1)/2 elements and takes the form:
where σ is again the envelope variance of the field.
The two rules used for binary pattern learning are a logical OR of the positively relevant elements and a logical product of the image and relevant elements.
Additionally, three types of classification have been used within pattern detection and matching system 100 to provide classification of images based on a previously learned image feature set. In the simplest case, maximum resonance based classification was used as in adaptive resonance theory network 50.
If M
R represents relevance memory 118 and M
P represents pattern memory 114, the classification of an image, I, is found by finding
and then taking the classification of the majority of the images used to learn to {MR,MP} pair. Since each of the terms used to generate the sum is equivalent to the summed term in the correlation association rule, the function maximized is the cross correlation of associative memories 114, 118.
The association based method calculates the values
R and P using the appropriate ∑ from symmetrical triggering circuit 126a, b, c. These values are then used to compute means values
and
. These values are then used to find:
for which
and
. This relationship is illustrated by vector 264 between mean 262 and classifying point 266 of association space diagram 260. As in the previous case, the classification is determined by the majority of the images used to learn the classifying memory pair.
The evaluation of the self-organized classes within pattern detection and matching system 100 is achieved by accumulation of statistics in a classification variable:
Zclass,k,j = Zclass,k,j, + 1 if class of (qi) = k
This may then be used to determine the most likely classification of each associative memory 114, 118 of the set for all images and to assign classes to images based on maximum strength achieved over all memory location and class assignments. This allows a new set of images to be learned wholly by example and divided into classes based on the recognition result achieved using the images assigned to the training set. This procedure is equivalent to inhibiting data flow between gating circuits 130, 146 and classification circuit 150.
As a demonstration of the effectiveness of system 100, a character recognition program has been constructed on a massively parallel computer which can perform ninety-nine percent accurate character recognition on medium quality machine printed digits at a speed of approximately two and one-half milliseconds per digit and ninety-three percent recognition on multiple writer hand print with a two percent substitutional error rate. Images of characters were optically scanned to provide electrical signals representative of the characters. These electrical signals were applied to input lines 112 of system 100. In the character recognition application two sets of feature maps are used. One map is used for character patterns and one map is used for the statistical relevance of the patterns. This combination of features allows image generalization or relevance maps to be learned in parallel with other image features. This type of parallelism effectively converts a multiple layer architecture to a single layer one.