CROSS REFERENCE TO RELATED APPLICATIONS The present invention is related to the following U.S. patent applications which are incorporated herein by reference:
Ser. No. ______ (Attorney Docket No. RPS920030020US1) entitled “A Configurable Bi-Directional Bus For Communicating Between Autonomous Units” filed ______; and
Ser. No. ______ (Attorney Docket No. RPS920030037US1) entitled “Intrusion Detection Using A Network Processor And A Parallel Pattern Detection Engine” filed ______.
TECHNICAL FIELD The present invention relates in general to methods and systems for performing fast partial or exact pattern matching.
BACKGROUND INFORMATION Recognizing patterns within a set of data is important in many fields, including speech recognition, image processing, seismic data, etc. Some image processors collect image data and then pre-process the data to prepare it to be correlated to reference data. Other systems, like speech recognition, are real time where the input data is compared in real time to reference data to recognize patterns. Once the patterns are “recognized” or matched to a reference, the system may output the reference. For example, a speech recognition system may output equivalent text to the processed speech patterns. Other systems, like biological systems, may use similar techniques to determine sequences in molecular strings like DNA.
In some systems, there is a need to find patterns that are imbedded in a continuous data stream. In non-aligned data streams, there are some situations where patterns may be missed if only a single byte-by-byte comparison is implemented. The situation where patterns may be missed occurs when there is a repeated or nested repeating patterns in the input stream or the pattern to be detected. A reference pattern (RP)containing the sequence that is being searched for is loaded into storage where each element of the sequence has a unique address. An address register is loaded with the address of the first element of the RP that is to be compared with the first element of the input pattern (IP). This address register is called a “pointer.” In the general case, a pointer may be loaded with an address that may be either incremented (increased) or decremented (decreased). The value of the element pointed to by the pointer is retrieved and compared with input elements (IEs) that are clocked or loaded into a comparator.
In pattern recognition, it is often desired to compare elements of an IP to many RPs. For example, it may be desired to compare an IP resulting from scanning a finger print (typically 1 Kilobyte for certain combinations of features defined in fingerprint technology) to a library of RPs (all scan results on file). To do the job quickly, elements of each RP may be compared in parallel with elements in the IP. Each RP may have repeating substrings (short patterns) which are smaller patterns embedded within the RP. Since a library of RPs may be quite large, the processing required may be considerable. It would be desirable to have a way of reducing the amount of storage necessary to hold the RPs. If the amount of data used to represent the RPs could be reduced, it may also reduce the time necessary to load and unload the RPs. Parallel processing may also be used where each one of the RPs and the IP are loaded into separate processing units to determine matches.
Other pattern recognition processing in biological systems may require the comparison of an IP to a large number of stored RPs that have substrings that are repeated. Processing in small parallel processing units may be limited by the storage size required for the RPs. Portable, inexpensive processing systems for chemical analysis, biological analysis, etc., may also be limited by the amount of storage needed to quickly process large numbers of RPs.
Pattern detection or recognition is a bottleneck in many applications today and software solutions cannot achieve the necessary performance. It is desirable to have a a hardware solution for matching patterns quickly that is expandable. It is also desirable to have a system that allows multiple modes of pattern matching. Some applications require an exact match of a pattern in an input data stream to a desired target pattern. In other cases, it is desirable to determine the longest match, the maximum number of characters matching, or a “fuzzy” match where various character inclusions or exclusions are needed.
There is, therefore, a need for a method and system for pattern detection that is mode programmable and comprises a large number of parallel processing units that is expandable to enable variable pattern lengths to be matched as well as allowing additional processing units to be added to increase matching speed.
SUMMARY OF THE INVENTION A parallel pattern detection engine comprises a large number of small processing units building blocks that perform pattern detection. These processing units have multiple modes of pattern detection and each has memory for storing patterns. A large parallel interface bus supplies input data to an input buffer. Selected input data is coupled in parallel to each processing unit so that each processing unit compares, in parallel, input data received on the parallel bus. Each processing unit is also coupled to a parallel address bus so that detection pattern data may be selectively loaded into each processing unit. The addresses of each processing unit comprises its identification (ID). Processing units that generate a match of its detection pattern to a pattern in the input data, according to its specific mode, have their corresponding ID forwarded by an ID selection unit to an output buffer. The data from the output buffer is coupled to the interface bus for sending results of a particular input data stream comparison. Each of the processing units has a “chaining” or cascading communication interface that allows a particular processing unit to be coupled to its corresponding adjacent processing units for generating larger detection patterns greater than a single PU can handle or for determining specific types of pattern matching. The parallel input bus structure and the chaining communication interface facilitate adding groups of the parallel pattern detection engines to create larger systems for speed or for handling a larger number of patterns or for both purposes.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram of the architecture of a parallel pattern detection engine (PPDE) according to embodiments of the present invention comprising N processing units;
FIG. 2A-2D are block diagrams of four matching modes which may be programmed for each of the N processing units (PUs) ofFIG. 1;
FIG. 3 is a chart illustrating the various modes of scalability of the PPDE of the present invention;
FIG. 4 is a flow diagram of methods steps used in embodiments of the present invention;
FIG. 5 is an overview block diagram of an individual PU according to embodiments of the present invention;
FIG. 6 is a detailed block diagram of an individual PU according to embodiments of the present invention;
FIG. 7 is a detailed block diagram of a PU architecture;
FIG. 8 is a circuit diagram of a specific implementation of a single PU;
FIG. 9 is a flow diagram of method steps in embodiments of the present invention;
FIG. 10 is a data processing system suitable for practicing embodiments of the present invention; and
FIG. 11A-11E illustrate operation in various modes of pattern matching according to embodiments of the present invention.
DETAILED DESCRIPTION In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. However, it will be obvious to those skilled in the art that the present invention may be practiced without such specific details. In other instances, well-known circuits may be shown in block diagram form in order not to obscure the present invention in unnecessary detail. For the most part, details concerning timing, data formats within communication protocols, and the like have been omitted inasmuch as such details are not necessary to obtain a complete understanding of the present invention and are within the skills of persons of ordinary skill in the relevant art.
Refer now to the drawings wherein depicted elements are not necessarily shown to scale and wherein like or similar elements are designated by the same reference numeral through the several views.
Sequential matching of a data stream in software is currently a central processing unit (“CPU”) intensive task. Thus, high performance is difficult. The pattern matching processing unit (hereafter PU) architecture can provide high performance matching because it is a piece of hardware dedicated to pattern matching. The PU provides more efficient searching (matching) because every input pattern is being matched in parallel to a corresponding target pattern. Parallel matching is possible because a virtually unlimited number of the PUs may be cascaded. Additionally, each PU has built-in functionality that can reduce the number of necessary PUS by incorporating modes that allow matching comprising wild cards (don't cares in the target pattern), multiple wildcards, and inverse operations. The PU architecture's fast pattern detection capabilities are useful in network intrusion detection, database scanning, and mobile device security applications. Additionally, with their built-in distance computation, “fuzzy” pattern detection may be implemented which are particularly useful in image processing and life sciences applications.
FIG. 5 is an overview block diagram of aPU500 according to embodiments of the present invention.PU500 receives inputs from identification (ID)bus501,control bus502 andinput data bus503. The inputs of the buses are buffered inID register509,control register505 and input data register504. Control data fromcontrol register505 is coupled to controllogic circuitry508 which also receives data frommemory507. Input data from input data register504 is coupled tomemory507,address circuitry506, maskingcircuitry510.Address circuitry506 couples addresses tomemory507.Address circuitry506 also couples to maskingcircuitry510 andoutput circuitry512.Output circuitry512 receives data fromID register509,address circuitry506 anddistance circuitry511 and selectively couples data tooutput bus513.
FIG. 6 is another more detailed block diagram ofPU500 according to embodiments of the present invention. Blocks shown inFIG. 5 are repeated for clarity.PU500 receives inputs from identification (ID)bus501,control bus502 andinput data bus503. The inputs of the buses are buffered inID register509,control register505 and input data register504.Memory507 is a register array having fields forpattern data601 and operation codes (Opcodes)602.Memory507 stores patterns that are being compared to input data.Opcodes602 define what type of pattern compare is being executed.Opcodes602 and control bits fromcontrol register505 are coupled to controllogic circuitry508.Pattern data601 are coupled to mask register603 inmask circuitry510. Outputs ofmask register603 are combined in logic AND605 to generate inputs to componentdistance computation unit610 indistance circuitry511. Likewise outputs ofmask register603 are combined in a logic AND606 to form inputs todata selector604.Data selector604 selects between input data frominput register504 and addresses fromaddress register614 to provide inputs to componentdistance computation unit610. Address register614 couples address tomemory507. Componentdistance computation unit610 couples outputs to Patterndistance computation unit611. Present distance computation results are stored indistance register612. The present distance computation result is coupled back to patterndistance computation unit611 and to comparecircuitry607. The output ofdistance register612 is compared to the value stored in the final distance register to generate theoutput GT615. GT Stand for “greater than” and this signal is set active when the value stored in the final distance register is greater than the value stored in the distance register. The final distance value in store infinal distance register608 is selected from either input register504 ordistance register612 indistance selector609.
EachPU500 has limited memory to storepattern data601. If a pattern is long, it is possible to mergeseveral PU500 units for storing a long sequence ofpattern data601. For example if twoPU500 are used, then during the beginning of a pattern detection phase, thememory507 of the first of the twoPU500 units is used. The address pointer of thefirst PU500 is modified according to the matching mode and theoperation codes602. When the address pointer reaches its last memory position alast signal650 is sent to the second of the twoPU500 units in order to continue the matching process using the remainder of thepattern data601 stored in thesecond PU500. Control data oncontrol bus502 is used to initialize thesecond PU500, in this case, so that it only starts matching when it receives the “last”signal650 from thefirst PU500. Also in this case, if a “reload” pointer address is indicated during the matching process, the address pointer of both of the twoPU500 units used for the long sequence ofpattern data601 must be updated. This is accomplished by sending a “reload”signal651 to the appropriate PU500 (containing theinitial pattern 601 bytes). Since the number of bytes in a sequence ofpattern data601 is not specifically limited, more than twoPU500 units may be used in the manner discussed. Again initialization control data oncontrol bus502 configures aPU500 to execute as an independent PU or as a cascade PU.
When the matching mode is a “fuzzy” match, patterndistance computation unit611 calculates a present distance value stored indistance register612. If two ormore PU500 units are used in cascade to storepattern data601 used for a fuzzy match, then the distance value is sent ondistance signal652 to thenext PU500 in a cascade so that a final distance value may be determined and stored infinal distance register608 of thelast PU500 in a cascade.
FIG. 7 is a block diagram of more details ofcircuitry PU500. Patterns to be compared are preloaded into memory (register file)507 as bytes wherein each bit is stored as 8 bits in bits [11:4]. EachOpcode602 is stored in bits [3:0]. Aninput data stream750 are compared to stored bytes inmemory507 as determined by readaddress614. Compare anddistance unit511 computes a distance for the compare operation.Match logic709 generates logic signals that are coupled to reloadlogic710,increment logic711 or holdlogic712. Various types of matching are possible as determined byOpcodes602 stored with each byte of the pattern inmemory507. Depending on theOpcode602 and the results of the compare in compare anddistance unit511, the logic in reloadlogic710,increment logic711 and holdlogic712 determine whether to hold the present read address, increment the present read address to the next value or reload the read address to its initial value to start comparing at the beginning of the pattern.Select line logic705 is enabled by activatelogic713 via activatesignal730. Depending on the output logic states of reloadlogic710,increment logic711 and holdlogic712, one of the inputs to multiplexer (MUX)704, hold723,increment722 or reload721 will be a logic one thereby selectinginput703,702 or701 respectively. Increment by one714 adds one to the present read address and generatesinput702. The present read address is coupled intohold703 and the first address in the pattern is coupled from714.Register614 was loaded with the first address in the pattern under control ofOpcodes602.Packet reset signal751 resets the read address. Ifactive signal706 is a logic zero, then selectline logic705 is degated and all the inputs hold703,increment702 and reload701 are a logic zero andMUX704 is degated. To allow cascading of multiple PUs (e.g., PU500), thesignal730, andID707 are coupled to the next PU. Likewise,PU500 receivesID752 andactive signal753 from a preceding PUJ. Activatelogic713 is coupled to the previous PU bysignal line790.
FIG. 8 is a more detailed circuit diagram of circuitry ofPU500.FIG. 8 illustrates a more detailed circuitry for select line logic705 (AND gates760-762), reload logic710 (ORgate763 and AND gates764-765), increment logic711 (OR gate766 and AND gates767-769) and hold logic712 (AND gate770). Inverters780-784 serve to generate the complement of theOpcode602 signals.
The following description may refer betweenFIGS. 5, 6,7, and8 as these illustratePU500 in various degrees of detail.
The fast pattern match technology utilizes local memory (e.g., register array507) in eachPU500 which contains apattern601 and flag bits (Opcodes602) that specify options. These options may include a single wildcard, multiple wildcard, last, and inverse matching operations. A single wildcard matching means that a match is indicated if the byte having the singlewildcard matching Opcode602 set matches the current byte in an input stream. A multiple wildcard matching means that a match is indicated if an indeterminate number of bytes in sequence not match the byte with themultiple wildcard Opcode602. Inverse matching means a match is indicated if every byte except the byte with theinverse Opcode602 matches a byte in an input stream.Last Opcode602 means that the byte is the last byte in a pattern.
Global registers includeID register509, readaddress register614,control register505 and registers inregister array507. Additional global registers,active register706,match register708 and select register (not shown) may be used to designatePU500 as active, matched, or selected for writing configuration data. The ID of aPU500 is an ID that is unique across a chip containing multiple PUs and is used to identify what pattern has been detected in a data stream being coupled in parallel to more than onePU500. Thecounter714 is used to index through the storedpattern601 for comparison tobytes801 in an input data stream (from input bus503) and the comparator (not shown) in compareunit511 compares thepattern601 with theinput data801 one byte at a time.
WhenPU500 comes online, all registers are initialized to zero (reset).Next PU500 receives unique ID from theinput bus503 which is stored inID register509.PU500 then waits until it receives additional commands. The first command is a select command which activatesPU500 to receive further configuration commands that apply toPU500 only. At this point the global registers may be loaded. Bytes of data are sent to theregister array507 which include thepattern data601 and thecorresponding Opcode data602. When the configuration is complete and theactive register706 is set to “active”,PU500 waits for the packet reset signal802 to enable theread address614. This indicates that a new input packet is being sent to thePU500 to begin the matching phase.
During the matching phase, one byte is sent toPU500 at each clock cycle.PU500 compares the byte stored (601) in the current register array position (determined by the address614) inregister array507 with the input byte ininput register504 and checks the Opcode (602) for the byte in the current register array position of the pattern stored in601. If there is a match or theOpcode602 is set to a single wild card match, the pointer is incremented to select the next read address inaddress register614. If theOpcode602 for the current byte inpattern601 is set to multiple wildcard, the pointer to addressregister614 holds its current value. If a match was not found, then the pointer is reloaded. This process continues until the pointer is at the last position of a pattern and a match occurs. At this point, thematch register708 is set inPU500. The final phase of the process is to report the found match. If thematch register708 is set, theoutput logic circuitry512 sends the ID ofPU500 to theoutput bus513.
FIG. 1 is a block diagram of a parallel pattern matching engine (PPDE)100 integrated circuit (IC) architecture.PPDE100 provides multiple mode pattern matching and has a highly flexible, massively parallel architecture.PPDE100 can perform exact, fuzzy, longest and maximum character pattern matching. Some of the possible applications that can benefit from the capabilities ofPPDE100's high performance pattern matching are: network intrusion detection, database search image processing, lossless compression, and real-time data processing (sound, EKG, MRI, etc.). The architecture ofPPDE100 is highly flexible and scalable and may be adapted to specific applications.
PPDE100 is an IC comprisingmultiple PU500 units and other logic functions. Input/output (I/O) interface101couples PPDE chip100 to system functions. I/O interface101couples 64 bits of input data toIC input bus120 which in turn couples to inputbuffer103. Data is written intoinput buffer103 in locations determined bywrite address102. Data is read frominput buffer103 using readaddress108. Data is read frominput buffer103 in 8 bit bytes using multiplexer (MUX)115 controlled byselect line logic109.Input bus503 is coupled to each of theN PU500 units. I/O interface101 also couples control data toglobal control107 which sends 24 bits of ID data onID bus501 and 4 bits of control data oncontrol bus502 to eachPU500 unit (PU1-PUn).
FIG. 9 is a flow diagram of method steps in pattern matching using aPU500 according to embodiments of the present invention. Instep901, a packet reset is received indicating that configurations of thePU500 is complete and a new packet (input pattern) is being sent to the PU and it should begin the matching process. In step902 a first pattern byte of the pattern is retrieved. Instep903, the first pattern byte is compared to the first byte in the input data stream and a test is done to determine if they compare. The first pattern byte is indicated by an address pointer (pointer). If there is a compare instep903, then a test is done instep910 to determine ifOpcode602 is set to “match” for the present pattern byte (in this first pass it is the first pattern byte). If theOpcode602 is set to “match”, then the pointer is incremented by one to move to the next pattern byte as this is a desired result. IfOpcode602 for the present pattern byte is not set to “match”, then instep911Opcode602 is tested to determine if it is set to “inverse”. IfOpcode602 is set to “inverse”, then this is not a desired result and the pointer is reloaded back to the first pattern byte instep913 if it is not already there. A branch is then taken back tostep902. IfOpcode602 is not set to “inverse” instep911, thenOpcode602 is tested to determine if it is set to “last” indicating the pattern byte is the last byte in the pattern. IfOpcode602 is not set to “last” instep912, then the pointer is incremented instep914 and a branch is taken back tostep902. IfOpcode602 is set to “last” instep912, then the pointer is “frozen” and a branch is taken back to step901 awaiting a new packet reset to restart match processing.
If the pattern byte and the input data byte do not compare instep903, then in step904 a test is done to determine ifOpcode602 is set to “match” for the pattern byte. IfOpcode602 is set to “match” instep904, then this is not a desired result and the pointer is reloaded back to the first pattern byte instep913 if it is not already there. A branch is then taken back tostep902. IfOpcode602 is not set to “match” instep904, then a test is done instep905 to determine ifOpcode602 is set to “inverse”. IfOpcode602 is set to “inverse” instep905, then this is a desired result and the pointer is incremented instep914 and a branch is taken back tostep902. IfOpcode602 is not set to “inverse” instep905, then a test is done instep906 to determine ifOpcode602 is set to “wildcard”. IfOpcode602 is set to “wildcard” instep906, then this is a desired result and the pointer is incremented instep914 and a branch is taken back tostep902. IfOpcode602 is not set to “wildcard” instep906, then a test is done instep907 to determine ifOpcode602 is set to “multiple wildcard”. IfOpcode602 is set to “multiple wildcard” instep907, then the pointer is held instep908 and a branch is taken back tostep902. IfOpcode602 is not set to “multiple wildcard” instep907, then instep909 the pointer is reloaded and a branch is taken back tostep902.
The operations discussed relative toFIG. 9 are called regular expression matching. These regular expressions are used within matching modes used by the PPDE incorporatingmultiple PU500 units according to embodiments of the present invention.
FIGS. 11A-11F actions taken relative to apattern601 when comparing to aninput data stream750.FIG. 11A illustrates three clock cycles of thecase1100 whereinput data750 is “AAC” being compared topattern data601 as “ABC” where each pattern byte has anOpcode602. Theactions1101 are taken in response to theOpcodes602. Inclock cycle1,pointer614 starts at the byte (“A”) inpattern601. The first byte ofinput data750 is also an “A”.Opcode602 for the first byte inpattern601 is set to “match”. Since the first byte ofinput data750 andpattern601 compare andOpcode601 is set to “match”, the pointer is incremented moving to the second byte inpattern601 which is a “B”. This happens in one clock cycle, therefore, in the second clock cycle (labeled1102 because it is significant to the particular pattern inFIG. 11A), the second byte in input pattern750 (“A”) is compared to the second byte in pattern601 (“B”). TheOpcode602 for the second byte ofpattern602 is set to “match”. Since these two bytes do not compare, the sequence “AB” inpattern601 cannot match the first two bytes “AA” ofinput data750 as required by theOpcode602. Therefore, in clock cycle2 (1102),pointer614 is reloaded with the address of the first byte inpattern602 and comparison begins again. Inclock cycle3, the third byte ininput data750 is compared to the first “A” inpattern602.
FIG. 11B illustrates thecase1110 where the bytes sequence ofinput data stream750 as “CDE” does matchpattern602 as a “CDE” but anOpcode602 on one of the pattern bytes is set to “inverse” indicating that a match between a byte ininput data750 and a byte inpattern601 is not desired. Inclock cycle1, the first “C” ininput data750 matches the “C” inpattern601 and theOpcode602 is set to “match”. Since this is a desired result thepointer614 is incremented and the second byte (“D”) ofinput data750 is compared to the second byte (“D”) ofpattern601 and these bytes do compare. However, theOpcode602 is set to “inverse” and a match is not desired, therefore in clock cycle2 (1103) thepointer614 is reloaded and the first byte ofpattern601 is again selected. Inclock cycle3, the third byte “E” ininput data750 is compared to the first byte “C” ofpattern601. The example ofFIG. 11B is “looking” for an input sequence “C!DE” where the “!D” indicates any character but not “D” is acceptable.
FIG. 11C illustratescase1120 where acomplete pattern601 is shown with anOpcode602 set to “last”. Inclock cycle1, the first byte “F” ininput data750 matches with the first byte “F” inpattern601 andOpcode602 is set to “match”. Since this is a correct result,pointer614 is incremented. Inclock cycle2, the second byte “G” ininput data750 matches with the second byte “G” inpattern601 andOpcode602 is set to “match”. Again,pointer614 is incremented as this is a correct result. In clock cycle3 (1104), the third byte “H” ininput data750 matches the third byte “H” inpattern601. In this case,Opcode602 is set to “last” indicating that the third byte is the last byte in a complete pattern601 (in this case “FGH”). In this case the pattern “FGH” is detected ininput data750 and a match signal can be assert. Since there isadditional input data750,pointer614 is reloaded back to the first byte inpattern601 and the matching process continues “looking” for additional occurrences of the complete pattern “FGH” in succeeding bytes ofinput data750.
FIG. 11D illustratescase1130 where apattern601 byte hasOpcode602 set to “inverse” and the bytes do not compare. Inclock cycle1, the first byte “I” ininput data750 matches the first byte “I” inpattern601 and theOpcode602 is set to “match”. Since this is a desired result, thepointer614 is incremented and the second byte (“J”) ofinput data750 is compared to the second byte (“I”) ofpattern601 and these bytes do not compare. However, theOpcode602 is set to “inverse” and no match is a desired result; therefore, in clock cycle2 (1105), thepointer614 is incremented and the third byte “K” ofpattern601 is again selected. Inclock cycle3, the third byte “K” ininput data750 is compared to the third byte “K” ofpattern601. Again, a match is detected and thepointer614 is incremented. The example ofFIG. 11D is “looking” for an input sequence “I!JK” where the “!J” indicates any character but “J” is acceptable.
FIG. 11E illustratescase1140 wherepattern601 matches a sequence ininput data750 and theOpcodes602 are set to “match”. Inclock cycle1,pointer614 starts at the byte (“L”) inpattern601. The first byte ofinput data750 is also an “L”.Opcode602 for the first byte inpattern601 is set to “match”. Since the first byte ofinput data750 andpattern601 compare andOpcode601 is set to “match”, thepointer614 is incremented to the second byte inpattern601 which is an “M”. In the second clock cycle, the second byte in input pattern750 (“M”) is compared to the second byte in pattern601 (“M”). TheOpcode602 for the second byte “M” ofpattern602 is set to “match”. Since these two bytes compare, thepointer614 is again incremented. Inclock cycle3, the third byte “N” ininput data750 is compared to the third byte “M” inpattern602. Since they compare, the pointer is again incremented.FIG. 11E illustrates a partial match of “LMN” inpattern601 to the sequence “LMN” ininput data750.
FIG. 11F illustratescase1150 where there is NOT a pattern match and the wildcard Opcode is set for a byte in thepattern601. Inclock cycle1, the “O” ininput data750 matches with the “O” inpattern601. Since theOpcode602 is set to “match”, thepointer614 is incremented. Inclock cycle2, second byte “O” ofpattern601 does not match the “P” in the second byte ofinput data750. However, sinceOpcode602 is set to “wildcard”, any character is accepted andpointer614 is again incremented. Inclock cycle3, the third byte “Q” ofpattern601 matches the third byte “Q” ininput750 andpointer614 is incremented. In this case, the sequence “O·Q” is found where “·” indicates any character.
FIG. 11G illustratescase1160 where there is not a pattern match and a byte ofpattern601 has theOpcode602 set to “multiple wildcard” (shown as simply “multiple”). Inclock cycle1, the first byte “T” inpattern601 does not match the first byte “R” ininput data750. However, sinceOpcode602 is set to “multiple”, thepointer614 is held at its present position (in this case, first byte of pattern601). Inclock cycle2, the first byte “T” ofpattern601 does not compare with the second byte ininput data750. SinceOpcode602 remains set to “multiple”, thepointer614 is held at the first byte ofpattern601. Inclock cycle3, the first byte “T” ofpattern601 does compare with the third byte ofinput data750 andpointer614 is incremented to the second byte ofpattern601. Inclock cycle4, the second byte ofpattern601 does compare with the fourth byte ofinput data750 and thepointer614 is again incremented. In clock cycle5 (not shown), the third byte ofpattern601 matches the fifth byte ininput data750 and the pattern “TUV” is detected ininput data750.
ThePPDE100 has four matching modes: exact, longest, maximum and fuzzy. Exact matching may be used for aligned or non-aligned data and may incorporate the regular expressions such as single wildcard, multiple wildcard, inverse, or inclusive set. The exact matching mode may be utilized in applications such as network intrusion where line speed matching is critical and a binary match or not match response is only needed.
In the longest match mode, eachPU500 unit keeps track of the number of consecutive bytes matched and does not reset until the end of a pattern packet. In the longest match mode, eachPU500 outputs the number of matched bytes along with its ID to the ID selection unit114 (FIG. 1A).ID selection unit114 then outputs the ID of thePU500 with the maximum number of matched bytes along with the length value of the longest match to theoutput buffer105.
In the maximum matching mode, eachPU500 keeps track of the number of bytes matched and does not reset until the end of a pattern packet. In this mode, eachPU500 outputs the number of matched characters along with its ID to theID selection unit114. TheID selection unit114 then outputs the ID of thePU500 with the maximum number of matches and the value of the maximum number to theoutput buffer105.
In the fuzzy matching mode, eachPU500 “looks” for the closed pattern and then outputs the ID of thePU500 with the closest match and a corresponding distance value quantifying the closeness of the match toID selection unit114 which in turn outputs the results to theoutput buffer105. The distance is the result of a comparison between the input Pattern and the Reference pattern (RP) previously stored in memory. The distance calculation method is based on a norm that is user selectable. Several norm can be used, the norm can uses the “absolute value of a difference” operator. The successive elementary distances can be summed in the case of the Manhattan distance, i.e. dist=sum (abs (IEi-REi)) or the maximum value thereof is selected in the case of the maximum norm to determine the final distance. i.e. dist=max (abs (IEi-REi)) where IEi (Input Element) and REi (Reference Element) are the components of rank i (variable i varies from l to k) for the input pattern IP and the stored prototype Reference pattern RP respectively. Note that “abs” is an usual abbreviation for “absolute value”. Other norms exist, for instance the L2 norm such as dist=square root (sum (IEi-REi)2. The L2 norm is said to be “Euclidean” while the Manhattan and maximum norms are examples of “non-Euclidean” norms. Other Euclidean or non-Euclidean norms (such as the match/no match) are known for those skilled in the art. In particular, the “match/no match” norm, represented by the “match (IEi, REi)” operator is extensively used. The closest match is the pattern with the lowest result. Fuzzy matching is useful in image processing and real time data processing where the input data stream may have white noise superimposed on data.
FIG. 2A illustrates an example of theexact matching mode200 using aPPDE100 according to embodiments of the present invention.Patterns203 correspond toID numbers205 numbered l-n and identifyn PU500 units incorporated into aPPDE100.Input pattern201 would be sent in parallel to each of then PU500 units. In this mode,PPDE100 is programmed to find if any of the n patterns are found ininput data stream201. By inspection, one can see that only pattern “4” is found in its exact sequence in the portion ofinput data stream201 shown. In this case, the ID of thePU500 with the exact match (in this case, “4” is the ID) would be outputted (output204) to ID selection unit114 (not shown) which would send the value to output buffer105 (not shown).
FIG. 2B illustrates an example of thelongest match mode220 using aPPDE100 according to embodiments of the present invention. Again,input data stream201 is coupled in parallel ton PU500 units withID numbers205 numbered l-n. In this mode,PPDE100 is programmed to determine the most consecutive bytes in thepatterns213 that appear ininput data stream201. Again, by inspection one can see that pattern “4” has the longest match with 5 consecutive bytes “ABCDE” appearing in theinput data stream201. In this case, the ID of thePU500 with the longest match (in this case, “4” is the ID) would be outputted (output204) along with the longest match value of “5” (output206) to ID selection unit114 (not shown) which would send the value to output buffer105 (not shown).
FIG. 2C illustrates an example of themaximum match mode230 using aPPDE100 according to embodiments of the present invention. Again inputdata stream212 is coupled in parallel ton PU500 units withID numbers205 numbered l-n. In thismode PPDE100 is programmed to determine the maximum number of bytes in thepatterns223 that appear ininput data stream212 not necessarily in consecutive order. Again, by inspection one can see that pattern “4” has the maximum number with 5 matching bytes “ACYEF” appearing in theinput data stream212. In this case, the ID of thePU500 with the maximum number of matches (in this case, “4” is the ID) (output204) along with the maximum number value of “5” (output206) are outputted to ID selection unit114 (not shown) which would send the value to output buffer105 (not shown).
FIG. 2D illustrates an example of thefuzzy match mode240 using aPPDE100 according to embodiments of the present invention.Input data stream222 is coupled in parallel ton PU500 units withID numbers205 numbered l-n. In this example,input data stream222 is an analog signal which would be digitized and each 8 bit input value would be sent to then PU500 units in parallel. In this mode,PPDE100 is programmed to determine which of thepatterns233 most closely matchesinput data stream222. Again, by inspection one can see that pattern “4” has the closest match. In actual operation, distance circuitry611 (not shown) would be used to make this determination. In this case, the ID of thePU500 with the closest match (in this case, “4” is the ID) (output204) along with the distance value of “10” (output206) would be outputted to ID selection unit114 (not shown) which would send the value to output buffer105 (not shown).
FIG. 3 is a block diagram illustrating the scalability ofPPDE100. The architecture ofPPDE100 allows for multiple chips to be cascaded. This feature may be used to either increase the number of processing units or to increase the performance by splitting the input data amongst the several chips.FIG. 3 illustrates the direct correlation between the number of chips and the number ofPU500 units.Block303 shows the standard performance of onePPDE100 chip. AsPPDE100 chips are added (by cascading) along the X axis the performance increases. Also, as the number ofPU500 units perPPDE100 chip are added along the Y axis, the performance increases.Block301 illustrates that by adding 4 chips (1500PU500 units) processing is increased to 8 Gb/sec for 1500 patterns.Block304 illustrates using 4 chips to increase the number of patterns while maintaining the processing speed of 2 Gb/sec.Block302 illustrates adding 5 groups of 4 chips coupled to process 6000 patterns to allow a system that can process 6000 patterns at 10 Gb/sec.
FIG. 4 is a flow diagram of method steps used in embodiments of the present invention. Instep401, an N sequences of pattern data are loaded into M processing units (PUs)500 for comparing pattern data to input data. Instep402, identification data (ID) determining a unique ID for eachPU500 is loaded into eachPU500. Instep403, match mode data is loaded into eachPU500 setting criteria for indicating a match has occurred between selected pattern data and input data. Instep404, the first input data is sent in parallel to each of theM PU500 units. Instep405, the first input data is compared to selected pattern data selected by an address pointer in thePU500 units thereby generating a compare output signal in each of theM PU500 units in the same clock cycle. Instep406, the address pointers in eachPU500 unit is modified in response to a logic state of the compare output signal and an operation code stored with the selected pattern data in each PU unit. Instep407, a match ID is selected from the ID data in response to a pattern match signal indicating selected pattern data has been detected in the input data. Instep408, the match ID and the match data corresponding to the match ID are stored. Instep409, a test is done to determine if all the input data has been processed. If the result of the test instep409 is NO, then instep411 additional input data is sent to thePU500 units in parallel. If the result of the test instep409 is YES, then instep410 the process is stopped.
A representative hardware environment for practicing the present invention is depicted inFIG. 10, which illustrates a typical hardware configuration of a workstation in accordance with the subject invention having central processing unit (CPU)1034 with one PPDE or a plurality ofPPDEs100 chips and other units interconnected viasystem bus1012. The workstation shown inFIG. 10 includes random access memory (RAM)1014, read only memory (ROM)1016, and input/output (I/O)adapter1018 for connecting peripheral devices such asdisk units1020 andtape drives1040 tobus1012,user interface adapter1022 for connectingkeyboard1024,mouse1026,speaker1028,microphone1032, and/or other user interface devices such as a touch screen device (not shown) tobus1012,communication adapter1035 for connecting the workstation to a data processing network, anddisplay adapter1036 for connectingbus1012 to displaydevice1038. Input data120 ( input data stream, pattern data, and various control data) may be provided to thePPDE100 chips inCPU1034 from varioussources including network1041,disk unit1020, tape drives1040 or form various input devices such asmicrophone1032,keyboard1024, etc. Other input devices, such as fingerprint readers and voice recognition units, may provide input data streams that are matched against stored patterns using one ormore PPDE100 chips according to embodiments of the present invention.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.