FIELD OF THE DISCLOSUREThis disclosure relates generally to the field of network processing. In particular, the disclosure relates to a novel filter architecture to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection.
BACKGROUND OF THE DISCLOSUREIn modem networks, applications such as intrusion detection/prevention and virus detection are important for protecting the networks and/or network users from attacks. In such applications network packets are often inspected to identify problematic packets by finding matches to a known set of data patterns. Matching every byte of an incoming data stream against a large database of patterns (e.g. up to hundreds of thousands) is very compute-intensive. Programs have used techniques such as finite-state machines and filters to find matches to known sets.
A Bloom filter, conceived by Burton H. Bloom in 1970, is a probabilistic structure for determining whether an element is a member of a set. Hashing is performed on the element. Multiple different hash functions are used to generate multiple different hash indices into an array of bits. To add or insert an element into the set, these hash functions are used to index multiple bit locations in the array for the element and these bit locations are then set to one. To query the filter for an arbitrary element the hash functions are used to index multiple bit locations in the array for the element and these bit locations are then checked to see if they are all set to one. If they are not all set to one, the arbitrary element in question is not a member of the set.
Whenever a filter generates a positive outcome for an element, which is not actually a member of the set, the outcome is called a false positive. The Bloom filter will not generate a false negative. It is a goal of any particular filter design, that the probability of false positives is “small.” For Bloom filters, after inserting n elements into a set represented by an array of m bits using k different hash functions, the probability of a false positive is (1−(1−1/m)kn)k.
Designing a filter for a specific problem may be tedious, and at high data rates it is difficult or impossible for state-of-the art processors to implement the design at rates even close to line-rate. To achieve rates close to one or more gigabits per second, specialized field-programmable gate array solutions or custom circuits have been proposed.
To date, more generalized reconfigurable architectures to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection have not been fully explored.
BRIEF DESCRIPTION OF THE DRAWINGSThe present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings.
FIG. 1 illustrates one embodiment of a filter apparatus to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection.
FIG. 2 illustrates a flow diagram for one embodiment of a process to initialize a filter apparatus for string matching in packet inspection.
FIG. 3 illustrates a flow diagram for one embodiment of a process to utilize a filter apparatus for string matching in packet inspection.
FIG. 4 illustrates one embodiment of a system employing a filter apparatus to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection.
DETAILED DESCRIPTIONMethods and apparatus to perform string matching for network packet inspection are disclosed below. In some embodiments, a filter apparatus may be configured as a set of string matching slice circuits, each slice circuit of the set being configured to perform string matching steps in parallel with other slice circuits. Each slice circuit may include an input window storing some number of bytes of data from an input data steam. The input window of data may be padded if necessary, and may be multiplied by a distinct Galois-field polynomial modulo an irreducible Galois-field polynomial to generate a hash index. A storage location of a memory slice corresponding to the hash index may be accessed to generate a slice-hit signal of a plurality of slice-hit signals. The slice-hit signal may be provided to an AND-OR logic array where the plurality of slice-hit signals is logically combined into a match result.
Embodiments of such methods and apparatus represent reconfigurable architectures to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. These and other embodiments of the present invention may be realized in accordance with the following teachings and it should be evident that various modifications and changes may be made in the following teachings without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense and the invention measured only in terms of the claims and their equivalents.
FIG. 1 illustrates one embodiment of afilter apparatus101 to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection.Filter apparatus101 as shown includes aninput data stream120, which may be in a system memory or may comprise an optional data stream buffer offilter apparatus101 for storing packed data for inspection and/or a pattern database to initializefilter apparatus101.Filter apparatus101 also includes a set of H (e.g. 1-8) slice circuits110-150, each ithslice circuit of the set is configurable for providing an ithslice-hit signal to a configurable AND-OR logic array140 as one of a set of H slice-hit signals. Slice circuits110-150, respectively include input windows111-151 each configurable to store Wi(e.g. 2-8) bytes of data frominput data steam120, and Ghash units112-152 coupled with input windows111-151 and configurable to receive the Wibytes of data, to pad the Wibytes of data if necessary, and to multiply their respective Wibytes of data by a polynomial modulo an irreducible Galois-field polynomial to generate an index.
It will be appreciated that some embodiments offilter apparatus101 may use the same irreducible Galois-field polynomial in each of the Ghash units112-152 with H distinct polynomial multipliers selected at random (each having a good mixture of 1's and 0's) to generate H distinct hash indices, thus simplifying the task of generating distinct hash indices for each Ghash unit. It will also be appreciated that in embodiments offilter apparatus101 where, unlike the Bloom filter, input windows111-151 are independently configurable to store Wibytes of data frominput data steam120, thefilter apparatus101 may be used to solve multiple problems of different sizes (e.g. a 2-byte match, a 3-byte match, a 6-byte match, and an 8-byte match, etc.) at the same time in parallel.
Slice circuits110-150, respectively, also include memories113-153 coupled with the Ghash units112-152 and configurable to access respective storage locations responsive to their respective indices (e.g. at the addresses specified by some field of bits from respective indices) to each generate an ithslice-hit signal and to provide the an ithslice-hit signal to AND-OR logic array140 as one of the set of H slice-hit signals115-155. Some embodiments of memories113-153 are configurable from alarger memory130 to serve as individual memories113-153 for slice circuits110-150 respectively. Some alternative embodiments of memories113-153 may be N-entry (e.g. 1K entries) read/write random-access memories (RAMs) of fixed width (e.g. 64-bits wide) and are configurable to be combined into larger memories (e.g. memory130) as necessary (e.g. when a very large set of patterns is required). Slice circuits110-150 may also include multiplexers114-154, respectively, configurable to access respective bit storage locations responsive to portions of their respective indices to generate the ithslice-hit signal and to provide the ithslice-hit signal to AND-OR logic array140 as one of the set of H slice-hit signals115-155.
AND-OR logic array140 is configurable to receive a set of H slice-hit signals115-155 and to combine the set of H slice-hit signals115-155 into amatch result145, a copy of which may be stored as a match result185. Some embodiments of AND-OR logic array140 may be configurable to perform a simple AND (e.g. as in a Bloom filter) or a simple OR (e.g. as in solving multiple problems of different sizes in parallel) of the set of H slice-hit signals115-155 to get amatch result145. Alternative embodiments of AND-ORlogic array140 may be configurable to perform a complex AND-OR of the set of H slice-hit signals115-155 (e.g. tempk=(AND slice-hit signalifor all i in a set Sk) and then the final match result=(OR tempkfor all k) ) to get amatch result145. The complex AND-OR of the set of H slice-hit signals115-155 may be used, for example, in embodiments offilter apparatus101 to provide multiple Bloom filters in parallel.
It will be appreciated that when a final match result is positive, a verification process may be used to check against false positives. Such verification process may be relatively slower than usingfilter apparatus101 and so the configuration offilter apparatus101 should be carefully made to avoid frequent false positives.
FIG. 2 illustrates a flow diagram for one embodiment of aprocess201 to initialize a filter apparatus for string matching in packet inspection.Process201 and other processes herein disclosed are performed by processing blocks that may comprise dedicated hardware or software or firmware operation codes executable by general purpose machines or by special purpose machines or by a combination of both.
In processing block211 a set of H slice circuits are configured. Inprocessing block212, i is set to zero (0). Inprocessing block213, i is incremented. Inprocessing block214, i is checked to see if it has exceeded H. It will be appreciated that even though initialization of the H slice circuits is shown as aniterative process201, in at least some preferred embodiment ofprocess201, the set of H slice circuits are configured to concurrently perform initialization according to processing blocks215-220 ofprocess201 for use in string matching during network packet inspections. Therefore, for each of the H slice circuits processing blocks215-220 are executed as follows, before proceeding to processingblock222.
In processing block215 Wibytes of data is stored from an input data steam in an ithinput window. Inprocessing block216 the Wibytes of data are padded if necessary. Then in processingblock217 the Wibytes of data are multiplied by a Galois-field polynomial modulo an irreducible Galois-field polynomial to generate an ithhash index. In processing block218 a storage location of a memory corresponding to the ithhash index is accessed, and in processing block220 an ithslice-hit signal is stored (i.e. set) in the storage location of the memory corresponding to the ithhash index. When all of the H slice circuits have completed processing blocks215-220 ofprocess201, processing proceeding to processing block222 where a pointer in the input data stream is moved (e.g. to a new string in the database). Then from processingblock224, if the data stream is empty processing terminates. Otherwise processing repeats inprocessing block212.
It will be appreciated that theprocess201 may be iterated for hundreds to hundreds of thousands of times in order to initialize a filter apparatus for string matching patterns in packet inspection. Thus when the set of H slice circuits are configured to concurrently perform initialization substantial performance improvements may be realized. It will also be appreciated that theprocess201 of initializing a filter apparatus (by setting slice-hit signals) may be performed in a manner substantially similar to a process of utilizing a filter apparatus for string matching (by reading the slice-hit signals) in packet inspection. In some embodiments of processing block222 a pointer into the input data stream may moved for each ithslice, in such a way as to provide each ithslice with a new compete pattern, whereas in utilizing a filter apparatus for string matching a pointer into the input data stream may be simply incremented.
FIG. 3 illustrates a flow diagram for one embodiment of aprocess301 to utilize a filter apparatus for string matching in packet inspection. In processing block311 a set of H slice circuits are configured. Inprocessing block312, i is set to zero (0). Inprocessing block313, i is incremented. Inprocessing block314, i is checked to see if it has exceeded H. Again, it will be appreciated that even though utilization of the H slice circuits is shown as aniterative process301, in at least some preferred embodiment ofprocess301, the set of H slice circuits are configured to concurrently perform string matching according to processing blocks315-321 ofprocess301 for use during network packet inspections. Therefore, for each of the H slice circuits processing blocks315-321 are executed as follows, before proceeding toprocessing block323.
In processing block315 Wibytes of data is stored from an input data steam in an ithinput window. Inprocessing block316 the Wibytes of data are padded if necessary. Then in processingblock317 the Wibytes of data are multiplied by a Galois-field polynomial modulo an irreducible Galois-field polynomial to generate an ithhash index. In processing block319 a storage location of a memory corresponding to the ithhash index is accessed to generate an ithslice-hit signal of a set of H slice-hit signals. Inprocessing block321 the ithslice-hit signal is provided to an AND-OR logic array as one of the set of H slice-hit signals. When all of the H slice circuits have completed processing blocks315-321 ofprocess301, processing proceeding to processing block323 where the AND-OR logic array is configured to receive the set of H slice-hit signals and to combine the set of H slice-hit signals into a match result. Then from processingblock323 processing terminates.
It will be appreciated that iterations ofprocess301 may be configured in accordance with embodiments offilter apparatus101 to substantially accelerate string matching in packet inspection.
FIG. 4 illustrates one embodiment of asystem401 employing afilter480 to accelerate string matching in packet inspection for network applications such as intrusion detection/prevention and virus detection.
System401 includes aninput data stream420, which may be insystem memory470 as shown, or may comprise an optional data stream buffer offilter480 for storing packed data for inspection and/or a pattern database to initializefilter480.
Filter480 includes a set of H slice circuits410-450, each ithslice circuit of the set is configurable for providing an ithslice-hit signal to a configurable AND-OR logic array440 as one of a set of H slice-hit signals. Slice circuits410-450, respectively include input windows411-451 each configurable to store Wibytes of data from input data steam420, and Ghash units412-452 coupled with input windows411-451 and configurable to receive the Wibytes of data, to pad the Wibytes of data if necessary, and to multiply their respective WI bytes of data by a polynomial modulo an irreducible Galois-field polynomial to generate an index.
Slice circuits410-450, respectively, also include memories413-453 coupled with the Ghash units412-452 and configurable to access respective storage locations responsive to their respective indices to each generate an ithslice-hit signal and to provide the an ithslice-hit signal to AND-ORlogic array440 as one of the set of H slice-hit signals415-455. Memories413-453 may be N-entry read/write RAMs of any fixed width and configurable to be combined into larger memories (e.g. memory430) as necessary. Alternatively some embodiments of memories413-453 may be configurable from alarger memory430. Slice circuits410-450 may also include multiplexers414-454, respectively, configurable to access respective bit storage locations responsive to portions of their respective indices to generate the ithslice-hit signal and to provide the ithslice-hit signal to AND-ORlogic array440 as one of the set of H slice-hit signals415-455. AND-ORlogic array440 may receive the set of H slice-hit signals415-455 and combine the set of H slice-hit signals415-455 into amatch result445.
System401 also includessystem processor460 to executed aprogram471 insystem memory470 to accelerate string matching in packet inspection for networkapplications using filter480, and to move or increment apointer461 intoinput data stream420 until amatch result445 is positive (in the case of string matching for packet inspections) or until an end-of-file is reached in theinput data steam420. In some embodiments ofsystem401,processor460 may check a copy ofmatch result445 stored insystem memory470 as amatch result485 when string matching for packet inspections to determine if match result445 was positive.
The above description is intended to illustrate preferred embodiments of the present invention. From the discussion above it should also be apparent that especially in such an area of technology, where growth is fast and further advancements are not easily foreseen, the invention can may be modified in arrangement and detail by those skilled in the art without departing from the principles of the present invention within the scope of the accompanying claims and their equivalents.