≠ y} . Another problem is that of overflow, whereby a data item is mapped by h (o) into an existing bucket that is already full. Ideally, therefore, hash functions should be designed to minimise the possibility of both collisions and overflow. It has been found that there are advantages to encoding groups of image sequences, as opposed to encoding individual samples. A technique known as vector quantisation (VQ) utilises this finding and offers a way of performing lossy compression along the way.

VQ is essentially the multi-dimensional generalisation of scalar quantisation, as is commonly employed in analog-to- digital conversion processes. In analytical terms, if X is an N-dimensional source vector, then VQ is a mapping such that:

Q : R™

Where C is an L-dimensional set, L < N, such that

C = { i/.-._/ i,}, and the Y.^ e #^N for i=l,...,L. C is usually termed the 'codebook', and the Y_± the 'code vectors'. The VQ operator Q partitions £^N into L disjoint and exhaustive regions {^~~_{l f} . . . P } , each of which has a single coarse-grained representation.

In multi-dimensional signal processing, X may be taken to be a pixel macroblock that is quantised under the operation Q into a finite codebook. The. latter is generated once, and a copy is provided to both the encoder and the decoder. It is then sufficient to merely store or transmit the output of the codebook in order to represent any source vector. The technique operates as a pattern matching algorithm. It is well-known in engineering literature and is an integral part of MPEG's repertoire of routines. In this exemplary embodiment of the present invention, Q is reinterpreted as the hash function h= (X) and, together with an appropriately sized two-dimensional array, enables the implementation of a hash table.

Referring to Figure 2 of the drawings, a source vector X is mapped by Q into a bucket, and occupies a unique, but arbitary, slot position. Each bucket therefore holds all the source vectors that are sufficiently close to the appropriate code vector which is their quantised representation within the source regions P_j,. With this interpretation, and assuming that there are no restrictions on the size of the hash table, it is possible to represent the entire domain of source vectors completely accurately, in spite of the fact that Q is usually a dimensionality-reduction operator. In other words, the combination of VQ and a hash table loaded in the manner described above provide a way for non-lossy representation of a source frame. This combined structure will be hereinafter referred to as a 'Vector Quantised Hash Table' or VQHT.

In order to support motion compensation, MPEG classifies video frames into three categories as follows.

1. I (ntra) -frames, which are independently coded without reference to any other frames .

2. P (redicted) -frames, which exploit motion compensation in order to improve compression. A predicted frame is coded with reference to a preceding I- or P-frame. 3. B (idirectional) -frames which rely upon both preceding and subsequent frames . Such frames use bidirectional interpolation between I- and P-frames, but are not used for coding other frames. They also have the highest compression efficiency.

Furthermore, MPEG specifies two parameters, N and M, which keep a count of the frame distance (i.e. number of frames) between, respectively, two successive I-frames (also-know -as GOP or 'Group of Pictures') and two successive P-frames. Typically, the boundary between GOPs is dictated by a scene cut; hence, N is a function of the number of such cuts in a video. M, however, is not defined by MPEG, and is left to the discretion of the encoder.

The generation of P-frames is crucial for efficient coding, but is also the most expensive part of MPEG, since motion estimation is directly involved. The decoding process uses a macroblock and a motion vector to reconstruct a P-frame, based on a closest match search of the preceding frame. Note that the use of the word 'preceding' does not imply frame adjacency, since B-frames typically interleave I- and P-frames.

In addition, MPEG does not specify how a closest match should be implemented; encoders have the task therefore of minimising the difference between a predicted and an actual macroblock.

In the following, the concept of performing motion estimation and compensation using the VQHT technique discussed above is described. For simplicity, both forward-predicted and bidirectionally-predicted frames are referred to as P-frames in the following description. For a given GOP, the process begins by encoding an I-frame (or a P-frame from which a subsequent P-frame is to be deduced) into a VQHT. As described above, this provides a complete and non-lossy representation of an I-frame. From an implementation perspective, encoding involves a two-stage process:

1. Codebook generation, in which a decision is made on the number of bucket entries L in the codebook C. Representative code vectors from the I-frame are computed

(using any of the standard VQ training algorithms) and stored in C. Clearly, the larger the codebook, the less the quantisation error during encoding and look-up. It follows, therefore, that the minimum bound on L should be at least equal to the maximum number of motion vectors that any subsequent predicted frame will require. Thus, video fidelity becomes a function of the codebook size, as well as the size of the hash table.

2. Hash Table loading, in which the VQHT bucket slots are filled up by feeding every possible source vector (macroblock) from the I-frame through the hash function and storing it (together with its co-ordinates) in its appropriate bucket. Bucket slots are filled up sequentially in this manner.

Clearly, the above two processes must be performed exactly once for a given GOP. The resulting VQHT structure must be made available to the encoder.

To encode a subsequent P-frame, a set of motion vectors are required for those macroblocks which will be predicted during the decoding stage. The generation of motion vectors using the VQHT involves the simple act of a hash table lookup. The P- frame macroblock whose motion vector is required is hashed directly into a bucket entry. The corresponding motion vector is then obtained simply by searching all slots for that I-frame macroblock which minimises a distance metric. The co-ordinate difference between the P-frame and I-frame macroblock so found defines the motion vector. This can now be DCT-encoded before being transmitted to the decoder in the usual MPEG manner.

The encoder structures required in a hardware implementation of VQH can be partitioned into pre-processing and postprocessing stages. For pre-processing, all that is required is a vector quantiser (which is normally a part of MPEG anyway) and some local buffer memory which stores the buckets and slots comprising the VQHT. It is possible to construct control logic that will directly fill up the VQHT from the vector quantiser's output when it is given an I-frame to encode. This, however, could also be done in software without incurring a significant performance penalty.

In BMA, the generation of motion vectors from subsequent P- frames is, as noted above, a computationally intensive task. Several high-throughput systolic designs have been suggested and implemented in order to achieve this.

In the VQH approach, it is possible to design, for postprocessing purposes, efficient dataflow hardware which will give rise to a high-performance motion vector generation engine. Such a design is illustrated in Figure 3 of the drawings, and consists of the following: A shift register array, which takes as its input a linearised macroblock from a P-frame that is to be encoded. The geometry of this array is arranged such that the outputs are simply equal to the inputs, but with each component staggered by one computational cycle from its predecessor;

A codebook buffer, which contains the code vectors which will be filled in by the hardware vector quantiser; A VQHT buffer, which contains the representation of the I-frame, and is filled in during the pre-processing stage;

A systolic sorter: the P-frame macroblock that is to be encoded needs to be hashed into its appropriate bucket, and the corresponding I-frame macroblock with the least distance metric needs to be found. For this reason, a systolic sorter is included, the function of which is two-fold. Firstly, it sorts the output metrics from the codebook array in order to find the corresponding bucket . Secondly, it sorts the output metrics from the bucket in order to find those with the least distortion; An array of comparators: the distortions from the sorter array need not be unique, particularly if the macroblock is representing a region of low spatial gradient (i.e. minimum motion) . Thus it is necessary to compare the sorted outputs with each other in order to tag all those which are equal. The comparator performs this task; and A mean absolute differencer: at this stage, there exists a set of bucket entries which have an identical (and minimum) distance metric between the P-frame macroblock to.be encoded and the I-frame. It now remains to find from these entries that unique entry which minimises the co-ordinate metric. The 2-dimensional differencer performs this task. It takes as its input the {x, y) coordinates of the P-frame macroblock to be encoded as well as the outputs from the comparator array. It then performs a metric computation (an L₂-norm) between this coordinate and the coordinates of all candidate I-frame macroblocks. The resulting calculation tags the coordinates of the best-matching I-frame macroblock.

In BMA, to compare an N x N once requires O iN²) operations (the O-notation is well-known in complexity analysis, and provides a way of. expressing an upper bound) . The number of such macroblock searches required within a search window is, from Figure 1, (2w+l)²~0(w²) . Consider a square frame, of dimensions A x A pixels. Since N+2w=A, we have O (w-) ≡ 0(A²) . If an exhaustive search for a motion vector is carried out over the entire frame, a total of P_BMA = 0 (N²A^Z) 0 (1SP) 0(A²) operations are required for every P-frame. The function grows relatively rapidly, as shown in Figure 4 of the drawings.

In the VQHT approach to motion estimation according to this exemplary embodiment of the present invention, it is necessary to factor in the initial, but one-off, cost of generating the codebook at the start of a GOP. Using the convention illustrated in Figure 2, if there are L bucket entries, with the largest bucket having at most M slots, then a VQHT training and loading algorithm based on J-means clustering can be shown to require O (IM) operations. (Note: M is bounded by A², as explained above, but it is realistic to expect that M < A², and The look-up for generating a motion vector requires simply O (L) operations followed by at most O (M) slot searches. This gives a total of P_VQHT = 0 {LM) operations to set up a GOP, followed by O (L) +0 (M) operations for every P-frame that is subsequently encoded using it. With standard VQ, the greater the codebook size, the more accurate is the quantised representation. However, with the VQHT of the present invention, the reduction in accuracy entailed by small values of L is compensated for by an increase in the maximum slot size M. The extremal cases are simply L=A², M=l against L=l, =A². By choosing mid-point values L=M= (l/2)A, we obtain P_VQHT = 0(A²) + 0(A).

Thus, in the above description a new method of finding the closest match in video compression is presented based on two new ideas, namely the use of a hash table for storing the motion vectors, and the use of vector quantisation (VQ) as an indexing method for a hash table. A systolic architecture is also proposed for implementing the described algorithm in hardware.

An embodiment of the present invention has been described above by way of example only and it will be apparent to persons skilled in the art that modifications and variations can be made to the described embodiment without departing from the scope of the invention.

Claims

CLAIMS :

1. A method of compressing image data comprising the steps of generating a set of motion vectors representative of one or more image frames, generating, by means of a predetermined hash function a set of hash values corresponding to said motion vectors, and storing as a code book said hash values in the form of a table or array.

2. A method according to claim 1 , including the step of using vector quantisation to index the hash values stored in said table or array for retrieval thereof by decoding means .

3. Apparatus for compressing image data comprising means for generating a set of motion vectors representative of one or more image frames, means for generating, using a predetermined hash function, a set of hash values corresponding to said motion vectors, and code book means for storing said hash values in the form of a table or array.

4. Apparatus according to claim 3, further comprising vector quantisation means for indexing the hash values stored in said table or array so that they can be retrieved.

5. A method of compressing image data comprising the steps of generating a set of motion vectors representative of one or more image frames, storing as a code book data representative of said motion vectors in the form of a table or array, and using vector quantisation to index the data stored in said table or array for retrieval of said data by decoding means.

6. A method according to claim 5, wherein said data representative of said motion vectors comprises a set of hash values corresponding to said motion vectors, said hash values being generated by means of a predetermined hash function.

7. Apparatus for compressing image data comprising means for generating a set of motion vectors representative of one or more image frames, code book means for storing data representative of said motion vectors in the form of a table or array, and vector quantisation means for indexing the data stored in said table or array so that it can be retrieved.

8. Apparatus according to claim 7, wherein said data representative of said motion vectors comprises a set of hash values corresponding to said motion vectors, said hash values being generated by means of a predetermined hash function.

9. A method or apparatus according to any one of the preceding claims, wherein said vector quantisation is a mapping Q such that :

Q: R^N → C where C is an L-dimensional set, L<N, such that C = {γ_1# ... , Y_N}, and the γ₄ e R^N for i= 1, ... ,L.

10. A method or apparatus according to claim 9, wherein Q is interpreted as the hash function h=h (X) (where X = R^N) .