2350511 Image Processing Method and AI?12aratus The present invention
relates to a method and apparatus for processing facial images, 0 particularly but not exclusively for use in digital animation eg computer games.
zn V Photo grammetric techniques are known for converting two or more overlapping 2D Ic, 0 4:
images acquired from different viewpoints into a common 3D representation and in t principle such techniques can be applied to the human face to generate a 3D Z representation which can be animated using known digital techniques.
ID 0 Suitable al-orithms for correlatino, imaae regions of corresponding images (eg 41) 4=1 C) 0 C 0 photographs taken during airborne surveys) are already known - eg Gruen's 0 t 0 algorithm (see Gruen, A W "Adaptive least squares correlation: a powerful image matchina technique"S Afi- J of Photogrammetry, remote sensing and Cartography Im Vol 14 No 3 (1985) and Gruen, A W and Baltsavias, E P "High precision image 0 0 matching for digital terrain model generation" Int Arch photogrammetry Vol 25 No 3 1:1 Zl (1986) p254) and particularly the "region-growing" modification thereto which is 0 C, C1 described in Otto and Chau "Region-growing algorithm for matching terrain images" 0 4=1 Z:. 0 Image and Vision Computing Vol 7 No 2 May 1989 p83.
Essentially, Gruen's algorithm is an adaptive least squares correlation algorithm in en which two image patches of typically 15 x 15 to 30 x 30 pixels are correlated (ie selected from larger left and right images in such a manner as to give the most r t:. 0 consistent match between patches) by allowing an affine geometric distortion 0 between coordinates in the images (ie stretching or compression in which originally 0 0 parallel lines remain parallel in the transformation) and allowing an additive radiometric distortion between the grey levels of the pixels in the image patches, generating an over-constrained set of linear equations representing the discrepancies Z. 0 between the correlated pixels and finding a least squares solution which minimises the discrepancies.
The Gruen algorithm is essentially an iterative algorithm and requires a reasonable approximation for the correlation to be fed in before it will converge to the correct solution. The Otto and Chau region-growing algorithm begins with an approximate 0 0 match between a point in one image and a point in the other, utilises Gruen's algorithm to produce a more accurate match and to generate the geometric and 2 radiometric distortion parameters, and uses the distortion parameters to predict approximate matches for points in the region of the neighbourhood of the initial matching point. The neighbouring points are selected by choosing the four adjacent c) 0 points on a grid having a grid spacing of eg 5 or 10 pixels in order to avoid running 0 & 0 Gruen's algorithm for every pixel.
Hu et al "Matching Point Features with ordered Geometric, Rigidity and Disparity Z> Constraints" IEEE Transactions on Pattern Analysis and Machine Intelligence Vol 0 16 No 10, 1994 pp1041-1049 (and references cited therein) discloses further methods for correlating features of overlapping images.
0 0 Our co-pending patent applications disclose a number of improvements to the Gruen 0 algorithm, as follows:
is i) the additive radiometric shift employed in the algorithm can be dispensed with; ii) if during successive iterations, a candidate matched point moves by more than a 0 certain amount (eg 3 pixels) per iteration then it is not a valid matched point and should be rejected; iii) during the growing of a matched region it is useful to check for sufficient 0 C5 contrast at at least three of the four sides of the region in order to ensure that there is 0 sufficient data for a stable convergence - in order to facilitate this it is desirable to make the algorithm configurable to enable the parameters (eg required contrast) to 0 CI CI be optimised for different environments, and iv) in order to quantify the validity of the correspondences between respective patches of one image and points in the other image it has been fokind useful to re- 0 0 derive the original grid point in the starting image by applying the algorithm to the 0 0 0 matched point in the other image (ie reversing the stereo matching process) and 0 0 measuring the distance between the original grid point and the new grid point found in the starting image from the reverse stereo matching. The smaller the distance the c) 0 better the correspondence.
However the known photogrammetric techniques still require correlations between high quality overlapping images and, in cases where there is little texture 3 information in the subject (which is true of large regions of the human face) it is difficult or impossible to correlate all the regions, which results in holes in the 3D reconstruction. Such difficulties can be overcome by projecting an optical (particularly infra-red) pattern (particularly a speckle pattern) onto the subject but the requirement for pattern projection increases the expense of an already sophisticated apparatus.
One object of the present invention is to overcome or alleviate such disadvantages.
C In one aspect the present invention provides a method of providing a threedimensional representation of an object wherein two or more twodimensional images of the object are photogrammetrically processed to generate an incomplete C. r> three-dimensional representation thereof and the incomplete three- dimensional 15 representation is combined with a generic representation of such objects to provide C the three-dimensional representation.
In one embodiment the object is a human or animal body or a part thereof.
In a preferred embodiment the three-dimensional representation derived from the combination with the generic representation is provided in the format of an animatable character.
In a preferred embodiment the resulting three-dimensional representation is 0 converted to the file format of a computer game character and loaded into the computer game.
C In another aspect the invention provides a method of personalising a computer game Z> character wherein at least one image of a player of the game is digitally processed at a location remote from the player's computer, converted to an animatable character file and loaded onto the player's computer. For example the image can be processed 0 on an Internet server computer and downloaded over the Internet.
Further preferred features are defined in the dependent claims.
In a preferred embodiment a fully automated system is provided whereby Quake, Doom, Descent and other popular games users, can be provided with a custom game 4 character with their own face (preferably a 3D face) inserted into the character. This would enable them to use a visualisation of themselves in a game. This service could be provided via the Internet with little or no human intervention by the operator of the server.
Nothing similar currently exists with this level of accessibility by the gaming public.
Z> Z' By using a generic head during image processing, a relatively low quality of 3D In 0 0 surface is required in order to get an acceptable result, and the problems of holes in 10 3D data-sets can be eliminated.
A low-resolution model is required for gaming, as the game will have to support the manipulation of the character in the gaming environment, in real time, on a variety =1 IM of PCs.
It is assumed that if necessary, a user would tolerate a few hours turnaround time between submitting their images and receiving a model either on a data medium 0 0 such as floppy disk or by email.
In one embodiment the games user would be required to take a set of images of himself/herself using a digital camera or scanned photographs, under specified 0 CP C, guidelines.
He/she will then access aWeb page hosted by the server, which will provide a form, requiring the user to enter the following information:
0 Name Email address Which aame he/she wants the model for.
ID The images which are to be submitted.
Select a body on which he/she wants the face inserted.
Credit Card details The server will then schedule an image processing job to perform the following tasks:
Determine 3D facial geometry from the supplied image files Modify a generic head to this geometry Apply a texture map from the supplied images Polygon reduce the head model Integrate the head model with a body Convert to complete model to the required format for the specified game.
After appropriate processing, the completed character would be sent as an 0 attachment to the specified email address, and a micro transaction perforTned to bill the user's credit card.
A preferred embodiment of the invention is described below by way of example only with reference to Figures 1 to 3 of the accompanying drawings, wherein:
Fiaure 1 is a schematic flow diag g gram of an image processing method in accordance with one aspect of the invention; Figure 2A is a schematic plan view showing one camera arranaement for acquiring C> 0 0 the images utilised in the method of Figure 1; r.51 0 FioUre 2B is a schematic plan view of another camera arrangement for acquiring the 0 C 0 images utilised in the method of Figure 1; C Figure 2C is a schematic plan view of yet another camera arrangement for acquiring 0 C1 05 the images utilised in the method of FigUre 1, and 30 Figure 3 is a schematic representation of an Internet-based -arrangement for providing an animated games character by a method in accordance with the second 0 aspect of the invention. 35 Referring to Figure 1, left and right images 11 and 12 are acquired eg by a digital 0 camera and processed by standard photogrammetric techniques to provide an incomplete 3D representatation 100 of the game player's head.
0 The determination of the game player's facial geometry can involve Gruens type 6 area matching, facial feature correlation, and facial feature recognition via a statistical model of the human face. Gruens type area matching suffers from the problem of having no projected texture, and is thus highly susceptible to the texture in the face of the subject, the ambient lighting conditions, and the difference in colour hues and intensities between images. It is also susceptible to the lack of 0 camera model or optical geometry of the captured images. Facial feature correlation C.
suffers from the problem that any facial feature that is incorrectly detected will cause a very poor model to be generated. Facial feature recognition via a statistical model prevents gross inaccuracies from occurring and should lead to a more robust solution. It is possible that part of the image submission process could involve the user in specifying certain key points on the images.
0 0 In order to alleviate the above problems, a 3D representation of a generic head 200 is provided. Given geometric information derived from the preceding stage, the generic head model can be distorted to fit the subject's roughly calculated geometry. This C 0 head could be in one of two forms, a NURBS (Non-Uniform Rational B Spline) model, or a polygon model. The NURBS Model has the advantage of being easily 0 0 deformable to the subjects geometry, but suffers from the drawback of higher 0 C processing overhead, and having to convert to polygons for subsequent processing r> 0 stax),es.
r> At this stage of processing (modified generic head 300) there should already be a 0 0 correlation between certain points in each image, and points on the 3D model, greatly simplifying the task of texture mapping. There remains the problem of 4= texture merging arising from the use of multiple images.
A texture map is derived (400) from the 31) head (100) and attached to the representation resulting from step 300 (step 500) (ie used to render the modified 0 generic head) and the resulting realistic character representation is then integrated with or attached to the body of the games character (step 600).
If necessary the resulting model is converted to polygon form (step 700).
0 If the modified generic head is represented in polygon form, the number of polygons may have to be reduced (step 800). There are plenty of algorithms and commercially available code for polygon reduction. The completed model may be reduced to quite 7 a low polygon count, possibly 100 or so, in order to produce a relatively small model to transmit and use within the game.
Finally the polygon-reduced representation is converted to a games file format which 5 can be handled by the game (step 900).
C This last step may require liaison and co-operation with the games manufacturers, or it is conceivable that this task could be performed completely independently.
The acquisition of the 21) images I l and 12 will now be described with reference to Figures 2A, 2B and 2C. Each of these FigUres shows different camera arrangements which could be provided as fixed stereoscopic camera arrangements in a dedicated c booth provided in (say a gaming arcade) or could be set up by the games player. In each case a camera C acquires an image from one viewpoint and the same or a different camera C acquires an overlapping image from a different viewpoint. The C 0 fields of view V must overlap in the region of the face of the subject 1.
In Figure 2A the cameras are diagonally disposed at right angles, in Figure 2B the 0 0 C 0 cameras are parallel and in Figure 2C the cameras are orthogonal, so that one camera 0 has a front view and the other camera has a profile view of the subject 1. The arrangement of Figure 2C is particularly preferred because the front view and profile 23 are acquired independently. The front image and profile image can be analysed to 0 determine the size and location of features and the resulting data can be used to select one of a range of generic heads or to adjust variable parameters of the generic C head as shown, prior to step 300.
By correlating a small number of points of the digitised images by means of a known C C algorithm (eg the Gruen algorithm) the exact camera locations and orientations can be determined and the remaining points correlated relatively easily to enable a 31) Z representation of the subject 1 to be generated, essentially be projecting ray lines from pairs of correlated points by virtual projectors having the same location, orientation and optical parameters as the cameras.
Referring to Figure 3, the above correlation process between the generic image G C C and the image 1 of the character 1 provided by digital camera C can be performed by a server computer S on the Internet and the 21) images acquired by the camera C can 8 either be posted by the games player (eg as photographic prints) or uploaded (eg as t:1 C) Im email attachments) onto the server from the user's computer PC via a communications link CL provided by the Internet.
To this end the server computer S has stored on its hard disc HD:
i) the software required to implement the processes outlined in Figure 1 including file format conversion software for the common computer games, graphics software 0 image correlation software (eg based on the Gruen algorithm or a variant thereof) C:1 r t:1 and stereoscopic image processing software; t a ii) software to generate a VVWW submission form F on the user's computer screen Z:p and to process the personal information entered therein by the user, eg credit card details and the required game format of the character; iii) appropriate Internet server software including appropriate security software, and iv) an appropriate operating system.
Since items iii) and iv) are well known per se and items i) and ii) have already been described in sufficient detail for programmers of reasonable skill to write the necessary code, no further description is necessary.
The use's computer PC would have stored on its hard disc HD one or more games programs, Internet access software, graphics software for handling the images C, provided by camera C and a conventional operating system, eg Windows 958 or 4:1 Windows 980. Both computer PC and computer S are provided with a standard microprocessor jAP eg an Intel Pentium@ processor as well as RAM and ROM and appropriate input/output circuitry 110 connected to standard modems M or other comunication devices.
It is expected that the WWW submission form F would be based on a Java Applet to allow the validation of the quality of the submitted imag ,,es, and the selection of body types. It is likely that the server operator would want to test images for their size, resolution, and possibly their contrast ratio, before accepting them for processing. If 0 this can be done by an applet before accepting any credit card transaction, then it 0 9 will help to reduce bad conversions. By weeding out potential failures at an early stage, this will reduce wasted processing time, and will reduce customer frustration by not havino to wait a few hours to find out that the images were not of sufficient quality to produce a character.
Given the correct design of the web pagelform, a valuable database could be constructed of games users, which may be sold or used for mail-shots for future 0 developments.
In addition to the basic information requested on the Intemet submission form F, the operator can request other information of the user for his own uses, namely:
is Which games he plays C Age r> How he found out about us.
It would also be possible to take a single frontal photograph of the subject, detect facial features, and map the image onto a generic model. As a large percentage of the 0 0 0 brain is dedicated to the task of facial recognition, the model may be very 0 approximate indeed to the actual geometry of the subject's face, and the texture map need only be very low resolution. This may not be acceptable for higher resolution C models that may be required for games such as Tomb Raider.
C Although the preferred embodiment is based on an Internet server, the invention can C also be implemented in a purpose-built games booth at which the images 11 and 12 are acquired, and the processing can be carried out either locally in the booth or 0 remotely eg in a server computer linked to a number of such booths in a network.
In a variant, more than two cameras could be used to acquire the 3D surface of the character.