CN101344816B

Movatterモバイル変換

Info

Publication number: CN101344816B
Application number: CN2008100301944A
Authority: CN
Inventors: 秦华标; 肖志勇
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2008-08-15
Filing date: 2008-08-15
Publication date: 2010-08-11
Anticipated expiration: 2028-08-15
Also published as: CN101344816A

Abstract

The invention discloses a human-computer interaction method and a device based on vision follow-up and gesture identification. The method comprises the following steps of: facial area detection, hand area detection, eye location, fingertip location, screen location and gesture identification. A straight line is determined between an eye and a fingertip; the position where the straight line intersects with the screen is transformed into the logic coordinate of the mouse on the screen, and simultaneously the clicking operation of the mouse is simulated by judging the pressing action of the finger. The device comprises an image collection module, an image processing module and a wireless transmission module. First, the image of a user is collected at real time by a camera and then analyzed and processed by using an image processing algorithm to transform positions the user points to the screen and gesture changes into logic coordinates and control orders of the computer on the screen; and then the processing results are transmitted to the computer through the wireless transmission module. The invention provides a natural, intuitive and simple human-computer interaction method, which can realize remote operation of computers.

Description

Man-machine interaction method and device based on eye tracking and gesture identification

Technical field

The present invention relates to the human-computer interaction technology in the virtual reality system, specifically is a kind of man-machine interaction method and device based on eye tracking and gesture identification.

Background technology

Along with rapid development of computer technology, the interacting activity of people and computing machine becomes an important component part of people's daily life gradually.There is certain limitation in traditional human-computer interaction device such as mouse, keyboard at aspects such as naturality of using and friendly, and therefore studying the human-computer interaction technology that meets the interpersonal communication custom becomes current development trend.

All have naturality, substantivity and terseness based on eye tracking with based on the man-machine interaction mode of gesture identification.Human-computer interaction technology based on eye tracking mainly obtains the position that the user watches attentively by the rotation information of obtaining eyeball, and then realizes the control to computing machine.It mainly is divided into contact and contactless two classes.The matching requirements user of contact wears special equipment to detect eyeball information, and this will bring very big interference to the user.Contactless device resolution precision is limited, and the user must be apart from camera nearer and head deflection can not be excessive.Human-computer interaction technology based on gesture identification mainly changes by identification user's gesture, judges the operation that the user need carry out.It is divided into based on data glove with based on computer vision two classes.There are shortcomings such as burdensome, that motion is dumb in method based on data glove.Can not directly locate by the sensing of finger based on the method for computer vision, must change by specific gesture and obtain relative position, and require the user must be nearer apart from camera.So, traditional all there is certain defective based on eye tracking with based on the human-computer interaction technology of gesture identification, can not well solve remote mutual between man-machine, for example people need the remote-controlled operation computing machine when utilizing the giant-screen speech.

Summary of the invention

The objective of the invention is to overcome the above-mentioned defective that prior art exists, man-machine interaction method and device based on eye tracking and gesture identification are provided, can realize the remote-controlled operation computing machine.The user needn't carry other any equipment, only needs finger by nature to point to and click action can realize control to computing machine.Of the present invention being achieved through the following technical solutions.

Man-machine interaction method based on eye tracking and gesture identification comprises step: the location of the detection in the detection of human face region, hand zone, the location of human eye, finger tip, screen location and gesture identification.

Described screen location comprises: when the user stretched out a finger and points to screen, facial image that system collects according to image collecting device and hand image-region area size calculated human eye and the finger tip distance to screen; Coordinate conversion in image is human eye and the coordinate of finger tip in the three-dimensional coordinate system that is changed to initial point with image collector with human eye and finger tip; By human eye and 2 definite straight lines of finger tip, the point that this straight line and screen intersect is exactly the position that the user points to screen, calculates the coordinate of this position in described three-dimensional coordinate system according to the proportionate relationship of people's eye coordinates and finger tip coordinate; According to the size of screen, be the logical coordinates of mouse on screen with the coordinate transformation of described position;

The clicking operation of described gesture identification by judging that the finger click action is come analog mouse pointed to screen and when mobile, will be considered as mouse moving when the user stretches out finger of the right hand; When this right finger closes for the first time, will be considered as pressing left mouse button; To be considered as pin left button rolling mouse if right finger is stretched out sensing screen and mobile this moment; When this right finger closes once more, will be considered as having discharged left button; Point to screen when the user stretches out finger of left hand, when then left-hand finger being closed, will be considered as pressing right mouse button; After left-hand finger is stretched out, will be considered as discharging right mouse button.

In the said method, whether the detection step of described human face region comprises: have people's face to exist in the image by judging based on Adaboost people's face detection algorithm of class rectangular characteristic, the integrogram of computed image at first, extract the class rectangular characteristic, according to the sorter feature database that has trained, the method for utilization Cascade cascade is searched for human face region in image.The training step of described sorter feature database comprises: calculate the integrogram of sample image, extract the class rectangular characteristic of sample image; Screen effective feature according to the Adaboost algorithm, constitute Weak Classifier; By making up a plurality of Weak Classifiers, constitute strong classifier; The a plurality of strong classifiers of cascade form the sorter feature database that people's face detects.

In the said method, the detection step in described hand zone comprises: according to the features of skin colors of people's face, by the zone that is complementary in the colour of skin matching process searching image; After tentatively cutting apart the zone of selling,, remove background interference according to the area size of connected domain, thereby detect the zone of selling according to the position removal people's face of human face region and the interference region of neck.

In the said method, the positioning step of described human eye comprises: detecting on the basis of human face region, facial image is done horizontal Gray Projection,, on horizontal Gray Projection curve, search for each local smallest point and judge whether to be human eye area then according to the face feature of people's face; After detecting human eye area, with this regional mid point as people's eye coordinates.

In the said method, described finger tip positioning step comprises: at first the image in adversary zone carries out rim detection and grid sample process; Each pixel with the handwheel exterior feature after the sampling is the center, and it is right to choose four pairs of pixels of 4 adjacent pixels formations in counterclockwise and clockwise direction respectively; Calculate the distance variance between every pair of pixel respectively, the mean distance variance is minimum and be fingertip area less than the sampled pixel of threshold value; After detecting fingertip area, with this regional mid point as the finger tip coordinate.

A kind of device of realizing said method, it comprises image capture module, image processing module and wireless transport module, the camera in the image capture module are positioned over screen upper end central authorities, are responsible for gathering user's image and being input in the image processing module; Image processing module is responsible for controlling other two modules, moves various image processing algorithms the user images of gathering is carried out analyzing and processing, the user is pointed to the position of screen and the variation of gesture is converted into logical coordinates and the steering order of computing machine on screen; Wireless transport module comprises receiver module and sending module, and sending module is connected with image processing module, is responsible for result is arrived receiver module by radio signal transmission; Receiver module links to each other with computing machine, is responsible for that result is converted into the mouse control signal and is input in the computing machine.

Described image capture module comprises a camera, and described image processing module comprises a flush bonding processor and peripheral components, and described receiver module and sending module all include radio frequency chip and single-chip microcomputer.

Compared with prior art, the present invention has following advantage and effect: the present invention is a kind of natural, man-machine interaction mode intuitively, and the user need not carry other any equipment, need not to remember complicated operations, only need to point to and click action, can realize control computing machine by the finger of nature; The present invention combines eye tracking and two kinds of technology of gesture identification, solved traditional based on the eye tracking technology with based on needing to wear special equipment in the Gesture Recognition, limited subscriber uses defective freely, a kind of mode of operation simply freely is provided, can be used for the far distance controlled computing machine; Device volume of the present invention is small and exquisite, easy to use, only needs camera is placed on screen upper end central authorities, and wireless communication module is connected to computing machine, can use immediately.

Description of drawings

Fig. 1 is the hardware configuration synoptic diagram in the specific embodiment of the invention.

Fig. 2 is the user mode synoptic diagram in the embodiment of the present invention.

Fig. 3 is the workflow synoptic diagram in the specific embodiment of the invention.

Fig. 4 a and Fig. 4 b be respectively human eye and screen in the three-dimensional coordinate system that with the camera is initial point X-Z and the location diagram in the Y-Z plane.

Fig. 5 is the coordinate setting model according to human eye and finger tip coordinate setting screen.

Embodiment

Below in conjunction with accompanying drawing the specific embodiment of the present invention is described further.

Mainly by image capture module, image processing module and wireless communication module three parts constitute, as shown in Figure 1 based on the man-machine interactive system of eye tracking and gesture identification.Image capture module comprises a camera, is responsible for gathering user's image in real time and being transferred in the image processing module.Image processing module is made up of high performance flush bonding processor and peripheral components, be responsible for other two modules of control, move various image processing algorithms the user images of gathering is carried out analyzing and processing, the user is pointed to the position of screen and the variation of gesture is converted into logical coordinates and the control command of computing machine on screen.Wireless communication module is divided into transmission and receives two parts, and two parts constitute by single-chip microcomputer and radio frequency chip.Wireless communication module is responsible for the result in the images processing module, and this result is converted into the mouse control signal is input in the computing machine.

As shown in Figure 2, user's image is gathered incamera 1 centre position that is positioned over giant-screen 2 upper ends in real time.When the user needs certain position that rolling mouse arrives, only need to point to that this position get final product on the screens byright finger 3,

human eye

4 and 2 definite straight lines offinger 3 finger tips, the crossing point of this straight line and screen is exactly the position that the user points to screen; When the user needs the left mouse button operation, only need click action by right finger; When the user needs the right mouse button operation, only need click action by left-hand finger.

The specific embodiment of the present invention flow process as shown in Figure 3.At first, according to people's face detection algorithm this image is analyzed then by the camera collection user images.Currently whether have the user to use this system by whether existing people's face to judge in the detected image, only detect people's face after, just carry out follow-up processing.On the basis that detects people's face, according to the human eye location algorithm, search for the human eye area in this facial image, obtain people's eye coordinates.By the hand detection algorithm image is analyzed, detected the hand zone.On the basis that gets zone in one's hands, utilize the finger tip location algorithm to analyze the image in hand zone, obtain the finger tip coordinate.If system detects zone in one's hands and navigates to finger tip, think that then the user has pointed to screen, system will calculate the coordinate of mouse on screen according to coordinate conversion model and location model.If system detects zone in one's hands but does not navigate to finger tip, represent then that the user points to close, click action has taken place, can obtain the operational order of computing machine by identification user's click action.Behind intact this two field picture of system handles, with result by radio signal transmission to the wireless receiving module of computing machine in, this module is responsible for that result is converted into the mouse control signal and is input in the computing machine.

In the present embodiment, people's face detection algorithm adopts and based on Adaboost people's face detection algorithm of class rectangular characteristic image is analyzed.System is the integrogram of computed image at first, extracts the class rectangular characteristic.Utilize the sorter feature database that has trained then, the method for utilization Cascade cascade is searched for human face region in image.The sorter feature database that native system uses is made of 22 grades of strong classifiers, and each strong classifier is made of several Weak Classifiers again.System at first intercepts all subwindows of 80 * 80 in the entire image, and each subwindow is eliminated non-face subwindow step by step successively by cascade classifier.If have only a subwindow, then determine this window behaviour face window by whole 22 grades of sorters; If any a plurality of subwindows by whole 22 grades of sorters, a plurality of face windows of waiting to choose are carried out adjacent subwindow merge, select beautiful woman's face window.If do not detect the subwindow that meets, then the subwindow size increases progressively with 1.1 times, and detects by cascade classifier again.

In the present embodiment, the training of sorter feature database is an off-line training, needs a large amount of people's face and non-face sample.Because human face region is generally square area, so at first choose the square pixels zone of sample image, calculates this regional integrogram and class rectangular characteristic.Carry out the screening of feature then according to the AdaBoost algorithm, select effective feature and sets of threshold values to become Weak Classifier, constitute strong classifier by making up a plurality of Weak Classifiers.According to the method for Cascade cascade, as the feature of first strong classifier, the strong classifier that more features are formed detects sorter as further detecting by people's face of a plurality of strong classifiers formations of cascade then with the most tangible two features in people's face.

In the present embodiment, the human eye location algorithm adopts and locatees human eye based on the method for face feature.Smoothing processing is at first done to facial image by system on the basis that detects people's face, then this image is done horizontal Gray Projection.According to the face feature of people's face, can find that there are a plurality of local smallest point in this grey scale curve, is respectively eyebrow, eyes, nostril and face.After so system at first is assumed to position of human eye with second local smallest point of grey scale curve, this point respectively getimage 1/20 up and down as candidate's eye zone, and confirm according to following feature.(1) by the position of human eye as can be known, this part smallest point must be positioned at the first half of people's face.(2) because eyebrow is littler to the distance in nostril than eyes to the distance of eyes, therefore obtain second local smallest point of image after, the distance of itself and previous local smallest point will be less than the distance of itself and a back local smallest point.(3) horizontal projection is done in the candidate's eye zone to obtaining, and drop shadow curve need satisfy a crest shape, during two troughs.Have only and satisfy above three conditions and just can be defined as human eye area, otherwise adjust candidate's eye zone.After confirming as human eye area, and with this regional mid point as people's eye coordinates.

In the present embodiment, the hand detection algorithm adopts the method for colour of skin coupling tentatively to cut apart the zone of selling.On the basis that detects people's face, at first get human eye below size and be 20 * 20 rectangle frame sample area of skin color as people's face, calculate Y, Cb, the Cr mean value of 400 pixels in this rectangle frame.With the Y in the sample area of skin color, Cb, Cr mean value is the center, and plus-minus 10 mates with each pixel in the image respectively as last lower threshold value.Satisfy the zone of this two field picture complexion model, promptly be judged as skin pixels, after coupling is finished, can tentatively cut apart the zone of selling.According to detected human face region, the interference that can remove people's face; According to the geometry site of people's face and neck, get under people's face, wide for people's face twice, height and people's appearance rectangle together is a neck area, can the interference of removing neck under the situation of deflection be arranged at people's face.Because the area of hand is much larger than the area in background interference zone, so can judge whether to be background interference according to connected domain area size, area will be regarded as background interference and remove the zone thereby detection is sold less than the connected region of threshold value.

In the present embodiment, the finger tip location algorithm is the feature location fingertip area according to the handwheel exterior feature.System at first utilizes gradient operator adversary area image to carry out rim detection, gets profile in one's hands.Adopt grid sampling adversary profile to handle then, promptly one 10 * 10 zone is only represented with a pixel in the original image.Each sampled pixel with the handwheel exterior feature after the sampling is the center, chooses 4 adjacent pixels in counterclockwise and clockwise direction respectively, and constituting with sampled pixel is that centrosymmetric four pairs of pixels are right.Calculate the distance variance between every pair of pixel respectively, the mean distance variance is minimum and be fingertip area less than the sampled pixel of threshold value.After obtaining fingertip area, with this regional mid point as the finger tip coordinate.

In the present embodiment, it is as follows to determine that by the coordinate of human eye in the image and finger tip the user points on the screen method of position:

(1) be that true origin is set up a three-dimensional coordinate system with the camera.

(2) after system detects human face region and hand zone in the image, according to the area in human face region and hand zone, calculate human eye and finger tip to screen apart from d_EAnd d_F

(3) by human eye location and finger tip location, available human eye and the finger tip coordinate in image is respectively (x_{E_image}, y_{E_image}), (x_{F_image}, y_{F_image}).In order to determine that the user points to the position of screen by the line of human eye and finger tip, need be with human eye and finger tip the coordinate transformation in image for the camera being the coordinate in the three-dimensional coordinate system of initial point.Fig. 4 a and Fig. 4 b be respectively human eye and screen in the three-dimensional coordinate system that with the camera is initial point X-Z and the location diagram in the Y-Z plane.Because there is proportionate relationship in coordinate and the human eye of human eye in image at the coordinate of this three-dimensional coordinate system, so human eye is as follows at the coordinate of this three-dimensional coordinate system:

x_E＝g(d_E)×(L/2-x_{E_image})

y_{E} = d_{E} \times \tan (θ) - \frac{(y_{E_image} - W / 2)}{\sin (θ)}

G (d wherein_E) be the coefficient relevant with distance, L is the width of images acquired, and θ is the angle of inclination of camera, and W is the height of images acquired.In like manner, can obtain the coordinate x of finger tip by aforementioned calculation_F, y_F

(4) after system obtains human eye and the coordinate of finger tip at this three-dimensional coordinate system, will calculate the position that the user points to the screen position according to human eye and 2 definite straight lines of finger tip.Fig. 5 is the coordinate setting model that the coordinate in this three-dimensional coordinate system is determined the screen position according to human eye and finger tip, and according to the proportionate relationship of this model, the coordinate that can get finger tip sensing screen position is as follows:

x = x_{E} + \frac{(x_{F} - x_{E}) \times d_{E}}{(d_{E} - d_{F})}

y = y_{E} + \frac{(y_{F} - y_{E}) \times d_{E}}{(d_{E} - d_{F})}

When having determined the user behind the position of pointing on the screen,, can calculate the logical coordinates of mouse on screen according to the size of screen.

In the present embodiment, gesture identification is the clicking operation of coming analog mouse by the click action that detects user's finger.Point to screen and when mobile, will be considered as mouse moving when the user stretches out one on right hand finger.When user's right finger closes for the first time, will be considered as pressing left mouse button; This moment is mobile as if right finger is stretched out, and will be considered as pinning the left button rolling mouse; When right finger closes once more, will be considered as having discharged left button.Point to screen when the user stretches out finger of left hand, when then left-hand finger being closed, will be considered as pressing right mouse button; After left-hand finger is stretched out, will be considered as discharging right mouse button.

Claims

1. based on the man-machine interaction method of eye tracking and gesture identification, comprise step: the detection of (1) human face region; (2) detection in hand zone; (3) location of the location of human eye and (4) finger tip is characterized in that also comprising the steps:

(5) screen location: when the user stretched out a finger and points to screen, facial image that system collects according to image collecting device and hand image-region area size calculated human eye and the finger tip distance to screen; Coordinate conversion in image is human eye and the coordinate of finger tip in the three-dimensional coordinate system that is changed to initial point with image collector with human eye and finger tip; By human eye and 2 definite straight lines of finger tip, the point that this straight line and screen intersect is exactly the position that the user points to screen, calculates the coordinate of this position in described three-dimensional coordinate system according to the proportionate relationship of people's eye coordinates and finger tip coordinate; According to the size of screen, be the logical coordinates of mouse on screen with the coordinate transformation of described position;

(6) gesture identification:, point to screen and when mobile, will be considered as mouse moving when the user stretches out finger of the right hand by the clicking operation of judging that the finger click action is come analog mouse; When this right finger closes for the first time, will be considered as pressing left mouse button; To be considered as pin left button rolling mouse if right finger is stretched out sensing screen and mobile this moment; When this right finger closes once more, will be considered as having discharged left button; Point to screen when the user stretches out finger of left hand, when then left-hand finger being closed, will be considered as pressing right mouse button; After left-hand finger is stretched out, will be considered as discharging right mouse button.

2. method according to claim 1, it is characterized in that described step (1) comprising: whether have people's face to exist in the image by judging based on Adaboost people's face detection algorithm of class rectangular characteristic, the integrogram of computed image at first, extract the class rectangular characteristic, according to the sorter feature database that has trained, the method for utilization Cascade cascade is searched for human face region in image.

3. method according to claim 2 is characterized in that the training step of described sorter feature database comprises: calculate the integrogram of sample image, extract the class rectangular characteristic of sample image; Screen effective feature according to the Adaboost algorithm, constitute Weak Classifier; By making up a plurality of Weak Classifiers, constitute strong classifier; The a plurality of strong classifiers of cascade form the sorter feature database that people's face detects.

4. method according to claim 1 is characterized in that described step (2) comprising: according to the features of skin colors of people's face, by the zone that is complementary in the colour of skin matching process searching image; After tentatively cutting apart the zone of selling,, remove background interference according to the area size of connected domain, thereby detect the zone of selling according to the position removal people's face of human face region and the interference region of neck.

5. method according to claim 1, it is characterized in that described step (3) comprising: detecting on the basis of human face region, at first facial image is done smoothing processing, again facial image is done horizontal Gray Projection, then according to the face feature of people's face, the search second local smallest point on horizontal Gray Projection curve, this point respectively get image 1/20 up and down as candidate's eye zone, and judge whether this zone is human eye area; After detecting human eye area, with this regional mid point as people's eye coordinates.

6. method according to claim 1 is characterized in that described step (4) comprising: at first the imagery exploitation gradient operator in adversary zone is carried out rim detection and is got profile in one's hands, and carries out the grid sample process; Each sampled pixel with the handwheel exterior feature after the sampling is the center, chooses 4 adjacent pixels in counterclockwise and clockwise direction respectively, and constituting with sampled pixel is that centrosymmetric four pairs of pixels are right; Calculate the distance variance between every pair of pixel respectively, the mean distance variance is minimum and be fingertip area less than the sampled pixel of threshold value; After detecting fingertip area, with this regional mid point as the finger tip coordinate.

7. device of realizing each described method of claim 1～5, it is characterized in that comprising image collecting device, image processing module and wireless transport module, camera in the image collecting device is positioned over screen upper end central authorities, is responsible for gathering user's image and being input in the image processing module; Image processing module is responsible for controlling other two modules, moves various image processing algorithms the user images of gathering is carried out analyzing and processing, the user is pointed to the position of screen and the variation of gesture is converted into logical coordinates and the steering order of computing machine on screen; Wireless transport module comprises receiver module and sending module, and sending module is connected with image processing module, is responsible for result is arrived receiver module by radio signal transmission; Receiver module links to each other with computing machine, is responsible for that result is converted into the mouse control signal and is input in the computing machine.

8. device according to claim 7 is characterized in that described image processing module comprises a flush bonding processor and peripheral components, and described receiver module and sending module all include radio frequency chip and single-chip microcomputer.