CN102034247A

Movatterモバイル変換

Info

Publication number: CN102034247A
Application number: CN 201010602544
Authority: CN
Inventors: 王阳生; 时岭
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2010-12-23
Filing date: 2010-12-23
Publication date: 2011-04-27
Anticipated expiration: 2030-12-23
Also published as: CN102034247B

Abstract

本发明是一种基于背景分割对双目视觉图像的运动捕捉方法，可以完成对于人体作为前景的分割，同时对人体的上身躯干部分进行运动捕捉，从而完成人机交互的效果。本方法是在背景建模的基础上，通过对摄像头采集的干净背景进行高斯模型的建立，然后将采集的视频同背景模型进行比较，并通过双目摄像头所获得的深度信息，将场景的每一个像素给定一个属于前景或背景的概率值，并通过图切算法完成对场景前景和背景的分割。在分割前景是人体上身躯干的情况下，通过对前景轮廓的细化、去噪和关键点的确定，获得人体的基本骨架模型，从而完成运动捕捉的过程。

The invention is a motion capture method for binocular vision images based on background segmentation, which can complete the segmentation of the human body as the foreground, and at the same time perform motion capture on the upper torso of the human body, thereby achieving the effect of human-computer interaction. This method is based on background modeling, through the establishment of a Gaussian model for the clean background collected by the camera, and then compares the collected video with the background model, and uses the depth information obtained by the binocular camera to convert each part of the scene A pixel is given a probability value belonging to the foreground or background, and the segmentation of the foreground and background of the scene is completed through the graph cut algorithm. In the case that the segmented foreground is the upper body of the human body, the basic skeleton model of the human body is obtained through the refinement of the foreground outline, denoising and determination of key points, thereby completing the process of motion capture.

Description

A kind of based on the motion capture method of background modeling to the binocular vision image

Technical field

The invention belongs to computer vision technique and interactive digital entertainment field, relate to a kind of background segment that binocular camera shooting head and background modeling technology finish and motion-captured process utilized.

Background technology

Movement capturing technology refers to computer vision or other means utilized, and can capture the motion process of human body in real time, exactly.Along with the development of computer software and hardware and the raising of computer user's demand, movement capturing technology is more obvious in the effect of the inside, fields such as digital entertainment, video monitoring, motion analysis.

Yet the development of movement capturing technology also is subjected to the restriction of various conditions and many limitation occur.Such as problems such as blocking of variation, complicated background and the motion process of light.These factors make motion-captured process become more difficult.Yet, carry out the result of background segment by the method for utilizing binocular vision, the prospect in scene has only under the prerequisite of human body, and motion-captured problem will be converted into the prospect profile problem of analyzing scene, make calculated amount simplify greatly.Simultaneously, in the interactive digital entertainment field, movement capturing technology also is the research focus of man-machine interaction in playing in recent years as a kind of video interactive technology.And camera become the general outfit of PC, and man-machine interaction mode general, immersion more and more becomes the focus of digital entertainment research.So, have the research prospect of application fields based on the binocular vision movement capturing technology of background segment technology.

Summary of the invention

Cutting apart of prospect that the objective of the invention is to utilize the binocular camera shooting head to obtain scene and background, finish motion-captured process simultaneously on this basis.This method is at first trained clean background, gathers the background picture of certain frame number, finishes the foundation of background model.On this basis, utilize the new image of gathering, finish the foundation that figure cuts network chart with the color distortion of background model and the depth information of binocular vision, and the method for utilizing dynamic figure to cut, scene prospect and background are cut apart.On the basis of cutting apart, the human body of prospect is carried out the analysis of structure simultaneously, obtain the location of upper body trunk various piece, thereby finish motion-captured process.

For achieving the above object, the invention provides based on background modeling the motion capture method of binocular vision image comprised that step is as follows:

Step S1: with binocular camera shooting head stationkeeping, close white balance, obtain the binocular vision image;

Step S2: to the binocular vision image that obtains, under the clean background image of setting frame number, carry out background modeling, obtain background model;

Step S3: the binocular depth information that utilizes computer binocular vision to obtain, calculating pixel belongs to the probability of prospect and background;

Step S4: utilize binocular depth information and background modeling data and dynamic figure to cut algorithm, binocular vision display foreground and background are cut apart, and extracted prospect profile;

Step S5: prospect profile carries out refinement, determines the human body key point, finishes motion-captured.

Good effect of the present invention:

The present invention utilizes computer vision and image processing techniques, isolates the human body of prospect naturally from scene, and finishes the motion-captured of upper body trunk, thereby realizes the man-machine interaction of nature.The characteristics of traditional interactive mode are based on the hand contact, as mouse, keyboard etc.Development along with computer vision technique, increasing system is by the process of having finished man-machine interaction of the method nature of camera, the user can experience the enjoyment of man-machine interaction more easily by the mode of vision, simultaneously, interface as recreation makes the game player obtain more feeling of immersion.

In addition, the present invention has utilized the collection of binocular vision and the foundation of background model.The employing of binocular vision mainly is to have made full use of depth information, and the prospect of considering often belongs to from the nearer zone of camera, the problem of avoided by shade simultaneously, blocking the segmentation errors that causes.In addition, setting up background model can be so that the cost of cutting apart better obtains calculating, the method for utilizing dynamic figure to cut simultaneously, make cut apart quicker.

Description of drawings

Figure 1A is overall flow figure of the present invention;

Fig. 1 is a binocular vision image of the present invention;

Fig. 2 is left figure and right figure and the parallax that the present invention utilizes binocular vision to obtain;

Fig. 3 is that figure of the present invention cuts the max-flow of algorithm or the network flow graph of minimal cut;

Fig. 4 is a process flow diagram of the present invention;

Fig. 5 is one group of design sketch that video background is cut apart of the present invention;

Fig. 6 is a background segment result's of the present invention edge smoothing synoptic diagram;

Fig. 7 is the refinement of profile of the present invention and the result that key position extracts.

Embodiment

Below in conjunction with accompanying drawing the present invention is described in detail, described embodiment only is intended to be convenient to the understanding of the present invention, and it is not played any qualification effect.

Further specify a kind of operating process of the motion capture method based on background modeling below by example.

All codes of this example are C++ and write, and move under Microsoft visual studio 2005 environment, can also adopt other software and hardware conditions, do not repeat them here.

Figure 1A illustrates and the present invention is based on the overall flow figure of background modeling to the motion capture method of binocular vision image.

The present invention is based on the motion capture method of background modeling to the binocular vision image, based on binocular vision and background segment, its method comprises that step is as follows:

Comprise according to obtaining the binocular vision image step described in the step S2:

Step S211: guarantee the stationkeeping of camera, do not have tangible light and shade to change in the scene;

Step S212: close the Automatic white balance of camera, in the hardware parameter of camera, the function of automatic exposure parameter and Automatic white balance is arranged generally, so that when scene light changes, realize regulating automatically the function of picture quality; In background modeling, need to set white balance parameter and fix;

Step S213: gather the fixedly clean background image of frame number (100 frame), be stored in the internal memory.

Comprise as follows according to the step of under the clean background image of setting frame number, carrying out background modeling described in the step S2:

Step S221: utilize the Gaussian Background model to gather the coloured image of each frame in the binocular vision image, use R respectively, G, B represent red, and green and blue three-channel value, span are 0～255;

Step S221: obtained N image in the background modeling process, each image comprises 320 * 240 pixels, calculate the brightness I of each pixel and colourity (r, g).Wherein, r=R/ (R+G+B), g=G/ (R+G+B), R, G, B represent the redness in the Color Channel respectively, the value of green and blue component;

Step S221: the fusion background model of setting up pixel scale; Calculate the brightness and average and the variance of colourity in N image of each pixel, and deposit internal memory in;

Step S221: set up feature background model at brightness space, chrominance space set up based on the colourity model.Deposit the colourity obtained and the background model in the brightness space in internal memory.

According to the depth data cost of each pixel in the binocular vision image described in the step S2 is calculated, obtain the degree of depth cost of each pixel, thereby the binocular depth information is introduced, concrete steps comprise:

Step 231: gather and preservation binocular vision image, be designated as left image and right image respectively;

Step 232: set a depth value for each pixel of left image, described depth value is represented with the parallax of left image and right image;

Step 233:, calculate the difference cost of left image and right image at each depth value;

Step 234: add up the cost value in the left image, and the cost value in the left image is divided into four groups according to the size of described cost value;

Step 235: the cost value of each group is upgraded the prospect of this pixel and the cost of background, and the cost that wherein belongs to prospect reduces according to the exponential relationship to parallax, and the cost of background increases according to the exponential relationship of parallax.

According to utilizing binocular depth information and background modeling data and dynamic figure to cut algorithm described in the step S4, binocular vision display foreground and background to be cut apart, and extracted prospect profile, concrete steps comprise:

Step S41: background modeling reads in the binocular vision image that newly reads after finishing, and described binocular vision image comprises left image and right image;

Step S42: utilize the result of binocular vision data cost acquisition, obtain the data cost of binocular information;

Step S43: utilize background model, compare, obtain to utilize figure to cut algorithm basic principle, set up the network flow of max-flow or minimal cut based on the color cost value with the pixel of left figure;

Step S44: two data cost value utilizing step S42 and step S43 to obtain obtain figure and cut data cost value in the algorithm;

Step S45: utilize the relationship of contrast between the left pixel, the level and smooth item that figure is cut in the algorithm carries out assignment;

Step S46: utilize dynamic figure to cut algorithm, will cut apart based on the video flowing of pixel aspect, segmentation result is divided into two parts, and a part is a prospect, and a part is a background in addition.

Step S47: the prospect background that will cut apart is stored in the picture of identical size according to 0 or 1, and 0 or 1 prospect background picture is obtained edge contour;

Step S48: the mode of utilizing High frequency filter makes the edge more level and smooth the edge denoising;

Step S49: utilize the cut zone of the data error of former frames to proofread and correct.

Utilize picture denoising, refinement mode according to step S5, obtain the key point of trunk, thereby realize that motion-captured effect step comprises:

Step S51: will carry out convergent-divergent through the human body contour outline of aftertreatment;

Step S52: the human body contour outline of convergent-divergent is carried out refinement;

Step S53: the human body contour outline of refinement is enlarged, expand original size to;

Step S54: once more profile is carried out refinement;

Step S55: find neighborhood territory pixel greater than 2 node, and get its center-of-gravity value, be set at the gravity center of human body;

Step S56: search for up and down along center of gravity, find node, be set at head and waist;

Step S57: along about center of gravity, search for, find left arm and right arm, and proportionally determine ancon and shoulder with eccentricity;

Step S58: 9 key points will determining compare with former frames, obtain comparatively stable and trunk position accurately.

The first step as shown in Figure 1 is an images acquired.This method adopts the input of binocular vision video.Among the figure, (z) expression is the coordinate of world coordinate system for x, y; (x_L, y_L) and (x_R, y_R) pixel coordinate of the same object of expression in left figure and right figure.

(1) mostly the information of Digital Image Processing is two-dimensional signal, and the process information amount is very big.The piece image here with two-dimensional function f (x, y) expression, x wherein, y is a two-dimensional coordinate, f (x, y) expression point (x, colouring information y).Camera is gathered all optical information in the camera lens from the space, these information enter after the computing machine, is converted to the color model that meets computer standard, carries out Digital Image Processing with the program of entering, and guarantees the continuity and the real-time of video.From the image of gathering, each pixel is handled 76800 pixels of 320 * 240 pixels altogether.The initial effect of video of gathering as shown in Figure 1.Project all operations and computing 320 * 240 pixels that all are based on this each frame subsequently.In the binocular vision, same pixel about the position difference of imaging among two figure, and the size of position difference, reflection be the degree of depth of image.Relatively moving of two pixels can be calculated by the coupling of pixel.Method of the present invention is utilized these information, auxiliary finishing cutting apart of prospect and background.As shown in Figure 2, the utilization of binocular information is that the cost that two width of cloth figure mate about usefulness realizes.What wherein P represented is the position of certain pixel in left figure, and P+d represents the position of this pixel in right figure, and what d represented is exactly the parallax (Display) of this pixel.

(2) the present invention is made up of two parts in the process of utilizing the binocular depth information.

Step 1: the coupling cost in that pixel xi calculates is divided into four groups (maximal value of parallax d is set at 32) according to different parallax value:

A group: pixel x_iThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d＞16, represent that this pixel belongs to prospect very much;

B group: pixel x_iThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d≤16and d＞12, represent that this pixel has the very big prospect that may belong to;

C group: pixel x_iThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d≤12and d＞5, represent that this pixel has the very big background that may belong to;

D group: pixel x_iThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d≤5, represent that this pixel belongs to background very much.

Under such hypothesis, the present invention needs less time that pixel is divided into four groups, rather than each pixel is carried out 32 possible parallaxes suppose.

Step 2: set the suitable data cost value for figure cuts algorithm.Data item of the present invention comprises that respectively this pixel belongs to the cost of prospect or background, uses D respectively_i(B) and D_i(F) expression.The parallax value of pixel is big more, so it to belong to the possibility of prospect big more, so D_i(F) value correspondence reduces D_i(B) value is corresponding to be increased.By such corresponding relation, the present invention proposes a corresponding scheme, express with following formula:

D_{i, t}^{s} (B) = D_{i} (B) + λ_{t} e^{- d c_{t}},

\begin{matrix} D_{i, t}^{s} (F) = D_{i} (F) - λ_{t} e^{- d c_{t}} \\ c_{t} \end{matrix}

For all t=A, B, C, D, λ_t＞0.Wherein

What represent is the background model data item that incorporates binocular information, belongs to t=A respectively, B, C, four groups of D.D_i(B) expression is the background segment data item of monocular vision.λ_tBe the parameter of binocular data cost, what i represented is pixel coordinate.

What represent is the foreground model data item that incorporates binocular information.The parallax value that d represents (Disparity).c_tWhat represent is the parameter of control d.

As shown in Figure 3, figure cut algorithm max-flow or the network flow graph of minimal cut.P wherein, what q represented is two adjacent pixels.Shown in Figure 4 is the process flow diagram that figure cuts algorithm, comprises the assignment of front end and the partitioning portion of rear end.

(3) figure cuts the important component part that algorithm is a background segment, and it to the effect that utilizes the principle of max-flow or minimal cut, the pixel in the image is cut apart according to certain path, and which is calculated belong to prospect or and background respectively.

The segmentation problem of prospect or background in the image can be considered as the binary identified problems in the computer vision field.If pixel i belongs to prospect, the label f of this pixel of mark then_i=F, F refers to prospect.In like manner, if this pixel belongs to background, then be labeled as f_i=B.Correspond to two-value label problem, label set only comprises two labels.Figure cuts the weighted graph that algorithm constructs and comprises two corresponding with it summit s and t.As shown in Figure 3, among the figure, left figure is the weighted graph G that provides by 3 * 3 original image structure, G=＜V, ε 〉, wherein V is a vertex set, is to be called source node S respectively and terminal node T is dimerous by ordinary node and two.Wherein S and the T two-value label of representing prospect and background respectively for summit ε representative be the limit that is connected the summit, the weights size on limit is represented with the thickness of simplification in last figure.

Flow process such as Fig. 4 that dynamic figure cuts.Comprise data item and level and smooth in the energy function, being provided with of they directly affects figure and cuts the final segmentation result of algorithm.That Fig. 5 represents is several groups of Video Segmentation results of the present invention, and wherein left side 3 width of cloth figure are left figure images of doing video in the input video, and right side 3 width of cloth figure are the results after cutting apart.

(4) low-pass filter that designed in the frequency domain of the present invention comes smooth boundary.Along boundary curve C, the boundary curve of the process of edge smoothing of the present invention as shown in Figure 6, the picture left above is represented the input source image, top right plot is represented the result cut apart; The flat prospect of spending that lower-left figure represents or the edge of background, bottom-right graph is represented is result after level and smooth.Point sequence z (i)=[x (i), y (i)] its complex representation form that obtains of sampling at certain intervals is:

z(i)＝x(i)+jy(i)

The Fourier transform of discrete z (i) is:

f (u) = \frac{1}{K} Σ_{i = 0}^{K - 1} z (i) e^{- j 2 πui / K}

In the formula, j, u, K represent complex symbol respectively, frequency and constant term, f (u) is the Fourier transform of z (i), the Fourier that is called the border is described son, is the expression of boundary point sequence in frequency domain.By the Fourier transform theory as can be known, high fdrequency component comprises details, low frequency component decision global shape.Curve is because jagged ability is rough, and high fdrequency component is contained in these rough zones.The HFS of f (u) is carried out filtering just can obtain smooth curve.The present invention defines the high-frequency energy of low frequency energy ratio and filtering 5%:

r (l) = Σ_{u = 0}^{l} {| f (u) |}^{2} / Σ_{k = 0}^{K - 1} {| f (u) |}^{2}

Wherein || be modulo operation.Getting the minimum l value that r (l)＞0.95 is set up is the cutoff frequency of low-pass filter.Utilize the character of fourier coefficient(

Be the conjugate complex number of f).In coefficient f (u), the radio-frequency component of cancellation in from l to the K-1-l scope.Carry out inverse fourier transform again, the part of curve sudden change has obtained smoothly.

Being motion-captured result of the present invention as shown in Figure 7, wherein is two two field pictures of left figure in the video in the left hand view, and the right side is key point and the skeleton that segmentation result has extracted.Key point represents with circle that skeleton is represented with line.

(5) the present invention motion-captured on the basis of cutting apart comprises three steps,

Step 1: the result that will cut apart carries out aftertreatment, obtains level and smooth and stable relatively contour area, cuts apart owing to relate to profile, so the border does not need accurate calculating.Under will not situation, can finish the skeleton motion tracking effect that this paper needs preferably than macroscopic-void.

Step 2: the profile that will cut apart positions, and determines the basic comprising of nine points.Comprising A₁, A₂, A₃Be representative group, A herein₁, A₂, A₃, A₄, A₅, A₆, A₇, A₈, A₉Nine points.A₁, A₂, A₃Represent three points of head and trunk respectively, A₄, A₅, A₆And A₇, A₈, A₉Represent three points of left arm and right arm respectively.

Step 3: the order of nine points being installed the skeleton profile connects, and finishes motion-captured.

The above; only be the embodiment among the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected, all should be encompassed within the protection domain of claims of the present invention.