Embodiment
Below in conjunction with accompanying drawing the present invention is described in detail, described embodiment only is intended to be convenient to the understanding of the present invention, and it is not played any qualification effect.
Further specify a kind of operating process of the motion capture method based on background modeling below by example.
All codes of this example are C++ and write, and move under Microsoft visual studio 2005 environment, can also adopt other software and hardware conditions, do not repeat them here.
Figure 1A illustrates and the present invention is based on the overall flow figure of background modeling to the motion capture method of binocular vision image.
The present invention is based on the motion capture method of background modeling to the binocular vision image, based on binocular vision and background segment, its method comprises that step is as follows:
Step S1: with binocular camera shooting head stationkeeping, close white balance, obtain the binocular vision image;
Step S2: to the binocular vision image that obtains, under the clean background image of setting frame number, carry out background modeling, obtain background model;
Step S3: the binocular depth information that utilizes computer binocular vision to obtain, calculating pixel belongs to the probability of prospect and background;
Step S4: utilize binocular depth information and background modeling data and dynamic figure to cut algorithm, binocular vision display foreground and background are cut apart, and extracted prospect profile;
Step S5: prospect profile carries out refinement, determines the human body key point, finishes motion-captured.
Comprise according to obtaining the binocular vision image step described in the step S2:
Step S211: guarantee the stationkeeping of camera, do not have tangible light and shade to change in the scene;
Step S212: close the Automatic white balance of camera, in the hardware parameter of camera, the function of automatic exposure parameter and Automatic white balance is arranged generally, so that when scene light changes, realize regulating automatically the function of picture quality; In background modeling, need to set white balance parameter and fix;
Step S213: gather the fixedly clean background image of frame number (100 frame), be stored in the internal memory.
Comprise as follows according to the step of under the clean background image of setting frame number, carrying out background modeling described in the step S2:
Step S221: utilize the Gaussian Background model to gather the coloured image of each frame in the binocular vision image, use R respectively, G, B represent red, and green and blue three-channel value, span are 0~255;
Step S221: obtained N image in the background modeling process, each image comprises 320 * 240 pixels, calculate the brightness I of each pixel and colourity (r, g).Wherein, r=R/ (R+G+B), g=G/ (R+G+B), R, G, B represent the redness in the Color Channel respectively, the value of green and blue component;
Step S221: the fusion background model of setting up pixel scale; Calculate the brightness and average and the variance of colourity in N image of each pixel, and deposit internal memory in;
Step S221: set up feature background model at brightness space, chrominance space set up based on the colourity model.Deposit the colourity obtained and the background model in the brightness space in internal memory.
According to the depth data cost of each pixel in the binocular vision image described in the step S2 is calculated, obtain the degree of depth cost of each pixel, thereby the binocular depth information is introduced, concrete steps comprise:
Step 231: gather and preservation binocular vision image, be designated as left image and right image respectively;
Step 232: set a depth value for each pixel of left image, described depth value is represented with the parallax of left image and right image;
Step 233:, calculate the difference cost of left image and right image at each depth value;
Step 234: add up the cost value in the left image, and the cost value in the left image is divided into four groups according to the size of described cost value;
Step 235: the cost value of each group is upgraded the prospect of this pixel and the cost of background, and the cost that wherein belongs to prospect reduces according to the exponential relationship to parallax, and the cost of background increases according to the exponential relationship of parallax.
According to utilizing binocular depth information and background modeling data and dynamic figure to cut algorithm described in the step S4, binocular vision display foreground and background to be cut apart, and extracted prospect profile, concrete steps comprise:
Step S41: background modeling reads in the binocular vision image that newly reads after finishing, and described binocular vision image comprises left image and right image;
Step S42: utilize the result of binocular vision data cost acquisition, obtain the data cost of binocular information;
Step S43: utilize background model, compare, obtain to utilize figure to cut algorithm basic principle, set up the network flow of max-flow or minimal cut based on the color cost value with the pixel of left figure;
Step S44: two data cost value utilizing step S42 and step S43 to obtain obtain figure and cut data cost value in the algorithm;
Step S45: utilize the relationship of contrast between the left pixel, the level and smooth item that figure is cut in the algorithm carries out assignment;
Step S46: utilize dynamic figure to cut algorithm, will cut apart based on the video flowing of pixel aspect, segmentation result is divided into two parts, and a part is a prospect, and a part is a background in addition.
Step S47: the prospect background that will cut apart is stored in the picture of identical size according to 0 or 1, and 0 or 1 prospect background picture is obtained edge contour;
Step S48: the mode of utilizing High frequency filter makes the edge more level and smooth the edge denoising;
Step S49: utilize the cut zone of the data error of former frames to proofread and correct.
Utilize picture denoising, refinement mode according to step S5, obtain the key point of trunk, thereby realize that motion-captured effect step comprises:
Step S51: will carry out convergent-divergent through the human body contour outline of aftertreatment;
Step S52: the human body contour outline of convergent-divergent is carried out refinement;
Step S53: the human body contour outline of refinement is enlarged, expand original size to;
Step S54: once more profile is carried out refinement;
Step S55: find neighborhood territory pixel greater than 2 node, and get its center-of-gravity value, be set at the gravity center of human body;
Step S56: search for up and down along center of gravity, find node, be set at head and waist;
Step S57: along about center of gravity, search for, find left arm and right arm, and proportionally determine ancon and shoulder with eccentricity;
Step S58: 9 key points will determining compare with former frames, obtain comparatively stable and trunk position accurately.
The first step as shown in Figure 1 is an images acquired.This method adopts the input of binocular vision video.Among the figure, (z) expression is the coordinate of world coordinate system for x, y; (xL, yL) and (xR, yR) pixel coordinate of the same object of expression in left figure and right figure.
(1) mostly the information of Digital Image Processing is two-dimensional signal, and the process information amount is very big.The piece image here with two-dimensional function f (x, y) expression, x wherein, y is a two-dimensional coordinate, f (x, y) expression point (x, colouring information y).Camera is gathered all optical information in the camera lens from the space, these information enter after the computing machine, is converted to the color model that meets computer standard, carries out Digital Image Processing with the program of entering, and guarantees the continuity and the real-time of video.From the image of gathering, each pixel is handled 76800 pixels of 320 * 240 pixels altogether.The initial effect of video of gathering as shown in Figure 1.Project all operations and computing 320 * 240 pixels that all are based on this each frame subsequently.In the binocular vision, same pixel about the position difference of imaging among two figure, and the size of position difference, reflection be the degree of depth of image.Relatively moving of two pixels can be calculated by the coupling of pixel.Method of the present invention is utilized these information, auxiliary finishing cutting apart of prospect and background.As shown in Figure 2, the utilization of binocular information is that the cost that two width of cloth figure mate about usefulness realizes.What wherein P represented is the position of certain pixel in left figure, and P+d represents the position of this pixel in right figure, and what d represented is exactly the parallax (Display) of this pixel.
(2) the present invention is made up of two parts in the process of utilizing the binocular depth information.
Step 1: the coupling cost in that pixel xi calculates is divided into four groups (maximal value of parallax d is set at 32) according to different parallax value:
A group: pixel xiThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d>16, represent that this pixel belongs to prospect very much;
B group: pixel xiThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d≤16and d>12, represent that this pixel has the very big prospect that may belong to;
C group: pixel xiThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d≤12and d>5, represent that this pixel has the very big background that may belong to;
D group: pixel xiThe parallax that mates is most arranged, promptly optimum parallax (Disparity), the degree of mating most) d≤5, represent that this pixel belongs to background very much.
Under such hypothesis, the present invention needs less time that pixel is divided into four groups, rather than each pixel is carried out 32 possible parallaxes suppose.
Step 2: set the suitable data cost value for figure cuts algorithm.Data item of the present invention comprises that respectively this pixel belongs to the cost of prospect or background, uses D respectivelyi(B) and Di(F) expression.The parallax value of pixel is big more, so it to belong to the possibility of prospect big more, so Di(F) value correspondence reduces Di(B) value is corresponding to be increased.By such corresponding relation, the present invention proposes a corresponding scheme, express with following formula:
For all t=A, B, C, D, λ
t>0.Wherein
What represent is the background model data item that incorporates binocular information, belongs to t=A respectively, B, C, four groups of D.D
i(B) expression is the background segment data item of monocular vision.λ
tBe the parameter of binocular data cost, what i represented is pixel coordinate.
What represent is the foreground model data item that incorporates binocular information.The parallax value that d represents (Disparity).c
tWhat represent is the parameter of control d.
As shown in Figure 3, figure cut algorithm max-flow or the network flow graph of minimal cut.P wherein, what q represented is two adjacent pixels.Shown in Figure 4 is the process flow diagram that figure cuts algorithm, comprises the assignment of front end and the partitioning portion of rear end.
(3) figure cuts the important component part that algorithm is a background segment, and it to the effect that utilizes the principle of max-flow or minimal cut, the pixel in the image is cut apart according to certain path, and which is calculated belong to prospect or and background respectively.
The segmentation problem of prospect or background in the image can be considered as the binary identified problems in the computer vision field.If pixel i belongs to prospect, the label f of this pixel of mark theni=F, F refers to prospect.In like manner, if this pixel belongs to background, then be labeled as fi=B.Correspond to two-value label problem, label set only comprises two labels.Figure cuts the weighted graph that algorithm constructs and comprises two corresponding with it summit s and t.As shown in Figure 3, among the figure, left figure is the weighted graph G that provides by 3 * 3 original image structure, G=<V, ε 〉, wherein V is a vertex set, is to be called source node S respectively and terminal node T is dimerous by ordinary node and two.Wherein S and the T two-value label of representing prospect and background respectively for summit ε representative be the limit that is connected the summit, the weights size on limit is represented with the thickness of simplification in last figure.
Flow process such as Fig. 4 that dynamic figure cuts.Comprise data item and level and smooth in the energy function, being provided with of they directly affects figure and cuts the final segmentation result of algorithm.That Fig. 5 represents is several groups of Video Segmentation results of the present invention, and wherein left side 3 width of cloth figure are left figure images of doing video in the input video, and right side 3 width of cloth figure are the results after cutting apart.
(4) low-pass filter that designed in the frequency domain of the present invention comes smooth boundary.Along boundary curve C, the boundary curve of the process of edge smoothing of the present invention as shown in Figure 6, the picture left above is represented the input source image, top right plot is represented the result cut apart; The flat prospect of spending that lower-left figure represents or the edge of background, bottom-right graph is represented is result after level and smooth.Point sequence z (i)=[x (i), y (i)] its complex representation form that obtains of sampling at certain intervals is:
z(i)=x(i)+jy(i)
The Fourier transform of discrete z (i) is:
In the formula, j, u, K represent complex symbol respectively, frequency and constant term, f (u) is the Fourier transform of z (i), the Fourier that is called the border is described son, is the expression of boundary point sequence in frequency domain.By the Fourier transform theory as can be known, high fdrequency component comprises details, low frequency component decision global shape.Curve is because jagged ability is rough, and high fdrequency component is contained in these rough zones.The HFS of f (u) is carried out filtering just can obtain smooth curve.The present invention defines the high-frequency energy of low frequency energy ratio and filtering 5%:
Wherein || be modulo operation.Getting the minimum l value that r (l)>0.95 is set up is the cutoff frequency of low-pass filter.Utilize the character of fourier coefficient
(
Be the conjugate complex number of f).In coefficient f (u), the radio-frequency component of cancellation in from l to the K-1-l scope.Carry out inverse fourier transform again, the part of curve sudden change has obtained smoothly.
Being motion-captured result of the present invention as shown in Figure 7, wherein is two two field pictures of left figure in the video in the left hand view, and the right side is key point and the skeleton that segmentation result has extracted.Key point represents with circle that skeleton is represented with line.
(5) the present invention motion-captured on the basis of cutting apart comprises three steps,
Step 1: the result that will cut apart carries out aftertreatment, obtains level and smooth and stable relatively contour area, cuts apart owing to relate to profile, so the border does not need accurate calculating.Under will not situation, can finish the skeleton motion tracking effect that this paper needs preferably than macroscopic-void.
Step 2: the profile that will cut apart positions, and determines the basic comprising of nine points.Comprising A1, A2, A3Be representative group, A herein1, A2, A3, A4, A5, A6, A7, A8, A9Nine points.A1, A2, A3Represent three points of head and trunk respectively, A4, A5, A6And A7, A8, A9Represent three points of left arm and right arm respectively.
Step 3: the order of nine points being installed the skeleton profile connects, and finishes motion-captured.
The above; only be the embodiment among the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected, all should be encompassed within the protection domain of claims of the present invention.