Summary of the invention
The technical problem to be solved is to provide a kind of based on sound localization with the intelligence of Face datectionEnergy revolute's method, it is possible not only to improve positional accuracy, and can be greatly improved robotThe personalize effect mutual with user.
The present invention solves above-mentioned technical problem by following technical proposals: a kind of based on sound localizationIntelligent robot rotating method with Face datection, it is characterised in that it comprises the following steps:
Step one, receives and wakes up word up and start;
Step 2, uses many Mikes sound localization technology to determine the general direction of target person, and controls machineHead part turns to target direction;
Step 3, head photographic head is turned to the direction of target person to implement Face detection technology, is taken the photograph by headAs head gathers image;
Step 4, after head camera collection image, is carried out more accurately based on human face detection techLocation, is finally rotated further by robot head to quasi goal speaker direction.
Preferably, described step one is converted into syllable waking up word up, is further decomposed into aligned phoneme sequence.
Preferably, many Mikes sound localization technology that described step 2 uses is that employing is poor based on the time of adventThe voice location technology of method.
Preferably, described being divided into two steps based on reaching time difference method, the first step calculates sound source and arrives eachThe relative time of individual mike is poor, and second step combines the physical arrangement of microphone array and obtains the position of sound sourcePut.
Preferably, in the described first step, time delay estimation is whether accurate, directly determines the essence of second step locationDegree.
Preferably, described time delay is estimated to use broad sense cross-correlation function method.
Preferably, described Face detection technology is to use Face datection algorithm based on computer vision to come in factThe now location to face.
Preferably, the principle of described Face datection algorithm is by camera collection digitized video, to imageData carry out feature analysis and extraction, judge whether to comprise in image face by detection algorithm, and obtainThe position of face.
The most progressive effect of the present invention is: sound localization is tied mutually by the present invention with human face detection techClose, be possible not only to improve positional accuracy, and the personification that robot is mutual with user can be greatly improvedChange effect.The present invention realizes selective actuation location, and faster, precision is higher for locating speed, and robot is handed overThe effect that personalizes in Hu is more preferable.
Detailed description of the invention
Present pre-ferred embodiments is given below, to describe technical scheme in detail.
Present invention intelligent robot based on sound localization and Face datection rotating method comprises the following steps:
Step one, receives and wakes up word up and start;It is converted into syllable waking up word up, is further decomposed into phonemeSequence, such as, close electric light-> guan bi dian deng-> g uan b i d ian d deng,The corresponding acoustic model of each phoneme, enters the audio signal of collection and the phoneme model waking up word upRow coupling.
Step 2, uses many Mikes sound localization technology to determine the general direction of target person, and controls machineHead part turns to target direction;The many Mikes sound localization technology used in the present invention is to use based on arrivingReach the voice location technology of time difference method.It is divided into two steps based on reaching time difference method, first (firstStep) calculating sound source, to arrive the relative time of each mike poor, and then (second step) combines mikeThe physical arrangement of array obtains the position of sound source.What in the first step, time delay was estimated is whether accurate, directly determinesThe precision of second step location.
Delay time estimation method kind is a lot, and the present invention uses broad sense cross-correlation function method.Broad sense cross-correlation letterNumber method is by seeking the crosspower spectrum between two signals, and gives certain weighting in frequency domain to suppress noiseWith the impact of reflection, then inverse transformation is to time domain, thus obtains the cross-correlation function between two signals.This is mutualRelative time delay between correlation function peak that is two signal.
If the model that two mikes receive signal is formula (1) and (2):
X1 (n)=s (n)+n1(n)……………………(1)
x2(n)=s (n)+n2(n)……………………(2)
S (n) is raw tone, n1(n) and n2N () is noise signal
The broad sense cross-correlation function R of two microphone signals12(τ) formula (3) it is represented by:
Wherein X1(ω) andIt is respectively x1(n) and x2The Fourier transform of (n), ψ12Add for broad sense cross-correlationWeight function, for different noises and reflection case, can select different weighting functions.
It is sharp-pointed that broad sense cross-correlation weighting function makes the broad sense cross-correlation function of two microphone signals have comparisonPeak value.It is the time delay between two mikes at the broad sense cross-correlation function peak value of two microphone signals.
In the present invention, it is contemplated that the head of robot is circular, multiple (general no less than 3) wheatsThe position of gram wind is evenly distributed in the surrounding of head, forms a circular microphone array and measures soundThe time difference of source difference mike.It is described in detail as a example by three mikes below, three wheatsGram wind is respectively designated as MIC1, MIC2, MIC3, and chip is responsible for gathering the sound of this No. three mikeData also update data buffer storage.Once receive positioning instruction from outside, then according to the multichannel wheat cached beforeThe speech data of gram elegance collection, the delay sampling calculated between different voice data is counted, then with adoptingNumber of samples/sample frequency can obtain the time difference between voice data.After time delay has been estimated, then estimateThe coordinate of meter sound source.Owing to two mikes may determine that a Hyperbolic Equation.Three mikes are permissibleDetermine two Hyperbolic Equations.Two hyperbolies may determine that the coordinate of a two dimensional surface.Calculate twoThe intersection point of individual Hyperbolic Equation, is the coordinate of sound source.
Step 3, head photographic head is turned to the direction of target person to implement Face detection technology, is taken the photograph by headAs head gathers image;Face detection technology is to use Face datection algorithm based on computer vision to realizeLocation to face.The principle of Face datection algorithm is by camera collection digitized video, to picture numberAccording to carrying out feature analysis and extraction, judge whether image comprises face by detection algorithm, and obtain peopleThe position of face (may have multiple).
Face datection algorithm main method in early days is template matching, subspace method, deforming template coupling etc..The present invention use the most more advanced maturation based on data-driven, the method for machine learning.AsThe basis of Face datection algorithm, one had the storehouse of a large amount of general portrait to carry out feature analysis by we, thanForm and position, face mask edge, colour of skin etc. such as crucial face.To these substantial amounts of data andFeature is added up, and generates model by the method for machine learning.In present invention application Face detection technologyTime, by product camera collection to real-time imaging, then and model carry out character extraction,Compare, finally export the result of Face datection.
Face datection generally can export multiple result.If having many individuals such as fruit product front, Face datectionThe position of multiple face and correspondence thereof can be detected.In our application scenarios, choose maximum oneResult.Because this face of the expression of maximum is closest, and accuracy is the highest.Choose maximumA result, then calculate face center deviation picture center citing, be converted to face deviation photographic headThe angle of axis.Computing formula is formula (4):
Deviation angle=(| face center x coordinate-picture center x coordinate |) * camera lens Radix Rumicis angle/pictureFace width degree ... ... ... ... (4)
If photographic head is 140 ° of wide-angle lens, screen resolution is 1024*768, the owner's face detectedCentre coordinate is (400,380), then deviation angle is: (1024/2-400) * 140 °/1024=15.3 °.Show that face is in photographic head axis to the left 15.3.
Step 4, after head camera collection image, is carried out more accurately based on human face detection techLocation, is finally rotated further by robot head to quasi goal speaker direction.
Being all to interact with people in view of anthropomorphic domestic robot, the present invention can determine that sound source is substantiallyOn the basis of direction, control head and photographic head turns to target direction, then catch figure by photographic headPicture, and carry out more accurate target person location based on human face detection tech.Although single Face datectionPrecision is high, but it is limited to the coverage rate of camera lens, first has to ensure that face occurs in image/video,Location could be started.
The present invention can customize and specifically wake up word up, only after receiving and specifically waking up word up, and just meetingStart finder.During location, determine the substantially sound of target person initially with many Mikes sound localization technologySound direction, and control robot head and turn to target direction, then by head camera collection image,And position more accurately based on human face detection tech, finally it is rotated further by robot head and quasi goal is saidWords people direction.The core technology related in the present invention includes sound localization technology based on multiple mikesWith human face detection tech based on image.
Particular embodiments described above, solves the technical problem that the present invention, technical scheme and usefulEffect is further described, and be it should be understood that and the foregoing is only the concrete real of the present inventionExecute example, be not limited to the present invention, all within the spirit and principles in the present invention, that is done appointsWhat amendment, equivalent, improvement etc., should be included within the scope of the present invention.