Summary of the invention
In view of the above technical problems, the invention discloses a kind of guidance robot with voice and image identification function,It can be realized automatic identification user type, provide specific guide service for different user groups, and can be realized and useThe simple exchange at family.
To achieve the above object, the invention provides the following technical scheme:
A kind of guidance robot with voice and image identification function, specifically includes: voice-input unit, Image AcquisitionUnit, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control listMember, storage unit and output unit;
The voice-input unit is for acquiring voice messaging;
For described image acquisition unit for acquiring image information, the image information of described image acquisition unit acquisition includes ringBorder image and character image;
The touch input unit is for assisting user to input;
The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoRow processing, and processing result is sent to decision package;
Described image processing unit be used for receive image acquisition units acquisition image information, to received image information intoRow processing, and processing result is sent to decision package;
The text-processing unit is used to receive the input of touch input unit, handles received information, and willProcessing result is sent to decision package;
The decision package is used to receive the letter that Audio Processing Unit, image processing unit and text-processing unit are sentBreath, in summary information determines the information that the motion profile of robot and needs export, and is respectively sent to motion control unitAnd output unit;
The storage unit for storing the corresponding attractions related information of various language, various languages and corresponding child, inPupil, four kinds of different crowds of adult and the elderly sound and image template, the attractions related information further comprisesSight spot map, simple problem and corresponding answer for the sight spot, to various languages and corresponding child, students in middle and primary schools, adultThe guide information at the sight spot of four kinds of different crowds of people and the elderly, the guide information further comprises voice and image information;
The output unit includes speech player and display screen, for exporting sight spot information.
The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoSpecific step is as follows for row processing:
Pre-processed for collected voice messaging, it is described pretreatment include determine sound source main body, filter out noise andSpeech enhan-cement;
Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, the yearAge section includes child, students in middle and primary schools, adult and the elderly.
The pretreated voice data carries out speech analysis and identifies that specific step is as follows: for pretreatedVoice messaging carries out sub-frame processing, and every frame length is 25ms, carries out to downlink data upon handover plus Hamming window processing, for treated voiceData carry out feature extraction, determine fundamental frequency and MFCC coefficient, by what is stored in the fundamental frequency of extraction and MFCC coefficient and storage unitThe fundamental frequency and MFCC coefficient of all types of sound patterns are compared, select the highest languages of matching probability and age of user section asFinal matching results.
The ambient image that described image processing unit is acquired based on image acquisition units generates cartographic information, and by the mapInformation is sent to decision package.
The character image that described image processing unit is acquired based on image acquisition units determines the tool of the affiliated age bracket of userBody method is as follows: determining user's height information based on collected character image, while extracting human face region, for the people of extractionFace area image is pre-processed, and the preprocessing process includes light compensation, the greyscale transformation, histogram equalization of facial imageChange, normalization, geometric correction and filtering processing;Feature extraction, the feature packet of extraction are carried out for pretreated facial imageEyes, nose, ear, mouth, hair line feature are included, by user's height information and the eyes extracted according to facial image, noseSon, ear, mouth, hair line feature are compared with iconic model pre-stored in storage unit, select matching probability mostHigh age of user section is as final matching results.
The text-processing unit is used to receive the input of touch input unit, and input information includes languages and/or ageThe problem of section, can also input the problem related to scenic spot, and text-processing unit inputs user is sent to after carrying out text-processingDecision package.
The decision package receives the cartographic information that image processing unit is sent, will be pre- in the cartographic information and storage unitThe cartographic information first stored is matched, and carries out path planning based on preset path planning algorithm, and by the path after planningIt is sent to motion control unit.
The decision package receives the age of user section final matching results M1 and image procossing that Audio Processing Unit is sentThe age of user section final matching results M2 that unit is sent, according to the matching probability k1 and image processing unit of Audio Processing UnitMatching probability k2, determine the confidence level r1 and r2 of matching result M1 and M2, whereinIt is based onMatching result and confidence level determine that the affiliated age bracket of end user, formula are as follows: Age=r1*M1+r2*M2.
The languages and the finally determining affiliated age bracket of user that the decision package is sent based on Audio Processing Unit, are being depositedThe voice messaging of the suitable user type is selected in the pre-stored voice data of storage unit, and is carried out by output unit defeatedOut.
The voice-input unit is microphone, and described image acquisition unit is camera, and the touch input unit isTangible formula display screen.
Compared with prior art, the beneficial effects of the present invention are:
The guidance robot with voice and image identification function can pass through voice messaging, the image of acquisition userInformation and the auxiliary information of user's input, independently judge user type, and the type based on user selects suitable information defeatedForm out can be realized and provide specific guide service for different user groups.Guidance robot can be for use simultaneouslyThe enquirement realization at family is exchanged with the simple of user.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hairEmbodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative effortsExample, shall fall within the protection scope of the present invention.
A kind of guidance robot with speech recognition and image identification function, including voice-input unit, Image AcquisitionUnit, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control listMember, storage unit and output unit;
The voice-input unit is for acquiring voice messaging;
Described image acquisition unit is for acquiring image information;
The image information of described image acquisition unit acquisition includes ambient image and character image;
The touch input unit is for assisting user to input;
The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoRow processing, and processing result is sent to decision package;
Since scenic environment is more noisy, pre-processed for collected voice messaging, the pre- placeReason includes determining sound source main body, filters out noise and speech enhan-cement;
Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, institute's predicateKind includes Chinese, and English, the common languages such as French, the age bracket includes child, students in middle and primary schools, adult and the elderly;
Concrete sound analysis and identification process include carrying out sub-frame processing, every frame length for pretreated voice messagingFor 25ms, downlink data upon handover is carried out plus Hamming window is handled.Carry out feature extraction for treated voice data, determine fundamental frequency andMFCC coefficient, by the fundamental frequency of all types of sound patterns stored in the fundamental frequency of extraction and MFCC coefficient and storage unit and MFCC systemNumber is compared, and selects the highest languages of matching probability and age of user section as final matching results.
Fundamental frequency refers to vocal cords with the frequency of air-flow periodic vibration, and fundamental frequency is also the component that frequency is minimum in natural language,Different gender, between the age, fundamental frequency has biggish difference, as the fundamental frequency of child is higher than adult, mel-frequency cepstrum coefficientMFCC can embody distribution of the energy in different frequency domains of voice signal, be the ginseng that the aural signature based on human ear extractsNumber, also can determine whether out the substantially age of speaker generally according to subjective experience, moreover, MFCC contains the letter other than some sense of hearingsBreath, therefore, can estimate the age of speaker according to MFCC.
The specific acquisition process of the MFCC parameter is as follows:
The voice messaging of acquisition is normalized, the data of matrix form are obtained;
FFT transform is carried out to the data of matrix form, obtains short-time energy spectrum Xn(k);
Building filter is filtered short-time energy spectrum, obtains Coefficient m (i),Wherein p is number of filter, HiIt (k) is i-th of filter,
Wherein, f [i] is the centre frequency of i-th of filter, meets the initial frequency and neighbor filter of each filterCentre frequency it is identical;
Logarithmic energy is asked to the output of filter, finally carries out dct transform, obtains MFCC parameter:
Described image processing unit be used for receive image acquisition units acquisition image information, to received image information intoRow processing, and processing result is sent to decision package;
The ambient image that image processing unit is acquired based on image acquisition units generates cartographic information, and by the cartographic informationIt is sent to decision package;
The character image that image processing unit is acquired based on image acquisition units determines the affiliated age bracket of user;
The character image that image processing unit is acquired based on image acquisition units determines the specific side of the affiliated age bracket of userMethod is as follows:
User's height information is determined based on collected character image, while extracting human face region, for the face of extractionArea image is pre-processed, and the preprocessing process includes light compensation, the greyscale transformation, histogram equalization of facial imageChange, normalization, geometric correction and filtering processing.
For pretreated facial image carry out feature extraction, the feature of extraction include eyes, nose, ear, mouth,Hair line feature, by user's height information and the eyes extracted according to facial image, nose, ear, mouth, hair line featureIt is compared with iconic model pre-stored in storage unit, selects the highest age of user section of matching probability as finalWith result.
The text-processing unit is used to receive the input of touch input unit, handles received information, and willProcessing result is sent to decision package;
User can input languages and age bracket by touch input unit, can also input the problem related to scenic spot, textThe problem of processing unit inputs user is sent to decision package after carrying out text-processing;
The decision package is used to receive the letter that Audio Processing Unit, image processing unit and text-processing unit are sentBreath, in summary the information of unit determines the information that the motion profile of robot and needs export, and is respectively sent to movement controlUnit and output unit processed;
Decision package receives the cartographic information that image processing unit is sent, which is deposited in advance with storage unitThe cartographic information of storage is matched, and carries out path planning based on preset path planning algorithm, and the path after planning is sentTo motion control unit;
Decision package receives the age of user section final matching results M1 and image processing unit that Audio Processing Unit is sentThe age of user section final matching results M2 of transmission, according to the matching probability k1 of Audio Processing Unit and of image processing unitWith probability k2, the confidence level r1 and r2 of matching result M1 and M2 are determined, whereinBased on matchingAs a result determine that the affiliated age bracket of end user, formula are as follows with confidence level: Age=r1*M1+r2*M2.
The languages and the finally determining affiliated age bracket of user that decision package is sent based on Audio Processing Unit, it is single in storageThe voice messaging of the suitable user type is selected in the pre-stored voice data of member, and is exported by output unit.
The motion control unit is used to receive the routing information of decision package transmission, and controls pathfinder aircraft based on the informationThe motion profile of device people;
The storage unit for storing the corresponding attractions related information of various language, various languages and corresponding child, inPupil, four kinds of different crowds of adult and the elderly sound and image template, the attractions related information further comprisesSight spot map, simple problem and corresponding answer for the sight spot, to various languages and corresponding child, students in middle and primary schools, adultThe guide information at the sight spot of four kinds of different crowds of people and the elderly, the guide information further comprises voice and image information;
The customer problem and storage unit that the decision package can send text-processing unit or Audio Processing Unit are pre-The simple problem for the sight spot first stored is matched, and corresponding answer is sent to output unit and is exported.
The output unit includes speech player and display screen, for exporting sight spot information.
The voice-input unit is microphone.
Described image acquisition unit is camera.
The touch input unit is tangible formula display screen.
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosureUnit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrateThe interchangeability of part and software generally describes each exemplary composition and step according to function in the above description.These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Professional technician can use different methods to achieve the described function each specific application, but this realizationIt should not be considered as beyond the scope of the present invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processorThe combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory(ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical fieldIn any other form of storage medium well known to interior.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effectsIt is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the inventionProtection scope, all any modification, equivalent substitution, improvement and etc. within the scope of the present invention, done should be included in this hairWithin bright protection scope.