CN110070865A

Movatterモバイル変換

Info

Publication number: CN110070865A
Application number: CN201910264736.2A
Authority: CN
Inventors: 孙昌勋; 许志强
Original assignee: Beijing Ronglian Ets Information Technology Co Ltd
Current assignee: Beijing Ronglian Ets Information Technology Co Ltd
Priority date: 2019-04-03
Filing date: 2019-04-03
Publication date: 2019-07-30
Anticipated expiration: 2039-04-03
Also published as: CN110070865B

Abstract

The present invention relates to a kind of guidance robot with voice and image identification function, specifically includes: voice-input unit, image acquisition units, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control unit, storage unit and output unit.The decision package is used to receive the information that Audio Processing Unit, image processing unit and text-processing unit are sent, and in summary information determines the information that the motion profile of robot and needs export, and is respectively sent to motion control unit and output unit.The guidance robot, the auxiliary information that can be inputted by acquiring voice messaging, image information and the user of user, independently judge user type, and the type based on user selects suitable information output form, can be realized and provide specific guide service for different user groups.Guidance robot can be exchanged for the enquirement realization of user with the simple of user simultaneously.

Description

A kind of guidance robot with voice and image identification function

Technical field

The present invention relates to robot field, in particular to a kind of guidance robot with voice and image identification function.

Background technique

Tour guide's occupation plays the role of very important, however most trip during the high speed development of China's tourist industryTour guide's work at trip sight spot all has the characteristics that repeated high, creative low.The labour of one side high reproducibility wastes greatlyThe human resources of amount；The a certain tour guide's work of another aspect long campaigns also inevitably makes one to generate stagnant idle mood, reduces service quality.In addition, tourist would generally wish to obtain different information in time, while also wish with the rise of various emerging mode of tourismHope tour guide mode also more attractive；Existing explains device may be implemented to provide different language for the crowd with different mother tonguesThe explanation of type, but explanation content and form is single, it is fixed, and do not have human-computer interaction function.However traditional guide serviceThese demands are not able to satisfy gradually with explains device, this proposes city for the birth and development of the guiding robot towards tourist industryField demand.

Robot is in some special occasions, such as square, exhibition center, museum, science and technology center, shop and tourist quarters etc.Place, personnel are not intensive, and working environment is fixed, and robot can undertake the work of part guide, carry out simply fixed drawLead and explain work.The quantity of attendant not only can be reduced, and scientific and technological element and interest can be increased, is attractedThe participation of Children and teenager.

But existing guidance robot operating mode is also more single, can only be transmitted with fixed mode to user scheduledInformation can not independently be directed to the difference of user group, and providing has targetedly information, to meet the needs of different crowd.

Summary of the invention

In view of the above technical problems, the invention discloses a kind of guidance robot with voice and image identification function,It can be realized automatic identification user type, provide specific guide service for different user groups, and can be realized and useThe simple exchange at family.

To achieve the above object, the invention provides the following technical scheme:

A kind of guidance robot with voice and image identification function, specifically includes: voice-input unit, Image AcquisitionUnit, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control listMember, storage unit and output unit；

The voice-input unit is for acquiring voice messaging；

For described image acquisition unit for acquiring image information, the image information of described image acquisition unit acquisition includes ringBorder image and character image；

The touch input unit is for assisting user to input；

The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoRow processing, and processing result is sent to decision package；

Described image processing unit be used for receive image acquisition units acquisition image information, to received image information intoRow processing, and processing result is sent to decision package；

The text-processing unit is used to receive the input of touch input unit, handles received information, and willProcessing result is sent to decision package；

The decision package is used to receive the letter that Audio Processing Unit, image processing unit and text-processing unit are sentBreath, in summary information determines the information that the motion profile of robot and needs export, and is respectively sent to motion control unitAnd output unit；

The storage unit for storing the corresponding attractions related information of various language, various languages and corresponding child, inPupil, four kinds of different crowds of adult and the elderly sound and image template, the attractions related information further comprisesSight spot map, simple problem and corresponding answer for the sight spot, to various languages and corresponding child, students in middle and primary schools, adultThe guide information at the sight spot of four kinds of different crowds of people and the elderly, the guide information further comprises voice and image information；

The output unit includes speech player and display screen, for exporting sight spot information.

The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoSpecific step is as follows for row processing:

Pre-processed for collected voice messaging, it is described pretreatment include determine sound source main body, filter out noise andSpeech enhan-cement；

Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, the yearAge section includes child, students in middle and primary schools, adult and the elderly.

The pretreated voice data carries out speech analysis and identifies that specific step is as follows: for pretreatedVoice messaging carries out sub-frame processing, and every frame length is 25ms, carries out to downlink data upon handover plus Hamming window processing, for treated voiceData carry out feature extraction, determine fundamental frequency and MFCC coefficient, by what is stored in the fundamental frequency of extraction and MFCC coefficient and storage unitThe fundamental frequency and MFCC coefficient of all types of sound patterns are compared, select the highest languages of matching probability and age of user section asFinal matching results.

The ambient image that described image processing unit is acquired based on image acquisition units generates cartographic information, and by the mapInformation is sent to decision package.

The character image that described image processing unit is acquired based on image acquisition units determines the tool of the affiliated age bracket of userBody method is as follows: determining user's height information based on collected character image, while extracting human face region, for the people of extractionFace area image is pre-processed, and the preprocessing process includes light compensation, the greyscale transformation, histogram equalization of facial imageChange, normalization, geometric correction and filtering processing；Feature extraction, the feature packet of extraction are carried out for pretreated facial imageEyes, nose, ear, mouth, hair line feature are included, by user's height information and the eyes extracted according to facial image, noseSon, ear, mouth, hair line feature are compared with iconic model pre-stored in storage unit, select matching probability mostHigh age of user section is as final matching results.

The text-processing unit is used to receive the input of touch input unit, and input information includes languages and/or ageThe problem of section, can also input the problem related to scenic spot, and text-processing unit inputs user is sent to after carrying out text-processingDecision package.

The decision package receives the cartographic information that image processing unit is sent, will be pre- in the cartographic information and storage unitThe cartographic information first stored is matched, and carries out path planning based on preset path planning algorithm, and by the path after planningIt is sent to motion control unit.

The decision package receives the age of user section final matching results M1 and image procossing that Audio Processing Unit is sentThe age of user section final matching results M2 that unit is sent, according to the matching probability k1 and image processing unit of Audio Processing UnitMatching probability k2, determine the confidence level r1 and r2 of matching result M1 and M2, whereinIt is based onMatching result and confidence level determine that the affiliated age bracket of end user, formula are as follows: Age=r1*M1+r2*M2.

The languages and the finally determining affiliated age bracket of user that the decision package is sent based on Audio Processing Unit, are being depositedThe voice messaging of the suitable user type is selected in the pre-stored voice data of storage unit, and is carried out by output unit defeatedOut.

The voice-input unit is microphone, and described image acquisition unit is camera, and the touch input unit isTangible formula display screen.

Compared with prior art, the beneficial effects of the present invention are:

The guidance robot with voice and image identification function can pass through voice messaging, the image of acquisition userInformation and the auxiliary information of user's input, independently judge user type, and the type based on user selects suitable information defeatedForm out can be realized and provide specific guide service for different user groups.Guidance robot can be for use simultaneouslyThe enquirement realization at family is exchanged with the simple of user.

Detailed description of the invention

A kind of structural frames of guidance robot with speech recognition and image identification function of Fig. 1 embodiment of the present inventionFigure；

A kind of method flow diagram that age of user section is determined by voice of Fig. 2 embodiment of the present invention；

A kind of method flow diagram that age of user section is determined by image of Fig. 3 embodiment of the present invention；

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hairEmbodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative effortsExample, shall fall within the protection scope of the present invention.

A kind of guidance robot with speech recognition and image identification function, including voice-input unit, Image AcquisitionUnit, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control listMember, storage unit and output unit；

The voice-input unit is for acquiring voice messaging；

Described image acquisition unit is for acquiring image information；

The image information of described image acquisition unit acquisition includes ambient image and character image；

The touch input unit is for assisting user to input；

Since scenic environment is more noisy, pre-processed for collected voice messaging, the pre- placeReason includes determining sound source main body, filters out noise and speech enhan-cement；

Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, institute's predicateKind includes Chinese, and English, the common languages such as French, the age bracket includes child, students in middle and primary schools, adult and the elderly；

Concrete sound analysis and identification process include carrying out sub-frame processing, every frame length for pretreated voice messagingFor 25ms, downlink data upon handover is carried out plus Hamming window is handled.Carry out feature extraction for treated voice data, determine fundamental frequency andMFCC coefficient, by the fundamental frequency of all types of sound patterns stored in the fundamental frequency of extraction and MFCC coefficient and storage unit and MFCC systemNumber is compared, and selects the highest languages of matching probability and age of user section as final matching results.

Fundamental frequency refers to vocal cords with the frequency of air-flow periodic vibration, and fundamental frequency is also the component that frequency is minimum in natural language,Different gender, between the age, fundamental frequency has biggish difference, as the fundamental frequency of child is higher than adult, mel-frequency cepstrum coefficientMFCC can embody distribution of the energy in different frequency domains of voice signal, be the ginseng that the aural signature based on human ear extractsNumber, also can determine whether out the substantially age of speaker generally according to subjective experience, moreover, MFCC contains the letter other than some sense of hearingsBreath, therefore, can estimate the age of speaker according to MFCC.

The specific acquisition process of the MFCC parameter is as follows:

The voice messaging of acquisition is normalized, the data of matrix form are obtained；

FFT transform is carried out to the data of matrix form, obtains short-time energy spectrum X_n(k)；

Building filter is filtered short-time energy spectrum, obtains Coefficient m (i),Wherein p is number of filter, H_iIt (k) is i-th of filter,

Wherein, f [i] is the centre frequency of i-th of filter, meets the initial frequency and neighbor filter of each filterCentre frequency it is identical；

Logarithmic energy is asked to the output of filter, finally carries out dct transform, obtains MFCC parameter:

The ambient image that image processing unit is acquired based on image acquisition units generates cartographic information, and by the cartographic informationIt is sent to decision package；

The character image that image processing unit is acquired based on image acquisition units determines the affiliated age bracket of user；

The character image that image processing unit is acquired based on image acquisition units determines the specific side of the affiliated age bracket of userMethod is as follows:

User's height information is determined based on collected character image, while extracting human face region, for the face of extractionArea image is pre-processed, and the preprocessing process includes light compensation, the greyscale transformation, histogram equalization of facial imageChange, normalization, geometric correction and filtering processing.

For pretreated facial image carry out feature extraction, the feature of extraction include eyes, nose, ear, mouth,Hair line feature, by user's height information and the eyes extracted according to facial image, nose, ear, mouth, hair line featureIt is compared with iconic model pre-stored in storage unit, selects the highest age of user section of matching probability as finalWith result.

User can input languages and age bracket by touch input unit, can also input the problem related to scenic spot, textThe problem of processing unit inputs user is sent to decision package after carrying out text-processing；

The decision package is used to receive the letter that Audio Processing Unit, image processing unit and text-processing unit are sentBreath, in summary the information of unit determines the information that the motion profile of robot and needs export, and is respectively sent to movement controlUnit and output unit processed；

Decision package receives the cartographic information that image processing unit is sent, which is deposited in advance with storage unitThe cartographic information of storage is matched, and carries out path planning based on preset path planning algorithm, and the path after planning is sentTo motion control unit；

Decision package receives the age of user section final matching results M1 and image processing unit that Audio Processing Unit is sentThe age of user section final matching results M2 of transmission, according to the matching probability k1 of Audio Processing Unit and of image processing unitWith probability k2, the confidence level r1 and r2 of matching result M1 and M2 are determined, whereinBased on matchingAs a result determine that the affiliated age bracket of end user, formula are as follows with confidence level: Age=r1*M1+r2*M2.

The languages and the finally determining affiliated age bracket of user that decision package is sent based on Audio Processing Unit, it is single in storageThe voice messaging of the suitable user type is selected in the pre-stored voice data of member, and is exported by output unit.

The motion control unit is used to receive the routing information of decision package transmission, and controls pathfinder aircraft based on the informationThe motion profile of device people；

The customer problem and storage unit that the decision package can send text-processing unit or Audio Processing Unit are pre-The simple problem for the sight spot first stored is matched, and corresponding answer is sent to output unit and is exported.

The voice-input unit is microphone.

Described image acquisition unit is camera.

The touch input unit is tangible formula display screen.

Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosureUnit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrateThe interchangeability of part and software generally describes each exemplary composition and step according to function in the above description.These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Professional technician can use different methods to achieve the described function each specific application, but this realizationIt should not be considered as beyond the scope of the present invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processorThe combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory(ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical fieldIn any other form of storage medium well known to interior.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effectsIt is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the inventionProtection scope, all any modification, equivalent substitution, improvement and etc. within the scope of the present invention, done should be included in this hairWithin bright protection scope.

Claims

1. a kind of guidance robot with voice and image identification function, specifically includes: voice-input unit, Image Acquisition listMember, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control unit,Storage unit and output unit；

The voice-input unit is for acquiring voice messaging；

For described image acquisition unit for acquiring image information, the image information of described image acquisition unit acquisition includes environment mapPicture and character image；

The touch input unit is for assisting user to input；

The Audio Processing Unit is used to receive the voice messaging of voice-input unit acquisition, at received voice messagingReason, and processing result is sent to decision package；

Described image processing unit is used to receive the image information of image acquisition units acquisition, at received image informationReason, and processing result is sent to decision package；

The text-processing unit is used to receive the input of touch input unit, handles received information, and will processingAs a result it is sent to decision package；

The decision package is used to receive the information that Audio Processing Unit, image processing unit and text-processing unit are sent, comprehensiveIt closes above- mentioned information and determines the information that the motion profile of robot and needs export, and be respectively sent to motion control unit and outputUnit；

The storage unit is for storing the corresponding attractions related information of various language, various languages and corresponding child, middle and primary schoolsRaw, four kinds of different crowds of adult and the elderly sound and image template, the attractions related information is with specifically including sight spotFigure, simple problem and corresponding answer for the sight spot, to various languages and corresponding child, students in middle and primary schools, adult and oldThe guide information at the sight spot of Nian Rensi kind different crowd, the guide information specifically include voice and image information；

2. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteAudio Processing Unit is stated for receiving the voice messaging of voice-input unit acquisition, the tool handled received voice messagingSteps are as follows for body:

It is pre-processed for collected voice messaging, the pretreatment includes determining sound source main body, filters out noise and voiceEnhancing；

Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, the age bracketIncluding child, students in middle and primary schools, adult and the elderly.

3. a kind of guidance robot with voice and image identification function according to claim 2, which is characterized in that institutePretreated voice data is stated to carry out speech analysis and identification specific step is as follows: for pretreated voice messaging intoRow sub-frame processing, every frame length are 25ms, carry out to downlink data upon handover plus Hamming window is handled, voice data carries out spy for treatedSign is extracted, and determines fundamental frequency and MFCC coefficient, all types of sound that will be stored in the fundamental frequency of extraction and MFCC coefficient and storage unitThe fundamental frequency and MFCC coefficient of template are compared, and select the highest languages of matching probability and age of user section as final matching knotFruit.

4. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteIt states image processing unit and cartographic information is generated based on the ambient image that image acquisition units acquire, and the cartographic information is sent toDecision package.

5. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteThe character image that image processing unit is acquired based on image acquisition units is stated to determine the affiliated age bracket of user the specific method is as follows:User's height information is determined based on collected character image, while extracting human face region, for the human face region image of extractionPre-processed, the preprocessing process include the light compensation of facial image, greyscale transformation, histogram equalization, normalization,Geometric correction and filtering processing；Feature extraction is carried out for pretreated facial image, the feature of extraction includes eyes, noseSon, ear, mouth, hair line feature, by user's height information and the eyes extracted according to facial image, nose, ear, mouthBar, hair line feature be compared with iconic model pre-stored in storage unit, select the highest user of matching probability yearAge section is used as final matching results.

6. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteText-processing unit is stated for receiving the input of touch input unit, input information includes languages and/or age bracket, can also be inputtedThe problem of problem related to scenic spot, text-processing unit inputs user, is sent to decision package after carrying out text-processing.

7. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteState decision package and receive the cartographic information that image processing unit is sent, by the cartographic information and storage unit pre-storedlyFigure information is matched, and carries out path planning based on preset path planning algorithm, and the path after planning is sent to movementControl unit.

8. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteWhat the age of user section final matching results M1 and image processing unit for stating decision package reception Audio Processing Unit transmission were sentAge of user section final matching results M2, according to the matching probability k1 of Audio Processing Unit and the matching probability of image processing unitK2 determines the confidence level r1 and r2 of matching result M1 and M2, whereinBased on matching resultDetermine that the affiliated age bracket of end user, formula are as follows with confidence level: Age=r1*M1+r2*M2.

9. a kind of guidance robot with voice and image identification function according to claim 8, which is characterized in that instituteThe languages and the finally determining affiliated age bracket of user that decision package is sent based on Audio Processing Unit are stated, it is preparatory in storage unitThe voice messaging of the suitable user type is selected in the voice data of storage, and is exported by output unit.

10. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in thatThe voice-input unit is microphone, and described image acquisition unit is camera, and the touch input unit is tangible formulaDisplay screen.