Movatterモバイル変換


[0]ホーム

URL:


CN110070865A - A kind of guidance robot with voice and image identification function - Google Patents

A kind of guidance robot with voice and image identification function
Download PDF

Info

Publication number
CN110070865A
CN110070865ACN201910264736.2ACN201910264736ACN110070865ACN 110070865 ACN110070865 ACN 110070865ACN 201910264736 ACN201910264736 ACN 201910264736ACN 110070865 ACN110070865 ACN 110070865A
Authority
CN
China
Prior art keywords
image
voice
processing unit
information
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910264736.2A
Other languages
Chinese (zh)
Other versions
CN110070865B (en
Inventor
孙昌勋
许志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ronglian Ets Information Technology Co Ltd
Original Assignee
Beijing Ronglian Ets Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ronglian Ets Information Technology Co LtdfiledCriticalBeijing Ronglian Ets Information Technology Co Ltd
Priority to CN201910264736.2ApriorityCriticalpatent/CN110070865B/en
Publication of CN110070865ApublicationCriticalpatent/CN110070865A/en
Application grantedgrantedCritical
Publication of CN110070865BpublicationCriticalpatent/CN110070865B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The present invention relates to a kind of guidance robot with voice and image identification function, specifically includes: voice-input unit, image acquisition units, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control unit, storage unit and output unit.The decision package is used to receive the information that Audio Processing Unit, image processing unit and text-processing unit are sent, and in summary information determines the information that the motion profile of robot and needs export, and is respectively sent to motion control unit and output unit.The guidance robot, the auxiliary information that can be inputted by acquiring voice messaging, image information and the user of user, independently judge user type, and the type based on user selects suitable information output form, can be realized and provide specific guide service for different user groups.Guidance robot can be exchanged for the enquirement realization of user with the simple of user simultaneously.

Description

A kind of guidance robot with voice and image identification function
Technical field
The present invention relates to robot field, in particular to a kind of guidance robot with voice and image identification function.
Background technique
Tour guide's occupation plays the role of very important, however most trip during the high speed development of China's tourist industryTour guide's work at trip sight spot all has the characteristics that repeated high, creative low.The labour of one side high reproducibility wastes greatlyThe human resources of amount;The a certain tour guide's work of another aspect long campaigns also inevitably makes one to generate stagnant idle mood, reduces service quality.In addition, tourist would generally wish to obtain different information in time, while also wish with the rise of various emerging mode of tourismHope tour guide mode also more attractive;Existing explains device may be implemented to provide different language for the crowd with different mother tonguesThe explanation of type, but explanation content and form is single, it is fixed, and do not have human-computer interaction function.However traditional guide serviceThese demands are not able to satisfy gradually with explains device, this proposes city for the birth and development of the guiding robot towards tourist industryField demand.
Robot is in some special occasions, such as square, exhibition center, museum, science and technology center, shop and tourist quarters etc.Place, personnel are not intensive, and working environment is fixed, and robot can undertake the work of part guide, carry out simply fixed drawLead and explain work.The quantity of attendant not only can be reduced, and scientific and technological element and interest can be increased, is attractedThe participation of Children and teenager.
But existing guidance robot operating mode is also more single, can only be transmitted with fixed mode to user scheduledInformation can not independently be directed to the difference of user group, and providing has targetedly information, to meet the needs of different crowd.
Summary of the invention
In view of the above technical problems, the invention discloses a kind of guidance robot with voice and image identification function,It can be realized automatic identification user type, provide specific guide service for different user groups, and can be realized and useThe simple exchange at family.
To achieve the above object, the invention provides the following technical scheme:
A kind of guidance robot with voice and image identification function, specifically includes: voice-input unit, Image AcquisitionUnit, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control listMember, storage unit and output unit;
The voice-input unit is for acquiring voice messaging;
For described image acquisition unit for acquiring image information, the image information of described image acquisition unit acquisition includes ringBorder image and character image;
The touch input unit is for assisting user to input;
The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoRow processing, and processing result is sent to decision package;
Described image processing unit be used for receive image acquisition units acquisition image information, to received image information intoRow processing, and processing result is sent to decision package;
The text-processing unit is used to receive the input of touch input unit, handles received information, and willProcessing result is sent to decision package;
The decision package is used to receive the letter that Audio Processing Unit, image processing unit and text-processing unit are sentBreath, in summary information determines the information that the motion profile of robot and needs export, and is respectively sent to motion control unitAnd output unit;
The storage unit for storing the corresponding attractions related information of various language, various languages and corresponding child, inPupil, four kinds of different crowds of adult and the elderly sound and image template, the attractions related information further comprisesSight spot map, simple problem and corresponding answer for the sight spot, to various languages and corresponding child, students in middle and primary schools, adultThe guide information at the sight spot of four kinds of different crowds of people and the elderly, the guide information further comprises voice and image information;
The output unit includes speech player and display screen, for exporting sight spot information.
The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoSpecific step is as follows for row processing:
Pre-processed for collected voice messaging, it is described pretreatment include determine sound source main body, filter out noise andSpeech enhan-cement;
Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, the yearAge section includes child, students in middle and primary schools, adult and the elderly.
The pretreated voice data carries out speech analysis and identifies that specific step is as follows: for pretreatedVoice messaging carries out sub-frame processing, and every frame length is 25ms, carries out to downlink data upon handover plus Hamming window processing, for treated voiceData carry out feature extraction, determine fundamental frequency and MFCC coefficient, by what is stored in the fundamental frequency of extraction and MFCC coefficient and storage unitThe fundamental frequency and MFCC coefficient of all types of sound patterns are compared, select the highest languages of matching probability and age of user section asFinal matching results.
The ambient image that described image processing unit is acquired based on image acquisition units generates cartographic information, and by the mapInformation is sent to decision package.
The character image that described image processing unit is acquired based on image acquisition units determines the tool of the affiliated age bracket of userBody method is as follows: determining user's height information based on collected character image, while extracting human face region, for the people of extractionFace area image is pre-processed, and the preprocessing process includes light compensation, the greyscale transformation, histogram equalization of facial imageChange, normalization, geometric correction and filtering processing;Feature extraction, the feature packet of extraction are carried out for pretreated facial imageEyes, nose, ear, mouth, hair line feature are included, by user's height information and the eyes extracted according to facial image, noseSon, ear, mouth, hair line feature are compared with iconic model pre-stored in storage unit, select matching probability mostHigh age of user section is as final matching results.
The text-processing unit is used to receive the input of touch input unit, and input information includes languages and/or ageThe problem of section, can also input the problem related to scenic spot, and text-processing unit inputs user is sent to after carrying out text-processingDecision package.
The decision package receives the cartographic information that image processing unit is sent, will be pre- in the cartographic information and storage unitThe cartographic information first stored is matched, and carries out path planning based on preset path planning algorithm, and by the path after planningIt is sent to motion control unit.
The decision package receives the age of user section final matching results M1 and image procossing that Audio Processing Unit is sentThe age of user section final matching results M2 that unit is sent, according to the matching probability k1 and image processing unit of Audio Processing UnitMatching probability k2, determine the confidence level r1 and r2 of matching result M1 and M2, whereinIt is based onMatching result and confidence level determine that the affiliated age bracket of end user, formula are as follows: Age=r1*M1+r2*M2.
The languages and the finally determining affiliated age bracket of user that the decision package is sent based on Audio Processing Unit, are being depositedThe voice messaging of the suitable user type is selected in the pre-stored voice data of storage unit, and is carried out by output unit defeatedOut.
The voice-input unit is microphone, and described image acquisition unit is camera, and the touch input unit isTangible formula display screen.
Compared with prior art, the beneficial effects of the present invention are:
The guidance robot with voice and image identification function can pass through voice messaging, the image of acquisition userInformation and the auxiliary information of user's input, independently judge user type, and the type based on user selects suitable information defeatedForm out can be realized and provide specific guide service for different user groups.Guidance robot can be for use simultaneouslyThe enquirement realization at family is exchanged with the simple of user.
Detailed description of the invention
A kind of structural frames of guidance robot with speech recognition and image identification function of Fig. 1 embodiment of the present inventionFigure;
A kind of method flow diagram that age of user section is determined by voice of Fig. 2 embodiment of the present invention;
A kind of method flow diagram that age of user section is determined by image of Fig. 3 embodiment of the present invention;
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hairEmbodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative effortsExample, shall fall within the protection scope of the present invention.
A kind of guidance robot with speech recognition and image identification function, including voice-input unit, Image AcquisitionUnit, touch input unit, Audio Processing Unit, image processing unit, text-processing unit, decision package, motion control listMember, storage unit and output unit;
The voice-input unit is for acquiring voice messaging;
Described image acquisition unit is for acquiring image information;
The image information of described image acquisition unit acquisition includes ambient image and character image;
The touch input unit is for assisting user to input;
The Audio Processing Unit be used for receive voice-input unit acquisition voice messaging, to received voice messaging intoRow processing, and processing result is sent to decision package;
Since scenic environment is more noisy, pre-processed for collected voice messaging, the pre- placeReason includes determining sound source main body, filters out noise and speech enhan-cement;
Speech analysis and identification are carried out for pretreated voice data, determines languages and age of user section, institute's predicateKind includes Chinese, and English, the common languages such as French, the age bracket includes child, students in middle and primary schools, adult and the elderly;
Concrete sound analysis and identification process include carrying out sub-frame processing, every frame length for pretreated voice messagingFor 25ms, downlink data upon handover is carried out plus Hamming window is handled.Carry out feature extraction for treated voice data, determine fundamental frequency andMFCC coefficient, by the fundamental frequency of all types of sound patterns stored in the fundamental frequency of extraction and MFCC coefficient and storage unit and MFCC systemNumber is compared, and selects the highest languages of matching probability and age of user section as final matching results.
Fundamental frequency refers to vocal cords with the frequency of air-flow periodic vibration, and fundamental frequency is also the component that frequency is minimum in natural language,Different gender, between the age, fundamental frequency has biggish difference, as the fundamental frequency of child is higher than adult, mel-frequency cepstrum coefficientMFCC can embody distribution of the energy in different frequency domains of voice signal, be the ginseng that the aural signature based on human ear extractsNumber, also can determine whether out the substantially age of speaker generally according to subjective experience, moreover, MFCC contains the letter other than some sense of hearingsBreath, therefore, can estimate the age of speaker according to MFCC.
The specific acquisition process of the MFCC parameter is as follows:
The voice messaging of acquisition is normalized, the data of matrix form are obtained;
FFT transform is carried out to the data of matrix form, obtains short-time energy spectrum Xn(k);
Building filter is filtered short-time energy spectrum, obtains Coefficient m (i),Wherein p is number of filter, HiIt (k) is i-th of filter,
Wherein, f [i] is the centre frequency of i-th of filter, meets the initial frequency and neighbor filter of each filterCentre frequency it is identical;
Logarithmic energy is asked to the output of filter, finally carries out dct transform, obtains MFCC parameter:
Described image processing unit be used for receive image acquisition units acquisition image information, to received image information intoRow processing, and processing result is sent to decision package;
The ambient image that image processing unit is acquired based on image acquisition units generates cartographic information, and by the cartographic informationIt is sent to decision package;
The character image that image processing unit is acquired based on image acquisition units determines the affiliated age bracket of user;
The character image that image processing unit is acquired based on image acquisition units determines the specific side of the affiliated age bracket of userMethod is as follows:
User's height information is determined based on collected character image, while extracting human face region, for the face of extractionArea image is pre-processed, and the preprocessing process includes light compensation, the greyscale transformation, histogram equalization of facial imageChange, normalization, geometric correction and filtering processing.
For pretreated facial image carry out feature extraction, the feature of extraction include eyes, nose, ear, mouth,Hair line feature, by user's height information and the eyes extracted according to facial image, nose, ear, mouth, hair line featureIt is compared with iconic model pre-stored in storage unit, selects the highest age of user section of matching probability as finalWith result.
The text-processing unit is used to receive the input of touch input unit, handles received information, and willProcessing result is sent to decision package;
User can input languages and age bracket by touch input unit, can also input the problem related to scenic spot, textThe problem of processing unit inputs user is sent to decision package after carrying out text-processing;
The decision package is used to receive the letter that Audio Processing Unit, image processing unit and text-processing unit are sentBreath, in summary the information of unit determines the information that the motion profile of robot and needs export, and is respectively sent to movement controlUnit and output unit processed;
Decision package receives the cartographic information that image processing unit is sent, which is deposited in advance with storage unitThe cartographic information of storage is matched, and carries out path planning based on preset path planning algorithm, and the path after planning is sentTo motion control unit;
Decision package receives the age of user section final matching results M1 and image processing unit that Audio Processing Unit is sentThe age of user section final matching results M2 of transmission, according to the matching probability k1 of Audio Processing Unit and of image processing unitWith probability k2, the confidence level r1 and r2 of matching result M1 and M2 are determined, whereinBased on matchingAs a result determine that the affiliated age bracket of end user, formula are as follows with confidence level: Age=r1*M1+r2*M2.
The languages and the finally determining affiliated age bracket of user that decision package is sent based on Audio Processing Unit, it is single in storageThe voice messaging of the suitable user type is selected in the pre-stored voice data of member, and is exported by output unit.
The motion control unit is used to receive the routing information of decision package transmission, and controls pathfinder aircraft based on the informationThe motion profile of device people;
The storage unit for storing the corresponding attractions related information of various language, various languages and corresponding child, inPupil, four kinds of different crowds of adult and the elderly sound and image template, the attractions related information further comprisesSight spot map, simple problem and corresponding answer for the sight spot, to various languages and corresponding child, students in middle and primary schools, adultThe guide information at the sight spot of four kinds of different crowds of people and the elderly, the guide information further comprises voice and image information;
The customer problem and storage unit that the decision package can send text-processing unit or Audio Processing Unit are pre-The simple problem for the sight spot first stored is matched, and corresponding answer is sent to output unit and is exported.
The output unit includes speech player and display screen, for exporting sight spot information.
The voice-input unit is microphone.
Described image acquisition unit is camera.
The touch input unit is tangible formula display screen.
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosureUnit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrateThe interchangeability of part and software generally describes each exemplary composition and step according to function in the above description.These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Professional technician can use different methods to achieve the described function each specific application, but this realizationIt should not be considered as beyond the scope of the present invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processorThe combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory(ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical fieldIn any other form of storage medium well known to interior.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effectsIt is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the inventionProtection scope, all any modification, equivalent substitution, improvement and etc. within the scope of the present invention, done should be included in this hairWithin bright protection scope.

Claims (10)

3. a kind of guidance robot with voice and image identification function according to claim 2, which is characterized in that institutePretreated voice data is stated to carry out speech analysis and identification specific step is as follows: for pretreated voice messaging intoRow sub-frame processing, every frame length are 25ms, carry out to downlink data upon handover plus Hamming window is handled, voice data carries out spy for treatedSign is extracted, and determines fundamental frequency and MFCC coefficient, all types of sound that will be stored in the fundamental frequency of extraction and MFCC coefficient and storage unitThe fundamental frequency and MFCC coefficient of template are compared, and select the highest languages of matching probability and age of user section as final matching knotFruit.
5. a kind of guidance robot with voice and image identification function according to claim 1, which is characterized in that instituteThe character image that image processing unit is acquired based on image acquisition units is stated to determine the affiliated age bracket of user the specific method is as follows:User's height information is determined based on collected character image, while extracting human face region, for the human face region image of extractionPre-processed, the preprocessing process include the light compensation of facial image, greyscale transformation, histogram equalization, normalization,Geometric correction and filtering processing;Feature extraction is carried out for pretreated facial image, the feature of extraction includes eyes, noseSon, ear, mouth, hair line feature, by user's height information and the eyes extracted according to facial image, nose, ear, mouthBar, hair line feature be compared with iconic model pre-stored in storage unit, select the highest user of matching probability yearAge section is used as final matching results.
CN201910264736.2A2019-04-032019-04-03Guide robot with voice and image recognition functionActiveCN110070865B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910264736.2ACN110070865B (en)2019-04-032019-04-03Guide robot with voice and image recognition function

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910264736.2ACN110070865B (en)2019-04-032019-04-03Guide robot with voice and image recognition function

Publications (2)

Publication NumberPublication Date
CN110070865Atrue CN110070865A (en)2019-07-30
CN110070865B CN110070865B (en)2021-07-13

Family

ID=67367006

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910264736.2AActiveCN110070865B (en)2019-04-032019-04-03Guide robot with voice and image recognition function

Country Status (1)

CountryLink
CN (1)CN110070865B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110569726A (en)*2019-08-052019-12-13北京云迹科技有限公司interaction method and system for service robot
CN110569806A (en)*2019-09-112019-12-13上海软中信息系统咨询有限公司Man-machine interaction system
CN110797034A (en)*2019-09-232020-02-14重庆特斯联智慧科技股份有限公司Automatic voice and video recognition intercom system for caring old people and patients
CN112287925A (en)*2020-10-192021-01-29南京数件技术研究院有限公司Mathematics system of judging questions based on real-time orbit is gathered
CN112873201A (en)*2021-01-132021-06-01北京方正数码有限公司Automatic change flow robot
CN113075956A (en)*2021-02-232021-07-06广州城市职业学院Map robot for rural tourism

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104915000A (en)*2015-05-272015-09-16天津科技大学Multisensory biological recognition interaction method for naked eye 3D advertisement
CN108818569A (en)*2018-07-302018-11-16浙江工业大学Intelligent robot system towards public service scene

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104915000A (en)*2015-05-272015-09-16天津科技大学Multisensory biological recognition interaction method for naked eye 3D advertisement
CN108818569A (en)*2018-07-302018-11-16浙江工业大学Intelligent robot system towards public service scene

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110569726A (en)*2019-08-052019-12-13北京云迹科技有限公司interaction method and system for service robot
CN110569806A (en)*2019-09-112019-12-13上海软中信息系统咨询有限公司Man-machine interaction system
CN110797034A (en)*2019-09-232020-02-14重庆特斯联智慧科技股份有限公司Automatic voice and video recognition intercom system for caring old people and patients
CN112287925A (en)*2020-10-192021-01-29南京数件技术研究院有限公司Mathematics system of judging questions based on real-time orbit is gathered
CN112873201A (en)*2021-01-132021-06-01北京方正数码有限公司Automatic change flow robot
CN113075956A (en)*2021-02-232021-07-06广州城市职业学院Map robot for rural tourism

Also Published As

Publication numberPublication date
CN110070865B (en)2021-07-13

Similar Documents

PublicationPublication DateTitle
CN110070865A (en)A kind of guidance robot with voice and image identification function
CN107993665B (en)Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
US20190385480A1 (en)System to evaluate dimensions of pronunciation quality
CN105792752B (en) Computational techniques for the diagnosis and treatment of language-related disorders
BabelEvidence for phonetic and social selectivity in spontaneous phonetic imitation
Sroka et al.Human and machine consonant recognition
CN103810994B (en)Speech emotional inference method based on emotion context and system
WO2015171646A1 (en)Method and system for speech input
WO2022121155A1 (en)Meta learning-based adaptive speech recognition method and apparatus, device and medium
Tran et al.Improvement to a NAM-captured whisper-to-speech system
CN114254096B (en)Multi-mode emotion prediction method and system based on interactive robot dialogue
WO2023185004A1 (en)Tone switching method and apparatus
CN116095357B (en)Live broadcasting method, device and system of virtual anchor
Přibil et al.GMM-based speaker gender and age classification after voice conversion
CN112700520B (en)Formant-based mouth shape expression animation generation method, device and storage medium
JoharParalinguistic profiling using speech recognition
CN113778226A (en)Infrared AI intelligent glasses based on speech recognition technology control intelligence house
WO2006034569A1 (en)A speech training system and method for comparing utterances to baseline speech
CN116366872A (en)Live broadcast method, device and system based on man and artificial intelligence
Senior et al.Liu vs. Liu vs. Luke? Name influence on voice recall
Bernier et al.Toddlers process common and infrequent childhood mispronunciations differently for child and adult speakers
de Menezes et al.A method for lexical tone classification in audio-visual speech
Yang et al.Intelligibility of Word-Initial obstruent consonants in Mandarin-speaking prelingually deafened children with cochlear implants
Zeng et al.Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
Zheng et al.The extraction method of emotional feature based on children's spoken speech

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp