Movatterモバイル変換


[0]ホーム

URL:


CN102760431A - Intelligentized voice recognition system - Google Patents

Intelligentized voice recognition system
Download PDF

Info

Publication number
CN102760431A
CN102760431ACN2012102408980ACN201210240898ACN102760431ACN 102760431 ACN102760431 ACN 102760431ACN 2012102408980 ACN2012102408980 ACN 2012102408980ACN 201210240898 ACN201210240898 ACN 201210240898ACN 102760431 ACN102760431 ACN 102760431A
Authority
CN
China
Prior art keywords
speech recognition
intelligentized
recognition system
user
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102408980A
Other languages
Chinese (zh)
Inventor
余金环
陈洪林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI YULIAN INFORMATION TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI YULIAN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI YULIAN INFORMATION TECHNOLOGY Co LtdfiledCriticalSHANGHAI YULIAN INFORMATION TECHNOLOGY Co Ltd
Priority to CN2012102408980ApriorityCriticalpatent/CN102760431A/en
Publication of CN102760431ApublicationCriticalpatent/CN102760431A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The invention discloses an intelligentized voice recognition system, belonging to the technical field of electronic information and comprising multiple background technologies of acoustics, linguistics, artificial intelligence, cloud calculation and the like. Voice (speaking) is a most convenient, rapid and natural interpersonal communication means, natural voice is used as a means of interacting people with a computer, so that the computer has capacities of listening, speaking and comprehending like people, and is the basis of application and development of an intelligentized voice technology. On the basis of research and development of a voice recognition system for many years, multiple innovations are brought out and are mainly concentrated on the structure of the voice recognition system and the specific voice recognition function and the intelligentized characteristic, so that a user effectively and conveniently develops and applies various voice recognition services.

Description

Intelligentized speech recognition system
Technical field
The present invention is an a kind of intelligentized speech recognition software system, belongs to electronic information technical field, has comprised multinomial background technologies such as acoustics, linguistics, artificial intelligence, computer network, cloud computing.
Background technology
Voice (speech) are the most convenient, fast, natural interpersonal communication means, adopt the means of natural-sounding as people and computer interactive, make calculating functional image people the same, have the ability of listening, mediating a settlement and understanding, and are the bases of intelligent sound technical application development.In the required therein various technology,, thereby be chosen as the 21 century previous decade by external numerous medium and expert and will produce one of ten big science and technology progress of significant impact the human life style with the tool challenge of speech recognition technology.
Speech recognition technology is quite complicated; An integrated technology that has comprised acoustics, linguistics, digital signal processing, statistical model, theory of probability and information theory, sound generating mechanism and multidisciplinary technology such as hearing mechanism, artificial intelligence; It is very big to study input human and material resources etc., and required time is relatively also long.
Speech recognition belongs to the category of multi-dimensional model identification and intelligent computer interface, and the basic goal of The Research of Speech Recognition is to work out a kind of machine with auditory function, and directly acceptor's voice command is understood people's intention and made corresponding reaction.In fact, let machine understand people's language, be the human long-term ideal of pursuing always, and demand has a wide range of applications.For example, the computing machine that has speech interface can change people at present to the mode of operation of computer, causes the revolution of operating system; Realize the direct communication between bilingual, promptly a kind of language is directly changed into another kind of language through " speech recognition-mechanical translation-text is synthetic "; The voice world can make the user pass through the direct searching database of voice; Just the phonetic search of similar internet search engine obtains required information, perhaps voice call dialing; This is in specific environment, as seeming extremely important and convenient in the car steering process.
More than these application demands derive from the essential characteristic of voice signal: on the one hand it be people the most naturally, boundary lake instrument the most easily, do not need to do specialized training again, and reaction velocity is fast especially, can reach a millisecond magnitude; Voice signal does not have the restriction of strict direction on the other hand, and can propagate in the dark, be picture, literal or button etc. other look, tactile data institute is irreplaceable.
But; The language that lets computing machine understand the people but is faced with many difficulties; The main the following aspects that embodies: 1. the acoustic feature of voice signal produces very big variation with the voice that are attached thereto before and after it are different, and does not have tangible border between each phonetic unit in the continuous flow; 2. phonetic feature can produce very big difference with the variation of difference, speaker psychology or the physiological status of speaker; 3. the difference of transaudient equipment and ambient noise interference also will directly influence the accurate extraction of phonetic feature; 4. meaning that statement is expressed be relevant with context, factors such as environmental baseline and background when speaking, and the syntactic structure of statement is diverse, and language ambience information almost is that the computing machine automatic speech recognition is unserviceable; 5. speech recognition can not be simple recognition technology in concrete the application, and will form a distributed systems, satisfies a large amount of concurrent speech-recognition services.
The present invention is an intelligentized continuous speech recognition system; Except speech recognition technology self; Emphasis has been made multinomial innovation on the speech recognition system structure, wherein the system architecture accuracy is high, extendable room is big, steady quality is reliable, can create high-quality speech recognition system and use.
Summary of the invention
Of the present invention is a kind of intelligentized speech recognition system, and main summary of the invention is following:
The speech recognition system structure
Speech recognition system is based on distributed frame, and system is flexible, reliable, and cost efficiency is high.Shown in system architecture Fig. 1.To distinguish each ingredient of descriptive system below.
Identify customer end
Identify customer end is to handle mutual process between application program and speech recognition system.Its processing audio input and output, and support limited phone control.Audio frequency input is optional selects the echo that disappears and makes pauses in reading unpunctuated ancient writings then.Prescoring prompting playback is supported in audio frequency output, changes (TTS) system for third-party Text To Speech a framework is provided.Under customized configuration, call out control and point out playback to control by the assembly outside the system.At last, identify customer end is passed to speech recognition server with audio frequency, and incident and result are returned to application program.
Identified server
Identified server carries out speech recognition and natural language understanding to receiving the terminal audio frequency that comes from identify customer end.If be that recognizing voice also is the explanation of expression content return to the nature language, identified server needs a series of acoustic model and grammer.Acoustic model and grammer help identified server to confirm the content of speaking.Grammer also is used to explain the meaning of oral vocabulary.Application program is specified acoustic model and grammer that identified server loads in the bag.
Explorer
The explorer executive real-time is written into equilibrium function, arrives available identified server to guarantee the identification mission mean allocation, thereby reduces hsrdware requirements, improves service quality.
Database
Speech recognition system adopts database (supporting relevant databases such as text, ODBC) to preserve dynamic syntax and subscriber data.For some speech recognition application, look its application instance, possibly not need database.
  
Speech recognition process
In order to understand the structure of speech recognition system, the most important thing is roughly to understand its identifying, emphasis is in client, server and application program.Fig. 2 and Fig. 3 are the synoptic diagram and the step of speech recognition process, are the explanation of each step subsequently.
The process of speech recognition system identification roughly comprises following several steps:
1. identify customer end has phone to arrive, the identify customer end notification application, and system answers the call;
2. the system requirements identify customer end is play first prompting, and the caller reacts.To Text To Speech conversion prompting, identify customer end will send to the TTS server through a socket by synthesis text, and receive the sample of passback;
3. be the reaction of call identifying side, identify customer end is to the request of explorer send server (buffered audio data simultaneously), and explorer points to only identified server with identify customer end;
4. identify customer end sends an identification request to identified server.Each request is made up of audio stream and the grammar entries in application.This grammar entries has implied acoustic model, because both are built in the identification bag of identified server loading;
5. after identified server receives request, carry out identification mission, then recognition result is returned to identify customer end;
6. during this period, explorer is kept watch on the current content that is written into of identified server;
7. identify customer end sends to application program with recognition result;
8. application program is made corresponding response, for example, carries out data base querying or another prompting of request identify customer end broadcast, as the response to the user;
9. the caller makes a response; Identify customer end sends next identification request and (sees step 4);
More than be a simple identifying, if to a large amount of speech recognition application, the identification service end can be launched a plurality of, and through resource management, reasonable distribution identification service processing.
Voice identification result
After each speech recognition was accomplished, system passed to application program with recognition result, and application program is made response according to the result is corresponding.Recognition result comprises abundant information programs to be used, and comprising:
Through identification speech copy and degree of confidence thereof
2. value of natural language result, each grade and corresponding degree of confidence score value
3. verification score value
Fig. 4 is the synoptic diagram of recognition result, comprises the text, confidence levels and the natural language explanation that are identified.
Similar sound identification
For similar sound, Chinese pronunciations especially, similar sound can often run into.Lift an automatic speech exchange examples of applications, there is the close or approaching situation of a plurality of employee's name pronunciations in a company, is " Li Xiang " if any the position male employee, and the female employee is " Li Xiang ", also has other like Li Qiang, Li Xiang etc.If the user looks for Li Xiang, the recognition result of system discovery Li Xiang, Li Xiang is very approaching, has all surpassed empirical value (as 85); In view of the situation, after application flow is received the result, be not sure of user's selection; But can further point out the user, man Li Xiang still is woman Li Xiang, if the user says man Li Xiang; System will be easy to judge recognition result, accomplish user's operation, and will be as shown in Figure 5.
Fault-tolerant processing
In the speech recognition application process, in the time of seldom, slightly unclear or weight difference causes recognition result wrong unavoidably like the user's voice input, can make troubles to the user.Voice call book as shown in Figure 5 is used.
Li Xiang and two contact persons of Li Xiang are deposited in the user-phone book the inside, and the user does not carry out similar sound and handles for rapid and convenient; If hear the name that is not that the user says during call forwarding; At this moment, the user need not to hang up the telephone, and only need say " returning " perhaps " wrong "; System can return upper level automatically, lets the user reselect.Both avoided misrouting connecing, and also let the user re-enter easily.More than be simple example, in application such as phonetic search, this fault-tolerant processing will embody very important value.
The speech recognition system key property
1. cloud computing (distributed) structure.Explorer is written into equilibrium between identified server, thereby guarantees the utilization ratio of hardware.Identification to CPU intensity is big can be carried out by the remote machine of inoperative application program and COBBAIF;
2. High Density Interface.A small amount of processing of client is isolated from the intensive server process of CPU, allowed client to have highdensity interface can improve the service efficiency of server end CPU again;
3. fault-tolerant and reliability.Even individual servers lost efficacy, can not make system crash yet, even can not miss an identification request.When an identified server lost efficacy, explorer stopped to send request to it automatically, when server recovers, began automatically to send request to it;
4. easy to maintenance.Can close an identified server and keep in repair, and the performance of total system is not influenced, perhaps influence is very little.The maintenance of some types even can not close identified server and carry out;
5. scalability.Along with the increase of client identification request, can increase the instance of identified server, identify customer end and application, need not stop any running application program or close recognition system;
6. request by all kinds of means.System supports the identification services request from internet (TCP/IP and Session Initiation Protocol) and telephone network heterogeneous networks such as (fixed line are with mobile);
7. algorithm optimization, separate unit identified server identification concurrent processing ability greater than 300 (Intel CPU Xeon E5, RAM RDIMM 8GB, RAID5), single identification processing procedure required time < 0.1 second.
The speech recognition system major function
Magnanimity vocabulary, be independent of talker's powerful recognition function
Speech recognition system can be carried out the identification of large vocabulary reliably to multilingual, and the degree of confidence of recognition result can be provided.This system provides speech recognition technology the most accurately to a large amount of vocabulary.The application program of utilizing the speech recognition system exploitation is through test, and accuracy surpasses 96%;
2. the natural language understanding of building in
Can develop natural language understanding system through speech recognition system, it is input with the sentence, returns the explanatory expression of S meaning.Application program can be taked corresponding action according to user's request.Native system also provides based on the letter of putting of class and marks, and it can more closely differentiate the accurately phrase each several part of (or inaccurate) identification.Then can be more nature with revise application program effectively, handle bug check or prompting again;
3. Host Based client/server structure
Speech recognition system is based on open client/server structure, is in particular required stability of large-scale application program and scalability and designs.Caller's speech is collected by client, and the load that identification is handled is by on a plurality of servers that separate of mean allocation to the network;
4. single vocabulary is proofreaied and correct
Also cry by shelves and put letter scoring, if a word in long sentence is unrecognized, application program can point out the user to repeat this fragment, rather than whole sentence;
5. hot speech identification
Hot speech identification makes system advance to monitor to the talker, waits for specific vocabulary or phrase, and this application program is returned in control.Can use this function in application program, recognizer can be listened attentively to silently, up to the user say specific phrase when asking just and user interactions;
6. intelligence is made pauses in reading unpunctuated ancient writings
Punctuate is that the sample flow of coming in is confirmed the processing procedure that the initial sum of statement stops.Behind the initial sum terminating point that finds statement, predetermined length is extended in the statement district forwards, backwards respectively.In case detect the starting point of statement, sample begins to flow to identified server, up to the terminating point of finding statement.In this way, identified server has in fact begun to handle the content of speech when the user is still talking, and don't handles the unnecessary blank of start-stop place of speech, thereby practices thrift the CPU time and the network bandwidth;
7. interrupt function
Interrupt function and make the user can interrupt prompting, make response, need not to point out by the time finishes.Interrupt function and make quick more, the nature, the particularly frequent user of system of exchanging between user and system;
8.N-Best handle
For some application program, possibly need recognition engine to produce possible recognition result collection, rather than a best result.The N-best identification disposal route of native system just has this function, and it provides possible recognition result list, and arranges from high to low by possibility;
9. grammer probability
Native system allows the particular words that the caller said or the probability in grammer of phrase are specified.When the probability of word of being said or phrase can be estimated according to the reality use, very useful.Grammer is increased probability can improve the accuracy rate and the speed of identification;
10. reduction noise
When the calling of coming in comprised stable background noise, native system discerned identified server through a kind of mechanism more accurately.The language that identified server will be come in strengthens, with effectively with the tone, buzz, groan noise filterings such as cry, hiss.If a considerable amount of phones all contain stable ground unrest, during such as hands-free making a phone call on automobile, this machine-processed effect is more satisfactory;
11. prompting playback
Native system allows to play prompting that records in advance and the prompting that is produced by the Text To Speech converting system.If application program is used a plurality of Text To Speech change server, explorer will carry out balance to the transformed load of these servers, to improve hardware efficiency;
12.SNMP support
Native system is that remote monitoring provides Simple Network Management Protocol (SNMP) support, and unique visualization tool is convenient to be configured, manages and is operated.
Voice application technology platform based on speech recognition system
The concrete application of speech recognition system, integrated with other related communications, voice, network, database etc. usually, form an intelligentized voice application technology platform, providing with the speech recognition is the application service of core.Fig. 7 is key foundation with the speech recognition system for our company, a software architecture diagram based on the intelligent sound application service technology platform of cloud computing of structure.Below concise and to the point do one and describe, only supply reference when concrete the application:
1. Access Layer
Access Layer comprises platform to connection module and terminal user's AM access module, and H.323 agreement and Session Initiation Protocol are supported in platform AM access module; Terminal user's AM access module supports the endpoint registration of SIP type on application platform;
2. call control layer
The various functions relevant such as call control layer is realized incoming call exhalation, call status analysis, call forwarding, recording playback sound, received DTMF, switching is attended a banquet with calling, and with the service of communicating by letter and charge of accounting server;
3. session layer
Session layer mainly realizes the dialog procedure of user and system, comprises functions such as media, speech recognition sampled voice, the synthetic medium output of text, and synthesizes the interface and the interaction process of serving with speech-recognition services, text;
4. flow process analytic sheaf
The flow process analytic sheaf is mainly realized the flow process script analytical capabilities of Voice XML, according to the service request from the operation flow key-course, is controlling the user service flow journey;
5. operation flow key-course
The operation flow key-course receives the service request from application server, through discriminatory analysis, this service request is consigned to the flow process analytic sheaf handle;
6. external interface module
External interface module mainly comprises application server (comprising database server and Web server), accounting server, speech recognition server, text synthesis server, content server, operator attendance, IP terminal, administers and maintains terminal etc.
Description of drawings
Fig. 1 is the speech recognition system structural drawing; Fig. 2 is a speech recognition system identifying synoptic diagram; Fig. 3 is speech recognition system identification step figure; Fig. 4 is speech recognition system recognition result figure; Fig. 5 is the similar sound figure of speech recognition system; Fig. 6 is speech recognition system fault-tolerant processing figure; Fig. 7 is based on the intelligent sound application technology platform software structural drawing of cloud computing.

Claims (9)

CN2012102408980A2012-07-122012-07-12Intelligentized voice recognition systemPendingCN102760431A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN2012102408980ACN102760431A (en)2012-07-122012-07-12Intelligentized voice recognition system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN2012102408980ACN102760431A (en)2012-07-122012-07-12Intelligentized voice recognition system

Publications (1)

Publication NumberPublication Date
CN102760431Atrue CN102760431A (en)2012-10-31

Family

ID=47054873

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN2012102408980APendingCN102760431A (en)2012-07-122012-07-12Intelligentized voice recognition system

Country Status (1)

CountryLink
CN (1)CN102760431A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103325371A (en)*2013-06-052013-09-25杭州网豆数字技术有限公司Voice recognition system and method based on cloud
CN103839549A (en)*2012-11-222014-06-04腾讯科技(深圳)有限公司Voice instruction control method and system
WO2014117584A1 (en)*2013-02-012014-08-07Tencent Technology (Shenzhen) Company LimitedSystem and method for load balancing in a speech recognition system
CN105489219A (en)*2016-01-062016-04-13广州零号软件科技有限公司Indoor space service robot distributed speech recognition system and product
CN105912128A (en)*2016-04-292016-08-31北京光年无限科技有限公司Smart robot-oriented multimodal interactive data processing method and apparatus
CN105917405A (en)*2014-01-172016-08-31微软技术许可有限责任公司 Incorporation of exogenous large-vocabulary models into rule-based speech recognition
CN106062868A (en)*2014-07-252016-10-26谷歌公司Providing pre-computed hotword models
CN107485843A (en)*2017-09-262017-12-19北京球友圈网络科技有限责任公司Count the method and device of ball sports information for the game
CN108604449A (en)*2015-09-302018-09-28苹果公司speaker identification
CN109819057A (en)*2019-04-082019-05-28科大讯飞股份有限公司A kind of load-balancing method and system
CN111210821A (en)*2020-02-072020-05-29普强时代(珠海横琴)信息技术有限公司Intelligent voice recognition system based on internet application
US10749989B2 (en)2014-04-012020-08-18Microsoft Technology Licensing LlcHybrid client/server architecture for parallel processing
US10885918B2 (en)2013-09-192021-01-05Microsoft Technology Licensing, LlcSpeech recognition using phoneme matching

Cited By (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103839549A (en)*2012-11-222014-06-04腾讯科技(深圳)有限公司Voice instruction control method and system
WO2014117584A1 (en)*2013-02-012014-08-07Tencent Technology (Shenzhen) Company LimitedSystem and method for load balancing in a speech recognition system
CN103325371A (en)*2013-06-052013-09-25杭州网豆数字技术有限公司Voice recognition system and method based on cloud
US10885918B2 (en)2013-09-192021-01-05Microsoft Technology Licensing, LlcSpeech recognition using phoneme matching
US10311878B2 (en)2014-01-172019-06-04Microsoft Technology Licensing, LlcIncorporating an exogenous large-vocabulary model into rule-based speech recognition
CN105917405A (en)*2014-01-172016-08-31微软技术许可有限责任公司 Incorporation of exogenous large-vocabulary models into rule-based speech recognition
CN105917405B (en)*2014-01-172019-11-05微软技术许可有限责任公司 Incorporation of exogenous large-vocabulary models into rule-based speech recognition
US10749989B2 (en)2014-04-012020-08-18Microsoft Technology Licensing LlcHybrid client/server architecture for parallel processing
CN106062868A (en)*2014-07-252016-10-26谷歌公司Providing pre-computed hotword models
CN106062868B (en)*2014-07-252019-10-29谷歌有限责任公司 Provides precomputed hotword models
CN108604449A (en)*2015-09-302018-09-28苹果公司speaker identification
CN108604449B (en)*2015-09-302023-11-14苹果公司 speaker identification
CN105489219A (en)*2016-01-062016-04-13广州零号软件科技有限公司Indoor space service robot distributed speech recognition system and product
CN105912128A (en)*2016-04-292016-08-31北京光年无限科技有限公司Smart robot-oriented multimodal interactive data processing method and apparatus
CN105912128B (en)*2016-04-292019-05-24北京光年无限科技有限公司Multi-modal interaction data processing method and device towards intelligent robot
CN107485843B (en)*2017-09-262018-07-20北京球友圈网络科技有限责任公司Count the method and device of ball sports information for the game
CN107485843A (en)*2017-09-262017-12-19北京球友圈网络科技有限责任公司Count the method and device of ball sports information for the game
CN109819057A (en)*2019-04-082019-05-28科大讯飞股份有限公司A kind of load-balancing method and system
CN111210821A (en)*2020-02-072020-05-29普强时代(珠海横琴)信息技术有限公司Intelligent voice recognition system based on internet application

Similar Documents

PublicationPublication DateTitle
CN102760431A (en)Intelligentized voice recognition system
CN110769124B (en) Power Marketing Customer Communication System
CN111344780B (en) Context-based device arbitration
US10810997B2 (en)Automated recognition system for natural language understanding
US12354596B1 (en)Centralized feedback service for performance of virtual assistant
CN106201424B (en)A kind of information interacting method, device and electronic equipment
AU2013252518B2 (en)Embedded system for construction of small footprint speech recognition with user-definable constraints
US9558745B2 (en)Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same
KR101798828B1 (en)System and method for hybrid processing in a natural language voice services environment
JP6025785B2 (en) Automatic speech recognition proxy system for natural language understanding
CN109074806A (en)Distributed audio output is controlled to realize voice output
US20130279665A1 (en)Methods and apparatus for generating, updating and distributing speech recognition models
JP2018017936A (en) Voice dialogue device, server device, voice dialogue method, voice processing method and program
KR102594838B1 (en)Electronic device for performing task including call in response to user utterance and method for operation thereof
CN101341532A (en)Sharing voice application processing via markup
WO2018099000A1 (en)Voice input processing method, terminal and network server
CN111210821A (en)Intelligent voice recognition system based on internet application
CN110570847A (en)Man-machine interaction system and method for multi-person scene
CN111094924A (en)Data processing apparatus and method for performing voice-based human-machine interaction
JP6448950B2 (en) Spoken dialogue apparatus and electronic device
CN103824560A (en)Chinese speech recognition system
US20190066676A1 (en)Information processing apparatus
CN118942454A (en) Terminal device and voice command response method
CN110809796A (en)Speech recognition system and method with decoupled wake phrases
GorliVOICE OVER CONTROL TO INTERNET OF THINGS WITHDEEP LEARNING

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C02Deemed withdrawal of patent application after publication (patent law 2001)
WD01Invention patent application deemed withdrawn after publication

Application publication date:20121031


[8]ページ先頭

©2009-2025 Movatter.jp