Movatterモバイル変換


[0]ホーム

URL:


CN106710587A - Speech recognition data pre-processing method - Google Patents

Speech recognition data pre-processing method
Download PDF

Info

Publication number
CN106710587A
CN106710587ACN201611184565.5ACN201611184565ACN106710587ACN 106710587 ACN106710587 ACN 106710587ACN 201611184565 ACN201611184565 ACN 201611184565ACN 106710587 ACN106710587 ACN 106710587A
Authority
CN
China
Prior art keywords
pronunciation
model
standard
dictionary
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611184565.5A
Other languages
Chinese (zh)
Inventor
朱崇俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Dongtian Digital Technology Co Ltd
Original Assignee
Guangdong Dongtian Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Dongtian Digital Technology Co LtdfiledCriticalGuangdong Dongtian Digital Technology Co Ltd
Priority to CN201611184565.5ApriorityCriticalpatent/CN106710587A/en
Publication of CN106710587ApublicationCriticalpatent/CN106710587A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention provides a speech recognition data pre-processing method; the system comprises a standard audio frequency file organization module, a standard text editing module, a pronunciation dictionary configuration module, a voice model forming module and a standard pronunciation characteristic data identification processing module; the method stores a finally formed standard pronunciation data model to the file system, directly loads a pre-formed data model in an application product so as to identify and grade user recordings, thus solving the problems that an existing method needs to identify the standard pronunciations and identify user recordings, thus causing low efficiency.

Description

A kind of voice recognition data preprocess method
Technical field
The present invention relates to field of speech recognition, more particularly to a kind of voice recognition data preprocess method.
Background technology
Speech recognition technology be widely used to the user terminals such as mobile phone, pc in such as:Input method, langue leaning system is searchedCable system product major part speech recognition application products absolutely are all to gather to use using user terminal in speech recognition technology applicationFamily recording data, is sent to a kind of voice knowledge that background server is identified pattern such as Publication No. CN103137129 AOther method and electronic installation, user's service condition that it passes through electronic installation collect user specific information, record the speech of user,Remote server is set to produce remote speech recognition result of speech of record etc., the shortcoming of this pattern is backstage identifying systemThe hardware system support of powerful performance is needed, high cost is built, it is more than one hundred million easily just to substantially meet large user's amount requirements for accessAnd if user terminal machine must can be calculated energy by user under network connection state using speech recognition application productPower is used, and can just be significantly reduced hardware cost but the typically no server system of user terminal computing capability is strong, thereforeNeed a kind for the treatment of for optimizing and speech recognition calculating being carried out in user terminal, the method for improving recognition efficiency.
The content of the invention
It is an object of the invention to provide one kind can Optimum utilization user terminal computing capability carry out speech recognition scoring effectRate and the treatment voice recognition data method that carries out.
Concrete technical scheme is comprised the following steps:
Step 1) organizational standard audio file, arrange the audio file for needing to generate data model;
Step 2) received text is edited, reduction needs the texts such as the literary section of identification scoring, sentence, word;
Step 3) configuration pronunciation dictionary, general pronunciation dictionary or special pronunciation dictionary that configuration this article section needs are used;
Step 4) generation correspondence speech model, correspondence speech model is generated according to above step output file, preserve languageSound model file;
Step 5) call the speech recognition engine be identified to standard pronunciation characteristic using the speech model of generationTreatment, generates and preserves standard pronunciation data model;
Step 6) pre-generatmg data model is loaded directly into application product using pre-generatmg data model carries out to userRecording is identified scoring.
Further, the step 1) it is specifically divided into following steps:
11) because user terminal CPU computing capabilitys are limited, carrying out speech recognition scoring needs identification goal-setting oneDetermine in scope and be such as set as a text content for unit;
Further, the step 2) in reduction text the step of it is as follows:An XML configuration file is created, to everyIndividual sentence or word create a node all in configuration file, and path and correspondence text are quoted comprising audio file in nodeThis;
Further, the step 3) in configuration pronunciation dictionary the step of it is as follows:31) completion word or sentence are matched somebody with somebody and are postponed,Node correspondence for each word configures the pronunciation of pronunciation dictionary, and is associated;32) further, pronunciation dictionary is divided intoConventional pronunciation dictionary and special pronunciation dictionary, if all words are all in conventional pronunciation dictionary in a literary section, at this moment just notNeed to configure special pronunciation dictionary, the word for being otherwise accomplished by creating to not having in each common dictionary carries out pronunciation mark additionTo special pronunciation dictionary;
Further, the step 4) Plays sound biometric data is as follows the step of generate:Using step 2) middle volumeThe standard audio and received text configuration file and step 3 collected) in the pronunciation dictionary that edits use speech recognition engine workThe literary section speech model of tool generation this article section, literary section speech model is for describing user pronunciation space, in identifying user pronunciationWhen, speech recognition engine is carried out rapidly and efficiently beta pruning under the conditions of vocabulary is constrained, quickly recognize user pronunciation content;
Further, the step 5) in speech model generation module the step of it is as follows:Speech recognition engine is called, it is incomingAcoustic model and in step 4) in generation literary section speech model, successively in step 2) in generation configuration file inside eachNode configures word or sentence carries out speech recognition, preserves the audio file identification number that identification engine returns to the configuration of each nodeAccording to local text, so far, the text of each word or sentence standard pronunciation is obtained, pronounced, rhythm, stress, intonationFeature-based data model to user pronunciation recognize score when only need to the incoming identification engine of data model, recognize engine useContrast scoring directly is carried out with standard pronunciation data model after the pronunciation identification of family, mark is obtained without being identified to standard pronunciationQuasi- sound data model.
The beneficial effects of the present invention are:By implementation steps of the invention, the speech recognition application such as spoken language exercise withThe speech recognition used time in the terminal of family reduces half, and the raising of recognition efficiency allows to be carried out using user terminal computing capabilityIdentification, without building server system, without network access, user can be obtained in using standalone version speech recognition applicationPreferably experience.
Brief description of the drawings
The present invention is described in further detail with reference to accompanying drawing:
Fig. 1 is the FB(flow block) of the application.
Specific embodiment
Below by preferred embodiment shown with reference to the accompanying drawings, the present invention is explained in detail, but the invention is not restricted toThe embodiment.
Step as shown in Figure 1 is as follows, and first three step is resource preparation process:
1 organizational standard audio file, arrangement needs to generate the audio file of data model;Because user terminal CPU is calculatedEnergy power restriction, is such as set as a class for unit within the specific limits identification goal-setting to carry out speech recognition scoring needsLiterary content;
2 editor's received texts, reduction needs the texts such as the literary section of identification scoring, sentence, word, creates an XMLConfiguration file, a node is created to each sentence or word in configuration file, is quoted comprising audio file in nodePath and correspondence text;
3 configuration pronunciation dictionaries, configuration this article section needs the general pronunciation dictionary or special pronunciation dictionary used, and completes single, with postponing, the node correspondence for each word configures the pronunciation of pronunciation dictionary, and is associated for word or sentence;Further,Pronunciation dictionary is divided into conventional pronunciation dictionary and special pronunciation dictionary, if all words are all in conventional pronunciation dictionary a literary sectionIn, at this moment avoiding the need for configuring special pronunciation dictionary, the word for being otherwise accomplished by creating to not having in each common dictionary is carried outPronunciation mark is added to special pronunciation dictionary;
After resource is ready, speech model treatment is carried out:
4 generation correspondence speech models, use the standard audio and received text configuration file and step that are editted in step 23) pronunciation dictionary editted in generates the literary section speech model of this article section, literary section speech model using speech recognition engine instrumentIt is, for describing user pronunciation space, when identifying user is pronounced, speech recognition engine is carried out under the conditions of vocabulary is constrained fastFast efficiently beta pruning, quickly recognizes user pronunciation content;
5 using generation speech model call speech recognition engine carry out to standard pronunciation characteristic be identified treatment,Generate and preserve standard pronunciation data model;Call speech recognition engine, incoming acoustic model and the literary section language for generating in step 4Sound model, successively in step 2) in generation configuration file inside each node configuration word or sentence carry out voice knowledgeNot, preserve identification engine and return to the audio file identification data of each node configuration to local text, so far, obtained everyThe text of individual word or sentence standard pronunciation, pronunciation, rhythm, stress, the feature-based data model of intonation is recognized to user pronunciationNeed to only recognize engine after user pronunciation is recognized directly with standard pronunciation data model in the incoming identification engine of data model during scoringContrast scoring is carried out, standard pronunciation data model is obtained without being identified to standard pronunciation;
6 are loaded directly into pre-generatmg data model using pre-generatmg data model in application product user recording is enteredRow identification scoring;
Voice recognition data method of the invention, including standard audio file organization module, standard text editing module, hairSound lexicon configuration module, speech model generation module, standard pronunciation characteristic recognition processing module is preserved and is ultimately generated standard pronunciationTo file system, pre-generatmg data model is loaded directly into application product carries out that user recording is identified to comment data modelPoint, solve the problems, such as to need to recognize standard pronunciation in actual applications and then the efficiency most to user recording identification is low.
Above specific embodiment is merely illustrative of the technical solution of the present invention and unrestricted, although with reference to example to this hairIt is bright to be described in detail, it will be understood by those within the art that, technical scheme can be modifiedOr equivalent, without deviating from the spirit and scope of technical solution of the present invention, it all should cover in claim of the inventionIn the middle of scope.

Claims (6)

6. a kind of voice recognition data preprocess method according to claim 1, it is characterised in that:The step 5) in languageThe step of sound model generation module, is as follows:Call speech recognition engine, incoming acoustic model and in step 4) in generation literary sectionSpeech model, successively in step 2) in generation configuration file inside each node configuration word or sentence carry out voice knowledgeNot, preserve identification engine and return to the audio file identification data of each node configuration to local text, so far, obtained everyThe text of individual word or sentence standard pronunciation, pronunciation, rhythm, stress, the feature-based data model of intonation is recognized to user pronunciationNeed to only recognize engine after user pronunciation is recognized directly with standard pronunciation data model in the incoming identification engine of data model during scoringContrast scoring is carried out, standard pronunciation data model is obtained without being identified to standard pronunciation.
CN201611184565.5A2016-12-202016-12-20Speech recognition data pre-processing methodPendingCN106710587A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201611184565.5ACN106710587A (en)2016-12-202016-12-20Speech recognition data pre-processing method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201611184565.5ACN106710587A (en)2016-12-202016-12-20Speech recognition data pre-processing method

Publications (1)

Publication NumberPublication Date
CN106710587Atrue CN106710587A (en)2017-05-24

Family

ID=58939302

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201611184565.5APendingCN106710587A (en)2016-12-202016-12-20Speech recognition data pre-processing method

Country Status (1)

CountryLink
CN (1)CN106710587A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107578778A (en)*2017-08-162018-01-12南京高讯信息科技有限公司A kind of method of spoken scoring
CN109246214A (en)*2018-09-102019-01-18北京奇艺世纪科技有限公司A kind of prompt tone acquisition methods, device, terminal and server
CN112837679A (en)*2020-12-312021-05-25北京策腾教育科技集团有限公司Language learning method and system
US20220301561A1 (en)*2019-12-102022-09-22Rovi Guides, Inc.Systems and methods for local automated speech-to-text processing
US20220392447A1 (en)*2019-10-232022-12-08Carrier CorporationA method and an apparatus for executing operation/s on device/s

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101432801A (en)*2006-02-232009-05-13日本电气株式会社Speech recognition dictionary making supporting system, speech recognition dictionary making supporting method, and speech recognition dictionary making supporting program
CN103985392A (en)*2014-04-162014-08-13柳超Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
WO2016053531A1 (en)*2014-09-302016-04-07Apple Inc.A caching apparatus for serving phonetic pronunciations
US20160133251A1 (en)*2013-05-312016-05-12Longsand LimitedProcessing of audio data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101432801A (en)*2006-02-232009-05-13日本电气株式会社Speech recognition dictionary making supporting system, speech recognition dictionary making supporting method, and speech recognition dictionary making supporting program
US20160133251A1 (en)*2013-05-312016-05-12Longsand LimitedProcessing of audio data
CN103985392A (en)*2014-04-162014-08-13柳超Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
WO2016053531A1 (en)*2014-09-302016-04-07Apple Inc.A caching apparatus for serving phonetic pronunciations

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107578778A (en)*2017-08-162018-01-12南京高讯信息科技有限公司A kind of method of spoken scoring
CN109246214A (en)*2018-09-102019-01-18北京奇艺世纪科技有限公司A kind of prompt tone acquisition methods, device, terminal and server
CN109246214B (en)*2018-09-102022-03-04北京奇艺世纪科技有限公司Prompt tone obtaining method and device, terminal and server
US20220392447A1 (en)*2019-10-232022-12-08Carrier CorporationA method and an apparatus for executing operation/s on device/s
US12300229B2 (en)*2019-10-232025-05-13Kidde Fire Protection, LlcMethod and an apparatus for executing operation/s on device/s
US20220301561A1 (en)*2019-12-102022-09-22Rovi Guides, Inc.Systems and methods for local automated speech-to-text processing
US12205585B2 (en)*2019-12-102025-01-21Adeia Guides Inc.Systems and methods for local automated speech-to-text processing
CN112837679A (en)*2020-12-312021-05-25北京策腾教育科技集团有限公司Language learning method and system

Similar Documents

PublicationPublication DateTitle
CN108364632B (en)Emotional Chinese text voice synthesis method
CN104239459B (en)voice search method, device and system
CN109686361B (en)Speech synthesis method, device, computing equipment and computer storage medium
CN106710587A (en)Speech recognition data pre-processing method
CN105609107A (en)Text processing method and device based on voice identification
CN101281518A (en) Speech translation device and method
JP2018146715A (en)Voice interactive device, processing method of the same and program
CN110852075B (en)Voice transcription method and device capable of automatically adding punctuation marks and readable storage medium
CN111508466A (en)Text processing method, device and equipment and computer readable storage medium
CN111489743B (en) An operation management analysis system based on intelligent voice technology
JP2015049254A (en) Speech data recognition system and speech data recognition method
CN112185341A (en)Dubbing method, apparatus, device and storage medium based on speech synthesis
CN118571229B (en)Voice labeling method and device for voice feature description
CN114333903A (en)Voice conversion method and device, electronic equipment and storage medium
CN107221344A (en)A kind of speech emotional moving method
EP3489951B1 (en)Voice dialogue apparatus, voice dialogue method, and program
CN106710591A (en)Voice customer service system for power terminal
US20160005421A1 (en)Language analysis based on word-selection, and language analysis apparatus
US9218807B2 (en)Calibration of a speech recognition engine using validated text
CN109104258A (en)A kind of radio identification method based on keyword identification
CN115019787B (en)Interactive homonym disambiguation method, system, electronic equipment and storage medium
CN115881119A (en) Disambiguation method, system, refrigeration equipment and storage medium for fusion of prosodic features
Tsiakoulis et al.Dialogue context sensitive HMM-based speech synthesis
CN118918878A (en)Speech synthesis method, device, computer equipment and storage medium
CN118447841A (en)Dialogue method and device based on voice recognition, terminal equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20170524


[8]ページ先頭

©2009-2025 Movatter.jp