CN107316638A

Movatterモバイル変換

Info

Publication number: CN107316638A
Application number: CN201710504389.7A
Authority: CN
Inventors: 高强; 吴凡; 夏龙; 阎鹏; 邓澍军; 郭常圳
Original assignee: Beijing Chalk Future Technology Co Ltd
Current assignee: Beijing ape force Education Technology Co., Ltd
Priority date: 2017-06-28
Filing date: 2017-06-28
Publication date: 2017-11-03

Abstract

The application provides a kind of poem and recites evaluating method and system, a kind of terminal and storage medium, and methods described includes：Client determines poem to be recited；Obtain the voice of the poem to be recited；The voice is pre-processed；The pretreated voice is uploaded onto the server；The voice is converted into word using the identification model pre-established in the server；The word is compared with the poem to be recited, the score of the word is calculated according to default code of points；The score is fed back into client.

Description

A kind of poem recites evaluating method and system, a kind of terminal and storage medium

Technical field

The present invention relates to speech recognition and evaluation technology field, more particularly to a kind of poem recite evaluating method and system,A kind of terminal and storage medium.

Background technology

The subject user predominantly students in middle and primary schools that current poem is recited, general speech recognition technology is directed to other usersWhen be difficult the effect for being optimal sounding data as students in middle and primary schools, the recognition accuracy that poem is recited is than relatively low；IfWant to reach that higher recognition accuracy is accomplished by substantial amounts of training data, used training data is substantially by professional MikeWhat wind was recorded, then by the long training time, training method is also more complicated；Nowadays in mobile Internet epoch, userWhen progress poem recites detection, most of is all to record the poem back of the body using sound pick-up outfits such as PC microphone, mobile microphonesVoice is readed aloud, detection is then identified, using existing speech recognition technology to this poem by amateur microphone recordsWord recites speech recognition inaccurately, and needs the noise individually recited poem in voice to process, and increases extra denoisingThe sonication time, cost is high, the response time is long.

The content of the invention

In view of this, the application provides a kind of poem and recites evaluating method and system, a kind of terminal and storage medium, with realityNow the content that can be recited under environment poem of reciting in face of a variety of user groups and complexity is quick and precisely recognized and examinedMeasure and recite error rate.

On the one hand, the application provides a kind of poem and recites evaluating method, including：

It is determined that poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server；

The voice is converted into word by the identification model pre-established in the server；

The word is compared with the poem to be recited, the word is calculated according to default code of pointsScore；

The score is fed back into client.

Alternatively, carrying out pretreatment to the voice includes：

To the voice burst；

The voice of burst is compressed.

Alternatively, it is described the pretreated voice is uploaded onto the server after also include：

To the speech decompression uploaded onto the server；

The voice after decompression is subjected to Jing Yin detection.

Alternatively, the step of pre-establishing the identification model includes：

Initial speech recognition network is built,

The speech recognition network receives voice training data；

By the wherein one voice training data conversion received into word；

Calculate the error between the word of conversion and the true word of the voice training data；

If the error is more than or equal to predetermined threshold value, the parameter of speech recognition network according to the error transfer factor,Then perform again it is described by the wherein one voice training data conversion received into word the step of, continue to train institute predicateSound recognizes network；

If the error is less than predetermined threshold value, the speech recognition network training terminates.

Alternatively, the construction voice training data include：

Original poem voice is obtained, the original poem voice is regard as training data；

The noise that adulterated in the original poem voice generates new training data.

Alternatively, the original poem voice includes the poem voice of different tonequality and/or alternative sounds feature.

Alternatively, the construction voice training data also include：

Original non-poem voice is obtained, the original non-poem voice is regard as training data.

Alternatively, the identification model includes hidden Markov model (Hidden Markov Model, HMM) and/or deepSpend neural network model (DNN).

Alternatively, it is described that the voice is converted into word using the identification model pre-established in the server, alsoIncluding：Error correction is carried out to the word after conversion using language model.

Alternatively, using language model to after conversion the word carry out error correction after, in addition to：Utilize probabilistic modelThe word after conversion is carried out into result with the true word of the poem to be recited to align.

Alternatively, the probabilistic model is used for judging that the back of the body is recited, recites again or omitted to the poem content recited as orderRead aloud.

Alternatively, the true word of the word after conversion and the poem to be recited is tied using probabilistic modelFruit is alignd, in addition to：

If the poem recited is correct, correct poem will be recited and be shown as the first color,

If the poem mistake recited, the poem for reciting mistake is shown as second color different from the first color；

If the poem recited again is correct, the poem for having been shown as the second color is revised as the first color；

In the event of the poem recited is omitted and the poem part recited is correct, then it will omit the poem recited and be shown as theSecond colors, will recite correct poem and are shown as the first color.

On the other hand, the application provides a kind of poem and recites evaluating system, including client and server, the clientIncluding：

Selecting module, for determining poem to be recited；

Acquisition module, the voice for obtaining the poem to be recited；

Pretreatment module, for being pre-processed to the voice；

Uploading module, for the pretreated voice to be uploaded onto the server；

The server includes：

Modular converter, for the voice to be converted into word using the identification model pre-established in the server；

Scoring modules, for the word to be compared with the poem to be recited, according to default code of points meterCalculate the score of the word；

Feedback module, for the score to be fed back into client.

Alternatively, the pretreatment module includes：

Burst module, for the voice burst；

Compression module, for the voice of burst to be compressed.

Alternatively, the uploading module also includes：

Decompression module, for the speech decompression to uploading onto the server；

Detection module, for the voice after decompression to be carried out into Jing Yin detection.

Alternatively, the identification model includes：

Identification network struction module, the speech recognition network initial for building,

Receiving module, voice training data are received for the speech recognition network；

Identification conversion module, for wherein one voice training data conversion will receiving into word；

Computing module, calculates the error between the word of conversion and the true word of the voice training data；

Judge module,

If the error is more than or equal to predetermined threshold value, the parameter of speech recognition network according to the error transfer factor,Then perform again it is described by the wherein one voice training data conversion received into word the step of, continue to train institute predicateSound recognizes network；If the error is less than predetermined threshold value, the speech recognition network training terminates.

Alternatively, the server also includes：

Poem voice acquisition module, for obtaining original poem voice, regard the original poem voice as training data；

Noise poem module, new training data is generated for the noise that adulterated in the original poem voice.

Alternatively, the server also includes：

Non- poem voice acquisition module, for obtaining original non-poem voice, regard the original non-poem voice as instructionPractice data.

Alternatively, the modular converter includes：Correction module, for being entered using language model to the word after conversionRow error correction.

Alternatively, the correction module also includes：Alignment module, for utilizing probabilistic model by the word after conversionResult is carried out with the true word of the poem to be recited to align.

Alternatively, the alignment module includes：

Order recites module, if correct for the poem recited, will recite correct poem and be shown as the first color,

Again module is recited, if correct for the poem recited again, the poem of the second color will be had been shown asIt is revised as the first color；

Module is recited in omission, for correct in the event of the poem part omitted the poem recited and recited, then will be omittedThe poem recited is shown as the second color, will recite correct poem and is shown as the first color.

On the other hand, the application provides a kind of terminal, including processor and memory, and the memory storage has computerInstruction, the processor calls the computer instruction and performs following steps：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server.

On the other hand, the application provides a kind of storage medium, and be stored with computer instruction, the computer instruction perform withLower step：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server.

A kind of poem that the present patent application is provided recites evaluating method based on being pre-processed to the voice for reciting poem, carriesThe uploading speed of high voice；Voice is identified using the identification module pre-established and is converted into word, passes through default scoringRule calculates word score, so as to detect to recite the error rate of the voice of poem so that the poem voice recited can enterRecognition accuracy is improved while row Real time identification.

Brief description of the drawings

Fig. 1 recites the flow chart of evaluating method for the poem that the embodiment of the application one is provided；

Fig. 2 for the poem that the embodiment of the application one is provided recite evaluating method to voice carry out pretreatment process figure；

Fig. 3 for the poem that the embodiment of the application one is provided recite evaluating method at the voice that uploads onto the serverThe flow chart of reason；

Fig. 4 recites the flow being trained in evaluating method to identification model for the poem that the embodiment of the application one is providedFigure；

Fig. 5 is the flow chart that the embodiment of the application one is trained to HMM-DNN models；

Fig. 6 recites the structural representation of evaluating system for a kind of poem that the embodiment of the application one is provided；

The hardware architecture diagram for the electronic equipment that Fig. 7 provides for the embodiment of the application one；

Fig. 8 recites client after evaluating method is evaluated and tested for the use poem that the embodiment of the application one is provided and is presented to useThe interface at family.

Embodiment

A kind of poem provided in an embodiment of the present invention recites evaluating method and system, a kind of terminal and storage medium, firstThe voice for obtaining poem to be recited is pre-processed, identification model is then based on and text conversion is carried out to the voice so thatThe evaluation and test that poem is recited is more quick and accurate.Below in conjunction with the accompanying drawings, embodiments of the present invention and implementation process are done in detailExplanation.

Existing speech recognition technology need substantial amounts of training speech data and longer training time can be only achieved compared withGood speech recognition, the speech recognition that can only for special equipment record low for the speech recognition accuracy of different usersEffect better, and needs individually to carry out noise processed to the voice for needing to recognize, cost is big, the response time is longer.

The deficiency of evaluating method is recited based on poem in the prior art, a kind of poem that the application is provided recites evaluating methodAnd system, a kind of terminal and storage medium, it is capable of the ancient poetry recorded to different user using common sound pick-up outfit of efficiently and accuratelyThe voice of word is identified.

Referring to Fig. 1, the present embodiment provides a kind of poem and recites evaluating method, including step 101 is to step 107.

Step 101：It is determined that poem to be recited.

In the present embodiment, the poem is classic poetry, and poem is a kind of exclusive style of Chinese, there is special form and rhythmRule；Poem can be divided into pre-Tang poetry and the class of "modern style" poetry, referring to innovations in classical poetry during the Tang Dynasty, marked by strict tonal patterns and rhyme schemes two by musical note point；Pre-Tang poetry and "modern style" poetry, referring to innovations in classical poetry during the Tang Dynasty, marked by strict tonal patterns and rhyme schemes are the concepts formed the Tang Dynasty, are from poemMusical note angle divide；It can be divided into epic, lyric by content, see off poem, frontier poem, Natural environment, poems on history(singing of history poem), mourning poems, object-ode poems, army's poem etc..Pre-Tang poetry has《The Book of Songs》《The Songs of the South》《Music Bureau》《The Chinese is assigned》《Northern and Southern Dynasties' folk song》Deng."modern style" poetry, referring to innovations in classical poetry during the Tang Dynasty, marked by strict tonal patterns and rhyme schemes is usually the poem of four lines, regulated verse, extended form of regulated verse (long rule)；User selects to determine what is recited in the application program of clientClassic poetry full text or classic poetry paragraph, start to recite.

Step 102：Obtain the voice of the poem to be recited.

In the present embodiment, the voice that above-mentioned user recites poem is obtained in real time.

Step 103：The voice is pre-processed.

In the present embodiment, above-mentioned voice is pre-processed, farthest realizes the real-time that the voice is uploaded.

Step 104：The pretreated voice is uploaded onto the server.

Step 105：The voice is converted into word by the identification model pre-established in the server.

In the present embodiment, the identification model trained is pre-established on the server, by pretreated voiceWord is identified into the identification model of the server.

Step 106：The word is compared with the poem to be recited, institute is calculated according to default code of pointsState the score of word.

Step 107：The score is fed back into client.

A kind of poem that the present embodiment is provided is recited evaluating method and pre-processed based on the voice to reciting poem, improvesThe uploading speed of voice；Voice is identified using the identification module pre-established and is converted into word, is advised by default scoringWord score is then calculated, so as to detect to recite the error rate of the voice of poem so that the poem voice recited can be carried outRecognition accuracy is improved while Real time identification.

Referring to Fig. 2, the process pre-processed in the embodiment of the application one to the voice includes step 201 to step202。

Step 201：To the voice burst.

In the present embodiment, the burst of the voice, compression and upload are all sightless to user, and user uses clientApplication program (for example, APP on the smart mobile phone) progress at end is continual to recite, and uniquely visible is exactly real-time feedback knotReally；The audio that smart mobile phone is recorded is lossless audio, and being uploaded directly into server needs to expend larger network traffics, in order toThe network flow consumption of smart mobile phone is reduced, we have carried out burst to audio and compression is carried out again before audio is uploadedPass server.

In the present embodiment, it is into isometric fragment, in order to maximum by phonetic segmentation to the voice burstReal-time is realized, in the case where not influenceing speech recognition speed, speech audio is sliced into as far as possible short fragment, by institute's predicateSound is controlled within 1 in the speed that the identification model is recognized, as long as speed is less than 1, it becomes possible to realize good real-time.

Step 202：The voice of burst is compressed.

In the present embodiment, the voice of burst is compressed, is not lose the information of useful information or lossUnder the conditions of insignificant, with Digital Signal Processing, raw tone audio stream is compressed, also referred to as compressed encoding；ThisEmbodiment carrys out compressed encoding raw tone audio using advanced message coding (Advanced Audio Coding, AAC) formStream.

Referring to Fig. 3, in the embodiment of the application one to the pretreated voice is uploaded onto the server after also include stepRapid 301 to step 302.

Step 301：To the speech decompression uploaded onto the server.

In the present embodiment, the server is after the upload compress speech file is received, it is necessary to described in compressionSpeech audio is carried out in decompression, the present embodiment using advanced message coding (Advanced Audio Coding, AAC)Inverse transformation decompression is carried out to the speech audio stream.

Step 302：The voice after decompression is subjected to Jing Yin detection.

In the present embodiment, user carry out being likely to forget when classic poetry is recited occur the behavior recited of pause orIt is that other uncertain factors cause user carrying out the vacancy of voice occur when classic poetry is recited, can produces substantial amounts of this whenJing Yin, i.e., noiseless fragment, this part audio is without being handled, and in order to reduce unnecessary processing time, we makeCarry out Jing Yin detection with the disaggregated model based on deep learning to calculate, the disaggregated model based on deep learning can be quick and preciselyDistinguish normal sound and Jing Yin, the silence clip detected is not done recognition processing.

Referring to Fig. 4, the training of identification model described in the embodiment of the application one is illustrated, and a plurality of language is used in practical applicationSound training data is trained to the identification model, every time wherein one voice training data of input, when this voice trainingData will input next voice training data after terminating to identification model training and the identification model is instructedPractice, until all voice training data have been fully entered or when the identification model reaches goal-selling to the identificationThe training of model terminates, and the training process includes step 401 and arrives step 408.

Step 401：Build initial speech recognition network.

In the present embodiment, it is the parameter for initializing identification model to build initial speech recognition network.

Step 402：The speech recognition network receives voice training data.

In the present embodiment, the voice training data are numbered, it is to avoid the voice training Data duplication occur defeatedEnter.

Step 403：By the wherein one voice training data conversion received into word.

Step 404：Calculate the error between the word of conversion and the true word of the voice training data.

Step 405：Judge whether the error is less than predetermined threshold value, if so, step 406 is performed, if it is not, then performing stepRapid 408.

Step 406：The parameter of speech recognition network according to the error transfer factor.

Step 407：Whether be the last item voice training data, if so, performing step if judging the voice training data408, if it is not, then performing step 403.

Step 408：The speech recognition network training terminates.

In the present embodiment, the construction voice training data include：

Original poem voice is obtained, the original poem voice is regard as training data；It is described to obtain original poem voiceIt can be the speech audio got by equipment such as smart mobile phone microphone, non-smart mobile phone microphones.

The noise that adulterated in the original poem voice generates new training data；In order to allow the classic poetry of user to reciteSpeech recognition can also keep higher accuracy rate in the case where there is noise circumstance, and the training data uses original poem voice, isAny Denoising disposal is not carried out to the speech audio got, but is directly entered using the voice audio data with noiseRow training, distinguished in identification model identification process by the method for pattern-recognition which be noise, which be to recite ancient poetrySound.

In the present embodiment, the original poem voice includes the poem voice of different tonequality and/or alternative sounds feature；PoemThe user group that word recites evaluation and test uses mainly for middle and primary schools, in order to adapt to the characteristic voice of students in middle and primary schools, the voice instructionIt is largely the voice audio data for gathering real middle and primary schools user to practice data；But it is due to the custom that different users speaksDifferent with sounding feature, tonequality and sound characteristic are had any different, in order to improve the generalization ability of identification model, voice training numberThe original poem voice is expanded according to using the enhanced method of data, the voice audio data of different characteristics is generated, increasedPlus the voice training data.

In the present embodiment, the construction voice training data also include：

Original non-poem voice is obtained, the original non-poem voice is regard as training data；In the training data alsoAdd the speech audio of some non-smart mobile phone microphone records, the speech audio of non-students in middle and primary schools user, the non-ancient poetry of contentSpeech audio under the speech audio of word, non-noise environment, improves the generalization ability of identification model so that identification model has moreGood robustness.

Referring to Fig. 5, provided in the embodiment of the application one and hidden Markov model is used with a voice training data instance(Hidden Markov Model, HMM) and/or deep neural network model (DNN) training pattern include step 501 and arrive step508。

In the present embodiment, the identification model include hidden Markov model (Hidden Markov Model, HMM) and/Or deep neural network model (DNN).

Step 501：Initialize HMM-DNN model parameters.

Step 502：Speech data is received from training data.

Step 503：Wherein one speech data received is converted into word.

Step 504：Calculate the error between the word changed out after speech recognition and real language and characters.

Step 505：Judge whether the error is less than predetermined threshold value, if so, step 506 is performed, if it is not, then performing stepRapid 508.

Step 506：The parameter of HMM-DNN models according to the error transfer factor.

Step 507：Whether be the last item speech data, if so, step 508 is performed, if not if judging the speech dataIt is then to perform step 503.

Step 508：The HMM-DNN model trainings terminate.

In the present embodiment, training data is to be expanded on the basis of seed voice using the enhanced method of data's；Seed speech data has the characteristics that：1) seed speech data is the real speech voice data collected from application；2)In order to adapt to the speech audio of smart mobile phone microphone records, our key data be all by smart mobile phone (including but notIt is limited to Android intelligent, iphone) collect；3) in order to adapt to the characteristic voice of students in middle and primary schools, our main languageSound data are all the voice audio datas of the real middle and primary schools user produced on line；4) real recite in speech data is adulteratedVarious ambient noises, can also keep higher accurate in a noisy environment to allow classic poetry to recite language identificationRate, does not carry out any Denoising disposal to audio, but is directly trained using the voice data with noise, passes through mouldThe sound that formula knows method for distinguishing to distinguish noise He recite ancient poetry；The custom and sounding feature spoken due to different users are differedSample, employs data enhanced method and seed data is expanded, generate the sound of different characteristics.

In order to improve the voice for also having added some non-smart mobile phone microphone records in the generalization ability of model, training dataSpeech audio under audio, the speech audio of non-students in middle and primary schools user, the speech audio of the non-classic poetry of content, non-noise environment,So that HMM-DNN models have more preferable robustness.

The embodiment of the application one, word is converted into using the identification model pre-established in the server by the voiceAfterwards, in addition to：Error correction is carried out to the word after conversion using language model.

Language model, which is one, can deduce the probability distribution of next word, i.e., next word is probably the mould of what wordType.

The problem of speech recognition modeling of reality there may be：1) pronunciation that speech recognition modeling is identified, may not beIt is absolutely accurate, even if but it is inaccurate, it is also in most cases an approximate pronunciation；2) pure speech recognition mouldThe recognition result of type and may not meet correct syntax gauge.

It is adjusted, can be corrected by context and syntax rule by mistake by natural language model in the present embodimentThe pronunciation of identification, method of adjustment includes but is not limited to：Merge the language material under several scenes and train the language model come so thatThe correction result of pronunciation has more robustness；It has adjusted weight of the language model in speech recognition process so that recognition result is morePlus precisely.

In the embodiment of the application one, using language model to after conversion the word carry out error correction after, in addition to：ProfitThe word after conversion is carried out into result with the true word of the poem to be recited with probabilistic model to align.

The result of speech recognition and the content that user truly to be recited are gone word for word to be alignd, it is that order is carried on the back to judge userThe content to be carried on the back is readed aloud, recited again, omitted and recite or recite mistake, then poem is recited using default code of pointsContent carries out evaluation and test marking.

In the embodiment of the application one, the probabilistic model is used for judging that the poem content recited is recited as order, carried on the back againRead aloud or omit and recite.

In true recite, sequentially recite, recite again and omit that to recite the frequency that three behaviors occur different, mostlyIt is that order is recited in the case of number, recites again and omission recites first using a kind of combination mankind in fewer generation, the present embodimentTest the bayesian probability model of knowledge to predict these three behaviors so that the accuracy rate of alignment is optimal.

In the embodiment of the application one, using probabilistic model by the true of the word after conversion and the poem to be recitedWord carries out result alignment, in addition to：

If the poem recited is correct, correct poem will be recited and be shown as the first color；

Referring to Fig. 6, the application provides a kind of poem and recites evaluating system, including client 601 and server 602, describedClient 601 includes：

Selecting module 611, for determining poem to be recited；

Acquisition module 612, the voice for obtaining the poem to be recited；

Pretreatment module 613, for being pre-processed to the voice；

Uploading module 614, for the pretreated voice to be uploaded onto the server；

The server 602 includes：

Modular converter 621, it is written for being changed the voice using the identification model pre-established in the serverWord；

Scoring modules 622, for the word to be compared with the poem to be recited, according to default code of pointsCalculate the score of the word；

Feedback module 623, for the score to be fed back into client.

Alternatively, the pretreatment module 613 includes：

Burst module, for the voice burst；

Compression module, for the voice of burst to be compressed.

Alternatively, the uploading module 614 also includes：

Alternatively, the identification model includes：

Judge module, if the error is more than or equal to predetermined threshold value, the speech recognition net according to the error transfer factorThe parameter of network, then perform again it is described by the wherein one voice training data conversion received into word the step of, continueTrain the speech recognition network；If the error is less than predetermined threshold value, the speech recognition network training terminates.

Alternatively, the server also includes：

Alternatively, the alignment module includes：

The DNN in HMM-DNN models used in identification model in the application can be various deep learning networks；HMM-DNN models can with hidden Markov model and gauss hybrid models (Hidden Markov Model, HMM andGaussian Mixture Model, i.e. HMM-GMM) model substitutes, can also be substituted with pure deep learning model.

The present embodiment provides a kind of terminal, including processor and memory, and the memory storage has computer instruction, instituteProcessor is stated to call the computer instruction and perform following steps：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server.

A kind of exemplary scheme of above-mentioned terminal for the present embodiment.It should be noted that the technical scheme of the terminal withThe technical scheme that above-mentioned poem recites evaluating method belongs to same design, the details that the technical scheme of the terminal is not described in detailContent, may refer to the description that above-mentioned poem recites the technical scheme of evaluating method.

The present embodiment provides a kind of storage medium, and be stored with computer instruction, and the computer instruction performs following steps：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server.

A kind of exemplary scheme of above-mentioned storage medium for the present embodiment.It should be noted that the skill of the storage mediumThe technical scheme that art scheme recites evaluating method with above-mentioned poem belongs to same design, and the technical scheme of storage medium is not detailedThe detail content of description, may refer to the description that above-mentioned poem recites the technical scheme of evaluating method.

A kind of poem that the present embodiment is provided, which recites evaluating method and system, a kind of terminal and storage medium, following excellentPoint：

1. cost is low, the dedicated voice identification technology recited for classic poetry, training method is relatively easy, training needsData volume is relatively small, the training time is short；

2. speed is fast, progress detection in real time and feedback are recited classic poetry, and the speed of speech recognition is less than 1；

3. effect is good, (include but is not limited to various Android phone equipment, iphone hands in various Smartphone devicesMachine equipment), under various noise circumstances (including but is not limited to quiet environment, road, subway, coffee shop etc.), classic poetry recites inspectionAccuracy rate is surveyed more than 97%.

Fig. 7 is the hardware architecture diagram for the electronic equipment that the poem that the embodiment of the present application is provided recites evaluating method, such asShown in Fig. 7, the electronic equipment includes：

In one or more processors 710 and memory 720, Fig. 7 by taking a processor 710 as an example.

The equipment that execution poem recites evaluating method can also include：Input unit 730 and output device 740.

Processor 710, memory 720, input unit 730 and output device 740 can pass through bus or other modesIn connection, Fig. 7 exemplified by being connected by bus 750.

Memory 720 is as a kind of non-volatile computer readable storage medium storing program for executing, available for storage non-volatile software journeyIt is corresponding that poem in sequence, non-volatile computer executable program and module, such as the embodiment of the present application recites evaluating methodProgrammed instruction/module (for example, modules shown in accompanying drawing 6).Processor 710 is stored in memory 720 by operationNon-volatile software program, instruction and module, so that various function application and the data processing of execute server, that is, realizeThe poem of above method embodiment recites evaluating method.

Memory 720 can include storing program area and storage data field, wherein, storing program area can store operation systemApplication program required for system, at least one function；Storage data field can store the use institute that evaluating system is recited according to poemData of establishment etc..In addition, memory 720 can include high-speed random access memory, non-volatile memories can also be includedDevice, for example, at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some embodimentsIn, memory 720 is optional including the memory remotely located relative to processor 710, and these remote memories can pass through netNetwork is connected to poem and recites evaluating system.The example of above-mentioned network include but is not limited to internet, intranet, LAN,Mobile radio communication and combinations thereof.

Input unit 730 can receive the numeral or character information of input, and produce the use that evaluating system is recited with poemThe key signals input that family is set and function control is relevant.Output device 740 may include the display devices such as display screen.

One or more of modules are stored in the memory 720, when by one or more of processorsDuring 710 execution, the poem performed in above-mentioned any means embodiment recites evaluating method.

The said goods can perform the method that the embodiment of the present application is provided, and possesses the corresponding functional module of execution method and hasBeneficial effect.Not ins and outs of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present application is provided.

The electronic equipment of the embodiment of the present invention exists in a variety of forms, includes but is not limited to：

(1) mobile communication equipment：The characteristics of this kind equipment is that possess mobile communication function, and to provide speech, dataCommunicate as main target.This Terminal Type includes:Smart mobile phone, multimedia handset, feature mobile phone, and low-end mobile phone etc..

(2) super mobile personal computer equipment：This kind equipment belongs to the category of personal computer, there is calculating and processing work(Can, typically also possess mobile Internet access characteristic.This Terminal Type includes：Palm PC (PDA Personal DigitalAssistant), mobile internet device (MID, Mobile Internet Device) and Ultra-Mobile PC(UMPC, Ultra-mobile Personal Computer) equipment etc..

(3) portable entertainment device：This kind equipment can show and play content of multimedia.The kind equipment includes:Audio,Video player, handheld device, e-book, and intelligent toy and portable car-mounted navigation equipment.

(4) server：The equipment for providing the service of calculating, the composition of server is total including processor, hard disk, internal memory, systemLine etc., server is similar with general computer architecture, but is due to need to provide highly reliable service, therefore in processing energyRequire higher in terms of power, stability, reliability, security, scalability, manageability.

(5) other electronic systems with data interaction function.

, can be by it in multiple embodiments provided herein, it should be understood that disclosed system and methodIts mode is realized.For example, the embodiment of detecting system described above is only schematical, for example, the moduleDivide, only a kind of division of logic function there can be other dividing mode when actually realizing, such as multiple module or componentsAnother system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown orThe coupling each other discussed or direct-coupling or communication linkage can be by some interfaces, the INDIRECT COUPLING of module or logicalLetter link, can be electrical, machinery or other forms.

The module illustrated as separating component can be or may not be it is physically separate, it is aobvious as moduleThe part shown can be or may not be physical module, you can with positioned at a place, or can also be distributed to multipleOn mixed-media network modules mixed-media.Some or all of module therein can be selected to realize the mesh of this embodiment scheme according to the actual needs's.

In addition, each functional module in each embodiment of the invention can be integrated in a processing module, can alsoThat modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated mouldBlock can both be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.

If the integrated module is realized using in the form of software function module and as independent production marketing or usedWhen, it can be stored in a computer read/write memory medium.Understood based on such, technical scheme is substantiallyThe part contributed in other words to prior art or all or part of the technical scheme can be in the form of software productsEmbody, the computer software product is stored in a storage medium, including some instructions are to cause a computerEquipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the inventionPortion or part steps.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journeyThe medium of sequence code.

It should be noted that for foregoing each method embodiment, for simplicity description, therefore it is all expressed as a series ofCombination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement becauseAccording to the present invention, some steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also knowKnow, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be this hairsNecessary to bright.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodimentPoint, it may refer to the associated description of other embodiments.

Present invention disclosed above preferred embodiment is only intended to help and illustrates the present invention.Alternative embodiment is not detailedAll details of narration, it is only described embodiment that the invention is not limited yet.Obviously, according to the content of this specification,It can make many modifications and variations.This specification is chosen and specifically describes these embodiments, is to preferably explain the present inventionPrinciple and practical application so that skilled artisan can be best understood by and utilize the present invention.The present invention is onlyLimited by claims and its four corner and equivalent.

Claims

1. a kind of poem recites evaluating method, it is characterised in that including：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server；

The voice is converted into word using the identification model pre-established in the server；

The word is compared with the poem to be recited, obtaining for the word is calculated according to default code of pointsPoint；

The score is fed back into client.

2. poem according to claim 1 recites evaluating method, it is characterised in that carry out pretreatment bag to the voiceInclude：

To the voice burst；

The voice of burst is compressed.

3. poem according to claim 1 recites evaluating method, it is characterised in that upload the pretreated voiceAlso include after to server：

To the speech decompression uploaded onto the server；

The voice after decompression is subjected to Jing Yin detection.

4. poem according to claim 1 recites evaluating method, it is characterised in that pre-establish the step of the identification modelSuddenly include：

Initial speech recognition network is built,

The speech recognition network receives voice training data；

By the wherein one voice training data conversion received into word；

If the error is more than or equal to predetermined threshold value, the parameter of speech recognition network according to the error transfer factor, thenPerform again it is described by the wherein one voice training data conversion received into word the step of, continue to train the voice to knowOther network；

5. poem according to claim 4 recites evaluating method, it is characterised in that the construction voice training packetInclude：

6. poem according to claim 5 recites evaluating method, it is characterised in that the original poem voice includes differenceThe poem voice of tonequality and/or alternative sounds feature.

7. poem according to claim 5 recites evaluating method, it is characterised in that the construction voice training data are also wrappedInclude：

8. poem according to claim 1 recites evaluating method, it is characterised in that the identification model includes hidden Ma ErkeHusband's model (Hidden Markov Model, HMM) and/or deep neural network model (DNN).

9. poem according to claim 1 recites evaluating method, it is characterised in that described using advance in the serverThe voice is converted into word by the identification model of foundation, in addition to：The word after conversion is carried out using language modelError correction.

10. poem according to claim 9 recites evaluating method, it is characterised in that after using language model to conversionThe word is carried out after error correction, in addition to：Using probabilistic model by the word after conversion and the poem to be recitedTrue word carries out result alignment.

11. poem according to claim 10 recites evaluating method, it is characterised in that the probabilistic model is used for judging the back of the bodyThe poem content readed aloud is recited for order, recites or omit again and recite.

12. poem according to claim 11 recites evaluating method, it is characterised in that using probabilistic model by after conversionThe word carries out result with the true word of the poem to be recited and alignd, in addition to：

Correct in the event of the poem part omitted the poem recited and recited, then the poem recited omission is shown as the second faceColor, will recite correct poem and is shown as the first color.

13. a kind of poem recites evaluating system, it is characterised in that including client and server, the client includes：

Selecting module, for determining poem to be recited；

Acquisition module, the voice for obtaining the poem to be recited；

Pretreatment module, for being pre-processed to the voice；

Uploading module, for the pretreated voice to be uploaded onto the server；

The server includes：

Scoring modules, for the word to be compared with the poem to be recited, are calculated according to default code of pointsThe score of the word；

Feedback module, for the score to be fed back into client.

14. poem according to claim 13 recites evaluating system, it is characterised in that the pretreatment module includes：

Burst module, for the voice burst；

Compression module, for the voice of burst to be compressed.

15. poem according to claim 13 recites evaluating system, it is characterised in that the uploading module also includes：

16. poem according to claim 13 recites evaluating system, it is characterised in that the identification model includes：

Judge module, if the error is more than or equal to predetermined threshold value, the speech recognition network according to the error transfer factorParameter, then perform again it is described by the wherein one voice training data conversion received into word the step of, continue trainThe speech recognition network；If the error is less than predetermined threshold value, the speech recognition network training terminates.

17. poem according to claim 16 recites evaluating system, it is characterised in that the server also includes：

18. poem according to claim 17 recites evaluating system, it is characterised in that original poem voice includes different soundsThe poem voice of matter and/or alternative sounds feature.

19. poem according to claim 17 recites evaluating system, it is characterised in that the server also includes：

Non- poem voice acquisition module, for obtaining original non-poem voice, regard the original non-poem voice as training numberAccording to.

20. poem according to claim 13 recites evaluating system, it is characterised in that the identification model includes hidden Ma ErCan husband's model and/or deep neural network model.

21. poem according to claim 13 recites evaluating system, it is characterised in that the modular converter includes：Error correctionModule, for carrying out error correction to the word after conversion using language model.

22. poem according to claim 21 recites evaluating system, it is characterised in that the correction module also includes：It is rightNeat module, for the true word of the word after conversion and the poem to be recited to be carried out into result pair using probabilistic modelTogether.

23. poem according to claim 22 recites evaluating system, it is characterised in that the probabilistic model is used for judging the back of the bodyThe poem content readed aloud is recited for order, recites or omit again and recite.

24. poem according to claim 23 recites evaluating system, it is characterised in that the alignment module includes：

Again module is recited, if correct for the poem recited again, the poem for having been shown as the second color is changedFor the first color；

Module is recited in omission, for correct in the event of the poem part omitted the poem recited and recited, is then recited omissionPoem be shown as the second color, correct poem will be recited and be shown as the first color.

25. a kind of terminal, it is characterised in that including processor and memory, the memory storage has computer instruction, describedProcessor calls the computer instruction and performs following steps：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server.

26. a kind of storage medium, it is characterised in that be stored with computer instruction, the computer instruction performs following steps：

Client determines poem to be recited；

Obtain the voice of the poem to be recited；

The voice is pre-processed；

The pretreated voice is uploaded onto the server.