Movatterモバイル変換


[0]ホーム

URL:


CN109308896B - Voice processing method and device, storage medium and processor - Google Patents

Voice processing method and device, storage medium and processor
Download PDF

Info

Publication number
CN109308896B
CN109308896BCN201710633042.2ACN201710633042ACN109308896BCN 109308896 BCN109308896 BCN 109308896BCN 201710633042 ACN201710633042 ACN 201710633042ACN 109308896 BCN109308896 BCN 109308896B
Authority
CN
China
Prior art keywords
model
speech
vectors
processing
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710633042.2A
Other languages
Chinese (zh)
Other versions
CN109308896A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huitong Jinke Data Co ltd
Original Assignee
Jiangsu Huitong Jinke Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huitong Jinke Data Co ltdfiledCriticalJiangsu Huitong Jinke Data Co ltd
Priority to CN201710633042.2ApriorityCriticalpatent/CN109308896B/en
Priority to PCT/CN2018/079848prioritypatent/WO2019019667A1/en
Publication of CN109308896ApublicationCriticalpatent/CN109308896A/en
Application grantedgrantedCritical
Publication of CN109308896BpublicationCriticalpatent/CN109308896B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a voice processing method and device, a storage medium and a processor. Wherein, the method comprises the following steps: acquiring voice vectors at a plurality of moments in a preset time period; processing the voice vectors at multiple moments by using a preset voice model to obtain multiple text messages corresponding to the voice vectors at multiple moments, wherein the preset voice model processes the voice vectors at multiple moments on the basis of prestored parameter vectors at multiple moments; and outputting a plurality of text messages. The invention solves the technical problem of low processing efficiency of the voice processing method in the prior art.

Description

Voice processing method and device, storage medium and processor
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and an apparatus for processing speech, a storage medium, and a processor.
Background
Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between a user and a computer using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will relate to natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics, but has important difference.
Currently, common natural language processing methods include: a conditional random field CRF, a hidden markov model HMM, a recurrent neural network model RNN, a long-short term memory model LSTM, and the like, but in order to improve processing accuracy, it is necessary to increase the model depth, resulting in high processing complexity and low processing efficiency.
Aiming at the problem of low processing efficiency of a voice processing method in the prior art, an effective solution is not provided at present.
Disclosure of Invention
Embodiments of the present invention provide a voice processing method and apparatus, a storage medium, and a processor, so as to at least solve the technical problem of low processing efficiency of a voice processing method in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a speech processing method, including: acquiring voice vectors at a plurality of moments in a preset time period; processing the voice vectors at multiple moments by using a preset voice model to obtain multiple text messages corresponding to the voice vectors at multiple moments, wherein the preset voice model processes the voice vectors at multiple moments on the basis of prestored parameter vectors at multiple moments; and outputting a plurality of text messages.
Further, the presetting of the voice model includes: the voice processing model is used for processing the voice vectors at multiple moments based on the parameter vectors at multiple moments to obtain multiple text messages corresponding to the voice vectors at multiple moments.
Further, processing the speech vectors at multiple moments by using a preset speech model to obtain multiple text messages corresponding to the speech vectors at multiple moments, including: acquiring first parameter vectors at a plurality of moments from a parameter matrix according to reading operation; modifying the voice processing model by using the first parameter vectors at a plurality of moments to obtain a modified voice processing model; and processing the voice vectors at a plurality of moments by using the corrected voice processing model to obtain a plurality of text messages.
Further, while processing the speech vectors at a plurality of times by using the corrected speech processing model to obtain a plurality of text messages, the method further includes: obtaining second parameter vectors at a plurality of moments by using the corrected voice processing model; and writing the second parameter vector of a plurality of moments into the parameter matrix according to the writing operation.
Further, obtaining a second parameter vector at a plurality of time instants by using the modified speech processing model, including: and updating the first parameter vectors at multiple moments by using the corrected voice processing model to obtain second parameter vectors at multiple moments.
Further, before processing the speech vectors at multiple times by using a preset speech model to obtain multiple text messages corresponding to the speech vectors at multiple times, the method further includes: establishing an initial preset model, wherein the initial preset model comprises the following steps: a speech processing model and an initial parameter matrix; acquiring training data, wherein the training data comprises: a plurality of training speech vectors and text information corresponding to each training speech vector; and training the initial preset model according to the training data to obtain a preset voice model.
Further, training the initial preset model according to the training data to obtain a preset speech model comprises: inputting training data into a voice processing model to obtain a preset parameter vector; and writing the preset parameter vector into the initial parameter matrix through writing operation to obtain the parameter matrix.
Further, the speech processing model is an LSTM model, and the parameter matrix is a memory matrix.
Further, according to the processing capacity of the preset voice model, a preset time period is determined.
According to another aspect of the embodiments of the present invention, there is also provided a speech processing apparatus, including: the first acquisition module is used for acquiring voice vectors at a plurality of moments in a preset time period; the processing module is used for processing the voice vectors at multiple moments by utilizing a preset voice model to obtain multiple text messages corresponding to the voice vectors at multiple moments, wherein the preset voice model processes the voice vectors at multiple moments based on prestored parameter vectors at multiple moments; and the output module is used for outputting a plurality of text messages.
Further, the presetting of the voice model includes: the voice processing model is used for processing the voice vectors at multiple moments based on the parameter vectors at multiple moments to obtain multiple text messages corresponding to the voice vectors at multiple moments.
Further, the processing module includes: the obtaining submodule is used for obtaining first parameter vectors at a plurality of moments from the parameter matrix according to the reading operation; the correction submodule is used for correcting the voice processing model by using the first parameter vectors at a plurality of moments to obtain a corrected voice processing model; and the first processing submodule is used for processing the voice vectors at multiple moments by using the corrected voice processing model to obtain multiple text messages.
Further, the processing module further comprises: the second processing submodule is used for obtaining second parameter vectors at a plurality of moments by using the corrected voice processing model; and the first storage submodule is used for writing the second parameter vectors of a plurality of moments into the parameter matrix according to the writing operation.
Further, the second processing sub-module is further configured to update the first parameter vectors at multiple times by using the modified speech processing model, so as to obtain second parameter vectors at multiple times.
Further, the above apparatus further comprises: the establishing module is used for establishing an initial preset model, and the initial preset model comprises: a speech processing model and an initial parameter matrix; a second obtaining module, configured to obtain training data, where the training data includes: a plurality of training speech vectors and text information corresponding to each training speech vector; and the training module is used for training the initial preset model according to the training data to obtain a preset voice model.
Further, the training module comprises: the third processing submodule is used for inputting the training data into the voice processing model to obtain a preset parameter vector; and the second storage submodule is used for writing the preset parameter vector into the initial parameter matrix through write operation to obtain a preset voice model.
Further, the speech processing model is an LSTM model, and the parameter matrix is a memory matrix.
Further, the above apparatus further comprises: and the determining module is used for determining a preset time period according to the processing capacity of the preset voice model.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, an apparatus in which the storage medium is controlled to execute the voice processing method in the above embodiments.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the speech processing method in the above embodiments.
In the embodiment of the invention, the voice vectors at multiple moments in the preset time period are obtained, the preset voice model is utilized to process the voice vectors at multiple moments, multiple text messages corresponding to the voice vectors at multiple moments are obtained, and the multiple text messages are output, so that the natural language processing is realized. It is easy to note that, because the obtained speech vectors at multiple moments in the preset time period are obtained, and the preset speech model processes the speech vectors at multiple moments based on the prestored parameter vectors at multiple moments, the natural speech is processed by using the time-sequence characteristics of the natural speech and combining the memory matrix of the neural network machine and the LSTM model, and the technical problem of low processing efficiency of the speech processing method in the prior art is solved. Therefore, the scheme provided by the embodiment of the invention can achieve the effects of improving the processing efficiency, improving the processing accuracy, reducing the processing complexity and reducing the processing time.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a method of speech processing according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an alternative default speech model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an alternative repeating module of a speech processing model according to an embodiment of the invention; and
fig. 4 is a schematic diagram of a speech processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a speech processing method, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flow chart of a speech processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, obtaining voice vectors of a plurality of moments in a preset time period.
Optionally, in the foregoing embodiment of the present invention, the preset time period is determined according to a processing capability of a preset speech model.
Specifically, the preset time period may be set according to the processing capability of the actual speech processing model, and the multiple times may be multiple sampling times with equal intervals, for example, if the preset time period is 100s and the sampling interval is 10s, then the speech vectors at 10 times may be acquired within 100 s.
And step S104, processing the voice vectors at multiple moments by using a preset voice model to obtain multiple text messages corresponding to the voice vectors at multiple moments, wherein the preset voice model processes the voice vectors at multiple moments based on the prestored parameter vectors at multiple moments.
Optionally, in the above embodiment of the present invention, the presetting the speech model includes: the voice processing model is used for processing the voice vectors at multiple moments based on the parameter vectors at multiple moments to obtain multiple text messages corresponding to the voice vectors at multiple moments.
Optionally, in the above embodiment of the present invention, the speech processing model is an LSTM model, and the parameter matrix is a memory matrix.
Specifically, the preset speech model may be a neural turing machine, as shown in fig. 2, the neural turing machine includes two components: the controller (namely the voice processing model) and the memory matrix (namely the parameter matrix), wherein the memory matrix is an external memory matrix and stores parameter vectors required by the voice processing model for voice processing, and the controller can read and write the parameter vectors in the memory matrix; the above-mentioned speech processing model may be LSTM model, which is a special type in RNN, and can learn long-term dependence information, LSTM avoids the long-term dependence problem by a deliberate design, specifically, LSTM has a chain form of repeating neural network modules as other RNNs, but unlike a single neural network layer, the repeating modules have a different structure, as shown in fig. 3, and may be composed of input gates, forget gates, output gates, and interact in a very special way, thereby solving the problem of gradient disappearance and gradient explosion of RNN.
Step S106, outputting a plurality of text messages.
In an optional scheme, according to the time sequence characteristics of natural voice, natural voice data at a plurality of sampling moments in a preset time period can be obtained, voice vectors at a plurality of moments in the preset time period are obtained, a pre-trained neural turing machine is obtained, the voice vectors at the plurality of moments are identified by the neural turing machine, corresponding text information is obtained, and the identified text information is output.
According to the embodiment of the invention, the voice vectors at a plurality of moments in the preset time period are obtained, the preset voice model is utilized to process the voice vectors at the plurality of moments, a plurality of text messages corresponding to the voice vectors at the plurality of moments are obtained, and the plurality of text messages are output, so that the natural language processing is realized. It is easy to note that, because the obtained speech vectors at multiple moments in the preset time period are obtained, and the preset speech model processes the speech vectors at multiple moments based on the prestored parameter vectors at multiple moments, the natural speech is processed by using the time-sequence characteristics of the natural speech and combining the memory matrix of the neural network machine and the LSTM model, and the technical problem of low processing efficiency of the speech processing method in the prior art is solved. Therefore, the scheme provided by the embodiment of the invention can achieve the effects of improving the processing efficiency, improving the processing accuracy, reducing the processing complexity and reducing the processing time.
Optionally, in the foregoing embodiment of the present invention, in step S104, processing the speech vectors at multiple times by using a preset speech model, to obtain multiple pieces of text information corresponding to the speech vectors at multiple times, where the step includes:
step S1040, obtaining a first parameter vector at multiple times from the parameter matrix according to the read operation.
Specifically, as shown in fig. 2, the neuropsychological machine may include: and the read head and the write head, wherein the read head can read the W parameters in the LSTM model from the memory matrix, and the write head can write the new W parameters into the memory matrix.
Step S1042, modifying the speech processing model by using the first parameter vectors at multiple times, so as to obtain a modified speech processing model.
And step S1044, processing the voice vectors at multiple moments by using the corrected voice processing model to obtain multiple text messages.
In an optional scheme, after obtaining speech vectors at multiple times, for a natural speech processing process at each time, a reading head may read a W parameter vector from a memory matrix, input the W parameter vector into an LSTM model, modify the LSTM model to obtain a modified LSTM model, may input the speech vector as an input vector into the modified LSTM model to obtain an output vector of the LSTM model, that is, text information of the speech vector, and after processing of all the speech vectors at multiple times, obtain multiple text information corresponding to the speech vectors at multiple times.
Optionally, in the foregoing embodiment of the present invention, in step S1044, the method further includes, while processing the speech vectors at multiple times by using the corrected speech processing model to obtain multiple text messages:
step S1046, obtaining a second parameter vector at a plurality of times by using the corrected speech processing model.
Optionally, in the foregoing embodiment of the present invention, in step S1046, obtaining the second parameter vector at multiple time points by using the modified speech processing model, includes:
step S10462, updating the first parameter vectors at multiple times by using the corrected speech processing model, and obtaining second parameter vectors at multiple times.
And step S1048, writing the second parameter vectors of a plurality of moments into the parameter matrix according to the writing operation.
In an alternative scheme, for the natural speech processing process at each time, in the process of processing the speech vector by using the LSTM model, not only the text information of the speech vector but also a new W parameter vector can be obtained, and the new W parameter vector is written into the memory matrix by the writing head to be used as the W parameter vector at the next time.
Optionally, in the foregoing embodiment of the present invention, before the step S104, processing the speech vectors at multiple times by using a preset speech model to obtain multiple pieces of text information corresponding to the speech vectors at multiple times, the method further includes:
step S108, establishing an initial preset model, wherein the initial preset model comprises the following steps: a speech processing model and an initial parameter matrix.
Step S110, acquiring training data, wherein the training data includes: a plurality of training speech vectors, and text information corresponding to each training speech vector.
And step S112, training the initial preset model according to the training data to obtain a preset voice model.
In an optional scheme, an LSTM model in the neurographic machine may be pre-established according to actual processing requirements, a W parameter vector in a memory matrix is set as an initial value, and then the pre-established neurographic machine is trained according to training data to obtain a neurographic machine with higher accuracy.
Optionally, in the foregoing embodiment of the present invention, in step S112, training the initial preset model according to the training data, and obtaining the preset speech model includes:
step S1122, inputting the training data into the speech processing model to obtain a preset parameter vector.
In step S1124, the preset parameter vector is written into the initial parameter matrix by a write operation, so as to obtain a parameter matrix.
In an optional scheme, in order to obtain a neural turing machine with higher accuracy, a plurality of training speech vectors in training data may be used as input vectors, text information corresponding to each training speech vector is used as an output vector, the output vectors are input into the LSTM model to obtain a preset W parameter vector of the LSTM model, and the preset W parameter vector is written into the memory matrix through a writing head, so that the neural turing machine with higher accuracy is obtained.
Example 2
According to an embodiment of the present invention, there is provided an embodiment of a speech processing apparatus.
Fig. 4 is a schematic diagram of a speech processing apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including:
the first obtaining module 41 is configured to obtain speech vectors at multiple moments in a preset time period.
Optionally, in the above embodiment of the present invention, the apparatus further includes: and the determining module is used for determining a preset time period according to the processing capacity of the preset voice model.
Specifically, the preset time period may be set according to the processing capability of the model, and the multiple times may be multiple sampling times with equal intervals, for example, if the preset time period is 100s and the sampling interval is 10s, then within 100s, the speech vectors at 10 times may be acquired.
And a processing module 43, configured to process the speech vectors at multiple times by using a preset speech model, so as to obtain multiple pieces of text information corresponding to the speech vectors at multiple times, where the preset speech model processes the speech vectors at multiple times based on pre-stored parameter vectors at multiple times.
Optionally, in the above embodiment of the present invention, the presetting the speech model includes: the voice processing model is used for processing the voice vectors at multiple moments based on the parameter vectors at multiple moments to obtain multiple text messages corresponding to the voice vectors at multiple moments.
Optionally, in the above embodiment of the present invention, the speech processing model is an LSTM model, and the parameter matrix is a memory matrix.
Specifically, the preset speech model may be a neural turing machine, as shown in fig. 2, the neural turing machine includes two components: the controller (namely the voice processing model) and the memory matrix (namely the parameter matrix), wherein the memory matrix is an external memory matrix and stores parameter vectors required by the voice processing model for voice processing, and the controller can read and write the parameter vectors in the memory matrix; the above-mentioned speech processing model may be LSTM model, which is a special type in RNN, and can learn long-term dependence information, LSTM avoids the long-term dependence problem by a deliberate design, specifically, LSTM has a chain form of repeating neural network modules as other RNNs, but unlike a single neural network layer, the repeating modules have a different structure, as shown in fig. 3, and may be composed of input gates, forget gates, output gates, and interact in a very special way, thereby solving the problem of gradient disappearance and gradient explosion of RNN.
And an output module 45, configured to output a plurality of text messages.
In an optional scheme, according to the time sequence characteristics of natural voice, natural voice data at a plurality of sampling moments in a preset time period can be obtained, voice vectors at a plurality of moments in the preset time period are obtained, a pre-trained neural turing machine is obtained, the voice vectors at the plurality of moments are identified by the neural turing machine, corresponding text information is obtained, and the identified text information is output.
According to the embodiment of the invention, the voice vectors at a plurality of moments in the preset time period are obtained, the preset voice model is utilized to process the voice vectors at the plurality of moments, a plurality of text messages corresponding to the voice vectors at the plurality of moments are obtained, and the plurality of text messages are output, so that the natural language processing is realized. It is easy to note that, because the obtained speech vectors at multiple moments in the preset time period are obtained, and the preset speech model processes the speech vectors at multiple moments based on the prestored parameter vectors at multiple moments, the natural speech is processed by using the time-sequence characteristics of the natural speech and combining the memory matrix of the neural network machine and the LSTM model, and the technical problem of low processing efficiency of the speech processing method in the prior art is solved. Therefore, the scheme provided by the embodiment of the invention can achieve the effects of improving the processing efficiency, improving the processing accuracy, reducing the processing complexity and reducing the processing time.
Optionally, in the above embodiment of the present invention, the processing module 43 includes:
and the obtaining submodule is used for obtaining the first parameter vectors at a plurality of moments from the parameter matrix according to the reading operation.
Specifically, as shown in fig. 2, the neuropsychological machine may include: and the read head and the write head, wherein the read head can read the W parameters in the LSTM model from the memory matrix, and the write head can write the new W parameters into the memory matrix.
And the correction submodule is used for correcting the voice processing model by using the first parameter vectors at a plurality of moments to obtain a corrected voice processing model.
And the first processing submodule is used for processing the voice vectors at multiple moments by using the corrected voice processing model to obtain multiple text messages.
In an optional scheme, after obtaining speech vectors at multiple times, for a natural speech processing process at each time, a reading head may read a W parameter vector from a memory matrix, input the W parameter vector into an LSTM model, modify the LSTM model to obtain a modified LSTM model, may input the speech vector as an input vector into the modified LSTM model to obtain an output vector of the LSTM model, that is, text information of the speech vector, and after processing of all the speech vectors at multiple times, obtain multiple text information corresponding to the speech vectors at multiple times.
Optionally, in the above embodiment of the present invention, the processing module 43 further includes:
and the second processing submodule is used for obtaining second parameter vectors at a plurality of moments by using the corrected voice processing model.
Optionally, in the foregoing embodiment of the present invention, the second processing sub-module is further configured to update the first parameter vectors at multiple times by using the modified speech processing model, so as to obtain second parameter vectors at multiple times.
And the first storage submodule is used for writing the second parameter vectors of a plurality of moments into the parameter matrix according to the writing operation.
In an alternative scheme, for the natural speech processing process at each time, in the process of processing the speech vector by using the LSTM model, not only the text information of the speech vector but also a new W parameter vector can be obtained, and the new W parameter vector is written into the memory matrix by the writing head to be used as the W parameter vector at the next time.
Optionally, in the above embodiment of the present invention, the apparatus further includes:
the establishing module is used for establishing an initial preset model, and the initial preset model comprises: a speech processing model and an initial parameter matrix.
A second obtaining module, configured to obtain training data, where the training data includes: a plurality of training speech vectors, and text information corresponding to each training speech vector.
And the training module is used for training the initial preset model according to the training data to obtain a preset voice model.
In an optional scheme, an LSTM model in the neurographic machine may be pre-established according to actual processing requirements, a W parameter vector in a memory matrix is set as an initial value, and then the pre-established neurographic machine is trained according to training data to obtain a neurographic machine with higher accuracy.
Optionally, in the above embodiment of the present invention, the training module includes:
and the third processing submodule is used for inputting the training data into the voice processing model to obtain a preset parameter vector.
And the second storage submodule is used for writing the preset parameter vector into the initial parameter matrix through write operation to obtain the parameter matrix.
In an optional scheme, in order to obtain a neural turing machine with higher accuracy, a plurality of training speech vectors in training data may be used as input vectors, text information corresponding to each training speech vector is used as an output vector, the output vectors are input into the LSTM model to obtain a preset W parameter vector of the LSTM model, and the preset W parameter vector is written into the memory matrix through a writing head, so that the neural turing machine with higher accuracy is obtained.
Example 3
According to an embodiment of the present invention, there is provided an embodiment of a storage medium including a stored program, wherein an apparatus in which the storage medium is located is controlled to execute the voice processing method in the above-described embodiment 1 when the program runs.
Example 4
According to an embodiment of the present invention, an embodiment of a processor for running a program is provided, where the program is run to execute the voice processing method in embodiment 1.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection according to some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (11)

CN201710633042.2A2017-07-282017-07-28Voice processing method and device, storage medium and processorActiveCN109308896B (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN201710633042.2ACN109308896B (en)2017-07-282017-07-28Voice processing method and device, storage medium and processor
PCT/CN2018/079848WO2019019667A1 (en)2017-07-282018-03-21Speech processing method and apparatus, storage medium and processor

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710633042.2ACN109308896B (en)2017-07-282017-07-28Voice processing method and device, storage medium and processor

Publications (2)

Publication NumberPublication Date
CN109308896A CN109308896A (en)2019-02-05
CN109308896Btrue CN109308896B (en)2022-04-15

Family

ID=65040955

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710633042.2AActiveCN109308896B (en)2017-07-282017-07-28Voice processing method and device, storage medium and processor

Country Status (2)

CountryLink
CN (1)CN109308896B (en)
WO (1)WO2019019667A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112489630A (en)*2019-09-122021-03-12武汉Tcl集团工业研究院有限公司 A kind of speech recognition method and device
CN113095559B (en)*2021-04-022024-04-09京东科技信息技术有限公司Method, device, equipment and storage medium for predicting hatching time
CN113836270A (en)*2021-09-282021-12-24深圳格隆汇信息科技有限公司Big data processing method and related product

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101123090A (en)*2006-08-112008-02-13哈曼贝克自动系统股份有限公司Speech recognition by statistical language using square-rootdiscounting
WO2013054347A2 (en)*2011-07-202013-04-18Tata Consultancy Services LimitedA method and system for detecting boundary of coarticulated units from isolated speech
CN105070300A (en)*2015-08-122015-11-18东南大学Voice emotion characteristic selection method based on speaker standardization change
CN106157950A (en)*2016-09-292016-11-23合肥华凌股份有限公司Speech control system and awakening method, Rouser and household electrical appliances, coprocessor

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6021403A (en)*1996-07-192000-02-01Microsoft CorporationIntelligent user assistance facility
CN1295672C (en)*2002-03-272007-01-17诺基亚有限公司Pattern recognition
JP2010204391A (en)*2009-03-032010-09-16Nippon Telegr & Teleph Corp <Ntt>Voice signal modeling method, signal recognition device and method, parameter learning device and method, and feature value generating device, method, and program
US9378729B1 (en)*2013-03-122016-06-28Amazon Technologies, Inc.Maximum likelihood channel normalization
CN105989839B (en)*2015-06-032019-12-13乐融致新电子科技(天津)有限公司Speech recognition method and device
DE102015211101B4 (en)*2015-06-172025-02-06Volkswagen Aktiengesellschaft Speech recognition system and method for operating a speech recognition system with a mobile unit and an external server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101123090A (en)*2006-08-112008-02-13哈曼贝克自动系统股份有限公司Speech recognition by statistical language using square-rootdiscounting
WO2013054347A2 (en)*2011-07-202013-04-18Tata Consultancy Services LimitedA method and system for detecting boundary of coarticulated units from isolated speech
CN105070300A (en)*2015-08-122015-11-18东南大学Voice emotion characteristic selection method based on speaker standardization change
CN106157950A (en)*2016-09-292016-11-23合肥华凌股份有限公司Speech control system and awakening method, Rouser and household electrical appliances, coprocessor

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《Deep Convolutional Neural Networks with Layer-wise》;D.Yu et al.;《Pro.Interspeech》;20161231;全文*
《基于正交的神经网络在语音识别上的应用研究》;倪蔚民;《电脑开发与应用》;19981231;全文*
《神经网络—CNN结构和语音识别应用》;xumcas;《CSDN》;20170131;第3-4页第四节语音应用*
《面向自然语音处理的深度学习研究》;奚雪峰等;《自动化学报》;20161031;全文*

Also Published As

Publication numberPublication date
CN109308896A (en)2019-02-05
WO2019019667A1 (en)2019-01-31

Similar Documents

PublicationPublication DateTitle
CN110457994B (en)Face image generation method and device, storage medium and computer equipment
CN109002433B (en)Text generation method and device
Engesser et al.Meaningful call combinations and compositional processing in the southern pied babbler
CN111104799B (en)Text information characterization method, system, computer equipment and storage medium
US9824681B2 (en)Text-to-speech with emotional content
CN112837676B (en)Statement generation method, statement generation device and intelligent device
US11010554B2 (en)Method and device for identifying specific text information
CN109308896B (en)Voice processing method and device, storage medium and processor
CN109559221A (en)Collection method, apparatus and storage medium based on user data
CN108682420A (en)A kind of voice and video telephone accent recognition method and terminal device
CN108346107B (en)Social content risk identification method, device and equipment
CN107729314A (en)A kind of Chinese time recognition methods, device and storage medium, program product
CN107229733B (en)Extended question evaluation method and device
EP3101598A3 (en)Augmented neural networks
CN104077930A (en)Powerful memorizing and learning method based on mobile terminal and mobile terminal
CN107657313B (en)System and method for transfer learning of natural language processing task based on field adaptation
CN110990556B (en)Idiom recommendation method and device, training method and device of idiom recommendation model
CN106897265A (en)Term vector training method and device
CN111949786B (en)Intelligent question-answering model optimization method and device
KR102745272B1 (en)Device and method for emotion transplantation
CN107577654A (en)E-book color matching method, electronic equipment and storage medium based on front cover analysis
CN109255106A (en)A kind of text handling method and terminal
CN113255357A (en)Data processing method, target recognition model training method, target recognition method and device
CN110232116B (en)Method and device for adding expressions in reply sentence
CN115795028B (en)Intelligent document generation method and system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right
TA01Transfer of patent application right

Effective date of registration:20220310

Address after:215000 Floor 9, building 5, Asia Pacific Plaza, No. 18, Zhaofeng Road, Huaqiao Town, Kunshan City, Suzhou City, Jiangsu Province

Applicant after:Jiangsu Huitong Jinke Data Co.,Ltd.

Address before:518000 Guangdong, Shenzhen, Nanshan District, Nanhai Road, West Guangxi Temple Road North Sunshine Huayi Building 1 15D-02F

Applicant before:SHEN ZHEN KUANG-CHI HEZHONG TECHNOLOGY Ltd.

Applicant before:Shenzhen Guangqi Innovation Technology Co., Ltd

GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp