Movatterモバイル変換


[0]ホーム

URL:


CN106486130B - Noise elimination and voice recognition method and device - Google Patents

Noise elimination and voice recognition method and device
Download PDF

Info

Publication number
CN106486130B
CN106486130BCN201510524909.1ACN201510524909ACN106486130BCN 106486130 BCN106486130 BCN 106486130BCN 201510524909 ACN201510524909 ACN 201510524909ACN 106486130 BCN106486130 BCN 106486130B
Authority
CN
China
Prior art keywords
voiceprint
audio data
processed
parameter
original audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510524909.1A
Other languages
Chinese (zh)
Other versions
CN106486130A (en
Inventor
李士岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510524909.1ApriorityCriticalpatent/CN106486130B/en
Priority to PCT/CN2015/095364prioritypatent/WO2017031846A1/en
Publication of CN106486130ApublicationCriticalpatent/CN106486130A/en
Application grantedgrantedCritical
Publication of CN106486130BpublicationCriticalpatent/CN106486130B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention provides a noise elimination and voice recognition method and device. The noise elimination method carries out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters, so that effective audio data can be acquired from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, other sound signals such as noise signals and the like are acquired without an additional sound acquisition device, and the problem that in the prior art, due to the fact that the distance between a signal source corresponding to the voice signals and two microphones changes, suppression of the voice signals is carried out to the same degree as that of the noise signals can be avoided, the reliability of noise reduction is improved, and meanwhile, the sound quality after the noise reduction can be effectively improved.

Description

Noise elimination and voice recognition method and device
[ technical field ] A method for producing a semiconductor device
The present invention relates to noise processing technologies, and in particular, to a method and an apparatus for noise cancellation and speech recognition.
[ background of the invention ]
With the development of sound processing technology becoming faster and faster, the terminal has higher and higher requirements on the sound quality to be processed, and the noise reduction technology is developed accordingly. The current noise reduction technology mainly adopts dual microphones to actively reduce noise, and performs noise suppression processing on audio data (i.e. corresponding to a noise signal and a speech signal with strong signal strength) collected by one microphone to audio data (i.e. corresponding to the noise signal and the speech signal with strong signal strength) collected by the other microphone through a certain algorithm.
However, if the distance between the signal source (for example, human mouth) corresponding to the voice signal and the two microphones varies, the voice signal may be determined as noise, so that the voice signal is also suppressed to the same extent as the noise signal, the sound quality after noise reduction is seriously affected, and the reliability of noise reduction is reduced.
[ summary of the invention ]
Aspects of the present invention provide a noise cancellation and speech recognition method and apparatus for improving the reliability of noise reduction.
In one aspect of the present invention, a noise cancellation method is provided, including:
based on the specific voiceprint parameters, carrying out voiceprint matching on the acquired original audio data to be processed;
and obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.
The above-described aspects and any possible implementations further provide an implementation in which the specific voiceprint parameter is a voiceprint parameter of a target user, an
The obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching comprises:
and acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.
The above-described aspect and any possible implementation manner further provide an implementation manner, before the voiceprint matching is performed on the obtained original audio data to be processed based on the specific voiceprint parameters, the method further includes:
acquiring a voice signal of the target user;
and acquiring the voiceprint parameters of the target user based on the acquired voice signal of the target user.
The above-described aspects and any possible implementations further provide an implementation in which the particular voiceprint parameter is a voiceprint parameter of a noise signal of the target environment, an
The obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching comprises:
and removing the audio data with successful voiceprint matching from the original audio data to be processed to serve as the effective audio data.
The above-described aspect and any possible implementation manner further provide an implementation manner, before the voiceprint matching is performed on the obtained original audio data to be processed based on the specific voiceprint parameters, the method further includes:
acquiring a noise signal of the target environment;
based on the acquired noise signal of the target environment, obtaining a voiceprint parameter of the noise signal.
In another aspect of the present invention, there is provided a noise removing apparatus including:
the voiceprint matching unit is used for carrying out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters;
and the effective audio data acquisition unit is used for acquiring effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.
The above-described aspects and any possible implementations further provide an implementation in which the specific voiceprint parameter is a voiceprint parameter of a target user, an
The effective audio data acquisition unit is used for acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.
The above-described aspect and any possible implementation further provide an implementation, where the noise cancellation apparatus further includes:
the voice signal acquisition unit is used for acquiring a voice signal of the target user;
a first voiceprint parameter obtaining unit, configured to obtain a voiceprint parameter of the target user based on the obtained voice signal of the target user.
The above-described aspects and any possible implementations further provide an implementation in which the particular voiceprint parameter is a voiceprint parameter of a noise signal of the target environment, an
The effective audio data acquisition unit is used for removing audio data successfully matched with the voiceprint from the original audio data to be processed to serve as the effective audio data.
The above-described aspect and any possible implementation further provide an implementation, where the noise cancellation apparatus further includes:
a noise signal acquisition unit for acquiring a noise signal of the target environment;
a second voiceprint parameter obtaining unit, configured to obtain a voiceprint parameter of the noise signal based on the obtained noise signal of the target environment.
In another aspect of the present invention, a speech recognition method is provided, including:
acquiring original audio data to be processed;
based on specific voiceprint parameters, carrying out voiceprint matching on the acquired original audio data to be processed;
obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching;
and carrying out voice recognition processing on the effective audio data.
The above-described aspects and any possible implementations further provide an implementation in which the specific voiceprint parameter is a voiceprint parameter of a target user, an
The obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching comprises:
and acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.
The above-described aspect and any possible implementation manner further provide an implementation manner, before the voiceprint matching is performed on the acquired original audio data to be processed based on the specific voiceprint parameters, the method further includes:
acquiring a voice signal of the target user;
and acquiring the voiceprint parameters of the target user based on the acquired voice signal of the target user.
The above-described aspects and any possible implementations further provide an implementation in which the particular voiceprint parameter is a voiceprint parameter of a noise signal of the target environment, an
The obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching comprises:
and removing the audio data with successful voiceprint matching from the original audio data to be processed to serve as the effective audio data.
The above-described aspect and any possible implementation manner further provide an implementation manner, before the voiceprint matching is performed on the acquired original audio data to be processed based on the specific voiceprint parameters, the method further includes:
acquiring a noise signal of the target environment;
based on the acquired noise signal of the target environment, obtaining a voiceprint parameter of the noise signal.
In another aspect of the present invention, there is provided a speech recognition apparatus including:
the original audio data acquisition unit is used for acquiring original audio data to be processed;
the noise cancellation device as described above;
and the voice recognition unit is used for carrying out voice recognition processing on the effective audio data.
As can be seen from the foregoing technical solutions, on one hand, in the embodiments of the present invention, the obtained original audio data to be processed is subjected to voiceprint matching based on the specific voiceprint parameter, so that the valid audio data can be obtained from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching, and an additional sound collection device is not required to collect other sound signals, such as a noise signal, and therefore, the problem that the distance between a signal source corresponding to a speech signal and two microphones changes in the prior art, which results in suppression of the speech signal to the same extent as the noise signal, can be avoided, so that the reliability of noise reduction is improved, and the sound quality after noise reduction can be effectively improved.
As can be seen from the foregoing technical solutions, on the other hand, in the embodiments of the present invention, original audio data to be processed is obtained, and then voiceprint matching is performed on the obtained original audio data to be processed based on a specific voiceprint parameter, so that valid audio data can be obtained from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, and voice recognition processing is performed on the valid audio data.
In addition, by adopting the technical scheme provided by the invention, only one sound acquisition device is needed, and the cost can be effectively reduced.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the embodiments or the prior art descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor.
Fig. 1 is a schematic flow chart of a noise cancellation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is the voiceprint parameter of the target user in the embodiment corresponding to FIG. 1;
FIG. 3 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment in the embodiment corresponding to FIG. 1;
FIG. 4 is a flowchart illustrating a speech recognition method according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of a noise cancellation apparatus according to another embodiment of the present invention;
FIG. 6 is a schematic structural diagram of the noise cancellation apparatus in the case where the specific voiceprint parameter is the voiceprint parameter of the target user in the embodiment corresponding to FIG. 5;
FIG. 7 is a schematic structural diagram of a noise cancellation apparatus in a case where the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment in the embodiment corresponding to FIG. 5;
fig. 8 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terminal according to the embodiment of the present invention may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a Personal Computer (PC), an MP3 player, an MP4 player, a wearable device (e.g., smart glasses, smart watch, smart bracelet, etc.), and the like.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 is a schematic flow chart of a noise cancellation method according to an embodiment of the present invention, as shown in fig. 1.
101. And performing voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters.
102. And obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.
It should be noted that part or all of the execution subjects of 101 to 102 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, which is not particularly limited in this embodiment.
It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, and this embodiment is not particularly limited thereto.
Therefore, the acquired original audio data to be processed is subjected to voiceprint matching based on the specific voiceprint parameters, so that effective audio data can be acquired from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, other sound signals such as noise signals and the like are acquired without an additional sound acquisition device, the problem that in the prior art, due to the fact that the distance between a signal source corresponding to the voice signals and two microphones changes, suppression of the voice signals to the same degree as the noise signals is carried out can be avoided, the reliability of noise reduction is improved, and meanwhile, the sound quality after the noise reduction can be effectively improved.
In the invention, the original audio data to be processed can be acquired by using a sound acquisition device. The sound collection device may be a microphone or the like that is built in or outside the terminal, which is not particularly limited in this embodiment.
Specifically, a sound collection device can be used to collect a sound signal including a speech signal to be processed by the terminal. Typically, the sound signal may be contaminated with noise signals. The collected sound signals may then be converted into raw audio data to be processed.
Specifically, the so-called raw audio data to be processed is a digital signal converted from an audio signal. For example, the sound signal may be specifically sampled, quantized, and encoded to obtain Pulse Code Modulation (PCM) data as original audio data to be processed.
In this embodiment, need not to adopt extra sound collection system and additionally gather supplementary audio data again, and only need adopt a sound collection system to gather the original audio data of treating processing can, effective reduce cost.
Optionally, in a possible implementation manner of this embodiment, in 101, the to-be-processed original audio data may be specifically subjected to framing processing to obtain at least one frame of data, and then each frame of data in the at least one frame of data is subjected to audio analysis processing to obtain a voiceprint feature of each frame of data. And then, matching the voiceprint characteristics of the original audio data to be processed based on the specific voiceprint parameters. If the two are consistent, the matching is successful, and if the two are not consistent, the matching is failed.
The term "match" may mean all match, i.e., complete match, or may mean partial match, and this embodiment is not particularly limited thereto.
Specifically, the raw audio data to be processed may be subjected to framing processing at preset time intervals, for example, 20ms, and there is partial data overlap between adjacent frames, for example, 50% data overlap, so that at least one frame of data of the raw audio data to be processed can be obtained.
The so-called voiceprint feature is a feature specific to audio data, and refers to a content-based digital signature that can represent important acoustic features of a piece of audio data, and the main purpose of the voiceprint feature is to establish an effective mechanism for comparing the perceptual auditory quality of two pieces of audio data. Note that here, rather than directly comparing the typically large audio data itself, their corresponding typically smaller voiceprint features are compared.
In one particular implementation, the voiceprint features can include, but are not limited to, acoustic features related to the anatomy of a human pronunciation mechanism, such as spectrum, cepstrum, formants, pitch, reflection coefficients, and the like.
Optionally, in a possible implementation manner of this embodiment, before 101, the specific voiceprint parameter may be further set to serve as a reference parameter for voiceprint matching. Specifically, the specific voiceprint parameter may be a voiceprint parameter of the target user, or may also be a voiceprint parameter of a noise signal of the target environment, which is not particularly limited in this embodiment. The following describes in detail the noise cancellation method provided in this embodiment when the two specific voiceprint parameters are the voiceprint parameter of the target user and the voiceprint parameter of the noise signal of the target environment, respectively.
Fig. 2 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is the voiceprint parameter of the target user in the embodiment corresponding to fig. 1, as shown in fig. 2.
201. And carrying out voiceprint matching on the acquired original audio data to be processed based on the voiceprint parameters of the target user.
Optionally, in a possible implementation manner of this embodiment, before 201, a voice signal of the target user may be further obtained, and then, based on the obtained voice signal of the target user, a voiceprint parameter of the target user may be obtained.
Specifically, the voice signal of the target user may be sampled, quantized, and encoded to obtain PCM data as user audio data. Then, the user audio data may be subjected to framing processing to obtain at least one frame of data, and further, each frame of data in the at least one frame of data is subjected to audio analysis processing to obtain a voiceprint parameter of each frame of data.
For example, the user audio data may be subjected to framing processing at a preset time interval, for example, 20ms, and there is a partial data overlap between adjacent frames, for example, 50% data overlap, so that at least one frame of data of the user audio data can be obtained.
202. And acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.
In this implementation, the specific voiceprint parameter refers to a voiceprint parameter of the voice signal of the target user obtained according to the voice signal of the target user. Therefore, the voiceprint feature successfully matched can be considered as the voiceprint feature corresponding to the voice signal sent by the target user using the terminal.
Fig. 3 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment in the embodiment corresponding to fig. 1, as shown in fig. 3.
301. And carrying out voiceprint matching on the acquired original audio data to be processed based on the voiceprint parameters of the noise signal of the target environment.
Optionally, in a possible implementation manner of this embodiment, before 301, a noise signal of the target environment may be further obtained, and then, a voiceprint parameter of the noise signal may be obtained based on the obtained noise signal of the target environment.
Specifically, the speech signal of the target environment may be sampled, quantized, and encoded to obtain PCM data as the environmental audio data. Then, the environmental audio data may be subjected to framing processing to obtain at least one frame of data, and further, each frame of data in the at least one frame of data is subjected to audio analysis processing to obtain a voiceprint parameter of each frame of data.
For example, the environmental audio data may be subjected to framing processing at a preset time interval, for example, 20ms, and there is a partial data overlap between adjacent frames, for example, 50% data overlap, so that at least one frame of data of the environmental audio data can be obtained.
302. And removing the audio data with successful voiceprint matching from the original audio data to be processed to serve as the effective audio data.
In this implementation, the specific voiceprint parameter refers to a voiceprint parameter of a noise signal of a target environment obtained according to the noise signal of the target environment. Therefore, the successfully matched voiceprint feature can be considered as the voiceprint feature corresponding to the noise signal generated in the target environment where the terminal is located.
It is to be understood that at least one empirical parameter may be used as the specific voiceprint parameter in addition to the two specific voiceprint parameters described above.
It should be noted that after obtaining the specific voiceprint parameters, the obtained specific voiceprint parameters need to be further processed by storage. Specifically, the obtained specific voiceprint parameters may be stored in a storage device of the terminal.
In a specific implementation process, the storage device of the terminal may be a slow storage device, specifically, a hard disk of a computer system, or may also be a non-operating Memory of a mobile phone, that is, a physical Memory, such as a Read-Only Memory (ROM), a Memory card, and the like, which is not limited in this embodiment.
In another specific implementation process, the storage device of the terminal may also be a fast storage device, specifically, a Memory of a computer system, or may also be a running Memory of a mobile phone, that is, a system Memory, for example, a Random Access Memory (RAM), and the like, which is not particularly limited in this embodiment.
Optionally, in a possible implementation manner of this embodiment, after 102, speech recognition processing may be further performed on the valid audio data.
The effective audio data is the audio data extracted from the original audio data to be processed according to the specific voiceprint parameters, and the audio data can be regarded as voice signals of users using the terminal, so that the effective audio data does not contain noise signals any more, and the sound quality is effectively improved.
Furthermore, the effective audio data is subjected to voice recognition processing, and the obtained recognition result is high in accuracy.
In this embodiment, the obtained original audio data to be processed is subjected to voiceprint matching based on the specific voiceprint parameter, so that valid audio data can be obtained from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, an additional sound collection device is not required to collect other sound signals such as noise signals, and the problem that in the prior art, due to the fact that the distance between a signal source corresponding to a speech signal and two microphones changes, suppression of the speech signal is performed to the same degree as that of the noise signal can be avoided, and therefore reliability of noise reduction is improved, and meanwhile, sound quality after noise reduction can be effectively improved.
In addition, by adopting the technical scheme provided by the invention, only one sound acquisition device is needed, and the cost can be effectively reduced.
Fig. 4 is a flowchart illustrating a speech recognition method according to another embodiment of the present invention, as shown in fig. 4.
401. And acquiring original audio data to be processed.
402. And carrying out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters.
403. And obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.
404. And carrying out voice recognition processing on the effective audio data.
It should be noted that part or all of the executionmain bodies 401 to 404 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, which is not particularly limited in this embodiment.
It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, and this embodiment is not particularly limited thereto.
In the present invention, details of 402 and 403 may refer to relevant contents in the embodiments corresponding to fig. 1 to fig. 3, and are not described herein again.
In this embodiment, by acquiring original audio data to be processed, and further performing voiceprint matching on the acquired original audio data to be processed based on a specific voiceprint parameter, effective audio data can be acquired from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, and voice recognition processing is performed on the effective audio data.
Furthermore, the effective audio data is subjected to voice recognition processing, and the obtained recognition result is high in accuracy.
In addition, by adopting the technical scheme provided by the invention, only one sound acquisition device is needed, and the cost can be effectively reduced.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
Fig. 5 is a schematic structural diagram of a noise cancellation device according to another embodiment of the present invention, as shown in fig. 5. The noise canceling device of the present embodiment may include avoiceprint matching unit 51 and an effective audiodata acquiring unit 52. Thevoiceprint matching unit 51 is configured to perform voiceprint matching on the acquired original audio data to be processed based on a specific voiceprint parameter; and the effective audiodata obtaining unit 52 is configured to obtain effective audio data from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching.
It should be noted that, part or all of the noise cancellation apparatus provided in this embodiment may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system on the network side, which is not particularly limited in this embodiment.
It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, and this embodiment is not particularly limited thereto.
Optionally, in a possible implementation manner of this embodiment, the specific voiceprint parameter is a voiceprint parameter of the target user; accordingly, the valid audiodata obtaining unit 52 may be specifically configured to obtain audio data with successfully matched voiceprints from the original audio data to be processed, as the valid audio data.
Optionally, in a possible implementation manner of this embodiment, as shown in fig. 6, the noise cancellation device provided in this embodiment may further include:
a voicesignal acquiring unit 61 configured to acquire a voice signal of the target user;
a first voiceprintparameter obtaining unit 62, configured to obtain a voiceprint parameter of the target user based on the obtained voice signal of the target user.
Optionally, in a possible implementation manner of this embodiment, the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment; accordingly, the valid audiodata obtaining unit 52 may be specifically configured to remove audio data with successfully matched voiceprints from the original audio data to be processed, as the valid audio data.
Optionally, in a possible implementation manner of this embodiment, as shown in fig. 7, the noise cancellation device provided in this embodiment may further include:
a noise signal acquisition unit 71 configured to acquire a noise signal of the target environment;
a second voiceprintparameter obtaining unit 72, configured to obtain a voiceprint parameter of the noise signal based on the obtained noise signal of the target environment.
It should be noted that the methods in the embodiments corresponding to fig. 1 to fig. 3 can be implemented by the noise cancellation device provided in this embodiment. For detailed description, reference may be made to relevant contents in the embodiments corresponding to fig. 1 to fig. 3, and details are not described here.
In this embodiment, the voiceprint matching unit performs voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters, so that the effective audio data acquisition unit can acquire the effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching, and an additional sound acquisition device is not required to acquire other sound signals such as noise signals, which can avoid the problem that the distance between a signal source corresponding to a speech signal and two microphones changes to cause suppression of the speech signal to the same degree as the noise signal in the prior art, thereby improving the reliability of noise reduction, and simultaneously effectively improving the sound quality after noise reduction.
In addition, by adopting the technical scheme provided by the invention, only one sound acquisition device is needed, and the cost can be effectively reduced.
Fig. 8 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention, as shown in fig. 8. The speech recognition apparatus of the present embodiment may include an original audio data acquisition unit 81, anoise cancellation apparatus 82 and aspeech recognition unit 83 as provided in the embodiment corresponding to any one of fig. 5 to 7. The original audio data acquiring unit 81 is configured to acquire original audio data to be processed; aspeech recognition unit 83, configured to perform speech recognition processing on the valid audio data.
In the present invention, the detailed description of thenoise cancellation device 82 can refer to the relevant contents in the embodiments corresponding to fig. 5 to fig. 7, and is not repeated here.
It should be noted that, part or all of the voice recognition apparatus provided in this embodiment may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system on the network side, which is not particularly limited in this embodiment.
It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, and this embodiment is not particularly limited thereto.
It should be noted that the method in the embodiment corresponding to fig. 4 can be implemented by the speech recognition apparatus provided in this embodiment. For a detailed description, reference may be made to relevant contents in the embodiment corresponding to fig. 4, which are not described herein again.
In this embodiment, original audio data to be processed is acquired by the original audio data acquisition unit, and then the voiceprint matching unit performs voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameter, so that the valid audio data acquisition unit can acquire valid audio data from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, and perform voice recognition processing on the valid audio data by the voice recognition unit.
Furthermore, the effective audio data is subjected to voice recognition processing, and the obtained recognition result is high in accuracy.
In addition, by adopting the technical scheme provided by the invention, only one sound acquisition device is needed, and the cost can be effectively reduced.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (16)

CN201510524909.1A2015-08-252015-08-25Noise elimination and voice recognition method and deviceActiveCN106486130B (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN201510524909.1ACN106486130B (en)2015-08-252015-08-25Noise elimination and voice recognition method and device
PCT/CN2015/095364WO2017031846A1 (en)2015-08-252015-11-24Noise elimination and voice recognition method, apparatus and device, and non-volatile computer storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201510524909.1ACN106486130B (en)2015-08-252015-08-25Noise elimination and voice recognition method and device

Publications (2)

Publication NumberPublication Date
CN106486130A CN106486130A (en)2017-03-08
CN106486130Btrue CN106486130B (en)2020-03-31

Family

ID=58099552

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201510524909.1AActiveCN106486130B (en)2015-08-252015-08-25Noise elimination and voice recognition method and device

Country Status (2)

CountryLink
CN (1)CN106486130B (en)
WO (1)WO2017031846A1 (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107293293A (en)*2017-05-222017-10-24深圳市搜果科技发展有限公司A kind of voice instruction recognition method, system and robot
CN107172256B (en)*2017-07-272020-05-05Oppo广东移动通信有限公司 Headphone call adaptive adjustment method, device, mobile terminal and storage medium
CN107945815B (en)*2017-11-272021-09-07歌尔科技有限公司Voice signal noise reduction method and device
CN108171118B (en)*2017-12-052020-10-02东软集团股份有限公司 Blink signal data processing method, device, readable storage medium and electronic device
CN108062949A (en)*2017-12-112018-05-22广州朗国电子科技有限公司The method and device of voice control treadmill
CN108305637B (en)*2018-01-232021-04-06Oppo广东移动通信有限公司Earphone voice processing method, terminal equipment and storage medium
CN109166575A (en)*2018-07-272019-01-08百度在线网络技术(北京)有限公司Exchange method, device, smart machine and the storage medium of smart machine
CN111161719B (en)*2018-11-082023-01-20联想新视界(北京)科技有限公司AR (augmented reality) glasses operated by voice and method for operating AR glasses by voice
CN109410938A (en)*2018-11-282019-03-01途客电力科技(天津)有限公司Control method for vehicle, device and car-mounted terminal
CN109493870A (en)*2018-11-282019-03-19途客电力科技(天津)有限公司Charging pile identity identifying method, device and electronic equipment
CN109360580B (en)*2018-12-112022-01-04珠海一微半导体股份有限公司Iteration denoising device and cleaning robot based on voice recognition
CN110060689A (en)*2019-04-102019-07-26南京启诺信息技术有限公司A kind of intelligent wearable device and its application method with the identification of feature voice
CN110265038B (en)*2019-06-282021-10-22联想(北京)有限公司Processing method and electronic equipment
CN110708625A (en)*2019-09-252020-01-17华东师范大学 Ambient sound suppression and enhancement adjustable earphone system and method based on intelligent terminal
CN113347519B (en)*2020-02-182022-06-17宏碁股份有限公司Method for eliminating specific object voice and ear-wearing type sound signal device using same
CN111696565B (en)*2020-06-052023-10-10北京搜狗科技发展有限公司Voice processing method, device and medium
CN111883159B (en)*2020-08-052024-12-17龙马智芯(珠海横琴)科技有限公司Voice processing method and device
CN111951802A (en)*2020-08-102020-11-17山东金东数字创意股份有限公司Visual expression system and method based on AI voice recognition
CN112464021A (en)*2020-11-252021-03-09上海眼控科技股份有限公司Anti-law audio and video determination method, device, equipment and storage medium
CN113516994B (en)*2021-04-072022-04-26北京大学深圳研究院Real-time voice recognition method, device, equipment and medium
CN112992153B (en)*2021-04-272021-08-17太平金融科技服务(上海)有限公司Audio processing method, voiceprint recognition device and computer equipment
CN114299981B (en)*2021-12-292024-07-23中国电信股份有限公司Audio processing method, device, storage medium and equipment
CN114708877A (en)*2022-03-222022-07-05马上消费金融股份有限公司 Audio noise reduction method, device, electronic device and storage medium
CN115440198B (en)*2022-11-082023-05-02南方电网数字电网研究院有限公司Method, apparatus, computer device and storage medium for converting mixed audio signal

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102781075A (en)*2011-05-122012-11-14中兴通讯股份有限公司Method for reducing communication power consumption of mobile terminal and mobile terminal
CN103971696A (en)*2013-01-302014-08-06华为终端有限公司Method, device and terminal equipment for processing voice

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2005181391A (en)*2003-12-162005-07-07Sony CorpDevice and method for speech processing
JP2006189626A (en)*2005-01-062006-07-20Fuji Photo Film Co LtdRecording device and voice recording program
US7995713B2 (en)*2006-04-032011-08-09Agere Systems Inc.Voice-identification-based signal processing for multiple-talker applications
CN102694891A (en)*2011-03-212012-09-26鸿富锦精密工业(深圳)有限公司System and method for removing conversation noises
CN103165131A (en)*2011-12-172013-06-19富泰华工业(深圳)有限公司Voice processing system and voice processing method
CN103594092A (en)*2013-11-252014-02-19广东欧珀移动通信有限公司Single microphone voice noise reduction method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102781075A (en)*2011-05-122012-11-14中兴通讯股份有限公司Method for reducing communication power consumption of mobile terminal and mobile terminal
CN103971696A (en)*2013-01-302014-08-06华为终端有限公司Method, device and terminal equipment for processing voice

Also Published As

Publication numberPublication date
CN106486130A (en)2017-03-08
WO2017031846A1 (en)2017-03-02

Similar Documents

PublicationPublication DateTitle
CN106486130B (en)Noise elimination and voice recognition method and device
CN112397083B (en) Voice processing method and related device
CN110956957B (en)Training method and system of speech enhancement model
US9536540B2 (en)Speech signal separation and synthesis based on auditory scene analysis and speech modeling
CN106486131B (en) Method and device for voice denoising
CN108335694B (en) Far-field environmental noise processing method, device, device and storage medium
CN104205215B (en)Automatic real-time verbal therapy
CN113327584B (en)Language identification method, device, equipment and storage medium
CN114360561B (en) A speech enhancement method based on deep neural network technology
WO2013188562A2 (en)Bandwidth extension via constrained synthesis
CN109817196B (en)Noise elimination method, device, system, equipment and storage medium
CN118899005B (en)Audio signal processing method, device, computer equipment and storage medium
CN111108554A (en)Voiceprint recognition method based on voice noise reduction and related device
CN113919375A (en)Speech translation system based on artificial intelligence
Drakopoulos et al.Real-time audio processing on a Raspberry Pi using deep neural networks
Semary et al.Using voice technologies to support disabled people
Lin et al.Focus on the sound around you: Monaural target speaker extraction via distance and speaker information
Joy et al.Deep scattering power spectrum features for robust speech recognition
CN106228984A (en)Voice recognition information acquisition methods
CN114996489A (en)Method, device and equipment for detecting violation of news data and storage medium
CN112259110B (en)Audio encoding method and device and audio decoding method and device
WO2025113018A1 (en)Speech extraction method and apparatus, and electronic device, computer-readable storage medium and computer program product
CN112118511A (en)Earphone noise reduction method and device, earphone and computer readable storage medium
CN115457973B (en)Speaker segmentation method, system, terminal and storage medium
CN115881154B (en)Voice noise reduction method, device, equipment and readable storage medium

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp