CN106486130B

Movatterモバイル変換

Info

Publication number: CN106486130B
Application number: CN201510524909.1A
Authority: CN
Inventors: 李士岩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2015-08-25
Filing date: 2015-08-25
Publication date: 2020-03-31
Anticipated expiration: 2035-08-25
Also published as: CN106486130A; WO2017031846A1

Abstract

The embodiment of the invention provides a noise elimination and voice recognition method and device. The noise elimination method carries out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters, so that effective audio data can be acquired from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, other sound signals such as noise signals and the like are acquired without an additional sound acquisition device, and the problem that in the prior art, due to the fact that the distance between a signal source corresponding to the voice signals and two microphones changes, suppression of the voice signals is carried out to the same degree as that of the noise signals can be avoided, the reliability of noise reduction is improved, and meanwhile, the sound quality after the noise reduction can be effectively improved.

Description

Noise elimination and voice recognition method and device

[ technical field ] A method for producing a semiconductor device

The present invention relates to noise processing technologies, and in particular, to a method and an apparatus for noise cancellation and speech recognition.

[ background of the invention ]

With the development of sound processing technology becoming faster and faster, the terminal has higher and higher requirements on the sound quality to be processed, and the noise reduction technology is developed accordingly. The current noise reduction technology mainly adopts dual microphones to actively reduce noise, and performs noise suppression processing on audio data (i.e. corresponding to a noise signal and a speech signal with strong signal strength) collected by one microphone to audio data (i.e. corresponding to the noise signal and the speech signal with strong signal strength) collected by the other microphone through a certain algorithm.

However, if the distance between the signal source (for example, human mouth) corresponding to the voice signal and the two microphones varies, the voice signal may be determined as noise, so that the voice signal is also suppressed to the same extent as the noise signal, the sound quality after noise reduction is seriously affected, and the reliability of noise reduction is reduced.

[ summary of the invention ]

Aspects of the present invention provide a noise cancellation and speech recognition method and apparatus for improving the reliability of noise reduction.

In one aspect of the present invention, a noise cancellation method is provided, including:

based on the specific voiceprint parameters, carrying out voiceprint matching on the acquired original audio data to be processed;

and obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.

The above-described aspects and any possible implementations further provide an implementation in which the specific voiceprint parameter is a voiceprint parameter of a target user, an

The obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching comprises:

and acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.

The above-described aspect and any possible implementation manner further provide an implementation manner, before the voiceprint matching is performed on the obtained original audio data to be processed based on the specific voiceprint parameters, the method further includes:

acquiring a voice signal of the target user;

and acquiring the voiceprint parameters of the target user based on the acquired voice signal of the target user.

The above-described aspects and any possible implementations further provide an implementation in which the particular voiceprint parameter is a voiceprint parameter of a noise signal of the target environment, an

and removing the audio data with successful voiceprint matching from the original audio data to be processed to serve as the effective audio data.

acquiring a noise signal of the target environment;

based on the acquired noise signal of the target environment, obtaining a voiceprint parameter of the noise signal.

In another aspect of the present invention, there is provided a noise removing apparatus including:

the voiceprint matching unit is used for carrying out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters;

and the effective audio data acquisition unit is used for acquiring effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.

The effective audio data acquisition unit is used for acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.

The above-described aspect and any possible implementation further provide an implementation, where the noise cancellation apparatus further includes:

the voice signal acquisition unit is used for acquiring a voice signal of the target user;

a first voiceprint parameter obtaining unit, configured to obtain a voiceprint parameter of the target user based on the obtained voice signal of the target user.

The effective audio data acquisition unit is used for removing audio data successfully matched with the voiceprint from the original audio data to be processed to serve as the effective audio data.

a noise signal acquisition unit for acquiring a noise signal of the target environment;

a second voiceprint parameter obtaining unit, configured to obtain a voiceprint parameter of the noise signal based on the obtained noise signal of the target environment.

In another aspect of the present invention, a speech recognition method is provided, including:

acquiring original audio data to be processed;

based on specific voiceprint parameters, carrying out voiceprint matching on the acquired original audio data to be processed;

obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching;

and carrying out voice recognition processing on the effective audio data.

The above-described aspect and any possible implementation manner further provide an implementation manner, before the voiceprint matching is performed on the acquired original audio data to be processed based on the specific voiceprint parameters, the method further includes:

acquiring a voice signal of the target user;

acquiring a noise signal of the target environment;

In another aspect of the present invention, there is provided a speech recognition apparatus including:

the original audio data acquisition unit is used for acquiring original audio data to be processed;

the noise cancellation device as described above;

and the voice recognition unit is used for carrying out voice recognition processing on the effective audio data.

As can be seen from the foregoing technical solutions, on one hand, in the embodiments of the present invention, the obtained original audio data to be processed is subjected to voiceprint matching based on the specific voiceprint parameter, so that the valid audio data can be obtained from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching, and an additional sound collection device is not required to collect other sound signals, such as a noise signal, and therefore, the problem that the distance between a signal source corresponding to a speech signal and two microphones changes in the prior art, which results in suppression of the speech signal to the same extent as the noise signal, can be avoided, so that the reliability of noise reduction is improved, and the sound quality after noise reduction can be effectively improved.

As can be seen from the foregoing technical solutions, on the other hand, in the embodiments of the present invention, original audio data to be processed is obtained, and then voiceprint matching is performed on the obtained original audio data to be processed based on a specific voiceprint parameter, so that valid audio data can be obtained from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, and voice recognition processing is performed on the valid audio data.

In addition, by adopting the technical scheme provided by the invention, only one sound acquisition device is needed, and the cost can be effectively reduced.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the embodiments or the prior art descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor.

Fig. 1 is a schematic flow chart of a noise cancellation method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is the voiceprint parameter of the target user in the embodiment corresponding to FIG. 1;

FIG. 3 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment in the embodiment corresponding to FIG. 1;

FIG. 4 is a flowchart illustrating a speech recognition method according to another embodiment of the present invention;

fig. 5 is a schematic structural diagram of a noise cancellation apparatus according to another embodiment of the present invention;

FIG. 6 is a schematic structural diagram of the noise cancellation apparatus in the case where the specific voiceprint parameter is the voiceprint parameter of the target user in the embodiment corresponding to FIG. 5;

FIG. 7 is a schematic structural diagram of a noise cancellation apparatus in a case where the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment in the embodiment corresponding to FIG. 5;

fig. 8 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terminal according to the embodiment of the present invention may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a Personal Computer (PC), an MP3 player, an MP4 player, a wearable device (e.g., smart glasses, smart watch, smart bracelet, etc.), and the like.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a schematic flow chart of a noise cancellation method according to an embodiment of the present invention, as shown in fig. 1.

101. And performing voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters.

102. And obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.

It should be noted that part or all of the execution subjects of 101 to 102 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, which is not particularly limited in this embodiment.

It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, and this embodiment is not particularly limited thereto.

Therefore, the acquired original audio data to be processed is subjected to voiceprint matching based on the specific voiceprint parameters, so that effective audio data can be acquired from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, other sound signals such as noise signals and the like are acquired without an additional sound acquisition device, the problem that in the prior art, due to the fact that the distance between a signal source corresponding to the voice signals and two microphones changes, suppression of the voice signals to the same degree as the noise signals is carried out can be avoided, the reliability of noise reduction is improved, and meanwhile, the sound quality after the noise reduction can be effectively improved.

In the invention, the original audio data to be processed can be acquired by using a sound acquisition device. The sound collection device may be a microphone or the like that is built in or outside the terminal, which is not particularly limited in this embodiment.

Specifically, a sound collection device can be used to collect a sound signal including a speech signal to be processed by the terminal. Typically, the sound signal may be contaminated with noise signals. The collected sound signals may then be converted into raw audio data to be processed.

Specifically, the so-called raw audio data to be processed is a digital signal converted from an audio signal. For example, the sound signal may be specifically sampled, quantized, and encoded to obtain Pulse Code Modulation (PCM) data as original audio data to be processed.

In this embodiment, need not to adopt extra sound collection system and additionally gather supplementary audio data again, and only need adopt a sound collection system to gather the original audio data of treating processing can, effective reduce cost.

Optionally, in a possible implementation manner of this embodiment, in 101, the to-be-processed original audio data may be specifically subjected to framing processing to obtain at least one frame of data, and then each frame of data in the at least one frame of data is subjected to audio analysis processing to obtain a voiceprint feature of each frame of data. And then, matching the voiceprint characteristics of the original audio data to be processed based on the specific voiceprint parameters. If the two are consistent, the matching is successful, and if the two are not consistent, the matching is failed.

The term "match" may mean all match, i.e., complete match, or may mean partial match, and this embodiment is not particularly limited thereto.

Specifically, the raw audio data to be processed may be subjected to framing processing at preset time intervals, for example, 20ms, and there is partial data overlap between adjacent frames, for example, 50% data overlap, so that at least one frame of data of the raw audio data to be processed can be obtained.

The so-called voiceprint feature is a feature specific to audio data, and refers to a content-based digital signature that can represent important acoustic features of a piece of audio data, and the main purpose of the voiceprint feature is to establish an effective mechanism for comparing the perceptual auditory quality of two pieces of audio data. Note that here, rather than directly comparing the typically large audio data itself, their corresponding typically smaller voiceprint features are compared.

In one particular implementation, the voiceprint features can include, but are not limited to, acoustic features related to the anatomy of a human pronunciation mechanism, such as spectrum, cepstrum, formants, pitch, reflection coefficients, and the like.

Optionally, in a possible implementation manner of this embodiment, before 101, the specific voiceprint parameter may be further set to serve as a reference parameter for voiceprint matching. Specifically, the specific voiceprint parameter may be a voiceprint parameter of the target user, or may also be a voiceprint parameter of a noise signal of the target environment, which is not particularly limited in this embodiment. The following describes in detail the noise cancellation method provided in this embodiment when the two specific voiceprint parameters are the voiceprint parameter of the target user and the voiceprint parameter of the noise signal of the target environment, respectively.

Fig. 2 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is the voiceprint parameter of the target user in the embodiment corresponding to fig. 1, as shown in fig. 2.

201. And carrying out voiceprint matching on the acquired original audio data to be processed based on the voiceprint parameters of the target user.

Optionally, in a possible implementation manner of this embodiment, before 201, a voice signal of the target user may be further obtained, and then, based on the obtained voice signal of the target user, a voiceprint parameter of the target user may be obtained.

Specifically, the voice signal of the target user may be sampled, quantized, and encoded to obtain PCM data as user audio data. Then, the user audio data may be subjected to framing processing to obtain at least one frame of data, and further, each frame of data in the at least one frame of data is subjected to audio analysis processing to obtain a voiceprint parameter of each frame of data.

For example, the user audio data may be subjected to framing processing at a preset time interval, for example, 20ms, and there is a partial data overlap between adjacent frames, for example, 50% data overlap, so that at least one frame of data of the user audio data can be obtained.

202. And acquiring audio data successfully matched with the voiceprint from the original audio data to be processed as the effective audio data.

In this implementation, the specific voiceprint parameter refers to a voiceprint parameter of the voice signal of the target user obtained according to the voice signal of the target user. Therefore, the voiceprint feature successfully matched can be considered as the voiceprint feature corresponding to the voice signal sent by the target user using the terminal.

Fig. 3 is a flowchart illustrating a noise cancellation method in a case where the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment in the embodiment corresponding to fig. 1, as shown in fig. 3.

301. And carrying out voiceprint matching on the acquired original audio data to be processed based on the voiceprint parameters of the noise signal of the target environment.

Optionally, in a possible implementation manner of this embodiment, before 301, a noise signal of the target environment may be further obtained, and then, a voiceprint parameter of the noise signal may be obtained based on the obtained noise signal of the target environment.

Specifically, the speech signal of the target environment may be sampled, quantized, and encoded to obtain PCM data as the environmental audio data. Then, the environmental audio data may be subjected to framing processing to obtain at least one frame of data, and further, each frame of data in the at least one frame of data is subjected to audio analysis processing to obtain a voiceprint parameter of each frame of data.

For example, the environmental audio data may be subjected to framing processing at a preset time interval, for example, 20ms, and there is a partial data overlap between adjacent frames, for example, 50% data overlap, so that at least one frame of data of the environmental audio data can be obtained.

302. And removing the audio data with successful voiceprint matching from the original audio data to be processed to serve as the effective audio data.

In this implementation, the specific voiceprint parameter refers to a voiceprint parameter of a noise signal of a target environment obtained according to the noise signal of the target environment. Therefore, the successfully matched voiceprint feature can be considered as the voiceprint feature corresponding to the noise signal generated in the target environment where the terminal is located.

It is to be understood that at least one empirical parameter may be used as the specific voiceprint parameter in addition to the two specific voiceprint parameters described above.

It should be noted that after obtaining the specific voiceprint parameters, the obtained specific voiceprint parameters need to be further processed by storage. Specifically, the obtained specific voiceprint parameters may be stored in a storage device of the terminal.

In a specific implementation process, the storage device of the terminal may be a slow storage device, specifically, a hard disk of a computer system, or may also be a non-operating Memory of a mobile phone, that is, a physical Memory, such as a Read-Only Memory (ROM), a Memory card, and the like, which is not limited in this embodiment.

In another specific implementation process, the storage device of the terminal may also be a fast storage device, specifically, a Memory of a computer system, or may also be a running Memory of a mobile phone, that is, a system Memory, for example, a Random Access Memory (RAM), and the like, which is not particularly limited in this embodiment.

Optionally, in a possible implementation manner of this embodiment, after 102, speech recognition processing may be further performed on the valid audio data.

The effective audio data is the audio data extracted from the original audio data to be processed according to the specific voiceprint parameters, and the audio data can be regarded as voice signals of users using the terminal, so that the effective audio data does not contain noise signals any more, and the sound quality is effectively improved.

Furthermore, the effective audio data is subjected to voice recognition processing, and the obtained recognition result is high in accuracy.

In this embodiment, the obtained original audio data to be processed is subjected to voiceprint matching based on the specific voiceprint parameter, so that valid audio data can be obtained from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, an additional sound collection device is not required to collect other sound signals such as noise signals, and the problem that in the prior art, due to the fact that the distance between a signal source corresponding to a speech signal and two microphones changes, suppression of the speech signal is performed to the same degree as that of the noise signal can be avoided, and therefore reliability of noise reduction is improved, and meanwhile, sound quality after noise reduction can be effectively improved.

Fig. 4 is a flowchart illustrating a speech recognition method according to another embodiment of the present invention, as shown in fig. 4.

401. And acquiring original audio data to be processed.

402. And carrying out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters.

403. And obtaining effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching.

404. And carrying out voice recognition processing on the effective audio data.

It should be noted that part or all of the executionmain bodies 401 to 404 may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system located on the network side, which is not particularly limited in this embodiment.

In the present invention, details of 402 and 403 may refer to relevant contents in the embodiments corresponding to fig. 1 to fig. 3, and are not described herein again.

In this embodiment, by acquiring original audio data to be processed, and further performing voiceprint matching on the acquired original audio data to be processed based on a specific voiceprint parameter, effective audio data can be acquired from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, and voice recognition processing is performed on the effective audio data.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

Fig. 5 is a schematic structural diagram of a noise cancellation device according to another embodiment of the present invention, as shown in fig. 5. The noise canceling device of the present embodiment may include avoiceprint matching unit 51 and an effective audiodata acquiring unit 52. Thevoiceprint matching unit 51 is configured to perform voiceprint matching on the acquired original audio data to be processed based on a specific voiceprint parameter; and the effective audiodata obtaining unit 52 is configured to obtain effective audio data from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching.

It should be noted that, part or all of the noise cancellation apparatus provided in this embodiment may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system on the network side, which is not particularly limited in this embodiment.

Optionally, in a possible implementation manner of this embodiment, the specific voiceprint parameter is a voiceprint parameter of the target user; accordingly, the valid audiodata obtaining unit 52 may be specifically configured to obtain audio data with successfully matched voiceprints from the original audio data to be processed, as the valid audio data.

Optionally, in a possible implementation manner of this embodiment, as shown in fig. 6, the noise cancellation device provided in this embodiment may further include:

a voicesignal acquiring unit 61 configured to acquire a voice signal of the target user;

a first voiceprintparameter obtaining unit 62, configured to obtain a voiceprint parameter of the target user based on the obtained voice signal of the target user.

Optionally, in a possible implementation manner of this embodiment, the specific voiceprint parameter is a voiceprint parameter of a noise signal of the target environment; accordingly, the valid audiodata obtaining unit 52 may be specifically configured to remove audio data with successfully matched voiceprints from the original audio data to be processed, as the valid audio data.

Optionally, in a possible implementation manner of this embodiment, as shown in fig. 7, the noise cancellation device provided in this embodiment may further include:

a noise signal acquisition unit 71 configured to acquire a noise signal of the target environment;

a second voiceprintparameter obtaining unit 72, configured to obtain a voiceprint parameter of the noise signal based on the obtained noise signal of the target environment.

It should be noted that the methods in the embodiments corresponding to fig. 1 to fig. 3 can be implemented by the noise cancellation device provided in this embodiment. For detailed description, reference may be made to relevant contents in the embodiments corresponding to fig. 1 to fig. 3, and details are not described here.

In this embodiment, the voiceprint matching unit performs voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters, so that the effective audio data acquisition unit can acquire the effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching, and an additional sound acquisition device is not required to acquire other sound signals such as noise signals, which can avoid the problem that the distance between a signal source corresponding to a speech signal and two microphones changes to cause suppression of the speech signal to the same degree as the noise signal in the prior art, thereby improving the reliability of noise reduction, and simultaneously effectively improving the sound quality after noise reduction.

Fig. 8 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention, as shown in fig. 8. The speech recognition apparatus of the present embodiment may include an original audio data acquisition unit 81, anoise cancellation apparatus 82 and aspeech recognition unit 83 as provided in the embodiment corresponding to any one of fig. 5 to 7. The original audio data acquiring unit 81 is configured to acquire original audio data to be processed; aspeech recognition unit 83, configured to perform speech recognition processing on the valid audio data.

In the present invention, the detailed description of thenoise cancellation device 82 can refer to the relevant contents in the embodiments corresponding to fig. 5 to fig. 7, and is not repeated here.

It should be noted that, part or all of the voice recognition apparatus provided in this embodiment may be an application located at the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application located at the local terminal, or may also be a processing engine located in a server on the network side, or may also be a distributed system on the network side, which is not particularly limited in this embodiment.

It should be noted that the method in the embodiment corresponding to fig. 4 can be implemented by the speech recognition apparatus provided in this embodiment. For a detailed description, reference may be made to relevant contents in the embodiment corresponding to fig. 4, which are not described herein again.

In this embodiment, original audio data to be processed is acquired by the original audio data acquisition unit, and then the voiceprint matching unit performs voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameter, so that the valid audio data acquisition unit can acquire valid audio data from the original audio data to be processed according to a voiceprint matching result of the voiceprint matching, and perform voice recognition processing on the valid audio data by the voice recognition unit.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of noise cancellation, comprising:

based on the specific voiceprint parameters, carrying out voiceprint matching on the acquired original audio data to be processed; the particular voiceprint parameters comprise voiceprint parameters of a noise signal of a target environment;

and obtaining the voiceprint parameters of the noise signal of the target environment before carrying out voiceprint matching on the obtained original audio data to be processed based on the specific voiceprint parameters.

2. The noise cancellation method according to claim 1, wherein the specific voiceprint parameter is a voiceprint parameter of a target user, and

3. The method of claim 2, wherein before the voiceprint matching the obtained raw audio data to be processed based on the specific voiceprint parameters, the method further comprises:

acquiring a voice signal of the target user;

4. The noise cancellation method according to claim 1, characterized in that the specific voiceprint parameter is a voiceprint parameter of a noise signal of a target environment, and

5. The method according to claim 4, wherein the obtaining the voiceprint parameters of the noise signal of the target environment comprises:

acquiring a noise signal of the target environment;

6. A noise cancellation apparatus, characterized by comprising:

the voiceprint matching unit is used for carrying out voiceprint matching on the acquired original audio data to be processed based on the specific voiceprint parameters; the particular voiceprint parameters comprise voiceprint parameters of a noise signal of a target environment;

the effective audio data acquisition unit is used for acquiring effective audio data from the original audio data to be processed according to the voiceprint matching result of the voiceprint matching;

and the second voiceprint parameter obtaining unit is used for obtaining the voiceprint parameters of the noise signal of the target environment so as to enable the voiceprint matching unit to carry out voiceprint matching on the obtained original audio data to be processed.

7. The noise cancellation apparatus according to claim 6, wherein the specific voiceprint parameter is a voiceprint parameter of a target user, and

8. The noise cancellation device according to claim 7, further comprising:

9. The noise cancellation apparatus according to claim 6, wherein the specific voiceprint parameter is a voiceprint parameter of a noise signal of a target environment, and

10. The noise cancellation device according to claim 9, further comprising: a noise signal acquisition unit for acquiring a noise signal of the target environment;

the second voiceprint parameter obtaining unit is configured to obtain a voiceprint parameter of the noise signal based on the obtained noise signal of the target environment.

11. A speech recognition method, comprising:

acquiring original audio data to be processed;

based on specific voiceprint parameters, carrying out voiceprint matching on the acquired original audio data to be processed; the particular voiceprint parameters comprise voiceprint parameters of a noise signal of a target environment;

carrying out voice recognition processing on the effective audio data;

12. The speech recognition method of claim 11, wherein the particular voiceprint parameter is a voiceprint parameter of a target user, and

13. The speech recognition method according to claim 12, wherein before the voiceprint matching the acquired raw audio data to be processed based on the specific voiceprint parameters, the method further comprises:

acquiring a voice signal of the target user; and acquiring the voiceprint parameters of the target user based on the acquired voice signal of the target user.

14. The speech recognition method of claim 11, wherein the particular voiceprint parameter is a voiceprint parameter of a noise signal of a target environment, and

15. The speech recognition method of claim 14, wherein the obtaining the voiceprint parameters of the noise signal of the target environment comprises:

acquiring a noise signal of the target environment;

16. A speech recognition apparatus, comprising:

the noise cancellation device of any one of claims 6 to 10;