Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Example one
Fig. 1 is a schematic flow chart of an implementation process of a voice denoising method according to an embodiment of the present invention, where the voice denoising method according to an embodiment of the present invention may be applied to an electronic device with a voice receiving element, such as a mobile phone, a notebook, a tablet computer, a vehicle-mounted system, and a wearable electronic device, and as shown in the figure, the method may include the following steps:
step S101 identifies a speech signal and a background noise signal from the sound signal acquired in real time.
In this embodiment, a user can start the hands-free mode to communicate with the mobile phone during driving, because the mobile phone is generally placed on a mobile phone holder of the automobile in the hands-free mode, and because the mobile phone is far away from the user, the mobile phone receives a voice signal of the user and also receives background noise generated during the operation of the automobile, such as tire noise generated by friction between tires and a road surface, air conditioning noise generated by an air conditioning fan, and wind noise generated by friction between air and gaps and corners of the automobile during the driving of the automobile. The background noise belongs to steady-state noise with small signal amplitude change and high repetition frequency. The noise generally persists during the user's conversation, and the voice apparatus may significantly affect the quality of the voice signal when performing AGC adjustment on the voice signal.
In this embodiment, first, the signal needs to be identified according to the signal characteristics of the speech signal and the background noise signal. The speech and the background noise can be recognized by storing a human voice model and a noise model in advance. The model contains speech characteristics of the sound such as frequency, zero crossing rate, short-term average energy, short-term average amplitude, etc. For example, after sampling the sound signal, the sound signal is matched with the speech model, if the sound signal includes all features in the human sound model, that is, it indicates that a person is speaking at present, if the sound signal cannot match the human sound model, it may be that the sound of the user is too small, or the background noise is too large, that the sound cannot be recognized from the currently acquired sound, and at this time, the mobile phone may issue an error prompt to the user, for example, the mobile phone may issue a prompt message of "the sound of the user cannot be acquired". Similarly, the acquired sound signal may be identified according to a pre-stored tire noise model, air-conditioning noise model, or wind noise model, so that the type of noise contained in the current sound signal can be determined.
Step S102, obtaining the amplitude difference value of the voice signal and the background noise signal;
in this embodiment, after recognizing that the sound signal contains the human voice and the background noise according to the human voice model in step S101, the human voice and the background noise may be extracted according to the features of the above models, and the average amplitude of the voice signal representing the human voice and the average amplitude of the background noise signal in the sound signal collected over a period of time may be calculated according to the recorded sound waveform.
And step S103, performing noise reduction processing on the sound signal based on the amplitude difference value.
In the present embodiment, the absolute value of the speech signal amplitude and the background noise signal amplitude acquired in step S102 is first obtained, and then the difference between the absolute value of the speech signal amplitude and the absolute value of the background noise signal amplitude is calculated. According to the embodiment, the preset processing methods are respectively selected according to the difference, so that the processing of the voice signals is more targeted, and the problem of low signal-to-noise ratio caused by processing based on the voice signals only through AGC is solved.
Optionally, before recognizing the speech signal and the background noise signal, the method includes: pre-filtering the sound signal.
In this embodiment, it is considered that the sound signal may be interfered by random noise, such as gaussian noise. If a sound signal with random noise is identified, an identification error may be generated due to interference of the random noise. Therefore, in the present embodiment, pre-filtering is performed before the sound signal is recognized. Since random noise in the environment is uncorrelated and exhibits high-frequency characteristics in the signal, the acquired sound signal can be low-pass filtered in the frequency domain, so that the high-frequency part obviously belonging to the noise signal in the sound signal is filtered, and the accuracy of subsequent human voice identification and noise identification is improved.
Example two
Based on the first embodiment, specifically, the performing noise reduction processing on the sound signal based on the amplitude difference includes: and if the amplitude difference value is a positive value and is greater than or equal to a first threshold value, performing noise reduction processing on the sound signal, wherein the intensity of the noise reduction processing is in direct proportion to the amplitude difference value.
In this embodiment, when the amplitude of the voice signal of the user is greater than the amplitude of the noise signal, the difference value is determined to be greater than the first threshold, and in this embodiment, the difference value is compared with the first threshold, and corresponding noise reduction methods are respectively adopted according to the comparison result.
When the difference is greater than the first threshold, i.e., the amplitude of the noise signal is significantly less than the amplitude of the speech signal. The method shows that the current speech signal quality is good and is not obviously influenced by noise, and the noise reduction processing can be performed on the sound signal by adopting a conventional noise reduction algorithm, such as amplitude spectrum subtraction, harmonic enhancement method and noise cancellation method which are commonly used in the field, and the noise reduction processing algorithm is not limited. For the specific use of the above noise reduction algorithm, reference may be made to the prior art, which is not described herein in detail. In the present embodiment, the intensity of the noise reduction processing is selected according to the magnitude of the difference. Generally speaking, the stronger the noise reduction process, the better the noise removal effect, but the speech signal emitted by a normal user will be severely distorted. Therefore, optionally, when the difference is larger, it indicates that the voice signal is less affected by noise, and aliasing between noise and voice is not serious, so that the intensity of the noise reduction processing can be increased, and when the difference is smaller, it indicates that the noise signal has a certain influence on the voice signal, i.e., a certain aliasing exists between noise and voice, so that the intensity of the noise reduction processing is correspondingly reduced. The intensity of noise reduction processing is dynamically adjusted according to the difference value between the voice signal amplitude and the noise signal amplitude, and the sound quality of the noise reduction processing is improved.
Optionally, as shown in fig. 2, if the amplitude difference is a positive value, and the amplitude difference is smaller than or equal to the second threshold, and the amplitude of the speech signal is greater than the third threshold, the method includes:
step S201, amplifying the sound signal by a preset gain to obtain a first intermediate signal.
Step S202, noise reduction processing is carried out on the first intermediate signal to obtain a second intermediate signal.
Step S203, attenuating the second intermediate signal according to the preset gain to obtain the noise-reduced sound signal.
In this embodiment, when the difference S is a positive value and is smaller than a second threshold T, it indicates that the current noise condition is relatively serious, where the second threshold T may be equal to the first threshold, or may also be smaller than the first threshold. If the noise reduction processing is directly performed on the current voice signal, the obtained voice signal is not ideal. Therefore, in this embodiment, the sound signal is first amplified according to a preset gain a, that is, both the speech signal and the noise signal in the sound signal are amplified according to the gain a, so as to obtain a first intermediate signal. It can be seen that the difference between the speech signal amplitude and the noise signal amplitude in the first intermediate signal is also amplified by the gain a, and therefore, the noise reduction processing performed after the sound signal is amplified by the gain a, the influence on the normal user speech signal can be significantly reduced, and the value of the gain a can be calculated according to the difference S and the second threshold T. Specifically, the method comprises the following steps: a is more than or equal to T/S. By the calculated gain a. The difference S' after the signal gain adjustment may be greater than the second threshold T, and then the first intermediate signal is subjected to noise reduction processing to obtain a second intermediate signal. And attenuating the second intermediate signal subjected to the noise reduction according to the gain A, and obtaining the sound signal subjected to the noise reduction after attenuation. In order to ensure the noise reduction effect, in this embodiment, the amplitude of the voice signal is greater than the third threshold. The third threshold value can be determined according to the performance of the mobile phone and the noise reduction algorithm used together. The gain a correspondingly cannot exceed a maximum threshold value, so that the amplitude value of the sound signal after the gain is not larger than the clipping point. Because the signal after gain amplification cannot be recorded if the signal after gain exceeds the clipping point, the original signal cannot be obtained even if the signal is attenuated by the same gain.
In this embodiment, when the amplitude difference is a positive value, the amplitude difference is smaller than or equal to the second threshold, and the amplitude of the voice signal is greater than the third threshold, the voice signal is subjected to noise reduction processing after being amplified by a preset gain, and then the voice signal subjected to noise reduction processing is attenuated according to the preset gain, so that not only is the signal size unchanged, but also the signal-to-noise ratio of the voice signal is improved, and the user experience is improved.
Optionally, the voice model includes a voice model of a specific user, and during driving, there may be other passengers speaking in addition to the user who is talking in the hands-free mode, so that when recognizing a voice signal using the voice model, the voice of the specific user is recognized according to the voice model of the specific user first. Whether a human voice other than the specific user is included is then identified according to the acoustic model. If there are voices of other than the specific user and the signal amplitude of the voices of other than the specific user is greater than the sound signal amplitude of the specific user, the speech of the specific user may not be enhanced by the noise reduction method in this embodiment. In this case, a prompt may be sent to the user, and the speaking voice of the current other person is too loud, which may affect the conversation effect.
Optionally, the noise reduction method disclosed above may be combined with a conventional Automatic Gain Control (AGC), specifically, after recognizing that the sound signal includes a user's sound signal and background noise, the obtained sound signal is subjected to AGC processing first, so that the amplitude of the sound signal changes in a small range, and then the amplitude of the sound signal and the amplitude of the background noise signal are obtained; calculating a difference value between the amplitude of the voice signal and the amplitude of the background noise signal, and when the difference value is a positive value and is greater than or equal to a first threshold value, performing noise reduction processing on the voice signal, wherein the intensity of the noise reduction processing is in direct proportion to the difference value; and when the difference value is a positive value and is smaller than a first threshold value, amplifying the sound signal by a preset gain, then carrying out noise reduction processing on the sound signal, and then attenuating the sound signal after the noise reduction processing according to the preset gain. By combining the conventional AGC method with the noise reduction method in the present embodiment, the voice noise reduction effect can be further optimized.
EXAMPLE III
Based on the first embodiment, specifically, the performing noise reduction processing on the sound signal based on the amplitude difference includes: and if the amplitude difference value is not a positive value, outputting prompt information, wherein the prompt information is used for prompting a user to approach a microphone for communication or increasing the speaking volume.
In this embodiment, when the difference S between the amplitude of the voice signal of the user and the amplitude of the noise signal is 0, that is, the voice signal and the noise signal are the same; or when the difference S is a negative value, that is, the amplitude of the speech signal is smaller than that of the noise signal, since the background noise is additive noise, the speech of the user is covered by the noise signal when the amplitude difference is a non-positive value, and in this case, performing noise reduction processing on the noise signal also processes the normal speech signal of the user, thereby causing serious distortion to the speech of the user. Therefore, in this embodiment, when the difference is a non-positive value, a prompt may be sent to the user through the audio output unit of the mobile phone, for example, to prompt the user to speak near the mobile phone or speak loudly when the sound is too loud or too loud.
Optionally, the content of the cell phone prompt may be adjusted by detecting the distance between the user and the cell phone.
In this embodiment, as in the first embodiment, the voice signal including the human voice and the noise signal are identified according to the voice model. And specifically identifying noise signals as tire noise, wind noise and air conditioner noise. If the difference value between the voice signal amplitude and the noise signal amplitude is a non-positive value and the distance between the user and the mobile phone is smaller than the preset value, the fact that the user is close to the mobile phone at present is indicated, and the problem that the call noise cannot be solved by speaking close to the mobile phone is solved, therefore, the source of the noise of the user can be simultaneously prompted while the voice of the user is prompted, so that the user can solve the problem in a targeted mode, for example, the fact that the air conditioner noise amplitude is larger than the voice signal amplitude of the user through voice recognition is found, the user is reminded that the air conditioner noise is too large, and the user is informed that the air conditioner wind speed can be reduced. If the wind noise amplitude is larger than the voice signal amplitude of the user through voice recognition, the user is prompted to have too large wind noise, and the user is advised to reduce the vehicle speed. The embodiment can not solve the noise problem by approaching the mobile phone to speak through the identified noise source, and can remind the user of the main source of the current noise so that the user can correspondingly adopt a solution, the user can conveniently and accurately solve the noise of voice reduction, and the user experience is improved.
Example four
Fig. 3 shows a constituent structure of the speech signal output processing apparatus provided in the present embodiment, and for convenience of explanation, only the portions related to the present embodiment are shown.
In this embodiment, the apparatus is used to implement the method for processing the output of the voice signal in the embodiment of fig. 1, and may be a software unit, a hardware unit or a unit combining software and hardware that is built in the mobile terminal. The mobile terminal includes but is not limited to a smart phone, a tablet computer, a learning machine or a smart car device.
As shown in fig. 3, the speech signal output processing apparatus 3 includes:
an audiosignal acquisition unit 301 for recognizing a speech signal and a background noise signal from a sound signal acquired in real time;
an amplitudedifference calculation unit 302, configured to obtain an amplitude difference between the speech signal and the background noise signal;
aprocessing unit 303, configured to perform noise reduction processing on the sound signal based on the amplitude difference.
Optionally, the apparatus for processing the output of the voice signal further includes:
and the prompting unit is used for outputting prompting information if the amplitude difference value is a non-positive value, wherein the prompting information is used for prompting a user to approach a microphone for conversation or increasing the speaking volume.
Optionally, the processing unit further includes:
the first processing subunit is configured to, if the amplitude difference is a positive value, and the amplitude difference is smaller than or equal to a second threshold, and the amplitude of the voice signal is greater than a third threshold, amplify the voice signal by a preset gain, so as to obtain a first intermediate signal;
carrying out noise reduction processing on the first intermediate signal to obtain a second intermediate signal;
and attenuating the second intermediate signal according to the preset gain to obtain the sound signal subjected to noise reduction processing.
Optionally, the processing unit further includes:
and the second processing subunit is used for performing noise reduction processing on the sound signal if the amplitude difference value is a positive value and is greater than or equal to a first threshold value, and the intensity of the noise reduction processing is in direct proportion to the amplitude difference value.
Optionally, the apparatus for processing an output of a voice signal further includes:
a pre-processing unit for pre-filtering the sound signal before recognizing the speech signal and the background noise signal.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: aprocessor 40, amemory 41 and acomputer program 42 stored in saidmemory 41 and executable on saidprocessor 40. Theprocessor 40, when executing thecomputer program 42, implements the steps in the above-described respective speech signal output processing method embodiments, such as the steps 101 to 103 shown in fig. 1. Alternatively, theprocessor 40, when executing thecomputer program 42, implements the functions of the units in the device embodiments described above, such as the functions of theunits 301 to 303 shown in fig. 3.
Illustratively, thecomputer program 42 may be partitioned into one or more modules/units that are stored in thememory 41 and executed by theprocessor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of thecomputer program 42 in the terminal device 4. For example, thecomputer program 42 may be divided into a synchronization module, a summary module, an acquisition module, and a return module (a module in a virtual device), and each module has the following specific functions:
the terminal device 4 may be a computing device with a voice input/output function, such as a notebook, a palm computer, a mobile phone, a tablet computer, and a navigator. The terminal device may include, but is not limited to, aprocessor 40, amemory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
TheProcessor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Thememory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. Thememory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, thememory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. Thememory 41 is used for storing the computer program and other programs and data required by the terminal device. Thememory 41 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.