Howling suppression method, device and equipmentTechnical Field
The embodiment of the invention relates to the technical field of audio processing, in particular to a howling suppression method, a howling suppression device and howling suppression equipment.
Background
At present, the sound amplification circuit is widely applied to work and life of people. For example, when a meeting place speaks or a KTV room sings, when the microphone and the loudspeaker are close to each other, the sound played by the loudspeaker is recorded into the microphone, the sound recorded by the microphone is played through the loudspeaker, and the sound is repeatedly superposed to form a loop gain, so that a stable point of a loop system is damaged, a very harsh howling sound is heard at the moment, and the user experience is greatly influenced.
When the above problems occur, the simplest solution is to keep the microphone and the speaker at a certain distance or at a staggered direction, but due to the limitation of the field, the howling phenomenon often occurs. Therefore, how to effectively suppress the howling phenomenon becomes a technical problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention provides a howling suppression method, a howling suppression device and howling suppression equipment, and aims to achieve the effect of effectively suppressing a howling part included in sound information.
In a first aspect, an embodiment of the present invention provides a method for suppressing howling, where the method includes:
acquiring sound input information;
performing framing processing on the sound input information to obtain at least one sound frame data;
performing frequency shift processing on the voice frame data to obtain frequency shift output information;
and carrying out self-adaptive filtering on the frequency shift output information and outputting howling suppression sound information.
Further, performing frequency shift processing on the voice frame data to obtain frequency shift output information, including:
acquiring single-sideband data of the sound frame data;
performing cosine modulation on the single-sideband data to obtain frequency shift output information;
the method for acquiring the single sideband data of the sound frame data comprises low-pass filtering and complex exponential modulation.
Further, adaptively filtering the frequency shifted output information comprises: and carrying out multi-subband self-adaptive filtering on the frequency shift output information.
Further, performing multi-subband adaptive filtering on the frequency shift output information, including:
performing down-sampling processing and sub-band division processing on the frequency shift output information, and performing adaptive filtering on at least two obtained sub-band data to obtain an adaptive filtering result;
and performing up-sampling processing and merging processing on the self-adaptive filtering result to obtain howling suppression sound information.
Further, down-sampling and dividing the frequency shift output information, and performing adaptive filtering on the obtained sub-band data to obtain an adaptive filtering result, including:
dividing the frequency shift output information into at least two sub-bands, performing down-sampling processing on the sub-bands, and performing adaptive filtering on the result of the down-sampling processing to obtain an adaptive filtering result; or,
and performing down-sampling processing on the frequency shift output information, dividing a down-sampling processing result into at least two sub-bands, and performing adaptive filtering on the sub-bands to obtain an adaptive filtering result.
Further, performing upsampling and merging on the adaptive filtering result to obtain howling suppression sound information, including:
performing up-sampling processing on the self-adaptive filtering result, synthesizing the obtained up-sampling processing result into howling suppression sound information and outputting the howling suppression sound information; or,
and synthesizing the self-adaptive filtering results of all sub-bands into information to be output, and performing up-sampling processing on the information to be output to obtain and output howling suppression sound information.
In a second aspect, an embodiment of the present invention further provides an apparatus for suppressing howling, where the apparatus includes:
the voice input information acquisition module is used for acquiring voice input information;
the framing processing module is used for framing the sound input information to obtain at least one sound frame data;
the frequency shift processing module is used for carrying out frequency shift processing on the sound frame data to obtain frequency shift output information;
and the self-adaptive filtering module is used for carrying out self-adaptive filtering on the frequency shift output information so as to inhibit the howling signal and output howling inhibition sound information.
Further, the frequency shift processing module includes:
a single sideband data acquisition unit for acquiring single sideband data of the sound frame data;
the frequency shift information output unit is used for carrying out cosine modulation on the single-sideband data to obtain frequency shift output information;
the method for acquiring the single sideband data of the sound frame data comprises low-pass filtering and complex exponential modulation.
Further, the adaptive filtering module is specifically configured to: and carrying out multi-subband self-adaptive filtering on the frequency shift output information.
The adaptive filtering module includes:
the adaptive filtering unit is used for carrying out downsampling processing and sub-band dividing processing on the frequency shift output information and carrying out adaptive filtering on at least two obtained sub-band data to obtain an adaptive filtering result;
and the howling suppression sound information obtaining unit is used for performing up-sampling processing and merging processing on the self-adaptive filtering result to obtain the howling suppression sound information.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned howling suppression method provided in the embodiment of the present invention.
According to the embodiment of the invention, one or more pieces of sound frame data are obtained by performing framing processing on the sound input information, frequency shift processing is performed on the sound frame data, and the obtained result is subjected to adaptive filtering, so that howling suppression sound information can be obtained and output, the problem that the howling phenomenon is frequently generated in real life to bring extremely bad experience to users is solved, and the effect of effectively suppressing the howling part in the sound information is realized.
Drawings
Fig. 1 is a flowchart of a howling suppression method according to an embodiment of the present invention;
fig. 2 is a flowchart of a howling suppression method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a howling suppression device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention;
FIG. 5 is a schematic diagram of a frame-by-frame frequency shift process according to the preferred embodiment of the present invention;
fig. 6 is a schematic diagram of a multi-subband adaptive filtering process according to a preferred embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a flowchart of a howling suppression method according to an embodiment of the present invention, where the howling phenomenon may occur in this embodiment, the method may be implemented by a device for howling suppression according to an embodiment of the present invention, and the device may be implemented by software and/or hardware and may be integrated in a sound recording and playing device.
As shown in fig. 1, the howling suppression method includes:
s110, performing framing processing on the voice input information to obtain at least one voice frame data.
The voice input information may be voice of a user speaking, or voice information recorded by a microphone or the like. The framing process may be framing for a fixed time period, and then encapsulating each frame of data to obtain at least one sound frame of data. Each frame of sound frame data may include the entire sound segment or a portion of the sound segment. In combination with an actual scene, the howling phenomenon may occur in a certain frame of voice frame data, and may also occur in a plurality of frames of voice frame data that are continuous or segmented.
And S120, performing frequency shift processing on the voice frame data to obtain frequency shift output information.
The frequency shift processing on the voice frame data may be performed on all the voice frame data or on part of the voice frame data, and may be specifically determined according to requirements. For example, in order to increase the calculation speed, after the audio input information is divided into frames, the magnitude of the energy of each frame of audio is estimated to determine whether there is a possibility of howling in the current frame, and the frequency shift processing may be performed on the audio frame data having a high probability of howling, or may be performed on all the audio frame data.
The frequency shift processing means shifting the frequency for judging the possible existence of the howling phenomenon within a certain range, so as to initially inhibit the howling phenomenon. In the embodiment of the present invention, it is preferable that the frequency shift range is several hz, and the sound quality is not affected after the frequency shift, and a better howling suppression effect can be achieved, for example, the sound frequency in which the howling phenomenon may occur is shifted to within 5 hz.
In this embodiment of the present invention, preferably, the frequency shift processing performed on the voice frame data to obtain frequency shift output information includes: acquiring single-sideband data of the sound frame data; performing cosine modulation on the single-sideband data to obtain frequency shift output information; the method for acquiring the single sideband data of the sound frame data comprises low-pass filtering and complex exponential modulation.
The parameters of low-pass filtering, complex index modulation and cosine modulation can be calculated in advance, the voice input information is subjected to framing processing by the method, and then frame-by-frame frequency shift is carried out, so that the advantages that the calculated amount can be greatly reduced and the calculation speed is increased relative to point-by-point frequency shift are achieved, and the processing speed of the voice input information is increased.
S130, carrying out self-adaptive filtering on the frequency shift output information and outputting howling suppression sound information
After the frequency shift output information is obtained, adaptive filtering processing can be performed on the frequency shift output information, so that the effect of suppressing the howling phenomenon existing in the input sound information to the maximum extent is achieved.
In the embodiment of the present invention, preferably, the adaptively filtering the frequency shift output information includes: and carrying out multi-subband self-adaptive filtering on the frequency shift output information. The method has the advantages that the self-adaptive filtering processing can be simultaneously carried out on the sound information by the plurality of sub-bands, so that the calculation speed can be improved, the sound can be rapidly output, and the delay is reduced.
The obtained multiple sub-bands can be divided equally or unequally, and the equal division has the advantages that division functions in the process of the molecular bands can be unified and do not need to be set independently. The unequal division has the advantages that the frequency band which is difficult to generate the howling phenomenon in the whole sound bandwidth can be amplified or reduced according to the frequency range in which the howling phenomenon is easy to generate, then the adaptive filtering processing can be carried out according to the selection, or the processing is not carried out, so that the calculation burden can be reduced, and meanwhile, the frequency band in which the howling phenomenon is easy to generate is divided into a plurality of sub-bands, so that the precision of the howling suppression is improved.
According to the embodiment of the invention, one or more pieces of sound frame data are obtained by performing framing processing on the sound input information, frequency shift processing is performed on the sound frame data, and the obtained result is subjected to adaptive filtering, so that howling suppression sound information can be obtained and output, the problem that the howling phenomenon is frequently generated in real life to bring extremely bad experience to users is solved, and the effect of effectively suppressing the howling part in the sound information is realized.
Example two
Fig. 2 is a flowchart of a howling suppression method according to a second embodiment of the present invention. In this embodiment, based on the above embodiment, adaptive filtering is performed on the frequency shift output information in the above embodiment to output howling suppression sound information, wherein preferably, multi-subband adaptive filtering is performed on the frequency shift output information to perform further optimization.
As shown in fig. 2, the howling suppression method includes:
s210, performing framing processing on the voice input information to obtain at least one voice frame data.
S220, performing frequency shift processing on the voice frame data to obtain frequency shift output information.
And S230, performing down-sampling processing and sub-band division processing on the frequency shift output information, and performing adaptive filtering on at least two obtained sub-band data to obtain an adaptive filtering result.
The down-sampling process may be performed according to a preset parameter, and if the number of steps of down-sampling is set to 4, one unit may be extracted as down-sampling data every three units. For example, data at 1 Hz was sampled every 3 Hz for further processing. The subband division may be performed by equally or unequally dividing data in the entire frequency shift output information bandwidth. If the entire frequency shift output information bandwidth is 8kHz, the frequency shift output information bandwidth can be divided into one sub-band every 2kHz, and the frequency shift output information bandwidth can be divided into 4 sub-bands. Wherein the process of dividing the sub-bands can be implemented by analyzing a filter bank.
And carrying out self-adaptive filtering on the obtained at least two sub-band data so as to obtain a self-adaptive filtering result.
In the embodiment of the present invention, preferably, the frequency shift output information is divided into at least two sub-bands, the sub-bands are down-sampled, and the result of the down-sampling is adaptively filtered to obtain a result of adaptive filtering; or, performing downsampling processing on the frequency shift output information, dividing a downsampling processing result into at least two sub-bands, and performing adaptive filtering on the sub-bands to obtain an adaptive filtering result. That is to say, according to the polyphase filter structure, the order of the two steps of sub-band division and downsampling processing can be reversed, and the calculation result is not affected.
And S240, performing up-sampling processing and merging processing on the self-adaptive filtering result to obtain howling suppression sound information.
The upsampling process may be understood as an interpolation process, and the parameters thereof may correspond to the downsampling parameters, and preferably, after the interpolation, a low-pass filtering process may be performed to filter out the mirror image information. The combining process may be performed by a synthesis filter bank corresponding to the function of an analysis filter bank in the sub-band process, i.e., the processing results of a plurality of sub-bands are mutually synthesized into a sound signal of a full bandwidth.
In this embodiment of the present invention, preferably, the performing upsampling and merging on the adaptive filtering result to obtain howling suppression sound information includes: performing up-sampling processing on the self-adaptive filtering result, synthesizing the obtained up-sampling processing result into howling suppression sound information and outputting the howling suppression sound information; or synthesizing the result of the adaptive filtering of each sub-band into information to be output, and performing up-sampling processing on the information to be output to obtain and output howling suppression sound information. The advantage of this arrangement is that the order of the two steps of upsampling and combining is not limited, and according to the polyphase filter structure, if the subband synthesis is performed before and then after, the length of the synthesized subband filter can be shortened, and the amount of calculation can be reduced.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a howling suppression device according to a third embodiment of the present invention. As shown in fig. 3, the howling suppression apparatus includes:
a framing processing module 310, configured to perform framing processing on the voice input information to obtain at least one voice frame data;
the frequency shift processing module 320 is configured to perform frequency shift processing on the voice frame data to obtain frequency shift output information;
the adaptive filtering module 330 is configured to perform adaptive filtering on the frequency shift output information to suppress the howling signal and output howling suppression sound information.
According to the embodiment of the invention, one or more pieces of sound frame data are obtained by performing framing processing on the sound input information, frequency shift processing is performed on the sound frame data, and the obtained result is subjected to adaptive filtering, so that howling suppression sound information can be obtained and output, the problem that the howling phenomenon is frequently generated in real life to bring extremely bad experience to users is solved, and the effect of effectively suppressing the howling part in the sound information is realized.
On the basis of the foregoing embodiments, the frequency shift processing module 320 includes:
a single sideband data acquisition unit for acquiring single sideband data of the sound frame data;
the frequency shift information output unit is used for carrying out cosine modulation on the single-sideband data to obtain frequency shift output information;
the method for acquiring the single sideband data of the sound frame data comprises low-pass filtering and complex exponential modulation.
On the basis of the foregoing embodiments, the adaptive filtering module 330 is specifically configured to: and carrying out multi-subband self-adaptive filtering on the frequency shift output information.
The adaptive filtering module 330 includes:
the adaptive filtering unit is used for carrying out downsampling processing and sub-band dividing processing on the frequency shift output information and carrying out adaptive filtering on at least two obtained sub-band data to obtain an adaptive filtering result;
and the howling suppression sound information obtaining unit is used for performing up-sampling processing and merging processing on the self-adaptive filtering result to obtain the howling suppression sound information.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 4 is only an example and should not bring any limitation to the function and scope of use of the embodiments of the present invention.
As shown in FIG. 4, device 12 is in the form of a general purpose computing device. The components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with device 12, and/or with any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown in FIG. 4, the network adapter 20 communicates with the other modules of the device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, to implement the howling suppression method provided by the embodiment of the present invention.
PREFERRED EMBODIMENTS
In order to better explain the howling suppression method provided in the embodiment of the present invention and the specific implementation manner of each step, the preferred embodiment of the present invention is provided as an explanation, but the preferred embodiment of the present invention does not limit the specific implementation process.
And S111, acquiring the voice input information, and firstly performing framing processing on the voice input information, wherein the specific mode of framing processing is conventional, and details are not repeated here.
And S112, shifting the frequency frame by frame.
The common practice is to perform hilbert transform on the signal, transform the signal into a single-sideband signal, and then perform frequency domain modulation on the signal by using sine wave and cosine wave to realize frequency shift. However, the hilbert transform is not for frame data but for a complete point-by-point signal, and the hilbert transform requires a single calculation for a new signal, which is a large amount of calculation. The idea of using the filter is proposed to realize the frequency shift frame by frame, and the calculation amount is small.
The frame-by-frame processing is greatly reduced compared with the point-by-point calculation amount, the single sideband of the frame data is realized through a low-pass filtering and complex index modulation unit which is designed in advance, and finally, the frequency shift signal is output through cosine modulation. The low-pass filtering unit, the complex exponential modulation unit and the cosine modulation unit can be calculated in advance, so that the calculation amount is further greatly reduced.
Fig. 5 is a schematic diagram of a frame-by-frame frequency shift process according to a preferred embodiment of the present invention, and as shown in fig. 5, the frame-by-frame frequency shift process includes:
s1121, acquiring sound information frame data subjected to framing processing;
s1122, performing low-pass filtering processing on the frame data;
s1123, performing complex exponential modulation on the low-pass filtering processing result to obtain single-sideband data;
s1124, cosine modulating the single sideband data;
and S1125, outputting the cosine-modulated signal.
Correspondingly, after the signal after the Yuxuan modulation processing is output, the signal is used as an input signal of other processing flows for subsequent processing.
And S113, analyzing a subband filter bank.
The embodiment of the present invention preferably uses a QMF filter bank, which is decomposed into 4 sub-bands (or into 2 sub-bands) as an example. The basic design idea is that firstly, a low-pass filter is designed, the normalized cut-off frequency is the reciprocal of the number of sub-bands, and the low-pass filter is modulated to different frequency bands step by step, so that the sub-band decomposition of the signal is realized.
Assuming the prototype low-pass filter h1[ n ], based on the relationship of each sub-band of the QMF filter bank, the filters of other sub-bands are sequentially:
subband 1 filter: h is1[n]
Subband 2 filter:
subband 3 filter:
subband 4 filter:
assuming that the input signal is x (n), each subband signal is:
sub(m,n)=x(n)*hm[n]
where n is the sampling point or the sample point at the nth time.
And S114, downsampling processing.
For the mth subband signal at time n, denoted as sub (m, n).
Q-time down sampling is divided into two steps:
in the first step, a low-pass filter is firstly passed to prevent the aliasing of the following high-frequency signal. The normalized cut-off frequency of the low-pass filter is 1/Q to obtain subf(m,n)。
Secondly, extracting every Q-1 points to extract one point, namely a down-sampled signal subd(m, n) is:
subd(m,n)=subf(m,1+Q*(n-1))。
and S115, self-adaptive filtering.
For the m-th sub-band down-sampled signal, n time is denoted as subd(m, n), y (m, n) is expressed corresponding to the filtered subband signal. This filter is preferably an IIR filter, so the input signal input (m, n) is represented as:
input(m,n)=[subd(m,n),subd(m,n-1),subd(m,n-2),y(m,n-1),y(m,n-2)]
the filter coefficients are represented as:
w=[1,b(m,n),1,-a*b(m,n),-a2]T
the filtered subband output signal is then:
y(m,n)=input(m,n)*w
wherein, the filter parameter b (n) is updated:
b(m,n+1)=b(m,n)+μ(m,n)*y(m,n)*g(m,n)
μ(m,n)=1/(2*y(m,n)T*y(m,n))
g(m,n)=[subd(m,n-1),y(m,n-1)]*[1,-a]T
and S116, upsampling.
Let y (M, n), n be 1,2, …, M.
The P-fold upsampling process can also be divided into two steps:
in the first step, P-1 zeros are interpolated between two adjacent points of the signal, i.e.
When n is 1,2, …, M,
yz(m,P*(n-1)+1)=y(m,n)
other time, yz(m,n)=0.
And secondly, low-pass filtering is carried out to filter out mirror image information. Will yz(m, n) is convolved with a low-pass linear filter h (n) to obtain yu(m,n)=yz(m, n) h (n), wherein
And S117, merging processing.
From the relationship of the analysis filter and the synthesis filter, a synthesis filter is obtained
hs1[n]=4*h1[n]
hs2[n]=-4*h2[n]
hs3[n]=4*h3[n]
hs4[n]=-4*h4[n]
The resultant output signal is then:
fig. 6 is a schematic diagram of a multi-subband adaptive filtering process according to a preferred embodiment of the present invention, wherein the data processing process of S113 to S117 may be covered, and it should be noted that fig. 6 is only a multi-subband adaptive filtering process according to an embodiment of the present invention, and does not limit the multi-subband adaptive filtering process according to the embodiment of the present invention. After the data processing in S111 to S117, the howling suppression output signal can be obtained.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.