Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," and the like in this specification are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the application may be practiced in sequences other than those illustrated and described herein, and that "first," "second," etc. distinguished objects generally are of the type.
Fig. 1 is a schematic flow chart of an audio signal separation method according to the present invention, as shown in fig. 1, the method includes steps 110, 120, 130 and 140.
Step 110, obtaining frequency domain signals corresponding to audio signals of different channels of the audio device.
Specifically, frequency domain signals corresponding to audio signals of different channels of the audio device can be obtained. Here, the audio device may be a smart phone, a tablet computer, an area pickup, a surround pickup microphone, a three-dimensional sound pickup, or the like, which is not particularly limited in the embodiment of the present invention.
Accordingly, the audio signals of different channels of the audio device may be stereo audio, multi-channel audio, three-dimensional audio, surround sound audio, virtual Reality (VR) audio, and audio signals in augmented Reality (Augmented Reality, AR) audio, which are not limited in detail in the embodiment of the present invention.
It will be appreciated that in the case where the audio corresponding to the audio device is stereo audio, the different channels of the audio device include two channels, i.e., a first channel and a second channel, for example, a left channel and a right channel, and in the case where the audio corresponding to the audio device is multi-channel audio, the different channels of the audio device include a plurality of channels, which is not particularly limited in the embodiment of the present invention.
It should be noted that most audio signals in daily life are non-stationary signals, i.e. the frequency components thereof may change with time. Conventional audio signals cannot provide time and frequency information at the same time in the time domain, and thus cannot effectively process non-stationary signals. The short-time fourier transform captures the frequency characteristics of the audio signal over time by dividing the audio signal into a plurality of short periods (windows) in the time domain and fourier transforming each period.
In the frequency domain, coherent acoustic signals and ambient acoustic signals typically have different spectral characteristics. For example, coherent acoustic signals may be concentrated in certain specific frequency ranges, while ambient acoustic signals may be distributed over a broader frequency range. By analyzing the frequency domain signal, these spectral differences can be exploited to separate the coherent acoustic signal from the ambient acoustic signal.
In summary, in order to be better suited for use in a non-stationary state and in a case where a plurality of sound sources exist simultaneously, a Short-time fourier transform (Short-Time Fourier Transform, STFT) process is performed on the audio signal to obtain a frequency domain signal.
In the process of extracting the coherent acoustic signal and the ambient acoustic signal, the audio signal of each channel is generally represented as a superposition of the coherent acoustic signal and the ambient acoustic signal. Based on the characteristics of the coherent acoustic signal and the ambient acoustic signal, it is assumed that the coherent acoustic signal between channels is completely correlated, and that the coherent acoustic signal is uncorrelated with the ambient acoustic signal of each channel and with the ambient acoustic signal between channels. Here, description is made of a case where audio corresponding to an audio device is stereo audio, and different channels of the audio device include a first channel and a second channel, i.e., a left channel and a right channel, and thus, audio signals in a time domain、The definition is as follows:
(1)
Wherein,An audio signal representing a first channel of an audio device,An audio signal representing a second channel of the audio device,Representing the coherent acoustic signal(s),An ambient sound signal representing the first channel,An ambient sound signal representing the second channel,An amplitude difference factor representing the coherent acoustic signal of the first channel and the second channel.
Step 120, decomposing the frequency domain signal to obtain a coherent acoustic signal and an ambient acoustic signal of each channel.
Specifically, after the frequency domain signal is obtained, the frequency domain signal may be decomposed, and the coherent acoustic signal and the ambient acoustic signal of each channel may be obtained.
Here, the case where the audio corresponding to the audio device is stereo audio is described, and the different channels of the audio device include a first channel and a second channel, and then the audio signal is represented as a frequency domain signal in the fourier transform domain, where the formula of the frequency domain signal is:
(2)
Wherein,A frequency domain signal representing a first channel,A frequency domain signal representing a second channel,Representing a coherent acoustic signal in the frequency domain,Representing an ambient sound signal in the frequency domain of the first channel,Representing an ambient sound signal in the second channel frequency domain,A time frame index is represented and a time frame index is represented,The index of the frequency point is represented,Amplitude difference factors representing coherent acoustic signals of the first and second channels may be omitted if necessary for brevity。
It will be appreciated that stereo mainly comprises two components of different properties, one of which is a sound component with directionality, called coherent sound, and the other of which is a sound component with diffusivity, from which directions cannot be distinguished, called ambient sound. The coherent sound may be a character dialogue, solo of a musical instrument, etc., and the ambient sound may be a background sound effect, such as traffic sound, wind sound, rain sound, etc., which is not particularly limited in the embodiment of the present invention.
It should be noted that, when the audio corresponding to the audio device is multi-channel audio, the ambient sound signal is mostly generated in a rear left channel and a rear right channel in the 5.1-channel audio system, or the ambient sound signal is mostly generated in a side left channel and a side right channel in the 7.1-channel audio system, the coherent sound signal is mostly generated in a front left channel, a front right channel and a middle channel, and formulas of the corresponding audio signal and the frequency domain signal are similar and are not repeated here.
In addition, in the case that the audio corresponding to the audio device is three-dimensional audio, surround sound audio, virtual reality audio and augmented reality audio, formulas of the corresponding audio signal and the frequency domain signal are similar, and are not repeated here.
And step 130, determining the target ambient sound signal of each channel according to the phase difference between the ambient sound signals of the different channels under the condition that the energy corresponding to the ambient sound signals of the different channels is the same.
Specifically, in the embodiment of the invention, it is firstly assumed that different channels are respectively composed of coherent acoustic signals and ambient acoustic signals, and the energy of the ambient acoustic signals in the different channels is consistent, so that only phase differences exist between the ambient acoustic signals of the different channels, and based on the phase differences, an amplitude expression of the ambient acoustic signals can be obtained. The method is characterized in that the phase of the ambient sound signal is solved, the phase of the optimal ambient sound signal is solved by combining the sparsity limit of the audio in the process, so that the coherent sound signal and the ambient sound signal are separated, the calculated amount of the method is small, and the separation effect is good.
Accordingly, in the case where the audio corresponding to the audio device is stereo audio, it is first assumed that the first channel and the second channel are composed of coherent sound signals and ambient sound signals, respectively, and that the energies of the ambient sound signals in the first channel and the second channel are identical, so that only a difference in phase exists between the ambient sound signals of the first channel and the second channel, and thus an amplitude expression of the ambient sound signals can be found.
In summary, the general principle is to determine the target ambient sound signal of each channel according to the phase difference between the ambient sound signals of different channels under the condition that the energy corresponding to the ambient sound signals of different channels is the same. Here, the target ambient sound signal refers to an ambient sound signal that eventually needs to be subjected to audio signal separation.
And 140, separating the target ambient sound signal of the corresponding channel from the audio signal of each channel, and acquiring the target coherent sound signal of the channel.
Specifically, after the target ambient sound signal of each channel is obtained, the target ambient sound signal of the corresponding channel may be separated from the audio signal of each channel, and the target coherent sound signal of the channel may be obtained, for example, the target ambient sound signal of the corresponding channel may be subtracted from the audio signal of each channel, that is, the target coherent sound signal of the channel may be obtained. Here, the target coherent acoustic signal is the final coherent acoustic signal.
It can be understood that the separated target coherent acoustic signal and target ambient acoustic signal can be processed twice by a sound mixer or a multi-channel algorithm to create an auditory atmosphere with more sense of surrounding and sense of presence, and realize better auditory effect.
The method is characterized in that the coherent sound signal is separated from the environmental sound signal, and 1, the immersion sense is enhanced, namely, a user is enabled to be as if the user is in the scene through accurately positioning the coherent sound source and finely regulating and controlling the environmental sound. 2. Improving sound quality, and improving definition and fidelity of sound. 3. And the method is suitable for personalized requirements, namely personalized customization is carried out aiming at the preferences of different users.
It can be appreciated that the method provided by the embodiment of the invention can be applied to multi-channel audio, and the target coherent sound signal and the target ambient sound signal are processed secondarily by a sound mixer or a multi-channel algorithm, so that a more immersive audio experience can be provided in a home theater or a movie theater; the audio signal separation method can help to improve the playback effect of surround sound when being applied to surround sound audio, can further provide more accurate sound localization and space sense when being applied to three-dimensional audio, and can help to realize more dynamic and real audio experience when being applied to virtual reality and augmented reality audio, wherein the immersive technology needs highly realistic space audio effect.
According to the method provided by the embodiment of the invention, the frequency domain signals corresponding to the audio signals of different channels of the audio equipment are obtained, the frequency domain signals are subdivided, the coherent sound signals and the ambient sound signals of each channel are obtained, then, under the condition that the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signals of each channel are determined according to the phase difference between the ambient sound signals of different channels, finally, the target ambient sound signals of the corresponding channels are separated from the audio signals of each channel, the target coherent sound signals of the channels are obtained, and then, secondary processing is carried out on the separated target coherent sound signals and the target ambient sound signals by a sound mixer or a multi-channel algorithm, so that the hearing atmosphere with surrounding sense and in-field sense is created, and the hearing experience of a user is improved.
Based on the above embodiment, the different channels are a first channel and a second channel;
step 130 includes:
step 131, determining a first target ambient sound signal and a second target ambient sound signal according to a phase difference between the first ambient sound signal and the second ambient sound signal under the condition that the first short-time energy of the first ambient sound signal in the first channel is the same as the second short-time energy of the second ambient sound signal in the second channel.
Specifically, in the case where the audio corresponding to the audio device is stereo audio, the different channels of the audio device include a first channel, i.e., a left channel, and a second channel, i.e., a right channel.
And under the condition that the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same, determining the first target ambient sound signal and the second target ambient sound signal according to the phase difference between the first ambient sound signal and the second ambient sound signal.
Here, the first target ambient sound signal is the ambient sound signal finally determined in the first channel, and the second target ambient sound signal is the ambient sound signal finally determined in the second channel.
Assuming that the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel in equation (2) are the same, it is noted thatIn connection with the relevant assumption, equation (2) can be expressed as:
(3)
Wherein,Short-time energy representing the frequency domain signal of the first channel,Short-time energy representing the frequency domain signal of the second channel, the first short-time energy and the second short-time energy being usedThe representation is made of a combination of a first and a second color,Representing the short-time energy of the coherent acoustic signal,Representing the amplitude difference factor.
Wherein,,Representing a short time average. The frequency domain signal of the second channel is used for obtaining the short-time energy by the same method。
It can be appreciated that the frequency domain signal of the first channel is knownFrequency domain signal of second channelIn the case of (a), the parameters are estimated、、AndThe extraction of the coherent sound signal and the ambient sound signal can be completed.
In the embodiment of the invention, the constraint relation of the phases of the ambient sound signals is explored through the signal model and the assumption of the formula (1), and the phases of the ambient sound signals are estimated by utilizing the sparsity of the coherent sound signals, so that the extraction of the coherent sound components is completed.
Based on the above embodiment, in the case that the first short-time energy of the first ambient sound signal in the first channel and the second short-time energy of the second ambient sound signal in the second channel are the same in step 131, determining the first target ambient sound signal and the second target ambient sound signal according to the phase difference between the first ambient sound signal and the second ambient sound signal includes:
step 1311, determining the first target ambient sound signal and the second target ambient sound signal according to a phase angle difference between the first ambient sound signal and the second ambient sound signal, in case the first ambient sound signal and the second ambient sound signal have the same amplitude.
Specifically, in the case where the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, the first target ambient sound signal and the second target ambient sound signal are determined in accordance with the phase angle difference between the first ambient sound signal and the second ambient sound signal.
In particular, the amplitude of the ambient sound signal can be assumed assuming that the ambient sound signal has the same energy in each channelThe same:
(4)
Wherein,
(5)
In the formula,A first phase angle representing a first ambient sound signal in a first channel,Representing a second phase angle of a second ambient sound signal in the second channel.
Substituting the formula (4) into the formula (2) to obtain:
(6)
Due toIs non-negative, so that the following constraint exists between the first phase angle and the second phase angle:
(7)
Wherein,Representation ofThe corresponding phase angle is used to determine the phase angle,Representing the amplitude difference factor.
It will be appreciated that in the case where the first ambient sound signal and the second ambient sound signal are of the same amplitude, the first target phase angle is determined firstThe second target phase angle may be determined based on the phase angle difference between the first ambient sound signal and the second ambient sound signalAnd the first and second target phase angles are determined, the first and second target ambient sound signals may be determined based on the first and second target phase angles, respectively.
That is, in equation (2), the frequency domain signal of the first channelFrequency domain signal of second channelAnd an amplitude difference factorAll have been determined, when solving forAndThe extraction of the coherent acoustic signal and the ambient acoustic signal can be accomplished.
Based on the above embodiments, fig. 2 is a schematic flow chart of a step of determining a phase angle difference according to the present invention, and as shown in fig. 2, the step of determining a phase angle difference includes:
step 210, obtaining a cross-correlation coefficient of a first frequency domain signal and a second frequency domain signal, wherein the first frequency domain signal is a frequency domain signal corresponding to an audio signal of the first channel, and the second frequency domain signal is a frequency domain signal corresponding to an audio signal of the second channel;
Step 220 of determining an amplitude difference factor based on the cross-correlation coefficient, the first short time energy of the first frequency domain signal and the second short time energy of the second frequency domain signal;
step 230, determining the phase angle difference based on the amplitude difference factor.
Specifically, first, cross-correlation coefficients of a first frequency-domain signal and a second frequency-domain signal are obtained, wherein the first frequency-domain signal is a frequency-domain signal corresponding to an audio signal of a first channel, and the second frequency-domain signal is a frequency-domain signal corresponding to an audio signal of a second channel.
The formula of the cross-correlation coefficient is as follows:
(8)
substituting formula (2) into formula (8) and combining assumptions on the signal model yields:
(9)
since in the formula (3) and the formula (9),、AndIs about、AndThus, can be solved for、And:
Wherein,
It follows that the amplitude difference factor can be determined based on the cross-correlation coefficient, the first short time energy of the first frequency domain signal and the second short time energy of the second frequency domain signal。
Based on equation (7), the phase angle difference can be determined based on the amplitude difference factor.
Based on the above embodiment, step 1311 includes:
Determining the first target ambient sound signal according to a first target phase angle of the first ambient sound signal, determining a second target phase angle of the second ambient sound signal based on the first target phase angle and the phase angle difference, and determining the second target ambient sound signal based on the second target phase angle;
or, determining the second target ambient sound signal based on a second target phase angle of the second ambient sound signal, and determining a first target phase angle of the first ambient sound signal based on the second target phase angle and the phase angle difference, and determining the first target ambient sound signal based on the first target phase angle.
Specifically, since in equation (2), the frequency domain signal of the first channelFrequency domain signal of second channelAnd an amplitude difference factorAll have been determined, when solving forAndThe extraction of the coherent acoustic signal and the ambient acoustic signal can be accomplished.
That is, the first target phase angle of the first ambient sound signal can be used as a basisDetermining a first target ambient sound signalAnd based on the first target phase angleDetermining a second target phase angle of the second ambient sound signalBased on the second target phase angleDetermining a second target ambient sound signal。
Or, a second target phase angle based on a second ambient sound signalDetermining a second target ambient sound signalAnd based on a second target phase angleDetermining a first target phase angle of the first ambient sound signalBased on the first target phase angleDetermining a first target ambient sound signal。
Based on the above embodiment, fig. 3 is a schematic flow chart of the step of determining the first target phase angle provided by the present invention, as shown in fig. 3, where the step of determining the first target phase angle includes:
Step 310, constructing a first target constraint between the first short-time energy and a first phase angle of the first ambient sound signal with the first short-time energy of the first ambient sound signal as a target;
Step 320, determining the first target phase angle based on the set of discrete values of the first phase angle and the first target constraint, the first target phase angle minimizing a component amplitude of a first coherent acoustic signal in the first frequency domain signal.
In particular, this feature is widely used in many audio and music signals because of the sparsity of coherent acoustic signals. Thus, the sparsity of the coherent acoustic signal may be used to determine the phase angle of one ambient acoustic signal. Namely, a first target constraint between the first short-time energy and a first phase angle of the first ambient sound signal is constructed by taking the first short-time energy of the first ambient sound signal as a target, wherein the first target constraint comprises the following formula:
(10)
Wherein,Indicating that the first phase angle is a first phase angle,Representing the first short time energy.
The objective function in equation (10) is non-convex, and embodiments of the present invention use a discrete optimization method to determine. Due toIs within a range ofIn, the set of discrete values of the first phase angle is defined as:
Wherein,,Is the total number of discrete phase values, in the embodiment of the invention. And selecting a first phase angle which minimizes the component amplitude of the first coherent acoustic signal in the first frequency domain signal from the determined number of discrete phase values as a first target phase angle.
Based on the above embodiment, fig. 4 is a schematic flow chart of the step of determining the second target phase angle provided by the present invention, as shown in fig. 4, where the step of determining the second target phase angle includes:
step 410, constructing a second target constraint between the second short-time energy and a second phase angle of the second ambient sound signal with the second short-time energy of the second ambient sound signal as a target;
Step 420, determining the second target phase angle based on the set of discrete values of the second phase angle and the second target constraint, wherein the second target phase angle minimizes the component amplitude of the second coherent acoustic signal in the second frequency domain signal.
Specifically, based on the above description, the sparsity of the coherent acoustic signal may be utilized to determine the phase angle of one ambient acoustic signal. Namely, a second target constraint between the second short-time energy and a second phase angle of the second environmental sound signal is constructed by taking the minimum second short-time energy of the second environmental sound signal as a target, wherein the formula is as follows:
(11)
Wherein,Which represents a second phase angle of the light,Representing a second short time energy.
The objective function in equation (11) is non-convex, and embodiments of the present invention use a discrete optimization method to determine. Due toIs within a range ofIn, the set of discrete values of the second phase angle is defined as:
Wherein,,Is the total number of discrete phase values, in the embodiment of the invention. And selecting a second phase angle which minimizes the component amplitude of the second coherent acoustic signal in the second frequency domain signal from the determined number of discrete phase values as a second target phase angle.
Based on any of the above embodiments, fig. 5 is a second flowchart of an audio signal separation method according to the present invention, as shown in fig. 5, the method includes:
The method comprises the steps of firstly, obtaining audio signals of different channels of audio equipment, and carrying out short-time Fourier transform on the audio signals of the different channels to obtain frequency domain signals corresponding to the audio signals of the different channels respectively.
And secondly, decomposing the frequency domain signals to obtain coherent acoustic signals and environmental acoustic signals of each channel.
And thirdly, the different channels are a first channel and a second channel, and then the cross-correlation coefficients of a first frequency domain signal and a second frequency domain signal are obtained, wherein the first frequency domain signal is a frequency domain signal corresponding to the audio signal of the first channel, and the second frequency domain signal is a frequency domain signal corresponding to the audio signal of the second channel.
Fourth, an amplitude difference factor is determined based on the cross-correlation coefficient, the first short time energy of the first frequency domain signal, and the second short time energy of the second frequency domain signal.
And fifthly, determining phase angle difference based on the amplitude difference factor.
And a sixth step of constructing a first target constraint between the first short-time energy and a first phase angle of the first ambient sound signal by taking the first short-time energy of the first ambient sound signal as a target when the amplitudes of the first ambient sound signal and the second ambient sound signal are the same, and determining a first target phase angle based on a discrete value set of the first phase angle and the first target constraint, wherein the first target phase angle enables the component amplitude of the first coherent sound signal in the first frequency domain signal to be minimum.
Or under the condition that the amplitudes of the first environmental sound signal and the second environmental sound signal are the same, taking the minimum second short-time energy of the second environmental sound signal as a target, constructing a second target constraint between the second short-time energy and a second phase angle of the second environmental sound signal, and determining a second target phase angle based on a discrete value set of the second phase angle and the second target constraint, wherein the second target phase angle enables the component amplitude of the second coherent sound signal in the second frequency domain signal to be minimum.
In summary, the phase of the ambient acoustic signal is determined by using the sparsity of the coherent acoustic signal, and a phase value is selected that minimizes the component amplitude of the coherent acoustic signal.
Seventh, according to a first target phase angle of the first ambient sound signal, determining a first target ambient sound signal, determining a second target phase angle of the second ambient sound signal based on the first target phase angle and the phase angle difference, and determining the second target ambient sound signal based on the second target phase angle;
Or, determining the second target ambient sound signal based on the second target phase angle of the second ambient sound signal, and determining the first target phase angle of the first ambient sound signal based on the second target phase angle and the phase angle difference, and determining the first target ambient sound signal based on the first target phase angle.
Eighth, the first target ambient sound signal is separated from the audio signal of the first channel, and the target coherent sound signal of the first channel is obtained. And separating the second target environmental sound signal from the audio signal of the second channel to obtain a target coherent sound signal of the second channel.
The method provided by the embodiment of the invention firstly assumes that the first channel and the second channel are respectively composed of coherent acoustic signals and environmental acoustic signals, and the energy of the environmental acoustic signals of the first channel and the second channel are consistent, so that only the difference in phase exists between the first environmental acoustic signal and the second environmental acoustic signal, the key of the method is to solve the phase of the environmental acoustic signal, and also proposes to solve the phase of the optimal environmental acoustic signal by utilizing the sparsity limit of audio, and the constraint relationship exists between the phases of the environmental acoustic signals of the first channel and the second channel, and only one of the phases needs to be solved, and the other one of the phases can be determined. The method has small calculated amount and good separation effect.
The audio signal separation device provided by the invention is described below, and the audio signal separation device described below and the audio signal separation method described above can be referred to correspondingly.
Based on any of the above embodiments, the present invention provides an audio signal separation device, and fig. 6 is a schematic structural diagram of the audio signal separation device provided by the present invention, as shown in fig. 6, the device includes:
a first obtaining unit 610, configured to obtain frequency domain signals corresponding to audio signals of different channels of an audio device, respectively;
a second obtaining unit 620, configured to decompose the frequency domain signal, and obtain a coherent acoustic signal and an ambient acoustic signal of each channel;
A determining unit 630, configured to determine, when the energy corresponding to the ambient sound signals of the different channels is the same, a target ambient sound signal of each channel according to a phase difference between the ambient sound signals of the different channels;
And a separation unit 640, configured to separate the target ambient sound signal of each channel from the audio signal of the corresponding channel, and obtain a target coherent sound signal of the channel.
According to the device provided by the embodiment of the invention, the frequency domain signals corresponding to the audio signals of different channels of the audio equipment are obtained, the frequency domain signals are subdivided, the coherent sound signals and the ambient sound signals of each channel are obtained, then, under the condition that the energy corresponding to the ambient sound signals of different channels is the same, the target ambient sound signals of each channel are determined according to the phase difference between the ambient sound signals of different channels, finally, the target ambient sound signals of the corresponding channels are separated from the audio signals of each channel, the target coherent sound signals of the channels are obtained, and then, secondary processing is carried out on the separated target coherent sound signals and the target ambient sound signals by a sound mixer or a multi-channel algorithm, so that the hearing atmosphere with surrounding sense and in-field sense is created, and the hearing experience of a user is improved.
Based on any of the above embodiments, the different channels are a first channel and a second channel;
the determining unit 630 is specifically configured to:
And the determining subunit is used for determining a first target environmental sound signal and a second target environmental sound signal according to the phase difference between the first environmental sound signal and the second environmental sound signal under the condition that the first short-time energy of the first environmental sound signal in the first channel is the same as the second short-time energy of the second environmental sound signal in the second channel.
Based on any of the above embodiments, the determining subunit is specifically configured to:
And determining the first target ambient sound signal and the second target ambient sound signal according to the phase angle difference between the first ambient sound signal and the second ambient sound signal under the condition that the amplitudes of the first ambient sound signal and the second ambient sound signal are the same.
Based on any one of the above embodiments, the apparatus further includes a phase angle difference determining unit, specifically configured to:
the method comprises the steps of obtaining cross-correlation coefficients of a first frequency domain signal and a second frequency domain signal, wherein the first frequency domain signal is a frequency domain signal corresponding to an audio signal of a first channel, and the second frequency domain signal is a frequency domain signal corresponding to an audio signal of a second channel;
determining an amplitude difference factor based on the cross-correlation coefficient, a first short time energy of the first frequency domain signal, and a second short time energy of the second frequency domain signal;
The phase angle difference is determined based on the amplitude difference factor.
Based on any of the above embodiments, the determining subunit is specifically configured to:
Determining the first target ambient sound signal according to a first target phase angle of the first ambient sound signal, determining a second target phase angle of the second ambient sound signal based on the first target phase angle and the phase angle difference, and determining the second target ambient sound signal based on the second target phase angle;
or, determining the second target ambient sound signal based on a second target phase angle of the second ambient sound signal, and determining a first target phase angle of the first ambient sound signal based on the second target phase angle and the phase angle difference, and determining the first target ambient sound signal based on the first target phase angle.
Based on any one of the above embodiments, the system further includes a first target phase angle determining unit, where the first target phase angle determining unit is specifically configured to:
Constructing a first target constraint between the first short-time energy and a first phase angle of the first ambient sound signal with the first short-time energy of the first ambient sound signal as a target;
The first target phase angle is determined based on the set of discrete values of the first phase angle and the first target constraint, the first target phase angle minimizing a component amplitude of a first coherent acoustic signal in the first frequency domain signal.
Based on any one of the above embodiments, the system further includes a second target phase angle determining unit, where the second target phase angle determining unit is specifically configured to:
Constructing a second target constraint between the second short-time energy and a second phase angle of the second ambient sound signal with the second short-time energy of the second ambient sound signal as a target;
And determining the second target phase angle based on the set of discrete values of the second phase angle and the second target constraint, wherein the second target phase angle minimizes the component amplitude of the second coherent acoustic signal in the second frequency domain signal.
Fig. 7 is a schematic structural diagram of an electronic device provided by the present invention, as shown in fig. 7, the electronic device may include a processor (processor 710, a communication interface (Communications Interface) 720, a memory (memory) 730 and a communication bus 740, where the processor710, the communication interface 720 and the memory 730 complete communication with each other through the communication bus 740, and the processor710 may invoke logic instructions in the memory 730 to execute an audio signal separation method, where the method includes obtaining frequency domain signals corresponding to audio signals of different channels of the audio device, decomposing the frequency domain signals to obtain coherent acoustic signals of each channel and ambient acoustic signals, determining a target ambient acoustic signal of each channel according to a phase difference between the ambient acoustic signals of the different channels under the condition that energy corresponding to the ambient acoustic signals of the different channels is the same, and separating the target ambient acoustic signal of the corresponding channel from the audio signal of each channel to obtain the target coherent acoustic signal of the channel.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer program can perform an audio signal separation method provided by the above methods, where the method includes obtaining frequency domain signals corresponding to audio signals of different channels of an audio device, decomposing the frequency domain signals to obtain coherent sound signals and ambient sound signals of each channel, determining a target ambient sound signal of each channel according to a phase difference between the ambient sound signals of the different channels when energy corresponding to the ambient sound signals of the different channels is the same, and separating the target ambient sound signal of the corresponding channel from the audio signals of each channel, and obtaining the target coherent sound signal of the channel.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented when executed by a processor to perform the audio signal separation method provided by the above methods, where the method includes obtaining frequency domain signals corresponding to audio signals of different channels of an audio device, respectively, decomposing the frequency domain signals to obtain a coherent sound signal and an ambient sound signal of each channel, determining a target ambient sound signal of each channel according to a phase difference between the ambient sound signals of the different channels when the energy corresponding to the ambient sound signals of the different channels is the same, separating the target ambient sound signal of the corresponding channel from the audio signal of each channel, and obtaining the target coherent sound signal of the channel.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.