Embodiment
The present invention is composed by the way that song audio signal amplitude spectrum is subtracted into the enhanced audio accompaniment signal amplitude of amplitude spectrum, fromAnd the degree of purity for the song isolated is improved, and the execution efficiency separated is high.
To describe the technology contents of the present invention in detail, feature, the objects and the effects being constructed, below in conjunction with embodimentAnd coordinate accompanying drawing to be explained in detail.
Referring to Fig. 1, a kind of song of present embodiment removes the method flow diagram of accompaniment method.The song goes to the side of accompanimentMethod, including step:
S1, acquisition audio accompaniment signal and song audio signal;
S2, song audio signal and audio accompaniment signal are pre-processed respectively and FFT is carried out obtain song audioSignal amplitude composes the phase with audio accompaniment amplitude spectrum and song audio signal;
S3, to audio accompaniment signal amplitude spectrum strengthen;
S4, song audio signal amplitude spectrum is subtracted into enhanced audio accompaniment signal amplitude composed, and combine song audioThe phase of signal carries out the audio signal that FFT inverse transformations obtain accompaniment.
The present invention is subtracted song audio signal amplitude spectrum after enhancing by strengthening audio accompaniment signal amplitude spectrumAudio accompaniment signal amplitude spectrum, and combine song audio signal phase carry out FFT inverse transformations obtain accompaniment audio letterNumber, the beneficial degree of purity for improving the song isolated, and the simple execution efficiency of algorithm of present embodiment is high.
In the present embodiment, the step S1 " obtaining audio accompaniment signal and song audio signal " method for " fromAudio accompaniment signal and song audio signal are obtained in stereo song audio ", be specially:
The left channel signals of stereo song audio are carried out anti-phase to obtain left inversion signal;
Left inversion signal is added with right-channel signals and obtains audio accompaniment signal.
And it regard right-channel signals in stereo song audio as the song audio signal for needing removal to accompany.
The stereo song audio includes left channel signals and right-channel signals, and the left channel signals are voice and a left sideThe mixed signal of sound channel accompaniment, right-channel signals are the mixed signals of voice and R channel accompaniment.
In another embodiment, the step S1 " obtaining audio accompaniment signal and song audio signal " method " fromAudio accompaniment signal and song audio signal are obtained in stereo song audio ", be specially:
The right-channel signals of stereo song audio are carried out anti-phase to obtain right inversion signal;
Right inversion signal is added with left channel signals and obtains audio accompaniment signal.
And it regard left channel signals in stereo song audio as the song audio signal for needing removal to accompany.
In another embodiment, the step S1 can also be realized by other method.Determine whether song audioAnd the audio accompaniment corresponding with song audio, if can just make next step processing.
In the present embodiment, it is described " respectively to song sound in step S2 for ease of the processing to song audio signalFrequency signal and audio accompaniment signal are pre-processed ", it implements step and is:
Step S20, song audio signal and audio accompaniment signal are normalized respectively;Wherein, the normalizingChanging the method handled is:The maximum value of song audio signal and audio accompaniment signal is found out respectively, and song audio is believedNumber and audio accompaniment signal divided by corresponding maximum value;
The song audio signal and audio accompaniment signal after normalized are divided into N number of frame respectively, wherein, N is justInteger, each song frame and accompaniment frame include 1024 sampled points, and have per two adjacent song frames or between accompaniment frameThe sampled point of 512 coincidences.
By the normalized, its amplitude of the song audio signal and audio accompaniment signal be limited to -1 andBetween+1, it is easy to subsequent treatment;Song audio signal and audio accompaniment signal are divided into each frame, and two adjacent songsThere is the sampled point of 512 coincidences between bent frame or accompaniment frame, in order that being seamlessly transitted between frame and frame.
In the present embodiment, the spectral leakage caused when being and reduce subsequent conversion to frequency domain, in " difference described in step S2FFT is carried out to song audio and audio accompaniment signal " it is preceding also including carrying out adding Hanning window to each song frame and accompaniment frameFiltering.
In the present embodiment, in step S3, described " audio accompaniment signal is carried out into amplitude spectrum enhancing " implementsStep includes:
Step S30, traversal audio accompaniment signal amplitude spectrum Mn(i),(I=0,1,2L512, n=0,1,2LN-1) it is allFrame, finds out the maximum of all amplitude spectrum corresponding points of common 2m+1 frames of the rear m frames of present frame, the preceding m frames of present frame and present frame,The new value that will be put corresponding to the value as present frame, wherein, m is default positive integer.
Such as, m selections 2 in one embodiment.Travel through all frames of audio accompaniment signal amplitude spectrum(Remove all framesPreceding 2 frame with the frame of end 2), rear 2 frame that present frame, preceding 2 frame of present frame and present frame are found out successively is compared and assignment.For example, present frame is the 2nd frame, then to find out its preceding 2 frame the i.e. the 0th, 1 frame, the 2nd frame, and 2 frames are the 3rd, 4 frames thereafter, to this 5Frame is traveled through by the 0th~512 point successively, is found out the maximum of 5 each corresponding points of frame and is assigned present frame by the valueCorresponding points.For example, the 0th maximum is the value of the 3rd frame in 5 frames, then it is the 2nd frame to assign present frame by the value of the 3rd frame0th point.Then, the value and assignment that this 5 frame 1-512 point is compared successively give the corresponding points of present frame.Then, ought the 3rd frame workFor present frame, its preceding 2 frame the i.e. the 1st, 2 frame, the 3rd frame are found out, and 2 frames are the 4th, 5 frames thereafter, are entered according to above-mentioned identical stepRow compares and assignment.Formula is Mn(i)=max(MMn-2(i),MMn-1(i),MMn(i),MMn+1(i),MMn+2(i)),i=0,1,2L512, n=2,3,4LN-3, wherein MMn(i)=Mn(i), i=0,1,2L512, wherein n=0,1,2LN-1, MMn(i) it is copyThe amplitude spectrum caching of audio accompaniment signal.In other embodiments, the m values can be arranged to other positive integers beyond 2,Such as 1,3,4.
The amplitude of audio accompaniment signal can be strengthened by " being strengthened audio accompaniment signal amplitude spectrum " stepSpectrum, allows spectral substraction and FFT inverse transformation steps largely to remove the accompaniment composition in song audio signal.
In present embodiment, the step S4 is specifically included:
S41, according to formula(i=0,1,2L512) (n=0,1,2LN-1),
All song audio frames of traversal, are traveled through by the 0th~512 point, by the amplitude spectrum of song audio frame again per frameThe corresponding amplitude spectrum of enhanced audio accompaniment frame is subtracted, the amplitude spectrum of all frames of audio after accompaniment is obtained.Wherein, Sn(i) composed for song audio signal amplitude, Mn(i) composed for enhanced audio accompaniment signal amplitude, Yn(i) it is to remove the sound after accompanyingFrequency signal amplitude is composed, and a is signal to noise ratio Dynamic gene, and b is accompaniment Dynamic gene;
A takes 2, b to take 4 in the present embodiment, and a and b can be arranged to other values in other embodiments, increases aValue can improve accompaniment after audio signal signal to noise ratio, increase b value can increase the removal of accompaniment.
S42, according to formula kn(i)=Yn(i)/Sn(i) (i=0,1,2L512) (n=0,1,2LN-1) will remove the sound after accompanimentFrequency frame amplitude is composed divided by the corresponding amplitude spectrum of song audio frame obtains proportionality coefficient kn(i);
The FFT real parts of all song audio frames are multiplied by corresponding proportionality coefficient k with imaginary part respectivelyn(i), it can be goneThe 0th point of FFT real part and imaginary part to the 512nd point of the audio frame after accompaniment, according to FFT symmetry principle, FFT symmetrical 2Conjugate complex number, i.e. real part are equal each other for part sample value, imaginary part on the contrary, can obtain the 513rd point to 1023 points of FFT real parts withImaginary part, then carries out the inverse FFT of 1024 points;
Frame obtained by after inverse FFT is stitched together(Notice that interframe is overlapping), obtain removing the audio letter after accompanimentNumber.
Referring to Fig. 2, being that the present invention also provides the functional block diagram that a kind of song removes the device of accompaniment.The song goes accompanimentDevice include audio accompaniment signal and song audio signal acquisition module 10, pretreatment and FFT module 20, accompaniment amplitudeSpectrum enhancing module 30, spectral substraction and FFT inverse transform modules 40;
Audio accompaniment signal and song audio signal acquisition module, for obtaining audio accompaniment signal and song audio letterNumber;
Pretreatment and FFT module, for being gone forward side by side respectively to song audio signal and audio accompaniment signal as pretreatmentRow FFT obtains song audio signal amplitude spectrum and audio accompaniment signal amplitude spectrum and the phase of song audio signal;
Amplitude spectrum of accompanying strengthens module, strengthens for the amplitude spectrum to audio accompaniment signal;
Spectral substraction and FFT inverse transform modules, for song audio signal amplitude spectrum to be subtracted into enhanced audio accompanimentSignal amplitude is composed, and combines the audio signal that the phase progress FFT inverse transformations of song audio signal obtain accompaniment.
The present invention carries out amplitude spectrum enhancing by signal amplitude spectrum enhancing module of accompanying to audio accompaniment signal, makes frequency spectrum phaseSubtract and FFT inverse transform modules can largely remove the accompaniment composition in song audio, so as to improve the song isolatedDegree of purity.
In the present embodiment, the audio accompaniment signal and song audio signal acquisition module include audio accompaniment signalAcquiring unit and song audio signal acquiring unit.
The audio accompaniment signal acquiring unit is used to the left channel signals of stereo song audio carrying out anti-phase obtainLeft inversion signal;Left inversion signal is added with right-channel signals and obtains audio accompaniment signal.
The song audio signal acquiring unit be used for using right-channel signals in stereo song audio as need removeThe song audio signal of accompaniment.
The stereo song audio includes left channel signals and right-channel signals, and the left channel signals are voice and a left sideThe mixed signal of sound channel accompaniment, right-channel signals are the mixed signals of voice and R channel accompaniment.
In another embodiment, the audio accompaniment signal and song audio signal acquisition module are believed including audio accompanimentNumber acquiring unit and song audio signal acquiring unit.
The audio accompaniment signal acquiring unit is used to the right-channel signals of stereo song audio carrying out anti-phase obtainRight inversion signal;Right inversion signal is added with left channel signals and obtains audio accompaniment signal.
The song audio signal acquiring unit be used for using left channel signals in stereo song audio as need removeThe song audio signal of accompaniment.
In another embodiment, the audio accompaniment signal and song audio signal acquisition module can also be by otherMethod is realized.Song audio is determined whether and the audio accompaniment corresponding with song audio, if can just make nextStep processing.
In the above-described embodiment, the pretreatment and FFT module also include normalization unit, framing unit, addedWindow unit;
The normalization unit is used to song audio signal and audio accompaniment signal is normalized respectively, whereinNormalized is:Find out the maximum value of song audio signal and audio accompaniment signal respectively, and by song audio signalWith audio accompaniment signal divided by corresponding maximum value;
The framing unit is used to the song audio signal and audio accompaniment signal after normalized are divided into N respectivelyIndividual frame, wherein, N is positive integer, and each song frame and accompaniment frame include 1024 sampled points, and per two adjacent song framesOr have the sampled point of 512 coincidences between accompaniment frame.
The windowing unit is used to carry out plus Hanning window filtering each song frame and accompaniment frame.In above-mentioned embodimentIn, the accompaniment amplitude spectrum enhancing module is used for all frames for traveling through audio accompaniment signal amplitude spectrum, finds out present frame, present framePreceding m frames and present frame rear m frames all amplitude spectrum corresponding points of common 2m+1 frames maximum, using the value as present frame institute it is rightThe new value that should be put, wherein, m is default positive integer.
Spectral substraction and the FFT inverse transform module includes spectral substraction unit, FFT inverse transformation blocks and concatenation unit;
The spectral substraction unit is used to the amplitude spectrum of song audio signal subtracting enhanced audio accompaniment signal widthDegree spectrum, obtains the audio frequency signal amplitude spectrum after accompaniment, and formula is:(i=0,1,2L512)(n=0,1,2LN-1).Wherein, Sn(i) composed for song audio signal amplitude, Mn(i) it is enhanced audio accompaniment signalAmplitude spectrum, Yn(i) to go the audio frequency signal amplitude spectrum after accompaniment, a is signal to noise ratio Dynamic gene, and b is accompaniment Dynamic gene;
The FFT inverse transformation blocks are used for going the audio frequency signal amplitude spectrum after accompaniment to carry out FFT inverse transformations.Specifically,According to formula kn(i)=Yn(i)/Sn(i) (i=0,1,2L512) (n=0,1,2LN-1) will go the audio frequency signal amplitude after accompaniment to composeDivided by song audio signal amplitude spectrum obtains proportionality coefficient kn(i);Then the FFT real parts of song audio signal and imaginary part are distinguishedIt is multiplied by proportionality coefficient kn(i), and carry out 1024 points FFT inverse transformations;
The concatenation unit is used to frame resulting after FFT inverse transformations being stitched together, and obtains removing the audio after accompanimentSignal.
In summary, the method and apparatus that song of the present invention goes accompaniment, by increasing to audio accompaniment signal amplitude spectrumBy force, the amplitude spectrum of song audio signal is subtracted into enhanced audio accompaniment signal amplitude to compose, and combines song audio signalPhase carries out the audio that FFT inverse transformations obtain accompaniment, the degree of purity for the song that beneficial raising is isolated, and present embodimentThe simple execution efficiency of algorithm it is high.
Example
Removing accompaniment example below by a specific song, the present invention will be described.
The song of Sun Yan appearances《Meet》, audio format is stereo double channel audio.
By stereo song《Meet》L channel carry out anti-phase obtaining inversion signal;By inversion signal and stereo songThe right-channel signals of audio are added and obtain song《Meet》Audio accompaniment signal;And by the right-channel signals of stereo song audioAs《Meet》Song audio signal.
2 audios are obtained through above-mentioned steps:Meet _ original singer .wav, and meet _ accompany .wav.
Reading is met _ original singer .wav and meet _ and pre-processed after the voice data for the .wav that accompanies, and 1024 points of progressFFT, met _ the song audio signal amplitude of original singer spectrum and meet _ audio accompaniment signal amplitude spectrum.Then according to such asLower formula is to meeting _ audio accompaniment signal amplitude spectrum progress amplitude spectrum enhancing:
Mn(i)=max(MMn-2(i),MMn-1(i),MMn(i),MMn+1(i),MMn+2(i)),i=0,1,2L512,n=2,3,4LN-3, wherein MMn(i)=Mn(i), i=0,1,2L512, n=0,1,2LN-1 represent that the audio accompaniment signal amplitude spectrum of copy is slowDeposit, N represents frame number.
According to formula(i=0,1,2L512) (n=0,1,2LN-1), will meet _ formerSing song audio signal amplitude spectrum subtract it is enhanced meet _ audio accompaniment signal amplitude spectrum, obtain accompaniment after audioSignal amplitude is composed.Wherein a takes 2, b to take 4.
According to formula kn(i)=Yn(i)/Sn(i) (i=0,1,2L512) (n=0,1,2LN-1) will go the audio after accompaniment to believeNumber amplitude spectrum divided by meet _ the song audio signal amplitude spectrum of original singer obtains proportionality coefficient kn(i);
By meeting _ the FFT real parts of the song audio signal amplitude spectrum of original singer are multiplied by proportionality coefficient k respectively with imaginary partn(i), and carry out 1024 points FFT inverse transformations;
Frame obtained by after FFT inverse transformations is stitched together, the audio for obtaining removing after accompaniment meets _ voice .wav.
It refer to Fig. 3 to Fig. 5, respectively song《Meet》Song audio, audio accompaniment and the audio gone after accompanimentTime domain beamformer.Use player plays audio:Meet _ voice .wav, can hear, accompaniment removes clean, voice substantiallyAlthough amplitude has weakened, tonequality is close to the voice in original audio.
Embodiments of the invention are the foregoing is only, are not intended to limit the scope of the invention, it is every to utilize this hairEquivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skillsArt field, is included within the scope of the present invention.