Movatterモバイル変換


[0]ホーム

URL:


CN102194457A - Audio encoding and decoding method, system and noise level estimation method - Google Patents

Audio encoding and decoding method, system and noise level estimation method
Download PDF

Info

Publication number
CN102194457A
CN102194457ACN2010191850619ACN201019185061ACN102194457ACN 102194457 ACN102194457 ACN 102194457ACN 2010191850619 ACN2010191850619 ACN 2010191850619ACN 201019185061 ACN201019185061 ACN 201019185061ACN 102194457 ACN102194457 ACN 102194457A
Authority
CN
China
Prior art keywords
band
mrow
frequency
sub
frequency domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010191850619A
Other languages
Chinese (zh)
Other versions
CN102194457B (en
Inventor
江东平
袁浩
彭科
陈国明
黎家力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE CorpfiledCriticalZTE Corp
Priority to CN2010191850619ApriorityCriticalpatent/CN102194457B/en
Publication of CN102194457ApublicationCriticalpatent/CN102194457A/en
Application grantedgrantedCritical
Publication of CN102194457BpublicationCriticalpatent/CN102194457B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

The invention relates to an audio encoding and decoding method, a system and a noise level estimation method, and the noise level estimation method comprises the following steps: estimating a power spectrum of audio signals to be encoded according to a frequency domain coefficient of the audio signals to be encoded; and estimating noise level of the audio signals of a zero-bit encoding sub-band according to the calculated power spectrum, wherein the noise level is used for controlling the ratio of energy for noise filling to the energy for frequency band replication during decoding, and the zero-bit encoding sub-band refers to the encoding sub-band of which the distributed number of bits is zero. By adopting the method in the invention, the frequency domain coefficient which is not encoded can be well re-constructed.

Description

Audio coding and decoding method and system and noise level estimation method
Technical Field
The present invention relates to audio encoding and decoding technologies, and in particular, to an audio encoding and decoding method and system for performing spectrum reconstruction on an uncoded encoded subband, and a noise level estimation method.
Background
Audio coding techniques are at the heart of multimedia applications such as digital audio broadcasting, internet-based music distribution, and audio communications, which would greatly benefit from the increased compression performance of audio coders. Perceptual audio coders, which are one type of lossy transform domain coding, are the mainstream audio coders of modern times. In order to better recover the spectral components of the uncoded subbands, the existing audio codecs usually use a noise filling or band replication method to reconstruct the spectral components of the uncoded subbands. G.722.1c adopts a noise filling method, HE-AAC-V1 adopts a spectral band replication technique, and g.719 adopts a method combining noise filling and simple spectral band replication. The spectral envelope of the uncoded subbands and the tonal and noise components within the subbands are not well restored by the noise filling method. The band replication method of HE-AAC-V1 requires spectral analysis of an audio signal before encoding, pitch and noise estimation of a signal with high frequency components, parameter extraction, and encoding of the audio signal after down-sampling using an AAC encoder, which has high computational complexity, requires transmission of more parameter information to a decoding end, occupies more encoding bits, and increases encoding delay. The replication scheme of g.719 is too simple to recover the spectral envelope of the uncoded subbands and the tonal and noise components inside the subbands well.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an audio encoding and decoding method, system and noise level estimation method to reconstruct the frequency domain coefficient which is not encoded well.
To solve the above technical problem, the present invention provides a noise level estimation method, including:
estimating a power spectrum of the audio signal to be encoded according to the frequency domain coefficient of the audio signal to be encoded;
estimating a noise level of the zero-bit encoded subband audio signal based on the calculated power spectrum, the noise level being used to control a ratio of noise filling and energy of the band replication when decoding; wherein, the zero bit coding sub-band refers to the coding sub-band with zero bit number.
Further, the noise level of the zero-bit encoded subband audio signal is the ratio of the noise component power estimated in the zero-bit encoded subband to the pitch component power estimated in the zero-bit encoded subband.
Further, the air conditioner is provided with a fan,
estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded, wherein the power calculation formula of the frequency point k of the ith frame is as follows:
Pi(k)=λPi-1(k)+(1-λ)Xj(k)2wherein P is when i is equal to 0i-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
Further, the air conditioner is provided with a fan,
the process of dividing the frequency domain coefficients of the audio signal to be encoded into one or several noise-filled sub-bands and calculating the noise level of a certain effective noise-filled sub-band according to the estimated power spectrum of the audio signal to be encoded specifically comprises:
calculating the average value of the power of all frequency domain coefficients of all or part of zero bit coding sub-bands in the effective noise filling sub-band to obtain the average power P _ aveg (j);
computing the effective noise-filling sub-bandPower P in all or part of zero-bit coded sub-bandsi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band;
and calculating the ratio P _ noise _ rate (j) of the average power P _ noise _ aveg (j) of the noise component and the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band.
Wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
In order to solve the above technical problem, the present invention further provides an audio encoding method, including:
A. dividing MDCT frequency domain coefficients of an audio signal to be coded into a plurality of coding sub-bands, and carrying out quantization coding on amplitude envelope values of the coding sub-bands to obtain amplitude envelope coding bits;
B. bit allocation is carried out on each coding sub-band, and quantization coding is carried out on non-zero bit coding sub-bands, so that MDCT frequency domain coefficient coding bits are obtained;
C. estimating the power spectrum of the audio signal to be encoded according to the MDCT frequency domain coefficient of the audio signal to be encoded, further estimating the noise level of the zero-bit encoded sub-band audio signal, and carrying out quantization encoding to obtain a noise level encoding bit; wherein, the noise level is used for controlling the proportion of noise filling and energy of frequency band replication when decoding, and the zero bit coding sub-band refers to the coding sub-band with zero bit number distributed;
D. and multiplexing and packaging the amplitude envelope coded bit, the frequency domain coefficient coded bit and the noise level coded bit of each coded sub-band, and transmitting the result to a decoding end.
Further, in step C, the noise level of the zero-bit encoded subband audio signal is a ratio of the power of the noise component estimated in the zero-bit encoded subband to the power of the pitch component estimated in the zero-bit encoded subband.
Further, the air conditioner is provided with a fan,
estimating the power spectrum of the audio signal to be encoded according to the MDCT frequency domain coefficient of the audio signal to be encoded, wherein the algorithm for estimating the power of the frequency point k of the ith frame is as follows:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein when i is equal to 0, Pi-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
Further, in step B, dividing the frequency domain coefficient of the audio signal to be encoded into one or several noise-filled sub-bands, and after allocating bits to each encoded sub-band, allocating bits to the effective noise-filled sub-bands; in step C, the process of calculating the noise level of a certain effective noise-filled subband according to the estimated power spectrum of the audio signal to be encoded specifically comprises:
calculating the average value of all frequency domain coefficients of all or part of zero bit coding sub-bands in the effective noise filling sub-band to obtain the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band;
and calculating the ratio P _ noise _ rate (j) of the average power P _ noise _ aveg (j) of the noise component and the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band.
Wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
Further, when the noise filling sub-band is divided, the noise filling sub-band is divided uniformly or non-uniformly according to the auditory characteristics of human ears, and one noise filling sub-band comprises one or more coding sub-bands.
Further, in the step B, bits are allocated to all effective noise-filled sub-bands or one or more low-frequency effective noise-filled sub-bands are skipped, and bits are allocated to the subsequent higher-frequency effective noise-filled sub-bands; calculating the noise level of the effective noise filling sub-band distributed with bits in the step C; the noise level coded bits are multiplexed and packed using the allocated bits in step D.
Further, each effective noise-filling sub-band is allocated with the same number of bits or different numbers of bits according to auditory characteristics.
In order to solve the above technical problem, the present invention further provides an audio decoding method, including:
a2, decoding and inversely quantizing each amplitude envelope coded bit in the bit stream to be decoded to obtain the amplitude envelope of each coded sub-band;
b2, carrying out bit allocation on each coding sub-band, carrying out decoding and inverse quantization on the noise level coding bits to obtain the noise level of a zero-bit coding sub-band, and carrying out decoding and inverse quantization on the frequency domain coefficient coding bits to obtain the frequency domain coefficient of a non-zero-bit coding sub-band;
c2, performing frequency band replication on the zero-bit coding sub-band, controlling the whole energy filling level of the coding sub-band according to the amplitude envelope of each zero-bit coding sub-band, and controlling the proportion of noise filling and frequency band replication energy according to the noise level of the zero-bit coding sub-band to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
d2, performing Inverse Modified Discrete Cosine Transform (IMDCT) on the frequency domain coefficient of the non-zero bit coding sub-band and the frequency domain coefficient of the reconstructed zero bit coding sub-band to obtain a final audio signal.
Further, in step C2, during frequency band replication, a position of a certain tone of the audio signal is searched in the MDCT frequency domain coefficients, a frequency band from 0 bin to the bin at the tone position is used as a frequency band replication period, and a frequency band from 0 bin to frequency bin which is shifted backward by copyband _ offset bins to the bin at the tone position is used as a source frequency band, and frequency band replication is performed on the zero-bit encoded subband, and if a highest frequency inside the zero-bit encoded subband is less than a frequency of the searched tone, the zero-bit encoded subband is only subjected to spectrum reconstruction by using noise padding.
Further, in step C2,
taking an absolute value or a square value of the frequency domain coefficient of the first frequency band and carrying out smooth filtering;
and searching the position of the maximum extreme value of the filtering output value of the first frequency band according to the result of the smooth filtering, and taking the position of the maximum extreme value as the position of a certain tone.
Further, an operation formula for performing smooth filtering on the absolute value of the frequency domain coefficient of the first frequency band is as follows:
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><mo>|</mo><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo></mrow></math>
or, the operation formula for performing smooth filtering on the frequency domain coefficient square value of the first frequency band is as follows
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><msup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math>
Where μ is the smoothing filter coefficient, X _ ampi(k) Represents the filtered output value of the k-th frequency bin of the ith frame,when the MDCT coefficient decoded for the k-th frequency point of the ith frame is equal to 0, X _ ampi-1(k)=0。
Further, the first frequency band is a frequency band of low frequencies with a relatively concentrated energy determined according to the statistical characteristics of the frequency spectrum, wherein the low frequencies refer to spectral components with less than half of the total bandwidth of the signal.
Further, the maximum extreme value of the filter output value is determined by the following method: and searching an initial maximum value directly from the filtering output value of the frequency domain coefficient corresponding to the first frequency band, and taking the maximum value as a maximum extreme value of the filtering output value of the first frequency band.
Further, the maximum extreme value of the filter output value is determined by the following method:
taking one section of the first frequency band as a second frequency band, searching an initial maximum value from the filtering output value of the frequency domain coefficient corresponding to the second frequency band, and performing different processing according to the position of the frequency domain coefficient corresponding to the initial maximum value:
a. if the initial maximum value is the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band, comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, sequentially comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, and until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or, until the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band;
b. if the initial maximum value is the filtering output value of the frequency domain coefficient with the highest frequency in the second frequency band, comparing the filtering output value of the frequency domain coefficient with the filtering output value of the next frequency domain coefficient with the higher frequency in the first frequency band, and sequentially comparing backwards until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or until the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is the finally determined maximum extreme value;
c. if the initial maximum value is the filtering output value of the frequency domain coefficient between the lowest frequency and the highest frequency of the second frequency band, the frequency domain coefficient corresponding to the initial maximum value is the position of the tone, that is, the initial maximum value is the finally determined maximum extreme value.
Further, in step C2, when performing band replication on the zero-bit encoded sub-band, first calculate a source frequency band replication start sequence number of the zero-bit encoded sub-band according to the source frequency band and the start sequence number of the zero-bit encoded sub-band that needs to perform band replication, and then periodically replicate the frequency domain coefficient of the source frequency band to the zero-bit encoded sub-band from the source frequency band replication start sequence number with the band replication period as a period.
Further, the method for calculating the source frequency band replication start sequence number of the zero-bit encoded sub-band in step C2 includes:
obtaining the sequence number of the frequency point of the initial MDCT frequency domain coefficient of the zero-bit coding sub-band needing to reconstruct the frequency domain coefficient, marking the sequence number as fillband _ start _ freq, marking the sequence number of the frequency point corresponding to the tone as Tonal _ pos, adding 1 to the Tonal _ pos to obtain a copy period copy _ period, marking the frequency band copy offset as copy band _ offset, and circularly subtracting the copy _ period from the value of fillband _ start _ freq until the value falls in the value interval of the sequence number of the source frequency band, wherein the value is the copy start sequence number of the source frequency band and is marked as copy _ pos _ mod.
Further, in step C2, the method for periodically copying the frequency domain coefficients of the source frequency band to the zero-bit encoded sub-band starting from the copy start sequence number of the source frequency band with the frequency band copy period as the period is as follows:
and sequentially copying the frequency domain coefficients starting from the copy start sequence number of the source frequency band backwards to a zero-bit coding subband taking fillband _ start _ freq as a start position until the frequency point copied by the source frequency band reaches a Tonal _ pos + copy band _ offset frequency point, then continuously copying the frequency domain coefficients starting from the frequency point of the second copy band _ offset backwards to the zero-bit coding subband again, and so on until the frequency band copying of all the frequency domain coefficients of the current zero-bit coding subband is completed.
Further, in step C2, the energy of the frequency domain coefficient obtained after the zero-bit encoded subband is copied is adjusted by the following method:
calculating the amplitude envelope of the frequency domain coefficient obtained after zero bit coding sub-band frequency band replication, and recording as sbr _ rms (r);
the formula for adjusting the energy of the frequency domain coefficient obtained after copying is as follows:
<math><mrow><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mi>X</mi><mo>_</mo><mi>sbr</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>/</mo><mi>sbr</mi><mo>_</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure GSA00000030054100072
the energy-adjusted frequency domain coefficient of the zero-bit coding sub-band r is represented, X _ sbr (r) represents the frequency domain coefficient obtained by copying the zero-bit coding sub-band r, sbr _ rms (r) is the amplitude envelope of the frequency domain coefficient X _ sbr (r) obtained by copying the zero-bit coding sub-band r, rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding sub-band r and is obtained by inverse quantization of an amplitude envelope quantization index, sbr _ lev _ scale (r) is an energy control scale factor copied by the frequency band of the zero-bit coding sub-band r, and the value of the energy control scale factor is determined by the noise level of the noise-filled sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
fill _ energy _ sacleffector is a filling energy scaling factor used for adjusting the gain of the whole filling energy, and the value range of the fill _ energy _ sacleffector is (0, 1),
Figure GSA00000030054100074
and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
Further, in step C2, the energy-adjusted frequency domain coefficients are noise-filled according to the following formula:
<math><mrow><mover><mi>X</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>+</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>random</mi><mrow><mo>(</mo><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure GSA00000030054100081
representing zero-bit encoded sub-bands r reconstructing the frequency domain coefficients,
Figure GSA00000030054100082
represents the frequency domain coefficient after the energy adjustment of the zero bit coding sub-band r, rms (r) is the amplitude envelope of the frequency domain coefficient before the coding of the zero bit coding sub-band r, which is obtained by inverse quantization of the amplitude envelope quantization index, random () is a random phase generator, which generates a random phase value with a return value of +1 or-1, noise _ lev _ scale (r)r) is a noise level control scale factor of the zero-bit coding sub-band r, the value of which is determined by the noise level of the noise filling sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo></mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo></mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
wherein, fill _ energy _ sacleffector is a filling energy scale factor used for adjusting the gain of the whole filling energy, and the value range is (0, 1),and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
Further, in step B2, after bit allocation is performed on each encoded sub-band, the encoded sub-band is divided into a plurality of noise-padded sub-bands, and bit allocation is performed on the effective noise-padded sub-band, in step C2, band replication is performed on a zero-bit encoded sub-band in the effective noise-padded sub-band to which bits are allocated, the energy level of the replicated frequency domain coefficient and the energy level of noise padding are controlled, and noise padding is performed on a zero-bit encoded sub-band in the effective noise-padded sub-band to which bits are not allocated, wherein the effective noise-padded sub-band refers to a noise-padded sub-band including the zero-bit encoded sub-band.
To solve the above technical problem, the present invention further provides an audio encoding system, the system comprising a Modified Discrete Cosine Transform (MDCT) unit, an amplitude envelope calculation unit, an amplitude envelope quantization and encoding unit, a bit distribution unit, a frequency domain coefficient encoding unit, and a bitstream Multiplexer (MUX), the system further comprising a noise level estimation unit, wherein:
the MDCT unit is used for performing modified inverse discrete cosine transform on the audio signal to generate a frequency domain coefficient;
the amplitude envelope calculation unit is connected with the MDCT unit and used for dividing the frequency domain coefficient generated by the MDCT into a plurality of coding sub-bands and calculating the amplitude envelope value of each coding sub-band;
the amplitude envelope quantization and coding unit is connected with the amplitude envelope calculation unit and is used for quantizing and coding the amplitude envelope value of each coding sub-band to generate coding bits of the amplitude envelope of each coding sub-band;
the bit distribution unit is connected with the amplitude envelope quantization and coding unit and is used for distributing bits to each coding sub-band;
the frequency domain coefficient quantization coding unit is connected with the MDCT unit, the bit distribution unit and the amplitude envelope quantization and coding unit and is used for carrying out normalization, quantization and coding processing on all frequency domain coefficients in each coding sub-band to generate frequency domain coefficient coding bits;
the noise level estimation unit is connected with the MDCT unit and the bit distribution unit and used for estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded, further estimating the noise level of the zero-bit coded sub-band audio signal and carrying out quantization coding to obtain a noise level coded bit; wherein the noise level is used to control the proportion of noise filling and energy of band replication when decoding;
and a bit stream Multiplexer (MUX) connected to the amplitude envelope quantization and encoding unit, the frequency domain coefficient encoding unit and the noise level estimation unit, for multiplexing and transmitting the encoded bits of each encoded sub-band and the encoded bits of the frequency domain coefficient to a decoding end.
Further, the noise level of the zero-bit encoded subband audio signal is the ratio of the noise component power estimated in the zero-bit encoded subband to the pitch component power estimated in the zero-bit encoded subband.
Further, the noise level estimation unit specifically includes:
the power spectrum estimation module is used for estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded;
the noise level calculation module is connected with the power spectrum estimation module and used for estimating the noise level of the zero-bit coding sub-band audio signal according to the power spectrum estimated by the power spectrum estimation module;
and the noise level coding module is connected with the noise level calculation module and is used for carrying out quantization coding on the noise level calculated by the noise level calculation module to obtain a noise level coding bit.
Further, the power spectrum estimation module estimates the power of the frequency point k of the ith frame by using the following formula:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein P is when i is equal to 0i-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
Further, the air conditioner is provided with a fan,
the frequency domain coefficients of the audio signal to be encoded are divided into one or several noise-filled sub-bands, and the function of the noise level calculation module specifically includes: the average value of all frequency domain coefficient powers of all or part of zero bit coding sub-bands in the effective noise filling sub-band is calculated to obtain an average power P _ aveg (j); for calculating the power P in all or part of the zero-bit coded sub-bands of the effective noise-filled sub-bandi(k) The average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j) is obtained to obtain the zero bit code in the effective noise filling sub-bandSubband tone component average power P _ signal _ aveg (j); for calculating the power P in all or part of the zero-bit coded sub-bands of the effective noise-filled sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band; calculating the ratio of the average power P _ noise _ aveg (j) of the noise component to the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band;
wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
Furthermore, the noise level estimation unit further comprises a bit allocation module connected to the noise level calculation module and the noise level coding module, and configured to allocate bits to all effective noise-padded sub-bands or skip one or more low-frequency effective noise-padded sub-bands, allocate bits to subsequent higher-frequency effective noise-padded sub-bands, and notify the noise level calculation module and the noise level coding module; the noise level calculation module calculates a noise level only for the noise-filled sub-bands to which bits are allocated; and the noise level coding module carries out quantization coding on the noise level by using the bits distributed by the bit distribution module.
In order to solve the above technical problem, the present invention further provides an audio decoding system, which includes a bitstream demultiplexer (DeMUX), an encoded subband amplitude envelope decoding unit, a bit allocation unit, a frequency domain coefficient decoding unit, a spectrum reconstruction unit, and an Inverse Modified Discrete Cosine Transform (IMDCT) unit, wherein:
the DeMUX is used for separating amplitude envelope coded bits, frequency domain coefficient coded bits and noise level coded bits from a bit stream to be decoded;
the amplitude envelope decoding unit is connected with the DeMUX and used for decoding the amplitude envelope coded bits output by the bit stream demultiplexer to obtain the amplitude envelope quantization index of each coded sub-band;
the bit distribution unit is connected with the amplitude envelope decoding unit and used for carrying out bit distribution to obtain the number of coding bits distributed to each frequency domain coefficient in each coding sub-band;
the frequency domain coefficient decoding unit is connected with the amplitude envelope decoding unit and the bit distribution unit and is used for decoding, inverse quantizing and inverse normalizing the encoded sub-band to obtain a frequency domain coefficient;
the noise level decoding unit is connected with the bit stream demultiplexer and the bit distribution unit and is used for decoding and inversely quantizing the noise level coded bits to obtain a noise level;
the frequency spectrum reconstruction unit is connected with the noise level decoding unit, the frequency domain coefficient decoding unit, the amplitude envelope decoding unit and the bit allocation unit and is used for carrying out frequency band replication on the zero-bit coding sub-band, controlling the integral energy filling level of the coding sub-band according to the amplitude envelope output by the amplitude envelope decoding unit and controlling the proportion of noise filling and frequency band replication energy according to the noise level output by the noise level decoding unit to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
and the IMDCT unit is connected with the frequency spectrum reconstruction unit and is used for carrying out IMDCT on the frequency domain coefficient after the frequency spectrum reconstruction of the zero-bit coding sub-band is completed to obtain the audio signal.
Further, the spectrum reconstruction unit includes a band replication subunit, an energy adjustment subunit, and a noise filling subunit, which are connected in sequence, wherein:
the frequency band replication sub-band unit is used for carrying out frequency band replication on the zero bit coding sub-band;
the energy adjusting subunit is used for calculating the amplitude envelope of the frequency domain coefficient obtained after the zero bit coding sub-band frequency band replication and is marked as sbr _ rms (r); and adjusting the energy of the frequency domain coefficient obtained after copying according to the noise level output by the noise level decoding unit, wherein the formula of the energy adjustment is as follows:
<math><mrow><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mi>X</mi><mo>_</mo><mi>sbr</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>/</mo><mi>sbr</mi><mo>_</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,the energy-adjusted frequency domain coefficient of the zero-bit coding sub-band r is represented, X _ sbr (r) represents the frequency domain coefficient obtained after the zero-bit coding sub-band r is copied, sbr _ rms (r) is the amplitude envelope of the frequency domain coefficient X _ sbr (r) obtained after the zero-bit coding sub-band r is copied, rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding sub-band r and is obtained by inverse quantization of an amplitude envelope quantization index, sbr _ lev _ scale (r) is an energy control scale factor copied by the frequency band of the zero-bit coding sub-band r, and the value of the energy control scale factor is determined by the noise level of a noise filling sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
fill _ energy _ sacleffector is a filling energy scaling factor used for adjusting the gain of the whole filling energy, and the value range of the fill _ energy _ sacleffector is (0, 1),filling the noise level of a sub-band j for the noise obtained by decoding and inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located;
a noise filling subunit, configured to perform noise filling on the energy-adjusted frequency domain coefficient according to the noise level output by the noise level decoding unit, where the noise filling formula is as follows:
<math><mrow><mover><mi>X</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>+</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>random</mi><mrow><mo>(</mo><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure GSA00000030054100123
representing zero-bit encoded sub-bands r reconstructing the frequency domain coefficients,
Figure GSA00000030054100124
representing the energy-adjusted copied frequency domain coefficient of the zero-bit coding sub-band r, rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding sub-band r, which is obtained by inverse quantization of the amplitude envelope quantization index, random () is a random phase generator, which generates a random phase value with a return value of +1 or-1, noise _ lev _ scale (r) is a noise level control scale factor of the zero-bit coding sub-band r, the value of which is filled with the noise of the sub-band where the zero-bit coding sub-band r is locatedLevel determination, the specific calculation formula is as follows:
<math><mrow><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo></mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo></mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
wherein, fill _ energy _ sacleffector is a filling energy scale factor used for adjusting the gain of the whole filling energy, and the value range is (0, 1),and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
Further, the frequency band replication sub-unit includes a pitch position search module, a period and source frequency band calculation module, a source frequency band replication start sequence number calculation module and a frequency band replication module, which are connected in sequence, wherein:
a tone position searching module for searching a position where a certain tone of the audio signal is located in the MDCT frequency domain coefficients,
the period and source frequency band calculation module is used for determining a frequency band replication period and a source frequency band for replication according to the position of the tone, wherein the frequency band replication period is the bandwidth from a frequency point 0 to a frequency point at the tone position, and the source frequency band is a frequency band in which the frequency point offset copy band _ offset is copied to the frequency point at the tone position from the frequency point 0 and the frequency point offset band _ offset is shifted backwards;
the source frequency band replication starting sequence number calculation module is used for calculating the source frequency band replication starting sequence number of the zero-bit coding sub-band according to the source frequency band and the starting sequence number of the zero-bit coding sub-band needing frequency band replication;
the frequency band copying module is used for copying the frequency domain coefficient of the source frequency band to the zero-bit encoding sub-band periodically from the copying initial sequence number of the source frequency band by taking a frequency band copying period as a period; and if the highest frequency in the zero bit coding sub-band is less than the frequency of the searched tone, the frequency point only adopts noise filling to carry out spectrum reconstruction.
Further, the tone position searching module searches the position of the tone by adopting the following method: taking an absolute value or a square value of the MDCT frequency domain coefficient of the first frequency band, and performing smooth filtering; and searching the position of the maximum extreme value of the filtering output value of the first frequency band according to the result of the smooth filtering, wherein the position of the maximum extreme value is the position of the tone.
Further, the operation formula of the pitch position search module performing smooth filtering on the absolute value of the MDCT frequency domain coefficient of the first frequency band is as follows: <math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><mo>|</mo><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo></mrow></math>
or, the operation of performing smooth filtering on the frequency domain coefficient square value of the first frequency band is as follows:
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><msup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math>
where μ is the smoothing filter coefficient, X _ ampi(k) Represents the filtered output value of the k-th frequency bin of the ith frame,
Figure GSA00000030054100133
when the MDCT coefficient decoded for the k-th frequency point of the ith frame is equal to 0, X _ ampi-1(k)=0。
Further, the first frequency band is a frequency band of low frequencies with a relatively concentrated energy determined according to the statistical characteristics of the frequency spectrum, wherein the low frequencies refer to spectral components with less than half of the total bandwidth of the signal.
Further, the pitch position searching module directly searches an initial maximum value from the filtering output values of the frequency domain coefficients corresponding to the first frequency band, and takes the initial maximum value as a maximum extreme value of the filtering output values of the first frequency band.
Further, when the pitch position search module determines the maximum extremum of the filtering output value, one segment of the first frequency band is used as the second frequency band, an initial maximum value is searched from the filtering output value of the frequency domain coefficient corresponding to the second frequency band, and then different processing is performed according to the position of the frequency domain coefficient corresponding to the initial maximum value:
a. if the initial maximum value is the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band, comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, sequentially comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, and until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or, until the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band;
b. if the initial maximum value is the filtering output value of the frequency domain coefficient with the highest frequency in the second frequency band, comparing the filtering output value of the frequency domain coefficient with the filtering output value of the next frequency domain coefficient with the higher frequency in the first frequency band, and sequentially comparing backwards until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or until the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is the finally determined maximum extreme value;
c. if the initial maximum value is the filtering output value of the frequency domain coefficient between the lowest frequency and the highest frequency of the second frequency band, the frequency domain coefficient corresponding to the initial maximum value is the position of the tone, that is, the initial maximum value is the finally determined maximum extreme value.
Further, the process of calculating the source frequency band replication start sequence number of the zero-bit encoded sub-band requiring frequency band replication by the source frequency band replication start sequence number calculation module includes: obtaining the sequence number of the initial frequency point of the zero-bit coding sub-band needing to reconstruct the frequency domain coefficient currently, marking the sequence number as fillband _ start _ freq, marking the sequence number of the frequency point corresponding to the tone as Tonal _ pos, adding 1 to the Tonal _ pos to obtain a copy period copy _ period, marking the initial sequence number of the source frequency band as copy band _ offset, and circularly subtracting the copy _ period from the value of fillband _ start _ freq until the value falls in the value interval of the sequence number of the source frequency band, wherein the value is the copy initial sequence number of the source frequency band and is marked as copy _ pos _ mod.
Further, when the band replication module performs band replication, frequency domain coefficients starting from the replication start sequence number of the source frequency band are sequentially copied backwards onto a zero-bit encoding subband taking fillband _ start _ freq as a start position until the frequency point replicated by the source frequency band reaches the Tonal _ pos + copyband _ offset frequency point, and then the frequency domain coefficients starting from the frequency point of the copyband _ offset are continuously copied backwards onto the zero-bit encoding subband again, and so on until all frequency domain coefficients of the current zero-bit encoding subband are replicated.
Further, the bit allocation unit is further configured to allocate bits to all effective noise-filled subbands or skip one or several low-frequency effective noise-filled subbands, and allocate bits to subsequent higher-frequency effective noise-filled subbands; the energy adjusting subunit performs energy adjustment on the frequency domain coefficient obtained after the frequency band replication; and the noise filling subunit performs noise filling on the frequency domain coefficient after the energy adjustment and a zero bit coding sub-band in the noise filling sub-band without bits.
The invention estimates the power spectrum of the audio signal to be encoded through MDCT frequency domain coefficient at the encoding end, estimates the noise level of zero bit encoding sub-band audio signal through the estimated power spectrum, and transmits the encoded noise level information to the decoding end for controlling the proportion of noise filling and frequency band copying energy of the decoding end; after decoding at a decoding end to obtain the encoded MDCT frequency domain coefficient, performing frequency domain coefficient reconstruction on an uncoded encoded sub-band by adopting a frequency band replication and noise filling method, wherein the ratio of the noise filling energy to the frequency band replication energy is controlled by a noise level encoding bit transmitted by an encoding end. The method can well recover the spectral envelope of the uncoded coded sub-band and the internal tone noise component, and obtains better subjective listening effect.
Drawings
FIG. 1 is a schematic diagram of an audio encoding method according to the present invention.
FIG. 2 is a flow chart illustrating the process of obtaining noise level coded bits for a zero bit coded sub-band within a noise-padded sub-band in accordance with the present invention.
Fig. 3 is a flow chart illustrating the calculation of the noise level according to the present invention.
FIG. 4 is a schematic diagram of an audio decoding method according to the present invention.
Fig. 5 is a schematic flow chart of the spectral reconstruction of the present invention.
Fig. 6 is a schematic diagram of the structure of the audio coding system of the present invention.
Fig. 7 is a block diagram of a noise level estimation unit according to the present invention.
Fig. 8 is a schematic structural diagram of an audio decoding system of the present invention.
Fig. 9 is a schematic block diagram of a spectral reconstruction unit according to the present invention.
Fig. 10 is a code stream composition diagram according to an embodiment of the present invention.
Detailed Description
The core idea of the invention is that the power spectrum of the audio signal to be coded is estimated at the coding end through MDCT frequency domain coefficients, the noise level of the zero-bit coding sub-band audio signal is estimated through the estimated power spectrum, and the noise level information is transmitted to the decoding end after being coded, so as to control the proportion of noise filling and frequency band copying energy of the decoding end; after decoding at a decoding end to obtain the encoded MDCT frequency domain coefficient, performing frequency domain coefficient reconstruction on an uncoded encoded sub-band by adopting a frequency band replication and noise filling method, wherein the ratio of the noise filling energy to the frequency band replication energy is controlled by a noise level encoding bit transmitted by an encoding end. The method can well recover the spectral envelope of the uncoded coded sub-band and the internal tone noise component, and obtains better subjective listening effect.
The frequency domain coefficients referred to in the present invention are MDCT frequency domain coefficients.
The present invention is explained in detail by the following four parts of encoding method, decoding method, encoding system, and decoding system:
coding method
The audio coding method of the present invention comprises the steps of:
A. dividing MDCT frequency domain coefficients of an audio signal to be coded into a plurality of coding sub-bands, and carrying out quantization coding on amplitude envelope values of the coding sub-bands to obtain coding bits of the amplitude envelopes;
when the coding sub-bands are divided, the frequency domain coefficients after MDCT transformation are divided into a plurality of coding sub-bands with equal intervals, or are divided into a plurality of non-uniform coding sub-bands according to the auditory perception characteristics.
B. Bit allocation is carried out on each coding sub-band, and quantization coding is carried out on the non-zero bit coding sub-band, so as to obtain coding bits of MDCT frequency domain coefficients;
after bit allocation is performed on each encoded subband, if the number of bits allocated to a certain encoded subband is zero, quantization encoding is not performed on the encoded subband, the encoded subband is referred to as a zero-bit encoded subband or an uncoded encoded subband herein, and other encoded subbands are referred to as non-zero-bit encoded subbands.
The method of normalization, quantization and encoding for each encoded subband is not the focus of the present invention.
C. Estimating a power spectrum of the audio signal to be encoded according to the MDCT frequency domain coefficient of the audio signal to be encoded, further estimating the noise level of the zero-bit encoded audio signal, and carrying out quantization encoding to obtain a noise level encoding bit; wherein the noise level coded bits are used to control the proportion of noise filling and energy of band replication when decoding;
the noise level of the zero-bit coding sub-band audio signal refers to the ratio of the noise component power estimated in the zero-bit coding sub-band to the tone component power estimated in the zero-bit coding sub-band.
D. And multiplexing and packaging the coded bits of the amplitude envelope, the coded bits of the frequency domain coefficient and the noise level coded bits of each coded sub-band, and then transmitting the result to a decoding end.
The audio encoding method of the present invention will be described in detail below with reference to the accompanying drawings:
example 1 coding method
Fig. 1 is a schematic structural diagram of an audio encoding method according to an embodiment of the present invention. In this embodiment, an audio stream with a frame length of 20ms and a sampling rate of 32kHz is taken as an example to specifically describe the audio encoding method of the present invention. The method of the present invention is equally applicable under other frame lengths and sampling rates. As shown in fig. 1, the method includes:
101: performing Modified Discrete Cosine Transform (MDCT) on an audio stream to be encoded to obtain frequency domain coefficients on N frequency domain sampling points;
the specific implementation manner of this step may be:
the N-point time domain sampling signal x (N) of the current frame and the N-point time domain sampling signal x of the previous frame are comparedold(N) forming a 2N point time domain sampling signal
Figure GSA00000030054100171
The 2N point time-domain sampling signal can be represented by the following equation:
<math><mrow><mover><mi>x</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mo>=</mo><mfenced open='{' close=''><mtable><mtr><mtd><msub><mi>x</mi><mi>old</mi></msub><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mtd><mtd><mi>n</mi><mo>=</mo><mn>0,1</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mi>N</mi><mo>-</mo><mn>1</mn></mtd></mtr><mtr><mtd><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>-</mo><mi>N</mi><mo>)</mo></mrow></mtd><mtd><mi>n</mi><mo>=</mo><mi>N</mi><mo>,</mo><mi>N</mi><mo>+</mo><mn>1</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mn>2</mn><mi>N</mi><mo>-</mo><mn>1</mn></mtd></mtr></mtable></mfenced><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></math>
to pairThe MDCT transform is performed to obtain the following frequency domain coefficients:
<math><mrow><mi>X</mi><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>n</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>N</mi><mo>-</mo><mn>1</mn></mrow></munderover><mover><mi>x</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mi>w</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mi>cos</mi><mo>[</mo><mfrac><mi>&pi;</mi><mi>N</mi></mfrac><mrow><mo>(</mo><mi>n</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mi>N</mi><mn>2</mn></mfrac><mo>)</mo></mrow><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>)</mo></mrow><mo>]</mo><mo>,</mo><mi>k</mi><mo>=</mo><mn>0</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mi>N</mi><mo>-</mo><mn>1</mn><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></mrow></math>
where w (n) represents a sine window function, the expression:
<math><mrow><mi>w</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mo>=</mo><mi>sin</mi><mo>[</mo><mfrac><mi>&pi;</mi><mrow><mn>2</mn><mi>N</mi></mrow></mfrac><mrow><mo>(</mo><mi>n</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>)</mo></mrow><mo>]</mo><mo>,</mo><mi>n</mi><mo>=</mo><mn>0</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mn>2</mn><mi>N</mi><mo>-</mo><mn>1</mn><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math>
when the frame length is 20ms and the sampling rate is 32kHz, 640 frequency domain coefficients are obtained. The corresponding frequency domain coefficient number N can be calculated by other frame lengths and sampling rates.
102: dividing the N frequency domain coefficients into a plurality of coding sub-bands, and calculating the amplitude envelopes of the coding sub-bands;
in this embodiment, non-uniform sub-band division is adopted, and frequency domain amplitude envelopes (amplitude envelopes for short) of the sub-bands are calculated.
This step can be implemented by the following substeps:
102 a: dividing the frequency domain coefficients within the frequency band to be processed into L sub-bands (which may be referred to as encoded sub-bands);
in this embodiment, the frequency band range to be processed is 0 to 13.6kHz, non-uniform sub-band division can be performed according to human ear perception characteristics, and a specific division mode is given in table 1.
In table 1, the frequency domain coefficients in the frequency band range of 0 to 13.6kHz are divided into 28 encoded sub-bands, i.e., L-28; and the frequency domain coefficient above 13.6kHz is set to 0.
102 b: the amplitude envelope of each encoded subband is calculated according to the following formula:
<math><mrow><mi>Th</mi><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mfrac><mn>1</mn><mrow><mi>HIndex</mi><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>-</mo><mi>LIndex</mi><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>+</mo><mn>1</mn></mrow></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mi>LIndex</mi><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow></mrow><mrow><mi>HIndex</mi><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow></mrow></munderover><mi>X</mi><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mi>X</mi><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow></msqrt><mo>,</mo><mi>j</mi><mo>=</mo><mn>0,1</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mi>L</mi><mo>-</mo><mn>1</mn><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>4</mn><mo>)</mo></mrow></mrow></math>
wherein, limidex (j) and hiddex (j) respectively represent the start frequency point and the end frequency point of the jth encoded subband, and the specific values thereof are shown in table 1.
TABLE 1 frequency domain non-uniform sub-band partition example
Sub-band sequence numberStarting frequency point (LIndex)Ending frequency point (HINdex)Sub-band width (BandWidth)
0 0 7 8
1 8 15 8
2 16 23 8
3 24 31 8
4 32 47 16
5 48 63 16
6 64 79 16
7 80 95 16
8 96 111 16
9 112 127 16
10 128 143 16
11 144 159 16
12 160 183 24
13 184 207 24
14 208 231 24
15 232 255 24
16 256 279 24
17 280 303 24
18 304 327 24
19 328 351 24
20 352 375 24
21 376 399 24
22 400 423 24
23 424 447 24
24 448 471 24
25 472 495 24
26 496 519 24
27 520 543 24
103: quantizing and encoding the amplitude envelopes of the encoded sub-bands to obtain quantization indexes of the amplitude envelopes and quantization index encoding bits of the amplitude envelopes (namely encoding bits of the amplitude envelopes);
quantizing each encoded subband amplitude envelope calculated according to the formula (4) by using the following formula (5) to obtain a quantization index of each encoded subband amplitude envelope:
Figure GSA00000030054100201
wherein,indicating rounding down, Thq(0) The amplitude envelope quantization index for the first encoded sub-band is limited in range to [ -5, 34 [ -5 [ ]]Internal, i.e. when Thq(0) When less than-5, let Th beq(0) -5; when Th isq(0) When greater than 34, let Thq(0)=34。
The quantized amplitude envelope reconstructed from the quantization index is
Figure GSA00000030054100203
The amplitude envelope quantization index of the first encoded subband is encoded using 6 bits, i.e. 6 bits are consumed.
The difference operation value between the amplitude envelope quantization indexes of each coding sub-band is calculated by adopting the following formula:
ΔThq(j)=Thq(j+1)-Thq (j)j=0,…,L-2 (6)
the amplitude envelope may be modified to ensure Δ Th as followsq(j) In the range of [ -15, 16 [)]The method comprises the following steps:
if Δ Thq(j) If < -15 >, let Δ Thq(j)=-15,Thq(j)=Thq(j+1)+15,j=L-2,…,0;
If Δ Thq(j) If > 16, let Δ Thq(j)=16,Thq(j+1)=Thq(j)+16,j=0,…,L-2;
For Delta Thq(j) J is 0, …, and L-2 performs Huffman coding, and calculates the number of bits consumed at that time (called Huffman coded bits). If the Huo is at this timeWhen the number of bits of the Huffman code is equal to or greater than the number of bits fixedly allocated (in this embodiment, greater than (L-1). times.5), the natural coding method is used to code Δ Thq(j) J is 0, …, L-2 is coded, and the amplitude envelope huffman coding Flag _ huff _ rms is 0; otherwise, using Huffman coding to compare Thq(j) J is 0, …, L-2, and the amplitude envelope huffman coding Flag _ huff _ rms is 1. The encoded bits of the magnitude envelope quantization exponent (i.e., the encoded bits of the magnitude envelope difference value) and the magnitude envelope huffman encoded flag bits need to be transmitted to the MUX.
104: carrying out bit allocation on each coding sub-band according to the importance of each coding sub-band;
calculating an initial value of the importance of each coding sub-band according to a code rate distortion theory and the amplitude envelope information of the coding sub-bands, and then carrying out bit allocation on each sub-band according to the importance of each sub-band; this step can be implemented by the following substeps:
104 a: calculating the bit consumption average value of a single frequency domain coefficient;
deducting the bit number bit _ sides consumed by the side information, the reserved bit of the noise level information of the noise filling sub-band, the bit number bit _ noise and consumed by the amplitude envelope of the coding sub-band from the total bit number bit _ available provided by the frame length of 20ms, and obtaining the residual bit number bit _ left available for the frequency domain coefficient coding, namely:
bits_left=bits_available-bit_sides-bits_Th-bits_noiseband (7)
and after the bit distribution of the noise filling sub-band is finished, if residual bits exist, the residual noise filling sub-band noise level information reserved bit is used for bit distribution correction.
The side information includes bits of an amplitude envelope huffman coding Flag _ huff _ rms, a frequency domain coefficient huffman coding Flag _ huff _ plvq, and the number of iterations count. Flag _ huff _ rms is used to identify whether huffman coding is used for the subband magnitude envelopes; flag _ huff _ plvq is used to identify whether huffman coding is used when vector quantizing and coding the frequency domain coefficients, and the number of iterations count is used to identify the number of iterations in bit allocation modification (see description in subsequent steps for details).
104 b: calculating the importance initial value of each coding sub-band in bit allocation:
the importance of the j-th encoded subband for bit allocation is denoted by rk (j).
104 c: carrying out bit allocation on each coding sub-band according to the importance of each coding sub-band;
the specific description is as follows:
first, find the coding sub-band with the maximum value from each rk (j), and assume that the coding sub-band is numbered jkThen increasing the coding bit number of each frequency domain coefficient in the coding sub-band and reducing the importance of the coding sub-band; the total number of bits bit _ band _ used (j) consumed for the sub-band coding is calculated at the same timek) (ii) a Finally, calculating the sum of the bit numbers consumed by all the coding sub-bands (bit _ band _ used (j)), wherein j is 0, … and L-1; the above process is repeated until the sum of the consumed bit numbers satisfies the maximum value that can provide the bit limitation condition.
The bit allocation number refers to the number of bits to which a single frequency domain coefficient is allocated in a coded sub-band. The number of bits consumed by a coding sub-band is the number of bits allocated to a single frequency domain coefficient in the coding sub-band multiplied by the number of frequency domain coefficients included in the coding sub-band.
In this embodiment, the step size for allocating bits to a coded subband whose bit allocation number is 0 is 1 bit, the step size for decreasing the importance after bit allocation is 1, the bit allocation step size for additionally allocating bits to a coded subband whose bit allocation number is greater than 0 and less than a threshold 5 is 0.5 bit, the step size for decreasing the importance after additionally allocating bits is also 0.5, the bit allocation step size for additionally allocating bits to a coded subband whose bit allocation number is greater than or equal to the threshold 5 is 1, and the step size for decreasing the importance after additionally allocating bits is also 1.
The bit allocation method in this step may be represented by the following pseudo code:
let region _ bit (j) 0, j 0, 1, …, L-1;
for encoded sub-bands 0, 1, …, L-1:
{
finding <math><mrow><msub><mi>j</mi><mi>k</mi></msub><mo>=</mo><munder><mrow><mi>arg</mi><mi>max</mi></mrow><mrow><mi>j</mi><mo>=</mo><mn>0</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mi>L</mi><mo>-</mo><mn>1</mn></mrow></munder><mo>[</mo><mi>rk</mi><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>]</mo><mo>;</mo></mrow></math>
If region _ bit (j)k)<5
{
If region _ bit (j)k)=0
Let region _ bit (j)k)=region_bit(jk)+1;
Calculate bit _ band _ used (j)k)=region_bit(jk)*BandWidth(jk);
Let rk (j)k)=rk(jk)-1;
Else if the region _ bit (j)k)>=1
Let region _ bit (jk) ═ region _ bit (jk) + 0.5;
calculate bit _ band _ used (j)k)=region_bit(jk)*BandWidth(jk)*0.5;
Let rk (j)k)=rk(jk)-0.5;
}
Else if the region _ bit (j)k)>=5
{
Let region _ bit (j)k)=region_bit(jk)+1;
Order tork(jk)=rk(jk)-1ifregion_bit(jk)<MaxBit-100else;
Calculate bit _ band _ used (j)k)=region_bit(jk)×BandWidth(jk);
}
Calculating bit _ used _ all _ sum (bit _ band _ used (j)) j 0, 1, …, L-1;
if bit _ used _ all < bits _ left-24, returning and searching j again in each coding sub-bandkCircularly calculating a bit allocation value; where 24 is the maximum of the width of the encoded sub-bands.
Otherwise, ending the circulation, calculating the bit distribution value and outputting the bit distribution value at the moment.
}
Finally, according to the importance of the coding sub-band, distributing the rest less than 24 bits to the coding sub-band meeting the requirements according to the following principle, preferentially distributing 0.5 bit to each frequency domain coefficient in the coding sub-band with the bit distribution of 1, and simultaneously reducing the importance of the coding sub-band by 0.5; otherwise, 1 bit is allocated to each frequency domain coefficient in the sub-band with the bit allocation of 0, and meanwhile, the importance 1 of the coding sub-band is reduced until bit _ left-bit _ used _ all is less than 4, and the bit allocation is finished.
Wherein, MaxBit is the maximum number of coding bits that can be allocated to a single frequency-domain coefficient in a coding sub-band. In this embodiment, MaxBit is 9. This value can be appropriately adjusted according to the coding rate of the codec. region _ bit (j) is the number of bits allocated to a single frequency domain coefficient in the jth encoded subband.
105: according to the bit allocation result ofstep 104, allocating bits for the effective noise filling sub-band containing the zero bit coding sub-band inside; estimating a power spectrum of the audio signal through the MDCT frequency domain coefficient, and estimating the noise level of the effective noise filling sub-band according to the estimated power spectrum; carrying out quantization coding on the noise level information to obtain noise level coding bits of a noise filling sub-band;
the N MDCT frequency domain coefficients can be regarded as one noise filling sub-band, and can also be uniformly divided or divided into a plurality of noise filling sub-bands according to the auditory property of human ears. A noise-filled subband comprises one or more encoded subbands.
The present invention refers to noise-filled subbands that have zero-bit coded subbands inside as effective noise-filled subbands.
When the bits of the noise filling sub-band are allocated, the bits can be allocated to all the effective noise filling sub-bands, one or more low-frequency effective noise filling sub-bands can be skipped, the bits are allocated to the subsequent high-frequency effective noise filling sub-bands, and correspondingly, when the bits are decoded, the frequency spectrum reconstruction is carried out on the zero-bit coding sub-band in the low-frequency effective noise filling sub-band without the bits allocated in a white noise filling mode.
Each effective noise filling sub-band is allocated with the same number of bits, or different numbers of bits are allocated according to the auditory characteristics of human ears for each sub-band. And subsequently, multiplexing and packaging the bits after obtaining the noise level coding bits of the effective noise filling sub-band.
106: quantizing and coding the vector of the non-zero bit coding sub-band to obtain the coding bit of the frequency domain coefficient;
107: constructing a coded code stream
Fig. 10 is a code stream composition diagram according to an embodiment of the present invention. Firstly, writing side information into a bit stream multiplexer MUX according to the following sequence, wherein the Flag _ huff _ rms, the Flag _ huff _ plvq and the count are written into the bit stream multiplexer MUX; then writing the encoded bits of the encoded sub-band amplitude envelopes into a MUX, then writing the encoded bits of the noise level into the MUX, and then writing the encoded bits of the frequency domain coefficients into the MUX; and finally, transmitting the code stream written according to the sequence to a decoding end.
Step 105 is described in detail below with reference to the figures for an example of dividing the N MDCT frequency domain coefficients into a plurality of noise-padded subbands and allocating bits from the second significant noise-padded subband.
As shown in fig. 2, the process of obtaining the noise level coded bits of the zero-bit coded sub-band inside the noise-padded sub-band specifically includes:
201: dividing the coding sub-band into a plurality of noise filling sub-bands, and distributing bits for the effective noise filling sub-bands according to the bit distribution result of the coding sub-band;
the frequency domain coefficient in the frequency band range to be processed is non-uniformly divided into a plurality of sub-bands according to the auditory property of human ears, and the sub-bands are called noise filling sub-bands; a noise-padded subband comprises one or more encoded subbands;
an example of a specific division manner is shown in table 2:
TABLE 2 example of non-uniform subband partitioning for noise-filled subbands
Noise filling subband sequence numberNumber of start code sub-band (NLIndex)End code subband number (NHIndex)Containing the number of encoded subbands (SubBandNum)
0 0 11 12
1 12 13 2
2 14 16 3
3 17 20 4
4 21 28 8
In table 2 above, the noise-padded subbands are arranged in order from lower to higher frequencies of the encoded subbands.
Suppose that the noise level information reserved bits of the noise-filled sub-band reserve two bits for each noise-filled sub-band except for the sequence number 0, and the total reserved bits number is equal to the number of the noise-filled sub-bands minus 1 and then multiplied by 2.
When bits are allocated, no bits are allocated to the noise-filled sub-band with the sequence number of 0, that is, no coded bits are occupied, and accordingly, when decoding is performed, for the noise-filled sub-band with the sequence number of 0, if a zero-bit coded sub-band is located inside, a white noise filling method is adopted to perform frequency spectrum reconstruction on the frequency domain coefficient of the zero-bit coded sub-band, which is detailed in step 504; starting from a noise filling sub-band with the sequence number of 1, judging whether a zero bit coding sub-band exists in the noise filling sub-band, if the noise filling sub-band has the zero bit coding sub-band, allocating 2 bits for the noise filling sub-band to represent noise level information of the zero bit coding sub-band in the noise filling sub-band, and reserving bits _ noise band for subtracting 2 from the noise level information of the noise filling sub-band. And after the bit allocation of all the noise filling sub-bands is completed, reserving bits _ noise band for the residual noise level information for bit allocation correction.
The noise-filling subband bit allocation method in this step can be represented by the following pseudo code:
the Nregion _ bitflag (j-1) assigns a flag to the bits of the noise-padded subband j, 1: indicating that a bit is allocated; 0 means no bit is allocated.
Let Nregion _ bitflag (j-1) be 0, j be 1, 2, …, L _ noise-1;
let the noise-padded subband bit allocation residual bit noiseband _ remaining _ bits be 0;
for the noise-padded sub-band j-1, 2, … … L _ noise-1
{ let region ═ nlindex (j), nlindex (j) +1, … … nhindex (j);
for all regions
{
If the region _ bit (region) is equal to 0
{
Let Nregion _ bit (Nregion) be 1;
bits_noiseband=bits_noiseband-2;
jumping out of the current cycle;
}
}
}
noiseband_remain_bits=bits_noiseband;
the sequential arrangement of the bits assigned to the noise-padded sub-bands is referred to as the noise-level coded bits.
The above is the process of allocating bits for the noise-padded subbands, and of course, a specific number of bits (e.g. 2 bits) may be directly reserved for each noise-padded subband.
202: estimating the power spectrum of the signal of 4 noise filling sub-bands with the sequence numbers of 1, 2, 3 and 4 through the MDCT coefficient based on the noise filling sub-band division mode of the table 2;
the algorithm for estimating the power of the frequency point k of the ith frame is as follows (13):
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2 (13)
wherein P is when i is equal to 0i-1(k)=0;Pi(k) And the power value estimated by the k frequency point of the ith frame is represented. Xi(k) The MDCT coefficients of the k-th frequency point of the ith frame are represented, and λ is a filter coefficient of a single-pole smoothing filter, wherein λ is 0.875;
the principle of power spectrum estimation by MDCT is derived as follows:
the discrete-time fourier transform (DTFT) of a signal x of length 2M at the angular frequency ω is given by:
<math><mrow><msub><mi>X</mi><mi>DTFT</mi></msub><mrow><mo>(</mo><mi>&omega;</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>n</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>M</mi><mo>-</mo><mn>1</mn></mrow></munderover><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><msup><mi>e</mi><mrow><mo>-</mo><mi>j&omega;n</mi></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>14</mn><mo>)</mo></mrow></mrow></math>
the DTFT is sampled at 2M evenly spaced frequencies between 0 and 2 pi. This sampled transform is called the Discrete Fourier Transform (DFT), which is given by:
<math><mrow><msub><mi>X</mi><mi>DFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><msub><mi>X</mi><mi>DTFT</mi></msub><mrow><mo>(</mo><mn>2</mn><mi>&pi;k</mi><mo>/</mo><mn>2</mn><mi>M</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>M</mi><mo>-</mo><mn>1</mn></mrow></munderover><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><msup><mi>e</mi><mrow><mo>-</mo><mi>j</mi><mfrac><mrow><mn>2</mn><mi>&pi;kn</mi></mrow><mrow><mn>2</mn><mi>M</mi></mrow></mfrac></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>15</mn><mo>)</mo></mrow></mrow></math>
sampling the DTFT with an offset of half the frequency bins to generate a Shifted Discrete Fourier Transform (SDFT):
<math><mrow><msub><mi>X</mi><mi>SDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><msub><mi>X</mi><mi>DTFT</mi></msub><mrow><mo>(</mo><mn>2</mn><mi>&pi;</mi><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>)</mo></mrow><mo>/</mo><mn>2</mn><mi>M</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>M</mi><mo>-</mo><mn>1</mn></mrow></munderover><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><msup><mi>e</mi><mrow><mo>-</mo><mi>j</mi><mfrac><mrow><mn>2</mn><mi>&pi;</mi><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>)</mo></mrow><mi>n</mi></mrow><mrow><mn>2</mn><mi>M</mi></mrow></mfrac></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>16</mn><mo>)</mo></mrow></mrow></math>
the SDFT after windowing the actual signal x (n) is as follows:
<math><mrow><msub><mi>X</mi><mi>SDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>M</mi><mo>-</mo><mn>1</mn></mrow></munderover><mi>w</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><msup><mi>e</mi><mrow><mo>-</mo><mi>j</mi><mfrac><mrow><mn>2</mn><mi>&pi;</mi><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>)</mo></mrow><mi>n</mi></mrow><mrow><mn>2</mn><mi>M</mi></mrow></mfrac></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>16</mn><mo>)</mo></mrow></mrow></math>
according to the formula (2), the MDCT frequency domain coefficient X (k) is recorded as XMDCT(k) And let M be N, rewrite equation (2) as follows:
<math><mrow><msub><mi>X</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>n</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>M</mi><mo>-</mo><mn>1</mn></mrow></munderover><mover><mi>x</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mi>w</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mi>cos</mi><mrow><mo>(</mo><mfrac><mi>&pi;</mi><mi>M</mi></mfrac><mrow><mo>(</mo><mi>n</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mi>M</mi><mn>2</mn></mfrac><mo>)</mo></mrow><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>)</mo></mrow><mo>)</mo></mrow><mo>,</mo><mi>k</mi><mo>=</mo><mn>0</mn><mo>,</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>&CenterDot;</mo><mo>,</mo><mi>M</mi><mo>-</mo><mn>1</mn><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>17</mn><mo>)</mo></mrow></mrow></math>
SDFT and MDCT use the same window type, let <math><mrow><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mo>=</mo><mover><mi>x</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
The relationship between the MDCT and SDFT of the actual signal x (n) can be expressed by the following formula:
<math><mrow><msub><mi>X</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mo>|</mo><msub><mi>X</mi><mi>SDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo><mi>cos</mi><mrow><mo>(</mo><mo>&angle;</mo><msub><mi>X</mi><mi>SDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>-</mo><mfrac><mi>&pi;</mi><mi>M</mi></mfrac><mrow><mo>(</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mi>M</mi><mn>2</mn></mfrac><mo>)</mo></mrow><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>)</mo></mrow><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>18</mn><mo>)</mo></mrow></mrow></math>
that is, the MDCT may be expressed as the magnitude of the SDFT modulated by a cosine, which is an angular function of the SDFT.
The power spectrum of the signal is estimated by SDFT of successive overlapping windowed blocks of the audio signal, assuming a transform length of 2M for signal x, then the short time-shifted discrete fourier transform stststdft at frequency point k and at block t is given by:
<math><mrow><msub><mi>X</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>=</mo><munderover><mi>&Sigma;</mi><mrow><mi>k</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>2</mn><mi>M</mi><mo>-</mo><mn>1</mn></mrow></munderover><mi>w</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>+</mo><mi>Ht</mi><mo>)</mo></mrow><msup><mi>e</mi><mrow><mo>-</mo><mi>j</mi><mfrac><mrow><mn>2</mn><mi>&pi;</mi><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>)</mo></mrow><mi>n</mi></mrow><mrow><mn>2</mn><mi>M</mi></mrow></mfrac></mrow></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>19</mn><mo>)</mo></mrow></mrow></math>
h is the hop length of the block. And H is M, STSTSTDFT has the same hop length with MDCT.
Using STSTSTSTDFT can pass X over many blocks tSDFT[k,t]The squared magnitudes of the signals are averaged to estimate the power spectrum of the signal, and a moving average of blocks of length T is calculated to generate an estimate of the time variation of the power spectrum by:
<math><mrow><msub><mi>P</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><mi>T</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>n</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>T</mi><mo>-</mo><mn>1</mn></mrow></munderover><msup><mrow><mo>|</mo><msub><mi>X</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>-</mo><mi>&eta;</mi><mo>)</mo></mrow><mo>|</mo></mrow><mn>2</mn></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>20</mn><mo>)</mo></mrow></mrow></math>
according to the operational relationship between MDCT and SDFT, under certain assumption conditions, the method can be based on XMDCT(k, t) is approximated to obtain PSTSDFT(k, t). Defining:
<math><mrow><msub><mi>P</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><mi>T</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>&eta;</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>T</mi><mo>-</mo><mn>1</mn></mrow></munderover><msup><mrow><mo>|</mo><msub><mi>X</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>-</mo><mi>&eta;</mi><mo>)</mo></mrow><mo>|</mo></mrow><mn>2</mn></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>21</mn><mo>)</mo></mrow></mrow></math>
from equation (18) we can obtain:
<math><mrow><msub><mi>P</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><mi>T</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>&eta;</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>T</mi><mo>-</mo><mn>1</mn></mrow></munderover><msup><mrow><mo>|</mo><msub><mi>X</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>-</mo><mi>&eta;</mi><mo>)</mo></mrow><mo>|</mo></mrow><mn>2</mn></msup><msup><mi>cos</mi><mn>2</mn></msup><mrow><mo>(</mo><msub><mrow><mo>&angle;</mo><mi>X</mi></mrow><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>-</mo><mi>&eta;</mi><mo>)</mo></mrow><mo>-</mo><mfrac><mi>&pi;</mi><mi>M</mi></mfrac><mrow><mo>(</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mi>M</mi><mn>2</mn></mfrac><mo>)</mo></mrow><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>)</mo></mrow><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>22</mn><mo>)</mo></mrow></mrow></math>
if it is assumed that | X is on a blockSTSDFT(k, t-eta) | and ≈ XSTSDFT(k, t- η) are relatively independent common variations (this assumption is true for most audio signals), then:
<math><mrow><msub><mi>P</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>&cong;</mo><mrow><mo>(</mo><mfrac><mn>1</mn><mi>T</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>&eta;</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>T</mi><mo>-</mo><mn>1</mn></mrow></munderover><msup><mrow><mo>|</mo><msub><mi>X</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>-</mo><mi>&eta;</mi><mo>)</mo></mrow><mo>|</mo></mrow><mn>2</mn></msup><mo>)</mo></mrow><mrow><mo>(</mo><mfrac><mn>1</mn><mi>T</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>&eta;</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>T</mi><mo>-</mo><mn>1</mn></mrow></munderover><msup><mi>cos</mi><mn>2</mn></msup><mrow><mo>(</mo><mo>&angle;</mo><msub><mi>X</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>-</mo><mfrac><mi>&pi;</mi><mi>M</mi></mfrac><mrow><mo>(</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mfrac><mi>M</mi><mn>2</mn></mfrac><mo>)</mo></mrow><mrow><mo>(</mo><mi>k</mi><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>)</mo></mrow><mo>)</mo></mrow><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>23</mn><mo>)</mo></mrow></mrow></math>
if further suppose that X is less thanSTSDFT(k) In general, evenly distributed over T blocks between 0 and 2 pi and if T is relatively large, since the expected value of the cosine squared of the phase angle with the even distribution is one half, one can get:
<math><mrow><msub><mi>P</mi><mi>MDCT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>&cong;</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mrow><mo>(</mo><mfrac><mn>1</mn><mi>T</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>&eta;</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>T</mi><mo>-</mo><mn>1</mn></mrow></munderover><msup><mrow><mo>|</mo><msub><mi>X</mi><mi>STSDFT</mi></msub><mo>[</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>-</mo><mi>&eta;</mi><mo>]</mo><mo>|</mo></mrow><mn>2</mn></msup><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><msub><mi>P</mi><mi>STSDFT</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>,</mo><mi>t</mi><mo>)</mo></mrow><mo>;</mo><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>24</mn><mo>)</mo></mrow></mrow></math>
thus, it can be seen that the power spectrum estimated from MDCT is approximately equal to half the power spectrum estimated from stststdft.
Because of the requirement of low time delay of coding operation, we select a single-pole smoothing filter to carry out power spectrum estimation, PMDCTThe block t in (k, t) is denoted by i and is written as subscript, PMDCT(k, t) can be written as Pi(k) If the length of the block is determined as the length of a frame of audio signal, i represents the number of the frame, and the algorithm for obtaining the final estimate is as formula (13), where formula (13) is the algorithm for power spectrum estimation in the present invention.
203: from the power spectrum estimated by equation (13), the zero bit coded sub-band noise level within each noise-filled sub-band assigned to a bit is calculated.
As shown in fig. 3: the specific process of calculating the noise level is as follows:
step 301: calculating the average value of the power of all frequency domain coefficients of all or part of zero bit coding sub-bands in the noise filling sub-band to obtain the average power P _ aveg (j);
step 302: regarding the frequency domain coefficients with power larger than the average power in all or part of zero bit coding sub-bands in the noise filling sub-band as tone components in the noise filling sub-band, calculating the average value of the power of all frequency domain coefficients with power larger than the average power P _ aveg (j) in all or part of zero bit coding sub-bands of the effective noise filling sub-band, and obtaining the average power P _ signal _ aveg (j) of the tone components of the zero bit coding sub-band in the effective noise filling sub-band;
step 303: considering the frequency domain coefficients with power less than or equal to the average power in all or part of zero bit coding sub-bands in the noise filling sub-band as noise components in the noise filling sub-band, calculating the average value of the power of all frequency domain coefficients with power less than or equal to the average power P _ aveg (j) in all or part of zero bit coding sub-bands in the effective noise filling sub-band, and obtaining the average power P _ noise _ aveg (j) of the noise components in the zero bit coding sub-band in the effective noise filling sub-band;
step 304: calculating a ratio P _ noise _ rate (j) of the noise component average power P _ noise _ aveg (j) and the pitch component average power P _ signal _ aveg (j), the value of which is the noise level of the effective noise-filled subband.
Carrying out quantization coding on the noise level to obtain a noise level coding bit;
and (j) carrying out quantization coding to obtain P _ noise _ rate _ bits (j). After the noise level quantization coding is completed, the noise level coding bits of each noise filling sub-band allocated to the bits are arranged from low to high according to the sequence number of the sub-band, and the noise level coding bits of the whole effective noise filling sub-band are obtained.
An example of one of these using non-uniform quantization is shown in table 3:
TABLE 3 noise signal ratio non-uniform quantization example
P_noise_rate(j) P_noise_rate_bits(j)
[0,0.04) 00
[0.04,0.08) 01
[0.08,0.16) 10
[0.16,1) 11
The noise level of the effective noise-filled subband is also the noise level of the zero-bit coded subband in the noise-filled subband, and may be represented by the ratio of the average power of the tonal component P _ signal _ aveg (j) to the average power of the noise component P _ noise _ aveg (j) in addition to P _ noise _ rate (j).
Second, decoding method
The audio decoding method of the present invention is the inverse process of the encoding method, and includes:
A. decoding each amplitude envelope coded bit in the bit stream to be decoded to obtain an amplitude envelope quantization index of each coded sub-band;
B. carrying out bit allocation on each coding sub-band, carrying out decoding and inverse quantization on the noise level coding bits to obtain the noise level of a zero-bit coding sub-band, and carrying out decoding and inverse quantization on the frequency domain coefficient coding bits to obtain the frequency domain coefficient of a non-zero-bit coding sub-band;
C. carrying out frequency band replication on the zero-bit coding sub-band, controlling the whole energy filling level of the coding sub-band according to the amplitude envelope of each zero-bit coding sub-band in the bit stream to be decoded, and controlling the proportion of noise filling and frequency band replication energy according to the noise level of the zero-bit coding sub-band to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
D. and performing Inverse Modified Discrete Cosine Transform (IMDCT) on the frequency domain coefficient of the non-zero bit coding sub-band and the frequency domain coefficient of the reconstructed zero bit coding sub-band to obtain a final audio signal.
Fig. 4 is a schematic structural diagram of an audio decoding method according to an embodiment of the present invention. As shown in fig. 4, the method includes:
401: decoding each amplitude envelope coded bit to obtain an amplitude envelope quantization index of each coded sub-band;
extracting coded bits of one frame from a coded bit stream transmitted from a coding end (namely from a bit stream demultiplexer DeMUX); after the coded bits are extracted, firstly decoding the side information, then carrying out Huffman decoding or directly decoding on each amplitude envelope coded bit in the frame according to the value of an amplitude envelope Huff _ rms coding Flag, and obtaining an amplitude envelope quantization index Th of each coded sub-bandq(j),j=0,…,L-1。
402: bit allocation is carried out on each coding sub-band, and bit allocation is carried out on the effective noise filling sub-band;
calculating an initial value of importance of each coding sub-band according to the amplitude envelope quantization index of each coding sub-band, and performing bit allocation on each coding sub-band by using the importance of the coding sub-band to obtain the bit allocation number of the coding sub-band; the bit allocation method of the decoding end is completely the same as that of the encoding end. In the bit allocation process, the bit allocation step size and the step size of the encoded sub-band with reduced importance after bit allocation are changed.
After the bit allocation process is completed, according to the bit allocation correction iteration number count value of the encoding end and the importance of each encoding subband, the bit allocation correction of the count times is carried out on the encoding subband, and then the whole bit allocation process is finished.
In the process of bit allocation and correction, the step length of bit allocation and the step length of bit allocation correction when allocating bits to the coding sub-band with the bit allocation number of 0 are 1 bit, the step length of reduced importance after bit allocation and bit allocation correction is 1, the step length of bit allocation and the step length of bit allocation correction when additionally allocating bits to the coding sub-band with the bit allocation number of more than 0 and less than a certain threshold are 0.5 bit, the step length of reduced importance after bit allocation and bit allocation correction is also 0.5, the step length of bit allocation and bit allocation correction when additionally allocating bits to the coding sub-band with the bit allocation number of more than or equal to the threshold is 1, and the step length of reduced importance after bit allocation and bit allocation correction is also 1;
dividing the coding sub-band into a plurality of noise filling sub-bands, and distributing bits for the effective noise filling sub-bands according to the bit distribution result of the coding sub-band; the method for dividing the noise-filled sub-band and the method for allocating the bits of the noise-filled sub-band are the same as the encoding method, and are not described herein again.
403: decoding and inversely quantizing the noise level coded bits to obtain the noise level of zero bit coded sub-bands, and decoding and inversely quantizing the frequency domain coefficient coded bits to obtain MDCT frequency domain coefficients;
404: performing frequency band replication on the zero-bit coding sub-band, controlling the overall filling energy level of the coding sub-band according to the amplitude envelope of the zero-bit coding sub-band, and controlling the proportion of the frequency band replication and the noise filling energy of each zero-bit coding sub-band according to the noise level of the noise filling sub-band where the coding sub-band is located to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
the detailed process of this step is illustrated in fig. 5 below.
And performing frequency band replication on zero bit coding sub-bands in the effective noise filling sub-bands allocated with bits, controlling the energy level of the frequency domain coefficient obtained after replication and the energy level of noise filling, and performing noise filling on the zero bit coding sub-bands in the effective noise filling sub-bands not allocated with bits.
405: performing IMDCT (Inverse Modified discrete cosine Transform) on the frequency domain coefficients after the spectrum reconstruction to obtain a final audio output signal.
Step 404 is described in detail below in conjunction with FIG. 5:
as shown in fig. 5, step 404 specifically includes:
step 501: performing band replication on a zero-bit coding sub-band of the effective noise filling sub-band;
searching the position of a certain tone of the audio signal in the MDCT frequency domain coefficient, taking the bandwidth from a frequency point 0 to the frequency point of the tone position as a frequency band replication period, and taking the frequency band which is shifted backwards by copyband _ offset frequency points from the frequency point 0 to the tone position as a frequency band replication source frequency band, and performing frequency band replication on the zero-bit coding subband. And if the highest frequency in the zero-bit coding sub-band needing the frequency band replication is less than the frequency of the searched tone, the frequency point only adopts noise filling to carry out spectrum reconstruction.
The frequency domain coefficients are arranged in the order of frequency from low to high, and the backward shift is the position shift to high frequency.
The band replication method is explained in detail below:
a. searching the position of a certain tone of the audio signal in the MDCT frequency domain coefficient;
the preferred method for searching the position of the tone is to carry out smooth filtering on MDCT frequency domain coefficients: taking an absolute value or a square value of an MDCT frequency domain coefficient of a certain specific frequency band of low frequency, and performing smooth filtering; searching the position of the maximum extreme value of the filtering output value according to the smooth filtering result, and taking the position of the maximum extreme value as the position of the tone;
the pitch of an audio signal according to the present invention refers to a fundamental tone of an audio signal or a harmonic of a fundamental tone.
The specific frequency band may be a frequency band with a relatively concentrated energy determined according to the spectral characteristics, and is referred to as a first frequency band. Low frequency here refers to spectral components less than one-half of the total bandwidth of the signal.
Here, the frequency domain coefficients are MDCT frequency domain coefficients decoded in step 403, and the frequencies are arranged from low to high.
The operation formula for performing smooth filtering on the frequency domain coefficient absolute value of the first frequency band is as follows:
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><mo>|</mo><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo></mrow></math>
or, the operation formula for performing smooth filtering on the frequency domain coefficient square value of the first frequency band is as follows
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><msup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math>
Wherein μ is a smoothing filter coefficient, and the value range thereof is (0, 1), and can be 0.125. X _ ampi(k) Represents the filtered output value of the k-th frequency bin of the ith frame,
Figure GSA00000030054100332
when the MDCT coefficient decoded for the k-th frequency point of the ith frame is equal to 0, X _ ampi-1(k)=0。
The following two methods are available for searching the position of the maximum extreme value of the first frequency band filtering output value:
(1) searching an initial maximum value directly from the filtering output value of the frequency domain coefficient corresponding to the first frequency band, taking the maximum value as a maximum extreme value of the filtering output value of the first frequency band, and taking the sequence number of the corresponding frequency point as the position of the maximum extreme value (namely tone);
(2) when searching the maximum extreme value, one section of the first frequency band is taken as a second frequency band, an initial maximum value is searched from the filtering output values of the frequency domain coefficients corresponding to the second frequency band, the initial maximum value is taken as the maximum extreme value of the filtering output value of the first frequency band, and the serial number of the corresponding frequency point is taken as the position of the maximum extreme value (namely, the tone).
The starting point of the second frequency band is greater than the starting point of the first frequency band, the ending point of the second frequency band is less than the ending point of the first frequency band, and preferably, the number of frequency coefficients in the first frequency band and the second frequency band is not less than 8.
In order to prevent the frequency domain coefficient corresponding to the found initial maximum value from not being the position of the tone of the audio signal, when the tone position is searched, the initial maximum value is searched from the filtering output value of the second frequency band, and different processing is carried out according to the position of the frequency domain coefficient corresponding to the initial maximum value:
(a) if the initial maximum value is the filtered output value of the frequency domain coefficient of the lowest frequency of the second frequency band, comparing the filter output value of the frequency domain coefficient with the previous lower frequency in the first frequency band, and comparing forward until the filter output value of the current frequency domain coefficient is larger than the filter output value of the previous frequency domain coefficient, then considering the current frequency domain coefficient as the position of the tone, that is, the filter output value of the current frequency domain coefficient is the maximum extreme value determined finally, or, until the filter output value of the lowest frequency domain coefficient of the first frequency band is greater than the filter output value of the next frequency domain coefficient, the lowest frequency domain coefficient of the first frequency band is considered as the position of the tone, the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band is the finally determined maximum extreme value;
(b) if the initial maximum value is the filtered output value of the frequency domain coefficient of the highest frequency of the second frequency band, comparing the filter output value of the highest frequency domain coefficient of the second frequency band with the filter output value of the next higher frequency domain coefficient in the first frequency band, sequentially comparing backwards until the filter output value of the current frequency domain coefficient is larger than the filter output value of the next frequency domain coefficient, considering the current frequency domain coefficient as the position of the tone, that is, the filtering output value of the current frequency domain coefficient is the maximum extreme value determined finally, or, when the filtering output value of the frequency domain coefficient with the highest frequency of the first frequency band obtained by comparison is larger than the filtering output value of the previous frequency domain coefficient, the frequency domain coefficient with the highest frequency of the first frequency band is considered as the position where the tone is located, namely, the filtering output value of the frequency domain coefficient with the highest frequency of the first frequency band is the finally determined maximum extreme value;
(c) if the initial maximum value is the filtering output value of the frequency domain coefficient between the lowest frequency and the highest frequency of the second frequency band, the frequency domain coefficient corresponding to the initial maximum value is the position of the tone, that is, the initial maximum value is the finally determined maximum extreme value.
The following describes a method for determining the position of an audio signal by taking the frequency domain coefficients of the first frequency band as the 24 th to 64 th MDCT frequency domain coefficients and the frequency domain coefficients of the second frequency band as the 33 th to 56 th MDCT frequency domain coefficients as an example:
searching for a maximum value from among the filtered output values of the 33 th to 56 th MDCT frequency-domain coefficients; if the maximum value corresponds to the 33 th frequency domain coefficient, judging whether the detection output result of the 32 th frequency domain coefficient is larger than that of the 33 th frequency domain coefficient, if so, continuing to compare forwards to see whether the detection output result of the 31 st frequency domain coefficient is larger than that of the 32 nd frequency domain coefficient, and sequentially comparing forwards according to the method until the filtering output value of the current frequency domain coefficient is larger than that of the previous frequency domain coefficient; or until the filter output value of the 24 th frequency domain coefficient is found to be larger than the filter output value of the 25 th frequency domain coefficient, the current frequency domain coefficient or the 24 th frequency domain coefficient is the position of the tone;
if the maximum value is 56, searching backwards in sequence by adopting a similar method until the filtering output value of the current frequency domain coefficient is larger than that of the next one, wherein the current frequency domain coefficient is the position of the tone, or until the filtering output value of the 64 th frequency domain coefficient is found and is larger than that of the 63 th frequency domain coefficient, the 64 th frequency domain coefficient is the position of the tone;
if the maximum value is between 33 and 56, the frequency domain coefficient corresponding to the maximum value is the position of the tone.
The value of this position is recorded as Tonal _ pos, i.e. the number of the frequency bin corresponding to the maximum extremum.
b. Taking the bandwidth from the frequency point 0 to the frequency point at the tone position as a period, and taking the frequency band which is shifted backwards from the frequency point 0 to the frequency point at the tone position by copying the frequency band of the frequency points shifted backwards from the frequency point 0 to the frequency point at the tone position as a source frequency band to perform frequency band copying on the zero-bit coding sub-band;
that is, the start sequence number of the bin of the source band, copyband _ offset, and the end sequence number is copyband _ offset + Tonal _ pos.
In the invention, the value of the frequency band replication offset (recorded as copyband _ offset) is preset, copyband _ offset is more than or equal to 0, when the preset copyband _ offset is 0, the source frequency band is the frequency band from the frequency point 0 to the frequency point of the tone position, and in order to reduce the frequency spectrum jump of the replication frequency band, copyband _ offset is set to be more than zero, then the source frequency band is the MDCT frequency domain coefficient of the frequency band from the frequency point of 0 frequency point which is backwards offset by a small range (recorded as copyband _ offset) to the frequency point of the maximum extreme value position which is backwards offset by the same small range (recorded as copyband _ offset), and the frequency spectrum filling of the zero bit coding sub-band (such as the serial numbers 1, 2, 3 and 4) inside the effective noise filling sub-band above a certain frequency point is copied from the source frequency band.
Corresponding to the process of fig. 2, for the zero-bit encoded sub-band of the first noise-filled sub-band, a random noise filling method is used to perform spectrum reconstruction, and for the zero-bit encoded sub-bands of the noise-filled sub-bands with the sequence numbers 1, 2, 3, and 4, a frequency domain coefficient replication method is used in combination with noise filling to perform spectrum reconstruction;
when the frequency band is copied, the source frequency band copying initial sequence number of the zero-bit coding sub-band is calculated according to the source frequency band and the initial sequence number of the zero-bit coding sub-band which needs to be copied, and then the frequency domain coefficient of the source frequency band is periodically copied to the zero-bit coding sub-band from the source frequency band copying initial sequence number by taking the frequency band copying period as a period.
The method for determining the copy starting sequence number of the source frequency band comprises the following steps:
firstly, starting from the first zero-bit encoded subband to be copied, obtaining the frequency point number of the initial MDCT frequency domain coefficient of the zero-bit encoded subband to be reconstructed, and recording the frequency point number as fillband _ start _ freq, the frequency point number corresponding to the tone as Tonal _ pos, and the copying period of the frequency band as copy _ period. copy _ period equals to Tonal _ pos plus 1. And if the highest frequency in the zero-bit coding sub-band needing the frequency band replication is less than the frequency of the searched tone, the frequency point only adopts noise filling to reconstruct the frequency spectrum and does not carry out the frequency band replication. The band replication offset is recorded as copy band _ offset, and copy _ period is cyclically subtracted from the value of fillband _ start _ freq until the value falls within the range of the sequence number of the source band, which is the replication start sequence number of the source band and is recorded as copy _ pos _ mod.
The source segment copy start sequence number copy _ pos _ mod may be obtained by the following pseudo code algorithm:
let copy _ pos _ mod be fillband _ start _ freq;
when copy _ pos _ mod is greater than (Tonal _ pos + copy band _ offset)
{
copy_pos_mod=copy_pos_mod-copy_period;
}
After the operation is completed, copy _ pos _ mod copies the starting sequence number for the source segment.
During copying, sequentially copying the frequency domain coefficients starting from the copy start sequence number of the source frequency band back to the zero-bit coding sub-band taking fillband _ start _ freq as the start position until the frequency point copied by the source frequency band reaches the frequency point of Tonal _ pos + copy band _ offset, continuously copying the frequency domain coefficients starting from the frequency point of the copy band _ offset back to the zero-bit coding sub-band again, and so on until the frequency band copying of all the frequency domain coefficients of the current zero-bit coding sub-band is completed.
When the frequency band replication offset copy _ offset is set to be 10, copying a frequency band starting from copy _ pos _ mod to a zero-bit coding subband taking fillband _ start _ freq as a starting position according to the low-to-high arrangement of frequencies, copying from a 10 th frequency domain coefficient again after a Tonal _ pos +10 frequency point, and so on, copying all signals of the zero-bit coding subband from 10 to Tonal _ pos +10 frequency domain coefficients, wherein the frequency domain coefficients from 10 to Tonal _ pos +10 frequency domain coefficients are the source frequency band of the frequency band replication.
With the above method, the spectrum is replicated for all zero-bit encoded subbands with noise filling subbands numbered 1, 2, 3, 4.
In addition to the above band replication methods, other band replication methods are also applicable to the present invention, and do not affect the implementation of the present invention.
Step 502: according to the noise level obtained by decoding, carrying out energy adjustment on a frequency domain coefficient obtained after copying a zero bit coding sub-band in each noise filling sub-band;
and calculating the amplitude envelope of the frequency domain coefficient obtained after zero bit coding sub-band replication, and recording the amplitude envelope as sbr _ rms (r). The calculation formula of the frequency domain coefficient after the energy adjustment is as follows:
<math><mrow><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mi>X</mi><mo>_</mo><mi>sbr</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>/</mo><mi>sbr</mi><mo>_</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure GSA00000030054100372
represents the frequency domain coefficient after the energy adjustment of the zero bit coding sub-band r, X _ sbr (r) represents the frequency domain coefficient obtained after the reproduction of the zero bit coding sub-band r, and sbr _ rms (r) is the frequency domain coefficient obtained after the reproduction of the zero bit coding sub-band rThe amplitude envelope (i.e. root mean square) of the frequency domain coefficient X _ sbr (r), rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding subband r, and is obtained by inverse quantization of the amplitude envelope quantization index, sbr _ lev _ scale (r) is an energy control scale factor copied by the frequency band of the zero-bit coding subband r, and the value of the energy control scale factor is determined by the noise level of the noise-filled subband where the zero-bit coding subband r is located, and the specific calculation formula is as follows:
<math><mrow><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
fill _ energy _ sacleffector is a filling energy scaling factor used for adjusting the gain of the whole filling energy, and the value range of the fill _ energy _ sacleffector is (0, 1), and in the example, the value of the fill _ energy _ sacleffector is 0.2.
Figure GSA00000030054100374
To decode the noise level of the dequantized noise-padded subband j, which may derive an dequantized value from the quantization range of table 3 based on the noise level coded bits, an example implementation in this example is shown in table 10:
TABLE 10 inverse quantization values of noise level
Figure GSA00000030054100375
Where j is the sequence number of the noise-filled subband where the zero-bit encoded subband r is located.
Step 503: and overlapping white noise on the frequency domain coefficient after the energy adjustment to form a final reconstructed frequency domain coefficient.
After the energy adjustment of the copied frequency domain coefficient is finished, white noise is superposed on the frequency domain coefficient after the energy adjustment to form a final reconstructed frequency domain coefficient
Figure GSA00000030054100376
<math><mrow><mover><mi>X</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>+</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>random</mi><mrow><mo>(</mo><mo>)</mo></mrow><mo>;</mo></mrow></math>
Wherein,
Figure GSA00000030054100378
representing the frequency domain coefficients of the zero bit coded subband r reconstruction,
Figure GSA00000030054100379
representing the frequency domain coefficient after the energy adjustment of the zero-bit coding sub-band r, rms (r) is the amplitude envelope of the frequency domain coefficient before the coding of the zero-bit coding sub-band r, which is obtained by inverse quantization of the amplitude envelope quantization index, random () is a random phase generator, which generates a random phase value, the return value of which is +1 or-1, noise _ lev _ scale (r) is a noise level control scale factor of the zero-bit coding sub-band r, and the value of the scale factor is determined by the noise level of the noise filling sub-band where the zero-bit coding sub-band r is located. The specific calculation formula is as follows:
<math><mrow><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo></mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo></mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
wherein, fill _ energy _ sacleffector is a filling energy scaling factor, which is used for adjusting the gain of the whole filling energy, and the value range thereof is (0, 1), in this example, the value is 0.2.
Figure GSA00000030054100382
And filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
Of course, white noise filling is performed on the zero-bit encoded subband in the effective noise-filled subband without bits (e.g., the noise-filled subband with the sequence number of 0) to realize frequency domain coefficient reconstruction, which is not described herein again.
The invention also provides a noise level estimation method, which comprises the following steps:
estimating a power spectrum of the audio signal to be encoded according to the frequency domain coefficient of the audio signal to be encoded;
estimating a noise level of the zero-bit encoded subband audio signal based on the estimated power spectrum, the noise level being used to control a ratio of noise filling and energy of the band replication when decoding; wherein, the zero bit coding sub-band refers to the coding sub-band with zero bit number. A noise level may be calculated for each zero bit encoded subband or a common noise level may be calculated for several noise subbands.
Further, the noise level of the zero-bit encoded subband audio signal is the ratio of the noise component power estimated from the zero-bit encoded subband to the pitch component power estimated from the zero-bit encoded subband.
Further, estimating the power spectrum of the audio signal to be encoded according to the MDCT frequency domain coefficient of the audio signal to be encoded, wherein a formula for estimating the power of the frequency point k of the ith frame is as follows:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein P is when i is equal to 0i-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
Further, the process of dividing the frequency domain coefficients of the audio signal to be encoded into one or several noise-filled subbands and calculating the noise level of a certain effective noise-filled subband according to the estimated power spectrum of the audio signal to be encoded specifically includes:
calculating the average value of the power of all frequency domain coefficients of all or part of zero bit coding sub-bands in the effective noise filling sub-band to obtain the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band;
and calculating the ratio P _ noise _ rate (j) of the average power P _ noise _ aveg (j) of the noise component and the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band.
Wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
Coding system
In order to implement the above encoding method, the present invention further provides an audio encoding system, as shown in fig. 6, the system includes a Modified Discrete Cosine Transform (MDCT) unit, an amplitude envelope calculation unit, an amplitude envelope quantization and encoding unit, a bit allocation unit, a frequency domain coefficient encoding unit, a noise level estimation unit, and a bit stream Multiplexer (MUX); wherein:
the MDCT unit is used for performing modified inverse discrete cosine transform on the audio signal to generate a frequency domain coefficient;
the amplitude envelope calculation unit is connected with the MDCT unit and used for dividing the frequency domain coefficient generated by the MDCT into a plurality of coding sub-bands and calculating the amplitude envelope of each coding sub-band;
when the amplitude envelope calculation unit divides the coding sub-bands, the frequency domain coefficients after the MDCT transformation are divided into a plurality of coding sub-bands with equal intervals, or are divided into a plurality of non-uniform coding sub-bands according to the auditory perception characteristics.
The amplitude envelope quantization and coding unit is connected with the amplitude envelope calculation unit and is used for quantizing and coding the amplitude envelope value of each coding sub-band to generate coding bits of the amplitude envelope of each coding sub-band;
the bit distribution unit is connected with the amplitude envelope quantization and coding unit and is used for carrying out bit distribution to obtain the number of coding bits distributed to each frequency domain coefficient in each coding sub-band;
specifically, the bit allocation unit includes an importance calculating module and a bit allocation module connected to each other, wherein:
the importance calculating module is used for calculating the initial value of the importance of each coding sub-band according to the amplitude envelope value of the coding sub-band;
the bit distribution module is used for carrying out bit distribution on each frequency domain coefficient in the coding sub-band according to the importance of each coding sub-band, and in the bit distribution process, the bit distribution step size and the step size of the reduced importance after the bit distribution are changed.
The importance initial value is calculated according to the optimal bit value under the condition of maximum quantization signal-to-noise ratio gain and the scale factor conforming to the human ear perception characteristic, or the quantization index Th of each coding sub-band amplitude envelopeq(j) Or
Figure GSA00000030054100401
Where μ > 0 and both μ and v are real numbers.
When the importance calculating module calculates the importance initial value, firstly calculating the bit consumption average value of a single frequency domain coefficient; then calculating the optimal bit value under the condition of the maximum quantization signal-to-noise ratio gain according to a code rate distortion theory; then calculating an importance initial value of each coding sub-band in bit allocation according to the bit consumption average value and the optimal ratio;
the bit distribution module carries out bit distribution on each coding sub-band according to the importance of each coding sub-band: increasing the number of coding bits of each frequency domain coefficient in the coding sub-band with the maximum importance, and reducing the importance of the coding sub-band; until the sum of the number of bits consumed by all the encoded subbands meets a maximum value that provides a bit-constraint.
When the bit distribution module distributes bits, the bit distribution step length of the low bit coding sub-band and the importance reduction step length after the bit distribution are smaller than the bit distribution step length of the zero bit coding sub-band and the high bit coding sub-band and the importance reduction step length after the bit distribution. Such as: when the bit distribution module carries out bit distribution, the bit distribution step length of the low bit coding sub-band and the importance reduction step length after bit distribution are both 0.5; the bit allocation step size of the zero bit coding sub-band and the high bit coding sub-band and the importance reduction step size after the bit allocation are both 1.
The frequency domain coefficient quantization coding unit is connected with the MDCT unit, the bit distribution unit and the amplitude envelope quantization and coding unit and is used for carrying out normalization, quantization and coding processing on all frequency domain coefficients in each coding sub-band to generate frequency domain coefficient coding bits;
the noise level estimation unit is connected with the MDCT unit and the bit distribution unit and used for estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded, further estimating the noise level of the zero-bit coded sub-band audio signal and carrying out quantization coding to obtain a noise level coded bit; wherein the noise level is used to control the proportion of noise filling and spectral band replication energy when decoding, see in particular fig. 7;
the noise level of the zero-bit coding sub-band audio signal refers to the ratio of the noise component power estimated in the zero-bit coding sub-band to the tone component power estimated in the zero-bit coding sub-band.
And the bit stream Multiplexer (MUX) is connected with the amplitude envelope quantization and coding unit, the frequency domain coefficient coding unit and the noise level estimation unit and is used for multiplexing the coded bits of each coded sub-band and the coded bits of the frequency domain coefficient and the coded bits of the noise level and sending the multiplexed coded bits to a decoding end.
The bit stream multiplexer multiplexes and packs the coded bits into amplitude envelope Huffman coding identification, frequency domain coefficient Huffman coding identification, bit distribution modification iteration times, amplitude envelope coding bits, frequency domain coefficient coding bits and noise level coding bits in sequence.
As shown in fig. 7, the noise level estimation unit specifically includes:
the power spectrum estimation module is used for estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded;
the power spectrum estimation module adopts the following formula to estimate the power of the frequency point k of the ith frame:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein when i is equal toAt 0 time Pi-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
A noise level calculation module, connected to the power spectrum estimation module, for estimating a noise level of the audio signal of the noise-filled subband allocated to the bit according to the power spectrum estimated by the power spectrum estimation module;
and the noise level coding module is connected with the noise level calculation module and is used for carrying out quantization coding on the noise level calculated by the noise level calculation module to obtain a noise level coding bit.
Further, the frequency domain coefficients of the audio signal to be encoded are divided into one or several noise-filled sub-bands, and the function of the noise level calculation module specifically includes: the average value of all frequency domain coefficient powers of all or part of zero bit coding sub-bands in the effective noise filling sub-band is calculated to obtain an average power P _ aveg (j); for calculating the power P in all or part of the zero-bit coded sub-bands of the effective noise-filled sub-bandi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j); for calculating the power P in all or part of the zero-bit coded sub-bands of the effective noise-filled sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band; calculating the ratio of the average power P _ noise _ aveg (j) of the noise component to the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band;
wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
The noise level estimation unit also comprises a bit distribution module connected with the noise level calculation module and the noise level coding module and used for distributing bits for all effective noise filling sub-bands or skipping one or more low-frequency effective noise filling sub-bands, distributing bits for the subsequent high-frequency effective noise filling sub-bands and informing the noise level calculation module and the noise level coding module; the noise level calculation module calculates a noise level only for the noise-filled sub-bands to which bits are allocated; and the noise level coding module carries out quantization coding on the noise level by using the bits distributed by the bit distribution module.
Four, decoding system
In order to implement the above decoding method, the present invention further provides an audio decoding system, as shown in fig. 8, the system includes a bitstream demultiplexer (DeMUX), an encoded subband amplitude envelope decoding unit, a bit allocation unit, a frequency domain coefficient decoding unit, a spectrum reconstruction unit, and an Inverse Modified Discrete Cosine Transform (IMDCT) unit, wherein:
a bitstream demultiplexer (DeMUX) for separating amplitude envelope encoded bits, frequency domain coefficient encoded bits, and noise level encoded bits from a bitstream to be decoded;
the amplitude envelope decoding unit is connected with the bit stream demultiplexer and is used for decoding the coded bits of the amplitude envelope output by the bit stream demultiplexer to obtain the amplitude envelope quantization index of each coded sub-band;
a bit allocation unit connected to the amplitude envelope decoding unit for allocating bits to each encoded subband and allocating bits to a noise-filled subband containing zero-bit encoded subbands;
the bit allocation unit comprises an importance calculation module, a bit allocation module and a bit allocation modification module, wherein:
the importance calculating module is used for calculating the initial value of the importance of each coding sub-band according to the amplitude envelope value of the coding sub-band;
the bit allocation module is used for performing bit allocation on each frequency domain coefficient in the coding sub-band according to the importance initial value of each coding sub-band, and in the bit allocation process, the bit allocation step length and the step length of reduced importance after bit allocation are changed;
and the bit distribution correction module is used for performing the count number bit distribution correction on the coding sub-band according to the bit distribution correction iteration count value of the coding end and the importance of each coding sub-band after the bit distribution is performed.
When the bit distribution module distributes bits, the bit distribution step length of the low bit coding sub-band and the importance reduction step length after the bit distribution are smaller than the bit distribution step length of the zero bit coding sub-band and the high bit coding sub-band and the importance reduction step length after the bit distribution.
When the bit distribution correction module performs bit correction, the bit correction step length of the low bit coding sub-band and the importance reduction step length after bit correction are smaller than the bit correction step length of the zero bit coding sub-band and the high bit coding sub-band and the importance reduction step length after bit correction.
And when the bit distribution unit distributes bits for the noise filling sub-bands, distributing the bits for all the effective noise filling sub-bands or skipping one or more low-frequency effective noise filling sub-bands according to the distribution method of the encoder, and distributing the bits for the subsequent high-frequency effective noise filling sub-bands.
The frequency domain coefficient decoding unit is connected with the amplitude envelope decoding unit and the bit distribution unit and is used for decoding, inverse quantizing and inverse normalizing the encoded sub-band to obtain a frequency domain coefficient;
the noise level decoding unit is connected with the bit stream demultiplexer and the bit distribution unit and is used for decoding and inversely quantizing the noise level coded bits to obtain a noise level;
a spectrum reconstruction unit: the device is connected with the noise level decoding unit, the frequency domain coefficient decoding unit, the amplitude envelope decoding unit and the bit distribution unit and is used for performing frequency band replication on a zero-bit coding sub-band, controlling the whole energy filling level of the coding sub-band according to the amplitude envelope output by the amplitude envelope decoding unit, and controlling the proportion of noise filling and frequency band replication energy according to the noise level output by the noise level decoding unit to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
and the modified inverse discrete cosine transform (IMDCT) unit is connected with the frequency spectrum reconstruction unit and is used for carrying out IMDCT on the frequency domain coefficient after the frequency spectrum reconstruction of the zero-bit coding sub-band is completed to obtain an audio signal.
As shown in fig. 9, the spectrum reconstructing unit specifically includes a band replication subunit, an energy adjusting subunit, and a noise filling subunit, which are connected in sequence, where:
the frequency band replication sub-band unit is used for carrying out frequency band replication on the zero bit coding sub-band;
the energy adjusting subunit is used for calculating the amplitude envelope of the frequency domain coefficient obtained after the zero bit coding sub-band frequency band replication and is marked as sbr _ rms (r); and adjusting the energy of the frequency domain coefficient obtained after copying according to the noise level output by the noise level decoding unit, wherein the frequency domain coefficient after energy adjustment is as follows:
<math><mrow><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mi>X</mi><mo>_</mo><mi>sbr</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>/</mo><mi>sbr</mi><mo>_</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure GSA00000030054100442
the energy-adjusted frequency domain coefficient of the zero-bit coding sub-band r is represented, X _ sbr (r) represents a frequency domain coefficient obtained by copying the zero-bit coding sub-band r, sbr _ rms (r) is an amplitude envelope of a frequency domain coefficient X _ sbr (r) obtained by copying the zero-bit coding sub-band r, rms (r) is an amplitude envelope of a frequency domain coefficient before coding of the zero-bit coding sub-band r and is obtained by inverse quantization of an amplitude envelope quantization index, sbr _ lev _ scale (r) is an energy control scale factor copied by a frequency band of the zero-bit coding sub-band r, and the value of the energy control scale factor is determined by the noise level of a noise filling sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
fill _ energy _ sacleffector is a filling energy scaling factor used for adjusting the gain of the whole filling energy, and the value range of the fill _ energy _ sacleffector is (0, 1),
Figure GSA00000030054100444
and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
A noise filling subunit, configured to perform noise filling on the energy-adjusted frequency domain coefficient according to the noise level output by the noise level decoding unit, where the noise filling formula is as follows:
<math><mrow><mover><mi>X</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>+</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo><mi>random</mi><mrow><mo>(</mo><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure GSA00000030054100446
representing zero-bit encoded sub-bands r reconstructing the frequency domain coefficients,
Figure GSA00000030054100447
representing the frequency domain coefficient after the energy adjustment of the zero-bit coding sub-band r, rms (r) is the amplitude envelope of the frequency domain coefficient before the coding of the zero-bit coding sub-band r, which is obtained by inverse quantization of the amplitude envelope quantization index, random () is a random phase generator, which generates a random phase value, the return value of which is +1 or-1, noise _ lev _ scale (r) is a noise level control scale factor of the zero-bit coding sub-band r, and the value of the scale factor is determined by the noise level of the noise filling sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo></mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo></mo></mrow><mo>*</mo><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
wherein, fill _ energy _ sacleffector is a filling energy scaling factor, which is used for adjusting the gain of the whole filling energy, and the value range thereof is (0, 1), in this example, the value is 0.2.
Figure GSA00000030054100452
And filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
The band replication sub-unit performs band replication on a zero-bit encoding sub-band in the noise-filled sub-band to which bits are allocated according to a bit allocation result of the bit allocation unit; the energy adjusting subunit performs energy adjustment on the frequency domain coefficient obtained after the frequency band replication; and the noise filling subunit performs noise filling on the frequency domain coefficient after the energy adjustment and a zero bit coding sub-band in the noise filling sub-band without bits.
Further, as shown in fig. 9, the band replication sub-unit includes a pitch position search module, a period and source frequency band calculation module, a source frequency band replication start sequence number calculation module, and a band replication module, which are connected in sequence, where:
the tone position searching module is configured to search a position where a certain tone of the audio signal is located in the MDCT frequency domain coefficient, and specifically includes: taking an absolute value or a square value of the MDCT frequency domain coefficient of the first frequency band, and performing smooth filtering; searching the position of the maximum extreme value of the filtering output value of the first frequency band according to the result of the smooth filtering, wherein the position of the maximum extreme value is the position of the tone;
the period and source frequency band calculation module is used for determining a frequency band replication period and a source frequency band for replication according to the position of the tone, wherein the frequency band replication period is the bandwidth from a frequency point 0 to a frequency point at the tone position, and the source frequency band is a frequency band in which the frequency point offset copy band _ offset is copied to the frequency point at the tone position from the frequency point 0 and the frequency point offset band _ offset is shifted backwards;
if the number of the bin at the tone position is called Tonal _ pos and the pre-set spectral band offset is called copyband _ offset, the start number of the frequency domain coefficient of the source band is copyband _ offset and the end number is copyband _ offset + Tonal _ pos.
And the source frequency band replication starting sequence number calculation module is used for calculating the source frequency band replication starting sequence number of the zero-bit coding sub-band according to the source frequency band and the starting sequence number of the zero-bit coding sub-band needing frequency band replication.
The frequency band copying module is used for copying the frequency domain coefficient of the source frequency band to the zero-bit encoding sub-band periodically from the copying initial sequence number of the source frequency band by taking a frequency band copying period as a period;
and if the highest frequency in the zero-bit coding sub-band needing the frequency band replication is less than the frequency of the searched tone, the frequency point only adopts noise filling to reconstruct the frequency spectrum and does not carry out the frequency band replication.
The tone position searching module searches the position of the tone by adopting the following method: taking an absolute value or a square value of the MDCT frequency domain coefficient of the first frequency band, and performing smooth filtering; searching the position of the maximum extreme value of the filtering output value of the first frequency band according to the result of the smooth filtering, wherein the position of the maximum extreme value is the position of the tone;
further, the air conditioner is provided with a fan,
the operation formula of the tone position searching module for performing smooth filtering on the absolute value of the MDCT frequency domain coefficient of the first frequency band is as follows: <math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><mo>|</mo><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo></mrow></math>
or, the operation of performing smooth filtering on the frequency domain coefficient square value of the first frequency band is as follows:
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><msup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math>
wherein μ is a smoothing filter coefficient, and the value is 0.125, X _ ampi(k) Represents the filtered output value of the k-th frequency bin of the ith frame,
Figure GSA00000030054100463
when the MDCT coefficient decoded for the k-th frequency point of the ith frame is equal to 0, X _ ampi-1(k)=0。
Further, the first frequency band is a frequency band of low frequencies with a relatively concentrated energy determined according to the statistical characteristics of the frequency spectrum, wherein the low frequencies refer to spectral components with less than half of the total bandwidth of the signal.
Further, the pitch position searching module directly searches an initial maximum value from the filtering output values of the frequency domain coefficients corresponding to the first frequency band, and takes the maximum value as a maximum extreme value of the filtering output values of the first frequency band.
Further, when the pitch position search module determines the maximum extremum of the filtering output value, one segment of the first frequency band is used as the second frequency band, an initial maximum value is searched from the filtering output value of the frequency domain coefficient corresponding to the second frequency band, and then different processing is performed according to the position of the frequency domain coefficient corresponding to the initial maximum value:
a. if the initial maximum value is the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band, comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, sequentially comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, and until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or, until the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band;
b. if the initial maximum value is the filtering output value of the frequency domain coefficient with the highest frequency in the second frequency band, comparing the filtering output value of the frequency domain coefficient with the filtering output value of the next frequency domain coefficient with the higher frequency in the first frequency band, and sequentially comparing backwards until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or until the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is the finally determined maximum extreme value;
c. if the initial maximum value is the filtering output value of the frequency domain coefficient between the lowest frequency and the highest frequency of the second frequency band, the frequency domain coefficient corresponding to the initial maximum value is the position of the tone, that is, the initial maximum value is the finally determined maximum extreme value.
Further, the process of calculating the source frequency band replication start sequence number of the zero-bit encoded sub-band requiring frequency band replication by the source frequency band replication start sequence number calculation module includes: obtaining the sequence number of the initial frequency point of the zero-bit coding sub-band needing to reconstruct the frequency domain coefficient currently, marking the sequence number as fillband _ start _ freq, marking the sequence number of the frequency point corresponding to the tone as Tonal _ pos, adding 1 to the Tonal _ pos to obtain a copy period copy _ period, marking the initial sequence number of the source frequency band as copy band _ offset, and circularly subtracting the copy _ period from the value of fillband _ start _ freq until the value falls in the value interval of the sequence number of the source frequency band, and marking the value as copy _ pos _ mod as the copy initial sequence number of the source frequency band.
Further, when the frequency band replication module performs frequency band replication, the method specifically includes:
and sequentially copying the frequency domain coefficients starting from the copy start sequence number of the source frequency band backwards to a zero-bit coding subband taking fillband _ start _ freq as a start position until the frequency point copied by the source frequency band reaches a Tonal _ pos + copy band _ offset frequency point, then continuously copying the frequency domain coefficients starting from the frequency point of the second copy band _ offset backwards to the zero-bit coding subband again, and so on until all the frequency domain coefficients of the current zero-bit coding subband are copied.

Claims (41)

1. A method of noise level estimation, the method comprising:
estimating a power spectrum of the audio signal to be encoded according to the frequency domain coefficient of the audio signal to be encoded;
estimating a noise level of the zero-bit encoded subband audio signal based on the calculated power spectrum, the noise level being used to control a ratio of noise filling and energy of the band replication when decoding; wherein, the zero bit coding sub-band refers to the coding sub-band with zero bit number.
2. The method of claim 1, wherein: the noise level of the zero-bit coding sub-band audio signal refers to the ratio of the noise component power estimated in the zero-bit coding sub-band to the tone component power estimated in the zero-bit coding sub-band.
3. The method of claim 1, wherein:
estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded, wherein the power calculation formula of the frequency point k of the ith frame is as follows:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein P is when i is equal to 0i-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
4. The method of claim 1, wherein:
the process of dividing the frequency domain coefficients of the audio signal to be encoded into one or several noise-filled sub-bands and calculating the noise level of a certain effective noise-filled sub-band according to the estimated power spectrum of the audio signal to be encoded specifically comprises:
calculating the average value of the power of all frequency domain coefficients of all or part of zero bit coding sub-bands in the effective noise filling sub-band to obtain the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) To obtain an average value ofThe average power P _ noise _ aveg (j) of the noise components of the zero-bit coded sub-band in the effective noise-padded sub-band;
and calculating the ratio P _ noise _ rate (j) of the average power P _ noise _ aveg (j) of the noise component and the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band.
Wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
5. An audio encoding method, characterized in that the method comprises:
A. dividing MDCT frequency domain coefficients of an audio signal to be coded into a plurality of coding sub-bands, and carrying out quantization coding on amplitude envelope values of the coding sub-bands to obtain amplitude envelope coding bits;
B. bit allocation is carried out on each coding sub-band, and quantization coding is carried out on non-zero bit coding sub-bands, so that MDCT frequency domain coefficient coding bits are obtained;
C. estimating the power spectrum of the audio signal to be encoded according to the MDCT frequency domain coefficient of the audio signal to be encoded, further estimating the noise level of the zero-bit encoded sub-band audio signal, and carrying out quantization encoding to obtain a noise level encoding bit; wherein, the noise level is used for controlling the proportion of noise filling and energy of frequency band replication when decoding, and the zero bit coding sub-band refers to the coding sub-band with zero bit number distributed;
D. and multiplexing and packaging the amplitude envelope coded bit, the frequency domain coefficient coded bit and the noise level coded bit of each coded sub-band, and transmitting the result to a decoding end.
6. The method of claim 5, wherein: in step C, the noise level of the zero-bit encoded subband audio signal is the ratio of the noise component power estimated in the zero-bit encoded subband to the pitch component power estimated in the zero-bit encoded subband.
7. The method of claim 5, wherein:
estimating the power spectrum of the audio signal to be encoded according to the MDCT frequency domain coefficient of the audio signal to be encoded, wherein the algorithm for estimating the power of the frequency point k of the ith frame is as follows:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein when i is equal to 0, Pi-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
8. The method as claimed in claim 5, wherein in step B, the frequency domain coefficients of the audio signal to be encoded are divided into one or several noise-filled sub-bands, and after allocating bits to each encoded sub-band, the bits are allocated to the effective noise-filled sub-bands; in step C, the process of calculating the noise level of a certain effective noise-filled subband according to the estimated power spectrum of the audio signal to be encoded specifically comprises:
calculating the average value of all frequency domain coefficients of all or part of zero bit coding sub-bands in the effective noise filling sub-band to obtain the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j);
calculating the power P in all or part of zero bit coding sub-band of the effective noise filling sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band;
and calculating the ratio P _ noise _ rate (j) of the average power P _ noise _ aveg (j) of the noise component and the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band.
Wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
9. The method of claim 8, wherein:
when the noise filling sub-band is divided, the noise filling sub-band is uniformly divided or non-uniformly divided according to the auditory characteristics of human ears, and one noise filling sub-band comprises one or more coding sub-bands.
10. The method of claim 8, wherein: b, distributing bits for all effective noise filling sub-bands or skipping one or more low-frequency effective noise filling sub-bands and distributing bits for the subsequent high-frequency effective noise filling sub-bands; calculating the noise level of the effective noise filling sub-band distributed with bits in the step C; the noise level coded bits are multiplexed and packed using the allocated bits in step D.
11. The method of claim 8, wherein: each effective noise-filling sub-band is allocated the same number of bits or different number of bits according to auditory characteristics.
12. A method for audio decoding, the method comprising:
a2, decoding and inversely quantizing each amplitude envelope coded bit in the bit stream to be decoded to obtain the amplitude envelope of each coded sub-band;
b2, carrying out bit allocation on each coding sub-band, carrying out decoding and inverse quantization on the noise level coding bits to obtain the noise level of a zero-bit coding sub-band, and carrying out decoding and inverse quantization on the frequency domain coefficient coding bits to obtain the frequency domain coefficient of a non-zero-bit coding sub-band;
c2, performing frequency band replication on the zero-bit coding sub-band, controlling the whole energy filling level of the coding sub-band according to the amplitude envelope of each zero-bit coding sub-band, and controlling the proportion of noise filling and frequency band replication energy according to the noise level of the zero-bit coding sub-band to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
d2, performing Inverse Modified Discrete Cosine Transform (IMDCT) on the frequency domain coefficient of the non-zero bit coding sub-band and the frequency domain coefficient of the reconstructed zero bit coding sub-band to obtain a final audio signal.
13. The method as claimed in claim 12, wherein in the step C2, in the frequency band replication, a position of a certain tone of the audio signal is searched in the MDCT frequency domain coefficients, a frequency band replication period is set as a bandwidth from a frequency bin of 0 to a frequency bin of the tone position, and a frequency band of a frequency bin of 0 shifted backward by copyband _ offset to a frequency bin of the tone position is shifted backward by copyband _ offset as a source frequency band, and the frequency band replication is performed on the zero-bit encoded subband, and if a highest frequency inside the zero-bit encoded subband is less than a frequency of the searched tone, the zero-bit encoded subband is only subjected to the spectral reconstruction by using noise padding.
14. The method of claim 12, wherein, in step C2,
taking an absolute value or a square value of the frequency domain coefficient of the first frequency band and carrying out smooth filtering;
and searching the position of the maximum extreme value of the filtering output value of the first frequency band according to the result of the smooth filtering, and taking the position of the maximum extreme value as the position of a certain tone.
15. The method of claim 14, wherein:
the operation formula for performing smooth filtering on the frequency domain coefficient absolute value of the first frequency band is as follows:
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><mo>|</mo><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo></mrow></math>
or, the operation formula for performing smooth filtering on the frequency domain coefficient square value of the first frequency band is as follows
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><msup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math>
Where μ is the smoothing filter coefficient, X _ ampi(k) Represents the filtered output value of the k-th frequency bin of the ith frame,
Figure FSA00000030054000051
when the MDCT coefficient decoded for the k-th frequency point of the ith frame is equal to 0, X _ ampi-1(k)=0。
16. The method of claim 14, wherein the first band is a band of low frequencies in a relatively concentrated energy determined based on statistical properties of the frequency spectrum, wherein low frequencies refer to spectral components less than one-half of the total bandwidth of the signal.
17. The method of claim 14, wherein the maximum extremum of the filtered output value is determined by: and searching an initial maximum value directly from the filtering output value of the frequency domain coefficient corresponding to the first frequency band, and taking the maximum value as a maximum extreme value of the filtering output value of the first frequency band.
18. The method of claim 14, wherein the maximum extremum of the filtered output value is determined by:
taking one section of the first frequency band as a second frequency band, searching an initial maximum value from the filtering output value of the frequency domain coefficient corresponding to the second frequency band, and performing different processing according to the position of the frequency domain coefficient corresponding to the initial maximum value:
a. if the initial maximum value is the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band, comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, sequentially comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, and until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or, until the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band;
b. if the initial maximum value is the filtering output value of the frequency domain coefficient with the highest frequency in the second frequency band, comparing the filtering output value of the frequency domain coefficient with the filtering output value of the next frequency domain coefficient with the higher frequency in the first frequency band, and sequentially comparing backwards until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or until the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is the finally determined maximum extreme value;
c. if the initial maximum value is the filtering output value of the frequency domain coefficient between the lowest frequency and the highest frequency of the second frequency band, the frequency domain coefficient corresponding to the initial maximum value is the position of the tone, that is, the initial maximum value is the finally determined maximum extreme value.
19. The method as claimed in claim 13, wherein in step C2, when performing band replication on the zero-bit encoded sub-band, the method first calculates a source frequency band replication start sequence number of the zero-bit encoded sub-band according to the source frequency band and the start sequence number of the zero-bit encoded sub-band that needs to be band replicated, and then periodically replicates the frequency domain coefficients of the source frequency band to the zero-bit encoded sub-band from the source frequency band replication start sequence number with the band replication cycle as a cycle.
20. The method of claim 19, wherein the method for calculating the source segment copy start index of the zero-bit encoded sub-band in step C2 comprises:
obtaining the sequence number of the frequency point of the initial MDCT frequency domain coefficient of the zero-bit coding sub-band needing to reconstruct the frequency domain coefficient, marking the sequence number as fillband _ start _ freq, marking the sequence number of the frequency point corresponding to the tone as Tonal _ pos, adding 1 to the Tonal _ pos to obtain a copy period copy _ period, marking the frequency band copy offset as copy band _ offset, and circularly subtracting the copy _ period from the value of fillband _ start _ freq until the value falls in the value interval of the sequence number of the source frequency band, wherein the value is the copy start sequence number of the source frequency band and is marked as copy _ pos _ mod.
21. The method as claimed in claim 19, wherein the step C2, with the band replication period as a period, periodically replicating the frequency domain coefficients of the source frequency band to the zero-bit encoded sub-bands starting from the replication start index of the source frequency band by:
and sequentially copying the frequency domain coefficients starting from the copy start sequence number of the source frequency band backwards to a zero-bit coding subband taking fillband _ start _ freq as a start position until the frequency point copied by the source frequency band reaches a Tonal _ pos + copy band _ offset frequency point, then continuously copying the frequency domain coefficients starting from the frequency point of the second copy band _ offset backwards to the zero-bit coding subband again, and so on until the frequency band copying of all the frequency domain coefficients of the current zero-bit coding subband is completed.
22. The method as claimed in claim 12, wherein in step C2, the energy of the frequency domain coefficients obtained after zero-bit encoded subband copying is adjusted by the following method:
calculating the amplitude envelope of the frequency domain coefficient obtained after zero bit coding sub-band frequency band replication, and recording as sbr _ rms (r);
the formula for adjusting the energy of the frequency domain coefficient obtained after copying is as follows:
<math><mrow><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mi>X</mi><mo>_</mo><mi>sbr</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>/</mo><mi>sbr</mi><mo>_</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,the energy-adjusted frequency domain coefficient of the zero-bit coding sub-band r is represented, X _ sbr (r) represents the frequency domain coefficient obtained by copying the zero-bit coding sub-band r, sbr _ rms (r) is the amplitude envelope of the frequency domain coefficient X _ sbr (r) obtained by copying the zero-bit coding sub-band r, rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding sub-band r and is obtained by inverse quantization of an amplitude envelope quantization index, sbr _ lev _ scale (r) is an energy control scale factor copied by the frequency band of the zero-bit coding sub-band r, and the value of the energy control scale factor is determined by the noise level of the noise-filled sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><msup><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>*</mo></msup><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
fill _ energy _ sacleffector is a filling energy scaling factor used for adjusting the gain of the whole filling energy, and the value range of the fill _ energy _ sacleffector is (0, 1),
Figure FSA00000030054000074
and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
23. The method according to claim 12, wherein in step C2, the energy-adjusted frequency domain coefficients are noise-filled according to the following formula:
<math><mrow><mover><mi>X</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>+</mo><mi>rms</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>random</mi><mrow><mo>(</mo><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure FSA00000030054000076
representing zero-bit encoded sub-bands r reconstructing the frequency domain coefficients,
Figure FSA00000030054000077
energy-adjusted to represent zero-bit coded sub-bands rFrequency domain coefficient, rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding subband r, which is obtained by inverse quantization of the amplitude envelope quantization index, random () is a random phase generator, which generates a random phase value, the return value of which is +1 or-1, noise _ lev _ scale (r) is the noise level control scale factor of the zero-bit coding subband r, the value of which is determined by the noise level of the noise-filled subband where the zero-bit coding subband r is located, and the specific calculation formula is as follows:
<math><mrow><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo></mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover></mrow><msup><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
wherein, fill _ energy _ sacleffector is a filling energy scale factor used for adjusting the gain of the whole filling energy, and the value range is (0, 1),
Figure FSA00000030054000079
and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
24. The method of claim 13, wherein: in step B2, after bit allocation is performed on each encoded sub-band, the encoded sub-band is divided into a plurality of noise-padded sub-bands, and bit allocation is performed on the effective noise-padded sub-band, in step C2, band replication is performed on a zero-bit encoded sub-band in the effective noise-padded sub-band to which bits are allocated, the energy level of the replicated frequency domain coefficient and the energy level of noise padding are controlled, and noise padding is performed on a zero-bit encoded sub-band in the effective noise-padded sub-band to which bits are not allocated, wherein the effective noise-padded sub-band refers to a noise-padded sub-band including the zero-bit encoded sub-.
25. An audio encoding system comprising a Modified Discrete Cosine Transform (MDCT) unit, an amplitude envelope calculation unit, an amplitude envelope quantization and encoding unit, a bit allocation unit, a frequency domain coefficient encoding unit and a bitstream Multiplexer (MUX), characterized in that the system further comprises a noise level estimation unit, wherein:
the MDCT unit is used for performing modified inverse discrete cosine transform on the audio signal to generate a frequency domain coefficient;
the amplitude envelope calculation unit is connected with the MDCT unit and used for dividing the frequency domain coefficient generated by the MDCT into a plurality of coding sub-bands and calculating the amplitude envelope value of each coding sub-band;
the amplitude envelope quantization and coding unit is connected with the amplitude envelope calculation unit and is used for quantizing and coding the amplitude envelope value of each coding sub-band to generate coding bits of the amplitude envelope of each coding sub-band;
the bit distribution unit is connected with the amplitude envelope quantization and coding unit and is used for distributing bits to each coding sub-band;
the frequency domain coefficient quantization coding unit is connected with the MDCT unit, the bit distribution unit and the amplitude envelope quantization and coding unit and is used for carrying out normalization, quantization and coding processing on all frequency domain coefficients in each coding sub-band to generate frequency domain coefficient coding bits;
the noise level estimation unit is connected with the MDCT unit and the bit distribution unit and used for estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded, further estimating the noise level of the zero-bit coded sub-band audio signal and carrying out quantization coding to obtain a noise level coded bit; wherein the noise level is used to control the proportion of noise filling and energy of band replication when decoding;
and a bit stream Multiplexer (MUX) connected to the amplitude envelope quantization and encoding unit, the frequency domain coefficient encoding unit and the noise level estimation unit, for multiplexing and transmitting the encoded bits of each encoded sub-band and the encoded bits of the frequency domain coefficient to a decoding end.
26. The system of claim 25, wherein: the noise level of the zero-bit coding sub-band audio signal refers to the ratio of the noise component power estimated in the zero-bit coding sub-band to the tone component power estimated in the zero-bit coding sub-band.
27. The system according to claim 25, wherein the noise level estimation unit specifically comprises:
the power spectrum estimation module is used for estimating the power spectrum of the audio signal to be coded according to the MDCT frequency domain coefficient of the audio signal to be coded;
the noise level calculation module is connected with the power spectrum estimation module and used for estimating the noise level of the zero-bit coding sub-band audio signal according to the power spectrum estimated by the power spectrum estimation module;
and the noise level coding module is connected with the noise level calculation module and is used for carrying out quantization coding on the noise level calculated by the noise level calculation module to obtain a noise level coding bit.
28. The system of claim 27, wherein: the power spectrum estimation module estimates the power of the frequency point k of the ith frame by adopting the following formula:
Pi(k)=λPi-1(k)+(1-λ)Xi(k)2wherein P is when i is equal to 0i-1(k)=0;Pi(k) The power value estimated by the kth frequency point of the ith frame is represented; xi(k) And the MDCT coefficient of the k frequency point of the ith frame is shown, and lambda is the filter coefficient of the single-pole smoothing filter.
29. The system of claim 27, wherein:
the frequency domain coefficients of the audio signal to be encoded are divided into one or several noise-filled sub-bands, and the function of the noise level calculation module specifically includes: for calculating the effective noise filling factorAverage value of all frequency domain coefficient power of all or partial zero bit coding sub-bands in the band to obtain average power P _ aveg (j); for calculating the power P in all or part of the zero-bit coded sub-bands of the effective noise-filled sub-bandi(k) Obtaining the average power P _ signal _ aveg (j) of the tone component of the zero bit coding sub-band in the effective noise filling sub-band by the average value of the power of all frequency domain coefficients larger than the average power P _ aveg (j); for calculating the power P in all or part of the zero-bit coded sub-bands of the effective noise-filled sub-bandi(k) Power P of all frequency domain coefficients less than or equal to the average power P _ aveg (j)i(k) Obtaining the average power P _ noise _ aveg (j) of the noise component of the zero-bit coding sub-band in the effective noise filling sub-band; calculating the ratio of the average power P _ noise _ aveg (j) of the noise component to the average power P _ signal _ aveg (j) of the tone component to obtain the noise level of the effective noise filling sub-band;
wherein an active noise-filled subband refers to a noise-filled subband that contains zero-bit encoded subbands.
30. The system of claim 27, wherein: the noise level estimation unit also comprises a bit distribution module connected with the noise level calculation module and the noise level coding module and used for distributing bits for all effective noise filling sub-bands or skipping one or more low-frequency effective noise filling sub-bands, distributing bits for the subsequent high-frequency effective noise filling sub-bands and informing the noise level calculation module and the noise level coding module; the noise level calculation module calculates a noise level only for the noise-filled sub-bands to which bits are allocated; and the noise level coding module carries out quantization coding on the noise level by using the bits distributed by the bit distribution module.
31. An audio decoding system comprising a bitstream demultiplexer (DeMUX), an encoded subband amplitude envelope decoding unit, a bit allocation unit, a frequency domain coefficient decoding unit, a spectral reconstruction unit, an Inverse Modified Discrete Cosine Transform (IMDCT) unit, characterized in that:
the DeMUX is used for separating amplitude envelope coded bits, frequency domain coefficient coded bits and noise level coded bits from a bit stream to be decoded;
the amplitude envelope decoding unit is connected with the DeMUX and used for decoding the amplitude envelope coded bits output by the bit stream demultiplexer to obtain the amplitude envelope quantization index of each coded sub-band;
the bit distribution unit is connected with the amplitude envelope decoding unit and used for carrying out bit distribution to obtain the number of coding bits distributed to each frequency domain coefficient in each coding sub-band;
the frequency domain coefficient decoding unit is connected with the amplitude envelope decoding unit and the bit distribution unit and is used for decoding, inverse quantizing and inverse normalizing the encoded sub-band to obtain a frequency domain coefficient;
the noise level decoding unit is connected with the bit stream demultiplexer and the bit distribution unit and is used for decoding and inversely quantizing the noise level coded bits to obtain a noise level;
the frequency spectrum reconstruction unit is connected with the noise level decoding unit, the frequency domain coefficient decoding unit, the amplitude envelope decoding unit and the bit allocation unit and is used for carrying out frequency band replication on the zero-bit coding sub-band, controlling the integral energy filling level of the coding sub-band according to the amplitude envelope output by the amplitude envelope decoding unit and controlling the proportion of noise filling and frequency band replication energy according to the noise level output by the noise level decoding unit to obtain the frequency domain coefficient of the reconstructed zero-bit coding sub-band;
and the IMDCT unit is connected with the frequency spectrum reconstruction unit and is used for carrying out IMDCT on the frequency domain coefficient after the frequency spectrum reconstruction of the zero-bit coding sub-band is completed to obtain the audio signal.
32. The system of claim 31, wherein:
the spectrum reconstruction unit comprises a frequency band replication subunit, an energy adjustment subunit and a noise filling subunit which are connected in sequence, wherein:
the frequency band replication sub-band unit is used for carrying out frequency band replication on the zero bit coding sub-band;
the energy adjusting subunit is used for calculating the amplitude envelope of the frequency domain coefficient obtained after the zero bit coding sub-band frequency band replication and is marked as sbr _ rms (r); and adjusting the energy of the frequency domain coefficient obtained after copying according to the noise level output by the noise level decoding unit, wherein the formula of the energy adjustment is as follows:
<math><mrow><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mi>X</mi><mo>_</mo><mi>sbr</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>/</mo><mi>sbr</mi><mo>_</mo><mi>rms</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure FSA00000030054000112
the energy-adjusted frequency domain coefficient of the zero-bit coding sub-band r is represented, X _ sbr (r) represents the frequency domain coefficient obtained after the zero-bit coding sub-band r is copied, sbr _ rms (r) is the amplitude envelope of the frequency domain coefficient X _ sbr (r) obtained after the zero-bit coding sub-band r is copied, rms (r) is the amplitude envelope of the frequency domain coefficient before coding of the zero-bit coding sub-band r and is obtained by inverse quantization of an amplitude envelope quantization index, sbr _ lev _ scale (r) is an energy control scale factor copied by the frequency band of the zero-bit coding sub-band r, and the value of the energy control scale factor is determined by the noise level of a noise filling sub-band where the zero-bit coding sub-band r is located, and the specific calculation formula is as follows:
<math><mrow><mi>sbr</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><msup><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>)</mo></mrow><mo>*</mo></msup><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
fill _ energy _ sacleffector is a filling energy scaling factor used for adjusting the gain of the whole filling energy, and the value range of the fill _ energy _ sacleffector is (0, 1),
Figure FSA00000030054000114
filling the noise level of a sub-band j for the noise obtained by decoding and inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located;
a noise filling subunit, configured to perform noise filling on the energy-adjusted frequency domain coefficient according to the noise level output by the noise level decoding unit, where the noise filling formula is as follows:
<math><mrow><mover><mi>X</mi><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><mover><mrow><mi>X</mi><mo>_</mo><mi>sbr</mi></mrow><mo>&OverBar;</mo></mover><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>+</mo><mi>rms</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><msup><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>random</mi><mrow><mo>(</mo><mo>)</mo></mrow><mo>;</mo></mrow></math>
wherein,
Figure FSA00000030054000116
representing zero-bit encoded sub-bands r reconstructing the frequency domain coefficients,
Figure FSA00000030054000117
representing the energy-adjusted copy frequency domain coefficient of the zero-bit coding sub-band r, rms (r) is the amplitude envelope of the pre-coding frequency domain coefficient of the zero-bit coding sub-band r, and is obtained by inverse quantization of the amplitude envelope quantization indexRandom () is a random phase generator, which generates a random phase value with a return value of +1 or-1, noise _ lev _ scale (r) is a noise level control scale factor of the zero-bit encoded subband r, and its value is determined by the noise level of the noise-padded subband where the zero-bit encoded subband r is located, and the specific calculation formula is as follows:
<math><mrow><mi>noise</mi><mo>_</mo><mi>lev</mi><mo>_</mo><mi>scale</mi><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow><mo>=</mo><msqrt><mrow><mo></mo><mover><mrow><mi>P</mi><mo>_</mo><mi>noise</mi><mo>_</mo><mi>rate</mi></mrow><mo>&OverBar;</mo></mover></mrow><msup><mrow><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mo>*</mo></msup><mi>fill</mi><mo>_</mo><mi>energy</mi><mo>_</mo><mi>saclefactor</mi></msqrt></mrow></math>
wherein, fill _ energy _ sacleffector is a filling energy scale factor used for adjusting the gain of the whole filling energy, and the value range is (0, 1),
Figure FSA00000030054000122
and filling the noise level of the sub-band j for decoding the noise obtained by inverse quantization, wherein j is the serial number of the noise filling sub-band where the zero bit coding sub-band r is located.
33. The system of claim 31, wherein: the frequency band replication sub-unit comprises a tone position searching module, a period and source frequency band calculating module, a source frequency band replication starting sequence number calculating module and a frequency band replication module which are connected in sequence, wherein:
a tone position searching module for searching a position where a certain tone of the audio signal is located in the MDCT frequency domain coefficients,
the period and source frequency band calculation module is used for determining a frequency band replication period and a source frequency band for replication according to the position of the tone, wherein the frequency band replication period is the bandwidth from a frequency point 0 to a frequency point at the tone position, and the source frequency band is a frequency band in which the frequency point offset copy band _ offset is copied to the frequency point at the tone position from the frequency point 0 and the frequency point offset band _ offset is shifted backwards;
the source frequency band replication starting sequence number calculation module is used for calculating the source frequency band replication starting sequence number of the zero-bit coding sub-band according to the source frequency band and the starting sequence number of the zero-bit coding sub-band needing frequency band replication;
the frequency band copying module is used for copying the frequency domain coefficient of the source frequency band to the zero-bit encoding sub-band periodically from the copying initial sequence number of the source frequency band by taking a frequency band copying period as a period; and if the highest frequency in the zero bit coding sub-band is less than the frequency of the searched tone, the frequency point only adopts noise filling to carry out spectrum reconstruction.
34. The system of claim 33, wherein: the tone position searching module searches the position of the tone by adopting the following method: taking an absolute value or a square value of the MDCT frequency domain coefficient of the first frequency band, and performing smooth filtering; and searching the position of the maximum extreme value of the filtering output value of the first frequency band according to the result of the smooth filtering, wherein the position of the maximum extreme value is the position of the tone.
35. The system of claim 33, wherein:
the operation formula of the tone position searching module for performing smooth filtering on the absolute value of the MDCT frequency domain coefficient of the first frequency band is as follows: <math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><mo>|</mo><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo></mrow></math>
or, the operation of performing smooth filtering on the frequency domain coefficient square value of the first frequency band is as follows:
<math><mrow><mi>X</mi><mo>_</mo><msub><mi>amp</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><mi>&mu;X</mi><mo>_</mo><msub><mi>amp</mi><mrow><mi>i</mi><mo>-</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>-</mo><mi>&mu;</mi><mo>)</mo></mrow><msub><mover><mi>X</mi><mo>&OverBar;</mo></mover><mi>i</mi></msub><msup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math>where μ is the smoothing filter coefficient, X _ ampi(k) Represents the filtered output value of the k-th frequency bin of the ith frame,
Figure FSA00000030054000133
when the MDCT coefficient decoded for the k-th frequency point of the ith frame is equal to 0, X _ ampi-1(k)=0。
36. The system of claim 33, wherein the first band is a band of low frequencies in a relatively concentrated energy determined based on statistical properties of the frequency spectrum, wherein low frequencies refer to spectral components less than one-half of the total bandwidth of the signal.
37. The system of claim 33, wherein: and the pitch position searching module directly searches an initial maximum value from the filtering output value of the frequency domain coefficient corresponding to the first frequency band, and takes the maximum value as a maximum extreme value of the filtering output value of the first frequency band.
38. The system of claim 33, wherein: when the tone position searching module determines the maximum extreme value of the filtering output value, one section of the first frequency band is taken as a second frequency band, an initial maximum value is searched from the filtering output value of the frequency domain coefficient corresponding to the second frequency band, and then different processing is carried out according to the position of the frequency domain coefficient corresponding to the initial maximum value:
a. if the initial maximum value is the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band, comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, sequentially comparing the filtering output value of the frequency domain coefficient of the lowest frequency of the second frequency band with the filtering output value of the frequency domain coefficient of the lower frequency in the first frequency band, and until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or, until the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the frequency domain coefficient of the lowest frequency of the first frequency band;
b. if the initial maximum value is the filtering output value of the frequency domain coefficient with the highest frequency in the second frequency band, comparing the filtering output value of the frequency domain coefficient with the filtering output value of the next frequency domain coefficient with the higher frequency in the first frequency band, and sequentially comparing backwards until the filtering output value of the current frequency domain coefficient is larger than the filtering output value of the next frequency domain coefficient, the filtering output value of the current frequency domain coefficient is the finally determined maximum extreme value, or until the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is larger than the filtering output value of the previous frequency domain coefficient, the filtering output value of the frequency domain coefficient with the highest frequency in the first frequency band is the finally determined maximum extreme value;
c. if the initial maximum value is the filtering output value of the frequency domain coefficient between the lowest frequency and the highest frequency of the second frequency band, the frequency domain coefficient corresponding to the initial maximum value is the position of the tone, that is, the initial maximum value is the finally determined maximum extreme value.
39. The system of claim 33, wherein:
the process that the source frequency band replication starting sequence number calculation module calculates the source frequency band replication starting sequence number of the zero-bit coding sub-band needing frequency band replication comprises the following steps: obtaining the sequence number of the initial frequency point of the zero-bit coding sub-band needing to reconstruct the frequency domain coefficient currently, marking the sequence number as fillband _ start _ freq, marking the sequence number of the frequency point corresponding to the tone as Tonal _ pos, adding 1 to the Tonal _ pos to obtain a copy period copy _ period, marking the initial sequence number of the source frequency band as copy band _ offset, and circularly subtracting the copy _ period from the value of fillband _ start _ freq until the value falls in the value interval of the sequence number of the source frequency band, wherein the value is the copy initial sequence number of the source frequency band and is marked as copy _ pos _ mod.
40. The system of claim 33, wherein: when the frequency band copying module performs frequency band copying, frequency domain coefficients starting from a source frequency band copying starting sequence number are sequentially copied backwards to a zero-bit encoding subband taking fillband _ start _ freq as a starting position until a frequency point copied by the source frequency band reaches a Tonal _ pos + copy band _ offset frequency point, and then the frequency domain coefficients starting from the second copy band _ offset frequency point are continuously copied backwards to the zero-bit encoding subband again, and so on until all frequency domain coefficients of the current zero-bit encoding subband are copied.
41. The system of claim 31, wherein:
the bit allocation unit is also used for allocating bits for all effective noise filling sub-bands or skipping one or more low-frequency effective noise filling sub-bands and allocating bits for the subsequent high-frequency effective noise filling sub-bands; the energy adjusting subunit performs energy adjustment on the frequency domain coefficient obtained after the frequency band replication; and the noise filling subunit performs noise filling on the frequency domain coefficient after the energy adjustment and a zero bit coding sub-band in the noise filling sub-band without bits.
CN2010191850619A2010-03-022010-03-02Audio encoding and decoding method, system and noise level estimation methodActiveCN102194457B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN2010191850619ACN102194457B (en)2010-03-022010-03-02Audio encoding and decoding method, system and noise level estimation method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN2010191850619ACN102194457B (en)2010-03-022010-03-02Audio encoding and decoding method, system and noise level estimation method

Publications (2)

Publication NumberPublication Date
CN102194457Atrue CN102194457A (en)2011-09-21
CN102194457B CN102194457B (en)2013-02-27

Family

ID=44602411

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN2010191850619AActiveCN102194457B (en)2010-03-022010-03-02Audio encoding and decoding method, system and noise level estimation method

Country Status (1)

CountryLink
CN (1)CN102194457B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102594701A (en)*2012-03-142012-07-18中兴通讯股份有限公司Frequency spectrum reconstruction determination method and corresponding system
CN103137133A (en)*2011-11-292013-06-05中兴通讯股份有限公司In-activated sound signal parameter estimating method, comfortable noise producing method and system
CN103854653A (en)*2012-12-062014-06-11华为技术有限公司 Method and device for signal decoding
CN103918029A (en)*2011-11-112014-07-09杜比国际公司Upsampling using oversampled SBR
WO2014117484A1 (en)*2013-01-292014-08-07华为技术有限公司Prediction method and decoding device for bandwidth expansion band signal
CN105190749A (en)*2013-01-292015-12-23弗劳恩霍夫应用研究促进协会 Noise Filling Technique
CN106663439A (en)*2014-07-012017-05-10弗劳恩霍夫应用研究促进协会Decoder and method for decoding an audio signal, encoder and method for encoding an audio signal
CN106796798A (en)*2014-07-282017-05-31弗劳恩霍夫应用研究促进协会 Apparatus and method for generating enhanced signal using independent noise filling
CN107516530A (en)*2012-10-012017-12-26日本电信电话株式会社Coding method, code device, program and recording medium
CN107945811A (en)*2017-10-232018-04-20北京大学A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN108109629A (en)*2016-11-182018-06-01南京大学A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative
CN109065062A (en)*2015-03-132018-12-21杜比国际公司Decode the audio bit stream in filling element with enhancing frequency spectrum tape copy metadata
CN110310659A (en)*2013-07-222019-10-08弗劳恩霍夫应用研究促进协会 Apparatus and method for decoding or encoding audio signals with reconstructed band energy information values
CN110992739A (en)*2019-12-262020-04-10上海乂学教育科技有限公司Student on-line dictation system
CN111862994A (en)*2020-05-302020-10-30北京声连网信息科技有限公司 A method and device for decoding a sound wave signal
CN112290975A (en)*2019-07-242021-01-29北京邮电大学Noise estimation receiving method and device for audio information hiding system
CN112992188A (en)*2012-12-252021-06-18中兴通讯股份有限公司Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment
CN113539281A (en)*2020-04-212021-10-22华为技术有限公司 Audio signal encoding method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1462429A (en)*2001-05-082003-12-17皇家菲利浦电子有限公司Audio coding
CN1677492A (en)*2004-04-012005-10-05北京宫羽数字技术有限责任公司Intensified audio-frequency coding-decoding device and method
CN101281748A (en)*2008-05-142008-10-08武汉大学 Method for filling vacant subbands realized by coding index and method for generating coding index
WO2009029037A1 (en)*2007-08-272009-03-05Telefonaktiebolaget Lm Ericsson (Publ)Adaptive transition frequency between noise fill and bandwidth extension
CN101393741A (en)*2007-09-192009-03-25中兴通讯股份有限公司Audio signal classification apparatus and method used in wideband audio encoder and decoder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1462429A (en)*2001-05-082003-12-17皇家菲利浦电子有限公司Audio coding
CN1677492A (en)*2004-04-012005-10-05北京宫羽数字技术有限责任公司Intensified audio-frequency coding-decoding device and method
WO2009029037A1 (en)*2007-08-272009-03-05Telefonaktiebolaget Lm Ericsson (Publ)Adaptive transition frequency between noise fill and bandwidth extension
CN101393741A (en)*2007-09-192009-03-25中兴通讯股份有限公司Audio signal classification apparatus and method used in wideband audio encoder and decoder
CN101281748A (en)*2008-05-142008-10-08武汉大学 Method for filling vacant subbands realized by coding index and method for generating coding index

Cited By (64)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9530424B2 (en)2011-11-112016-12-27Dolby International AbUpsampling using oversampled SBR
CN103918029A (en)*2011-11-112014-07-09杜比国际公司Upsampling using oversampled SBR
CN103918029B (en)*2011-11-112016-01-20杜比国际公司Use the up-sampling of over-sampling spectral band replication
USRE48258E1 (en)2011-11-112020-10-13Dolby International AbUpsampling using oversampled SBR
CN103137133A (en)*2011-11-292013-06-05中兴通讯股份有限公司In-activated sound signal parameter estimating method, comfortable noise producing method and system
CN103137133B (en)*2011-11-292017-06-06南京中兴软件有限责任公司Inactive sound modulated parameter estimating method and comfort noise production method and system
CN102594701A (en)*2012-03-142012-07-18中兴通讯股份有限公司Frequency spectrum reconstruction determination method and corresponding system
CN107516530A (en)*2012-10-012017-12-26日本电信电话株式会社Coding method, code device, program and recording medium
CN107516530B (en)*2012-10-012020-08-25日本电信电话株式会社Encoding method, encoding device, and recording medium
CN105976824A (en)*2012-12-062016-09-28华为技术有限公司Signal decoding method and device
US10236002B2 (en)2012-12-062019-03-19Huawei Technologies Co., Ltd.Method and device for decoding signal
US10971162B2 (en)2012-12-062021-04-06Huawei Technologies Co., Ltd.Method and device for decoding signal
JP2016506536A (en)*2012-12-062016-03-03▲ホア▼▲ウェイ▼技術有限公司 Method and apparatus for decoding a signal
CN103854653B (en)*2012-12-062016-12-28华为技术有限公司 Method and device for signal decoding
US9626972B2 (en)2012-12-062017-04-18Huawei Technologies Co., Ltd.Method and device for decoding signal
US11610592B2 (en)2012-12-062023-03-21Huawei Technologies Co., Ltd.Method and device for decoding signal
WO2014086155A1 (en)*2012-12-062014-06-12华为技术有限公司Signal decoding method and device
US9830914B2 (en)2012-12-062017-11-28Huawei Technologies Co., Ltd.Method and device for decoding signal
CN103854653A (en)*2012-12-062014-06-11华为技术有限公司 Method and device for signal decoding
US10546589B2 (en)2012-12-062020-01-28Huawei Technologies Co., Ltd.Method and device for decoding signal
CN112992188A (en)*2012-12-252021-06-18中兴通讯股份有限公司Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment
US9361904B2 (en)2013-01-292016-06-07Huawei Technologies Co., Ltd.Method for predicting bandwidth extension frequency band signal, and decoding device
US9875749B2 (en)2013-01-292018-01-23Huawei Technologies Co., Ltd.Method for predicting bandwidth extension frequency band signal, and decoding device
US11031022B2 (en)2013-01-292021-06-08Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Noise filling concept
WO2014117484A1 (en)*2013-01-292014-08-07华为技术有限公司Prediction method and decoding device for bandwidth expansion band signal
CN105190749A (en)*2013-01-292015-12-23弗劳恩霍夫应用研究促进协会 Noise Filling Technique
CN105190749B (en)*2013-01-292019-06-11弗劳恩霍夫应用研究促进协会noise filling technique
US10388295B2 (en)2013-01-292019-08-20Huawei Technologies Co., Ltd.Method for predicting bandwidth extension frequency band signal, and decoding device
US10410642B2 (en)2013-01-292019-09-10Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Noise filling concept
CN103971694B (en)*2013-01-292016-12-28华为技术有限公司The Forecasting Methodology of bandwidth expansion band signal, decoding device
US10607621B2 (en)2013-01-292020-03-31Huawei Technologies Co., Ltd.Method for predicting bandwidth extension frequency band signal, and decoding device
US11922956B2 (en)2013-07-222024-03-05Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Apparatus and method for encoding or decoding an audio signal with intelligent gap filling in the spectral domain
US11769512B2 (en)2013-07-222023-09-26Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Apparatus and method for decoding and encoding an audio signal using adaptive spectral tile selection
US12142284B2 (en)2013-07-222024-11-12Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
US11735192B2 (en)2013-07-222023-08-22Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
CN110310659A (en)*2013-07-222019-10-08弗劳恩霍夫应用研究促进协会 Apparatus and method for decoding or encoding audio signals with reconstructed band energy information values
US11996106B2 (en)2013-07-222024-05-28Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V.Apparatus and method for encoding and decoding an encoded audio signal using temporal noise/patch shaping
US11769513B2 (en)2013-07-222023-09-26Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Apparatus and method for decoding or encoding an audio signal using energy information values for a reconstruction band
CN110310659B (en)*2013-07-222023-10-24弗劳恩霍夫应用研究促进协会Apparatus and method for decoding or encoding audio signal using reconstructed band energy information value
CN106663439A (en)*2014-07-012017-05-10弗劳恩霍夫应用研究促进协会Decoder and method for decoding an audio signal, encoder and method for encoding an audio signal
US10770083B2 (en)2014-07-012020-09-08Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Audio processor and method for processing an audio signal using vertical phase correction
CN106663439B (en)*2014-07-012021-03-02弗劳恩霍夫应用研究促进协会 Decoder and method for decoding audio signal, Encoder and method for encoding audio signal
US10930292B2 (en)2014-07-012021-02-23Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Audio processor and method for processing an audio signal using horizontal phase correction
US11908484B2 (en)2014-07-282024-02-20Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.Apparatus and method for generating an enhanced signal using independent noise-filling at random values and scaling thereupon
US12205604B2 (en)2014-07-282025-01-21Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.Apparatus and method for generating an enhanced signal using independent noise-filling identified by an identification vector
US10885924B2 (en)2014-07-282021-01-05Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.Apparatus and method for generating an enhanced signal using independent noise-filling
CN106796798A (en)*2014-07-282017-05-31弗劳恩霍夫应用研究促进协会 Apparatus and method for generating enhanced signal using independent noise filling
US11264042B2 (en)2014-07-282022-03-01Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.Apparatus and method for generating an enhanced signal using independent noise-filling information which comprises energy information and is included in an input signal
US11705145B2 (en)2014-07-282023-07-18Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.Apparatus and method for generating an enhanced signal using independent noise-filling
CN106796798B (en)*2014-07-282021-03-05弗劳恩霍夫应用研究促进协会Apparatus and method for generating an enhanced signal using independent noise filling
CN109273013A (en)*2015-03-132019-01-25杜比国际公司Decode the audio bit stream with the frequency spectrum tape copy metadata of enhancing
CN109065062B (en)*2015-03-132022-12-16杜比国际公司Decoding an audio bitstream having enhanced spectral band replication metadata in a filler element
US12094477B2 (en)2015-03-132024-09-17Dolby International AbDecoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element
CN109273013B (en)*2015-03-132023-04-04杜比国际公司Decoding an audio bitstream with enhanced spectral band replication metadata
US11664038B2 (en)2015-03-132023-05-30Dolby International AbDecoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element
CN109065062A (en)*2015-03-132018-12-21杜比国际公司Decode the audio bit stream in filling element with enhancing frequency spectrum tape copy metadata
CN108109629A (en)*2016-11-182018-06-01南京大学A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative
CN107945811B (en)*2017-10-232021-06-01北京大学 A Generative Adversarial Network Training Method and Audio Coding and Decoding Method for Band Expansion
CN107945811A (en)*2017-10-232018-04-20北京大学A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN112290975A (en)*2019-07-242021-01-29北京邮电大学Noise estimation receiving method and device for audio information hiding system
CN110992739A (en)*2019-12-262020-04-10上海乂学教育科技有限公司Student on-line dictation system
CN110992739B (en)*2019-12-262021-06-01上海松鼠课堂人工智能科技有限公司Student on-line dictation system
CN113539281A (en)*2020-04-212021-10-22华为技术有限公司 Audio signal encoding method and device
CN111862994A (en)*2020-05-302020-10-30北京声连网信息科技有限公司 A method and device for decoding a sound wave signal

Also Published As

Publication numberPublication date
CN102194457B (en)2013-02-27

Similar Documents

PublicationPublication DateTitle
CN102194457A (en)Audio encoding and decoding method, system and noise level estimation method
JP7483792B2 (en) Decoding device and method for decoding an encoded audio signal
US8731949B2 (en)Method and system for audio encoding and decoding and method for estimating noise level
JP5164834B2 (en) Scaled compressed audio bitstream and codec using hierarchical filter bank and multi-channel joint coding
CN103106902B (en)Low bit-rate audio signal coding/decoding method
CN102208188B (en)Audio signal encoding-decoding method and device
EP1852851A1 (en)An enhanced audio encoding/decoding device and method
CN106537499B (en) Apparatus and method for generating enhanced signal using independent noise fill
EP2186089A1 (en)Method and device for noise filling
CN101521014A (en)Audio bandwidth expansion coding and decoding devices
CN102194458B (en)Spectral band replication method and device and audio decoding method and system
KR20240151254A (en)Backward-compatible integration of high frequency reconstruction techniques for audio signals
UA129049C2 (en) INTEGRATION OF HIGH-FREQUENCIES SOUND RECONSTRUCTION METHODS
US20130006644A1 (en)Method and device for spectral band replication, and method and system for audio decoding
KR101786863B1 (en)Frequency band table design for high frequency reconstruction algorithms
CN105431902B (en)Apparatus and method for audio signal envelope encoding, processing and decoding
KR101789085B1 (en)Apparatus and method for audio signal envelope encoding, processing and decoding by splitting the audio signal envelope employing distribution quantization and coding
JP5416173B2 (en) Frequency band copy method, apparatus, audio decoding method, and system
CN105336334B (en)Multi-channel sound signal coding method, decoding method and device

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp