Detailed Description of Preferred EmbodimentsThe embodiments described below are merely illustrative for the principles of the present invention for time warped transform coding of audio signals. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
In the following, basic ideas and concepts of warping and block transforms are shortly reviewed to motivate the inventive concept, which will be discussed in more detail below, making reference to the enclosed figures.
Generally, the specifics of the time-warped transform are easiest to derive in the domain of continuous-time signals. The following paragraphs describe the general theory, which will then be subsequently specialized and converted to its inventive application to discrete-time signals. The main step in this conversion is to replace the change of coordinates performed on continuous-time signals with non-uniform resampling of discrete-time signals in such a way that the mean sample density is preserved, i.e. that the duration of the audio signal is not altered.
Lets = ψ(t) describe a change of time coordinate described by a continuously differentiable strictly increasing functionψ , mapping thet - axis intervalI onto thes -axis intervalJ.
ψ (t) is therefore a function that can be used to transform the time-axis of a time-dependent quantity, which is equivalent to a resampling in the time discrete case. It should be noted that in the following discussion, the t-axis interval I is an interval in the normal time-domain and the x-axis interval J is an interval in the warped time domain.
Given an orthonormal basis {vα} for signals of finite energy on the interval J , one obtains an orthonormal basis {uα} for signals of finite energy on the interval I by the rule
Given an infinite time interval I, local specification of time warp can be achieved by segmenting I and then constructingψ by gluing together rescaled pieces of normalized warp maps.
A normalized warp map is a continuously differentiable and strictly increasing function which maps the unit interval [0,1] onto itself. Starting from a sequence of segmentation pointst = tk wheretk+1 >tk , and a corresponding sequence of normalized warp mapsψk, one constructs wheredk =sk+1 -sk and the sequencedk is adjusted such thatΨ(t) becomes continuously differentiable. This definesψ(t) from the sequence of normalized warp mapsΨk up to an affine change of scale of the typeAψ(t)+ B .
Let {vk,n} be an orthonormal basis for signals of finite energy on the intervalJ , adapted to the segmentationsk =ψ(tk ) , in the sense that there is an integerK, theoverlap factor , such thatvk,n (s) = 0 ifs < sk ors > sk+ K .
The present invention focuses on cases K ≥ 2, since the case K = 1 corresponds to the prior art methods without overlap. It should be noted that not many constructions are presently known for K ≥ 3. A particular example for the inventive concept will be developed for the case K = 2 below, including local trigonometric bases that are also used in modified discrete cosine transforms (MDCT) and other discrete time lapped transforms.
Let the construction of {vk,n} from the segmentation belocal, in the sense that there is an integerp , such thatvk,n (s) does not depend onsl forl < k - p l > k + K + p . Finally, let the construction be such that an affine change of segmentation toAsk + B results in a change of basis toA-1/2vk,n ((s -B) / A) . Then is a time-warped orthonormal basis for signals of finite energy on the interval I , which is well defined from the segmentation pointstk and the sequence of normalized warp mapsψk, independent of the initialization of the parameter sequencessk anddk in (2). It is adapted to the given segmentation in the sense thatuk,n (t) = 0 ift < tk ort >tk+K , and it is locally defined in the sense thatuk,n (t) depends neither ontl forl < k - p orl > k + K + p, nor on the normalized warp mapsψl forl < k - p orl ≥k + K + p.
The synthesis waveforms (3) are continuous but not necessarily differentiable, due to the Jacobian factor (ψ'(t))1/2 . For this reason, and for reduction of the computational load in the discrete-time case, a derivedbiorthogonal system can be constructed as well. Assume that there are constants 0 <C1 <C2 such that for a sequenceηk > 0. Then defines a biorthogonal pair if of Riesz bases for the space of signals of finite energy on the interval I.
Thus,fk,n(t) as well asgk,n(t) may be used for analysis, whereas it is particularly advantageous to usefk,n(t) as synthesis waveforms andgk,n(t) as analysis waveforms.
Based on the general considerations above, an example for the inventive concept will be derived in the subsequent paragraphs for the case of uniform segmentationtk = k and overlap factor K = 2, by using a local cosine basis adapted to the resulting segmentation on thes -axis.
It should be noted that the modifications necessary to deal with non-uniform segmentations are obvious such that the inventive concept is as well applicable to such non-uniform segmentations. As for example proposed by M.W. Wickerhauser, "Adapted wavelet analysis from theory to software", A. K. Peters, 1994, Chapter 4, a starting point for building a local cosine basis is a rising cutoff functionρ such thatρ(r)= 0 forr < -1,ρ(r) = 1 forr > 1, andρ(r)2+ ρ(-r)2 = 1 in the active region -1 ≤r ≤ 1.
Given a segmentationsk, a window on each intervalsk ≤s ≤sk+2 can then be constructed according to with cutoff midpointsck = (sk +sk+1) / 2 and cutoff radiiεk = (sk+1 -sk ) / 2. This corresponds to the middle point construction of Wickerhauser.
Withlk = ck+1- ck = εk + εk+1 , an orthornormal basis results from where the frequency index n = 0,1, 2, ... It is easy to verify that this construction obeys the condition of locality withp = 0 and affine invariance described above. The resulting warped basis (3) on thet - axis can in this case be rewritten in the form fork ≤t ≤k + 2, whereϕk is defined by gluing togetherψk andψk+1 to form a continuously differentiable map of the interval [0,2] onto itself,
This is obtained by putting
The construction ofψk is illustrated inFig. 1, showing the normalized time on the x-axis and the warped time on the y-axis.Fig. 1 shall be particularly discussed for the case k = 0, that is for buildingϕ0 (t) and therefore deriving a warp function for a first frame 10, lasting from normalized time 0 to normalized time 1 and for a second frame 12 lasting from normalized time 1 to normalized time 2. It is furthermore assumed that first frame 10 has a warp function 14 and second frame 12 has a warp function 16, derived with the aim of achieving equal pitch within the individual frames, when the time axis is transformed as indicated by warp functions 14 and 16. It should be noted that warp function 14 corresponds toΨ0 _and warp function 16 corresponds toψ1. According to equation 9, a combined warp functionϕ0(t) 18 is constructed by gluing together the warp maps 14 and 16 to form a continuously differentiable map of the interval [0,2] onto itself. As a result, the point (1,1) is transformed into (1,a), wherein a corresponds to 2mk in equation 9.
As the inventive concept is directed to the application of time warping in an overlap and add scenario, the example of building the next combined warped function for frame 12 and the following frame 20 is also given inFig. 1. It should be noted that following the overlap and add principle, for full reconstruction of frame 12, knowledge on both warp functions 18 and 22 is required.
It should be further noted that gluing together two independently derived warp functions is not necessarily the only way of deriving a suitable combined warp function ϕ_(18, 22) as ϕ may very well be also derived by directly fitting a suitable warp function to two consecutive frames. It is preferred to have affine consistence of the two warp functions on the overlap of their definition domains.
According to equation 6, the window function in equation 8 is defined by which increases from zero to one in the interval [0,2mk] and decreases from one to zero in the interval [2mk,2].
A biorthogonal version of (8) can also be derived if there are constants 0 <C1 <C2, such that for alk. Choosingηk = lk in (4) leads to the specialization of (5) to
Thus, for the continuous time case, synthesis and analysis functions (equation 12) are derived, being dependent on the combined warped function. This dependency allows for time warping within an overlap and add scenario without loss of information on the original signal, i.e. allowing for a perfect reconstruction of the signal.
It may be noted that for implementation purposes, the operations performed within equation 12 can be decomposed into a sequence of consecutive individual process steps. A particularly attractive way of doing so is to first perform a windowing of the signal, followed by a resampling of the windowed signal and finally by a transformation.
As usually, audio signals are stored and transmitted digitally as discrete sample values sampled with a given sample frequency, the given example for the implementation of the inventive concept shall in the following be further developed for the application in the discrete case.
The time-warped modified discrete cosine transform (TWMDCT) can be obtained from a time-warped local cosine basis by discretizing analysis integrals and synthesis waveforms. The following description is based on the biorthogonal basis (see equ. 12). The changes required to deal with the orthogonal case (8) consist of an additional time domain weighting by the Jacobian factor. In the special case where no warp is applied, both constructions reduce to the ordinary MDCT. Let L be the transform size and assume that the signalx(t) to be analyzed is band limited byqπL (rad/s) for someq < 1. This allows the signal to be described by its samples at sampling period 1/L.
The analysis coefficients are given by
Defining the windowed signal portionxk (τ)= x(τ + k) bk (ϕk (τ)) and performing the substitutionsτ = t - k andr = ϕk (τ) in the integral (13) leads to
A particularly attractive way of discretizing this integral taught by the current invention is to choose the sample points, wherev is integer valued. Assuming mild warp and the band limitation described above, this gives the approximation where
The summation interval in (15) is defined by 0 ≤rv < 2. It includesv = 0,1, ... ,L -1 and extends beyond this interval at each end such that the total number of points is 2L. Note that due to the windowing, the result is insensitive to the treatment of the edge cases, which can occur if for some integerv0.
As it is well known that the sum (equation 15) can be computed by elementary folding operations followed by a DCT of type IV, it may be appropriate to decompose the operations of equation 15 into a series of subsequent operations and transformations to make use of already existing efficient hardware and software implementations, particularly of DCT (discrete cosine transform). According to the discretized integral, a given discrete time signal can be interpreted as the equidistant samples at sampling periods 1/L of x(t). A first step of windowing would thus lead to: forp = 0,1, 2,..., 2L -1. Prior to the block transformation as described by equation 15 (introducing an additional offset depending onmk ), a resampling is required, mapping
The resampling operation can be performed by any suitable method for non-equidistant resampling.
Summarizing, the inventive time-warped MDCD can be decomposed into a windowing operation, a resampling and a block-transform.
The individual steps shall in the following be shortly described referencingFigs. 2 to 3b. Figs. 2 to 3b show the steps of time warped MDCT encoding considering only two windowed signal blocks of a synthetically generated pitched signal. Each individual frame comprises 1024 samples such that each of two considered combined frames 24 and 26(original frames 30 and 32 and original frames 32 and 34) consists of 2048 samples such that the two windowed combined frames have an overlap of 1024 samples.Figs. 2 to 2b show at the x-axis the normalized time of 3 frames to be processed. First frame 30 ranges from 0 to 1, second frame 32 ranges from 1 to 2, and 3 frame ranges from 2 to 3 on the time axis. Thus, in the normalized time domain, each time unit corresponds to one complete frame having 1024 signal samples. The normalized analysis windows span the normalized time intervals [0,2] and [1,3] . The aim of the following considerations is to recover the middle frame 32 of the signal. As the reconstruction of the outer signal frames (30, 34) requires data from adjacent windowed signal segments, this reconstruction is not to be considered here. It may be noted that the combined warp maps shown inFig. 1 are warp maps derived from the signal ofFig. 2, illustrating the inventive combination of three subsequent normalized warp maps (dotted curves) into two overlapping warp maps (solid curves). As explained above, inventive combined warp maps 18 and 22 are derived for the signal analysis. Furthermore, it may be noted that due to the affine invariance of warping, this curve represents a warped map with the same warp as in the original two segments.
Fig. 2 illustrates the original signal by a solid graph. Its stylized pulse-train has a pitch that grows linearly with time, hence, it has positive and decreasing warp considering that warp is defined to be the logarithmic derivative of the pitch. InFig. 2, the inventive analysis windows as derived using equation 17 are superimposed as dotted curves. It should be noted that the deviation from standard symmetric windows (as for example in MDCT) is largest where the warp is largest that is, in the first segment [0,1]. The mathematical definition of the windows alone is given by resampling the windows of equation 11, resampling implemented as expressed by the second factor of the right hand side of equation 17.
Figs. 2a and 2b illustrate the result of the inventive windowing, applying the windows ofFig. 2 to the individual signal segments.
Figs. 3a and 3b illustrate the result of the warp parameter dependent resampling of the windowed signal blocks ofFigs. 2a and 2b, the resampling performed as indicated by the warp maps given by the solid curves ofFig. 1. Normalized time interval [0,1] is mapped to the warped time interval [0,a], being equivalent to a compression of the left half of the windowed signal block. Consequently, an expansion of the right half of the windowed signal block is performed, mapping the internal [1,2] to [a,2]. Since the warp map is derived from the signal with the aim of deriving the warped signal with constant pitch, the result of the warping (resampling according to equation 18) is a windowed signal block having constant pitch. It should be noted that a mismatch between the warped map and the signal would lead to a signal block with still varying pitch at this point, which would not disturb the final reconstruction.
The off-set of the following block transform is marked by circles such that the interval [m, m+1] corresponds to the discrete samples v = 1,0,...L-1 with L = 1024 in formula 15. This does equivalently mean that the modulating wave forms of the block transform share a point of even symmetry at m and a point of odd symmetry at m+1. It is furthermore important to note that a equals 2m such that m is the mid point between 0 and a and m+1 is the mid point between a and 2. Summarizing,Figs. 3a and 3b describe the situation after the inventive resampling described by equation 18 which is, of course, depending on the warp parameters.
The time-warped transform domain samples of the signals ofFigs. 3a and 3b are then quantized and coded and may be transmitted together with warp side information describing normalized warp mapsΨk to a decoder. As quantization is a commonly known technique, quantization using a specific quantization rule is not illustrated in the following figures, focusing on the reconstruction of the signal on the decoder side.
In one embodiment of the present invention, the decoder receives the warp map sequence together with decoded time-warped transform domain samplesdk,n , wheredk,n = 0 for n ≥ L can be assumed due to the assumed band limitation of the signal. As on the encoder side, the starting point for achieving discrete time synthesis shall be to consider continuous time reconstruction using the synthesis wave-forms of equation 12: where and with
Equation (19) is the usual overlap and ad procedure of a windowed transform synthesis. As in the analysis stage, it is advantageous to sample equ. (21) at the points, giving rise to which is easily computed by the following steps: First, a DCT of type IV followed by extension in 2L into samples depending on the offset parametermk according to the rule 0 ≤rv < 2 . Next, a windowing with the windowbk(rv) is performed. Oncezk(rv) is found, the resampling gives the signal segmentyk at equidistant sample points ready for the overlap and add operation described in formula (19).
The resampling method can again be chosen quite freely and does not have to be the same as in the encoder. In one embodiment of the present invention spline interpolation based methods are used, where the order of the spline functions can be adjusted as a function of a band limitation parameter q so as to achieve a compromise between the computational complexity and the quality of reconstruction. A common value of parameter q is q = 1/3, a case in which quadratic splines will often suffice.
The decoding shall in the following be illustrated byFigs. 4a to 7 for the signal shown inFigs. 3a and 3b. It shall again be emphasized that the block transform and the transmission of the transform parameters is not described here, as this is a technique commonly known. As a start for the decoding process,Figs. 4a and 4b show a configuration, where the reverse block transform has already been performed, resulting in the signals shown inFigs. 4a and 4b. One important feature of the inverse block transform is the addition of signal components not present in the original signal ofFigs. 3a and 3b, which is due to the symmetry properties of the synthesis functions already explained above. In particular, the synthesis function has even symmetry with respect to m and odd symmetry with respect to m+1. Therefore, in the interval [0,a], positive signal components are added in the reverse block transform whereas in the interval [a,2], negative signal components are added. Additionally, the inventive window function used for the synthesis windowing operation is superimposed as a dotted curve inFigs. 4a and 4b.
The mathematical definition of this synthesis window in the warped time domain is given by equation 11.Figs. 5a and 5b show the signal, still in the warped time domain, after application of the inventive windowing.
Figs. 6a and 6b finally show the result of the warp parameter-dependent resampling of the signals ofFigs. 5a and 5b.
Finally,Fig. 7 shows the result of the overlap-and-add operation, being the final step in the synthesis of the signal. (see equation 19). The overlap-and-add operation is a superposition of the waveforms ofFigs. 6a and 6b. As already mentioned above, the only frame to be fully reconstructed is the middle frame32, and, a comparison with the original situation ofFig. 2 shows that the middle frame 32 is reconstructed with high fidelity. The precise cancellation of the disturbing addition signal components introduced during the inverse block transform is only possible since it is a crucial property of the present invention that the two combined warped maps 14 and 22 inFig. 1 differ only by an affine map within the overlapping normalized time interval [1,2]. A consequence of this is that there is a correspondence between signal portions and windows on the warped time segments [a,2] and [1,b]. When consideringFigs. 4a and 4b, a linear stretching of segments [1,b] into [a,2] will therefore make the signal graphs and window halves describe the well known principle of time domain aliasing cancellation of standard MDCT. The signal, already being alias-cancelled, can then simply be mapped onto the normalized time interval [1,2] by a common inverse warp map.
It may be noted that, according to a further embodiment of the present invention, additional reduction of computational complexity can be achieved by application of a pre-filtering step in the frequency domain. This can be implemented by simple pre-weighting of the transmitted sample values dkn. Such a pre-filtering is for example described in M. Unser, A. Aldroubi, and M. Eden, "B-spline signal processing part II-efficient design and applications". A implementation requires B-spline resampling to be applied to the output of the inverse block transform prior to the windowing operation. Within this embodiment, the resampling operates on a signal as derived by equation 22 having modifieddk,n . The application of the window functionbk(rv) is also not performed. Therefore, at each end of the signal segment, the resampling must take care of the edge conditions in terms of periodicities and symmetries induced by the choice of the block transform. The required windowing is then performed after the resampling using the window.
Summarizing, according to a first embodiment of an inventive decoder, inverse time-warped MDCT comprises, when decomposed into individual steps:
- Inverse transform
- Windowing
- Resampling
- Overlap and add.
According to a second embodiment of the present invention inverse time-warped MDCT comprises:
- Spectral weighting
- inverse transform
- Resampling
- Windowing
- Overlap and add.
It may be noted that in a case when no warp is applied, that is the case where all normalized warp maps are trivial,(Ψk (t)= t ), the embodiment of the present invention as detailed above coincides exactly with usual MDCT.
Further embodiments of the present invention incorporating the above-mentioned features shall now be described referencingFigs. 8 to 15.
Fig. 8 shows an example of an inventive audio encoder receiving a digital audio signal 100 as input and generating a bit stream to be transmitted to a decoder incorporating the inventive time-warped transform coding concept. The digital audio input signal 100 can either be a natural audio signal or a preprocessed audio signal, where for instance the preprocessing could be a whitening operation to whiten the spectrum of the input signal. The inventive encoder incorporates a warp parameter extractor 101, a warp transformer 102, a perceptual model calculator 103, a warp coder 104, an encoder 105, and a multiplexer 106. The warp parameter extractor 101 estimates a warp parameter sequence, which is input into the warp transformer 102 and into the warp coder 104. The warp transformer 102 derives a time warped spectral representation of the digital audio input signal 100. The time-warped spectral representation is input into the encoder 105 for quantization and possible other coding , as for example differential coding. The encoder 105 is additionally controlled by the perceptual model calculator 103. Such, for example, the coarseness of quantization may be increased when signal components are to be encoded that are mainly masked by other signal components. The warp coder 104 encodes the warp parameter sequence to reduce its size during transmission within the bit stream. This could for example comprise quantization of the parameters or, for example, differential encoding or entropy-coding techniques as well as arithmetic coding schemes.
The multiplexer 106 receives the encoded warp parameter sequence from the warp coder 104 and an encoded time-warped spectral representation of the digital audio input signal 100 to multiplex both data into the bit stream output by the encoder.
Fig. 9 illustrates an example of a time-warped transform decoder receiving a compatible bit stream 200 for deriving a reconstructed audio signal as output. The decoder comprises a de-multiplexer 201, a warp decoder 202, a decoder 203, and an inverse warp transformer 204. The de-multiplexer de-multiplexes the bit stream into the encoded warp parameter sequence, which is input into the warp decoder 202. The de-multiplexer further de-multiplexes the encoded representation of the time-warped spectral representation of the audio signal, which is input into the decoder 203 being the inverse of the corresponding encoder 105 of the audio encoder ofFig. 8. Warp decoder 202 derives a reconstruction of the warp parameter sequence and decoder 203 derives a time-warped spectral representation of the original audio signal. The representation of the warp parameter sequence as well as the time-warped spectral representation are input into the inverse warp transformer 204 that derives a digital audio output signal implementing the inventive concept of time-warped overlapped transform coding of audio signals.
Fig. 10 shows a further embodiment not encompassed by the wording of the claims of a time-warped transform decoder in which the warp parameter sequence is derived in the decoder itself. The alternative embodiment shown inFig. 10 comprises a decoder 203, a warp estimator 301, and an inverse warp transformer 204. The decoder 203 and the inverse warp transformer 204 share the same functionalities as the corresponding devices of the previous embodiment and therefore the description of these devices within different embodiments is fully interchangeable. Warp estimator 301 derives the actual warp of the time-warped spectral representation output by decoder 203 by combining earlier frequency domain pitch estimates with a current frequency domain pitch estimate. Thus, the warp parameters sequence is signalled implicitly, which has the great advantage that further bit rate can be saved since no additional warp parameter information has to be transmitted in the bit stream input into the decoder. However, the implicit signalling of warped data is limited by the time resolution of the transform.
Fig. 11 illustrates the backwards compatibility of the inventive concept, when prior art decoders not capable of the inventive concept of time-warped decoding are used. Such a decoder would neglect the additional warp parameter information, thus decoding the bit stream into a frequency domain signal fed into an inverse transformer 401 not implementing any warping. Since the frequency analysis performed by time-warped transformation in inventive encoders is well aligned with the transform that does not include any time warping, a decoder ignoring warp data will still produce a meaningful audio output. This is done at the cost of degraded audio quality due to the time warping, which is not reversed within prior art decoders.
Fig. 12 shows a block diagram of the inventive method of time-warped transformation. The inventive time-warp transforming comprises windowing 501, resampling 502, and a block transformation 503. First, the input signal is windowed with an overlapping window sequence depending on the warp parameter sequence serving as additional input to each of the individual encoding steps 501 to 503. Each windowed input signal segment is subsequently resampled in the resampling step 502, wherein the resampling is performed as indicated by the warp parameter sequence.
Within the block transformation step 503, a block transform is derived typically using a well-known discrete trigonometric transform. The transform is thus performed on the windowed and resampled signal segment. It is to be noted that the block transform does also depend on an offset value, which is derived from the warp parameter sequence. Thus, the output consists of a sequence of transform domain frames.
Fig. 13 shows a flow chart of an inverse time-warped transform method. The method comprises the steps of inverse block transformation 601, windowing 602, resampling 603, and overlapping and adding 604. Each frame of a transform domain signal is converted into a time domain signal by the inverse block transformation 601. Corresponding to the encoding step, the block transform depends on an offset value derived from the received parameter sequence serving as additional input to the inverse block transforming 601, the windowing 602, and the resampling 603. The signal segment derived by the block transform 601 is subsequently windowed in the windowing step 602 and resampled in the resampling 603 using the warped parameter sequence. Finally, in overlapping and adding 604, the windowed and resampled segment is added to the previously inversely transformed segments in an usual overlap and add operation, resulting in a reconstruction of the time domain output signal.
Fig. 14 shows an alternative embodiment of an inventive inverse time-warp transformer, which is implemented to additionally reduce the computational complexity. The decoder partly shares the same functionalities with the decoder ofFig. 13.Therefore the description of the same functional blocks in both embodiments are fully interchangeable. The alternative embodiment differs from the embodiment ofFig. 13 in that it implements a spectral pre-weighting 701 before the inverse block transformation 601. This fixed spectral pre-weighting is equivalent to a time domain filtering with periodicities and symmetries induced by the choice of the block transform. Such a filtering operation is part of certain spline based re-sampling methods, allowing for a reduction of the computational complexity of subsequent modified resampling 702. Such resampling is now to be performed in a signal domain with periodicities and symmetries induced by the choice of the block transform. Therefore, a modified windowing step 703 is performed after resampling 702. Finally, in overlapping and adding 604 the windowed and resampled segment is added to the previously inverse- transformed segment in an usual overlap and add procedure giving the reconstructed time domain output signal.
Figs. 15a and 15b show the strength of the inventive concept of time-warped coding, showing spectral representations of the same signal with and without time warping applied.Fig. 15a illustrates a frame of spectral lines originating from a modified discrete cosine transform of transform size 1024 of a male speech signal segment sampled at 16 kHz. The resulting frequency resolution is 7.8 Hz and only the first 600 lines are plotted for this illustration, corresponding a bandwidth of 4.7 kHz. As can be seen from the fundamental frequency and the plot, the segment is a voiced sound with a mean pitch of approximately, 155 Hz. As can be furthermore seen fromFig. 15a, the few first harmonics of the pitch-frequency are clearly distinguishable, but towards high frequencies, the analysis becomes increasingly dense and scrambled. This is due to the variation of the pitch within the length of the signal segment to be analyzed. Therefore, the coding of the mid to high frequency ranges requires a substantial amount of bits in order to not introduce audible artefacts upon decoding. Conversely, when fixing the bit rate, substantial amount of distortion will inevitably result from the demand of increasing the coarseness of quantization.
Fig. 15b illustrates a frame of spectral lines originating from a time-warped modified discrete cosine transform according to the present invention. Obviously, the same original male audio signal has been used as inFig. 15a. The transform parameters are the same as forFig. 15a, but the use of a time-warped transform adapted to the signal has the visible dramatic effect on the spectral representation. The sparse and organized character of the signal in the time-warped transform domain yields a coding with much better rate distortion performance, even when the cost of coding the additional warp data is taken into account.
As already mentioned, transmission of warp parameters instead of transmission of pitch or speed information has the great advantage of decreasing the additional required bit rate dramatically. Therefore, in the following paragraphs, several inventive schemes of transmitting the required warp parameter information are detailed.
For a signal with warpa(t) at timet, the optimal choice of normalized warp map sequenceψk for the local cosine bases (see(8), (12) is obtained by solving
However, the amount of information required to describe this warp map sequence is too large and the definition and measurement of pointwise values ofa(t) is difficult. For practical purposes, awarp update interval Δt is decided upon and each warp mapψk is described byN =1/ Δt parameters. A Warp update interval of around 10-20 ms is typically sufficient for speech signals. Similarly to the construction in (9) ofϕk fromψk andψk+1, a continuously differentiable normalized warp map can be pieced together byN normalized warp maps via suitable affine re-scaling operations. Prototype examples of normalized warp maps include wherea is a warp parameter. Defining the warp of a maph(t) byh"/h', all three maps achieve warp equal toa att = 1/2. The exponential map has constant warp in the whole interval 0 ≤t ≤ 1, and for small values ofa, the other two maps exhibit very small deviation from this constant value. For a given warp map applied in the decoder for the resampling (23), its inverse required in the encoder for the resampling (equ. 18). A principal part of the effort for inversion originates from the inversion of the normalized warp maps. The inversion of a quadratic map requires square root operations, the inversion of an exponential map requires a logarithm, and the inverse of the rational Moebius map is a Moebius map with negated warp parameter. Since exponential functions and divisions are comparably expensive, a focus on maximum ease of computation in the decoder leads to the preferred choice of a piecewise quadratic warp map sequenceψk .
The normalized warp mapψk is then fully defined byN warp parametersak (0), ak (1), ...,ak(N -1) by the requirements that it
- is a normalized warp map;
- is pieced together by rescaled copies of one of the smooth prototype warp maps (25);
- is continuously differentiable;
- satisfies
The present invention teaches that the warp parameters can be linearly quantized, typically to a step size of around 0.5 Hz. The resulting integer values are then coded. Alternatively, the derivative can be interpreted as a normalized pitch curve where the values are quantized to a fixed step size, typically 0.005. In this case the resulting integer values are further difference coded, sequentially or in a hierarchical manner. In both cases, the resulting side information bitrate is typically a few hundred bits per second which is only a fraction of the rate required to describe pitch data in a speech codec.
An encoder with large computational resources can determine the warp data sequence that optimally reduces the coding cost or maximizes a measure of sparsity of spectral lines. A less expensive procedure is to use well known methods for pitch tracking resulting in a measured pitch functionp(t) and approximating the pitch curve with a piecewise linear functionp0(t) in those intervals where the pitch track exist and does not exhibit large jumps in the pitch values. The estimated warp sequence is then given by inside the pitch tracking intervals. Outside those intervals the warp is set to zero. Note that a systematic error in the pitch estimates such as pitch period doubling has very little effect on warp estimates.
As illustrated inFig. 10, in an alternative embodiment not encompassed by the wording of the claims, the warped parameter sequence may be derived from the decoded transform domain data by a warp estimator . The principle is to compute a frequency domain pitch estimate for each frame of transform data or from pitches of subsequent decoded signal blocks. The warp information is then derived from a formula similar to formula 28.
The application of the inventive concept has mainly been described by applying the inventive time warping in a single audio channel scenario. The inventive concept is of course by no way limited to the use within such a monophonic scenario. It may be furthermore extremely advantageous to use the high coding gain achievable by the inventive concept within multi-channel coding applications, where the single or the multiple channel has to be transmitted may be coded using the inventive concept. Furthermore, warping could generally be defined as a transformation of the x-axis of an arbitrary function depending on x. Therefore, the inventive concept may also be applied to scenarios where functions or representation of signals are warped that do not explicitly depend on time. For example, warping of a frequency representation of a signal may also be implemented.
Furthermore, the inventive concept can also be advantageously applied to signals that are segmented with arbitrary segment length and not with equal length as described in the preceding paragraphs.
The use of the base functions and the discretization presented in the preceding paragraphs is furthermore to be understood as one advantageous example of applying the inventive concept. For other applications, different base functions as well as different discretizations may also be used.
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disk, DVD or a CD having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive methods are performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine-readable carrier, the program code being operative for performing the inventive methods when the computer program product runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer.
The scope of the present invention is defined by the appended claims.