The application is to be dividing an application of the application for a patent for invention of being entitled as of on September 20th, 2004, Chinese application number 200480035333.2 (international application no is PCT/US2004/031024) " in the audio coding process, carrying out the method for window type decision according to the MDCT data " applying date.
Embodiment
In the following detailed description of the embodiment of the invention, added reference marker to accompanying drawing, wherein same reference marker is represented similar element, and show as an example in the accompanying drawing can embodiment of the present invention specific embodiment.These embodiment are enough described in detail so that those skilled in the art can embodiment of the present invention, and will understand the embodiment that can use other and under the situation that does not depart from scope of the present invention, can make logic, mechanical, electricity, functional and other change.Therefore, following detailed description is not a kind of restriction, and scope of the present invention is only limited appended claim.
From the general introduction of operation of the present invention, Fig. 1 is an embodiment of clear codedsystem 100 for example.Said codedsystem 100 is not followed the mpeg audio coding standard (for example, MPEG-2AAC standard, MPEG-4AAC standard or the like) that is referred to as mpeg standard at this.Said codedsystem 100 comprises: bank of filters module 102,coding tools 104, tonequality modeler 106, quantization modules 110 and huffman coding module 114.
Said bank of filters module 102 received audio signals and execution are revised discrete cosine transform computing (MDCT) so that sound signal is mapped in the frequency field.Said mapping is to utilize long conversion codomain (by long window definition) or short conversion codomain (by short-sighted window definition) to carry out; Signal to be analyzed is expanded to improve frequency resolution in time in said long conversion codomain, and signal to be analyzed reduces to improve temporal resolution in time in said short conversion codomain.Having only the long window type of use under the situation of stabilization signal, and using short-sighted window type when signal changes fast when existing.Through use the operation of these two types according to the characteristic of signal to be analyzed, just can prevent to be called the disgusting generating noise of preparatory echo, otherwise because inadequate temporal resolution will produce preparatory echo.
As will discussing more in detail below, said bank of filters module 102 is responsible for confirming to use which window type and utilize determined window type to produce the MDCT coefficient.In one embodiment, said bank of filters module 102 can also be responsible for when using short-sighted window type to produce the MDCT coefficient, carrying out marshalling.Marshalling has reduced the supplementary amount that is associated with short-sighted window.Each group all comprises one or more continuous short-sighted windows, and its scale factor is identical.
Saidcoding tools 104 comprises the one group of optional instrument that is used for frequency spectrum processing.For example, said coding tools can comprise that interim noise shaping (TNS) instrument and forecasting tool to carry out predictive coding, comprise that also intensity/coupling tool and MS system stereo (M/S) instrument is to carry out stereo correlative coding.
Said tonequality modeler 106 is analyzed sample value and is covered curve to confirm the sense of hearing.Curve representation is covered in the sense of hearing can be injected into the maximum noise amount in each of each sample value under situation about can not listen.In this, what be can the audible human auditory of being based on psychoacoustic model.The sense of hearing is covered curve and is served as the estimation of expecting noise spectrum.
Said quantization modules 110 is responsible for the frequency spectrum data and selects the best proportion factor.The scale factor selection course is based on according to the tolerable bit number that covers the distortion of allowing that opisometer calculates and calculate according to the bit rate according to the coding defined.In case selected the best proportion factor, said quantization modules 110 just uses them to come the sampling frequency frequency spectrum data.Spectral coefficient after the resulting quantification is organized into scale factor band (SFB).Each SFB comprises the coefficient that is produced by the use of the same ratio factor.
Each group that said huffman coding module 114 is responsible for the spectral coefficient after the quantification selects optimum Hoffman code basis and utilizes optimum Hoffman code to carry out the huffman coding operation originally.Resulting variable length code (VLC), the data, quantization modules 110 selected scale factors and some out of Memory that are identified at the code book that uses in the coding all are to be assembled in the bit stream subsequently.
In one embodiment, said bank of filters module 102 compriseswindow type determiner 108, MDCT coefficient calculator 112 and short-sighted window marshalling determiner 116.Saidwindow type determiner 108 is responsible for confirming as the MDCT computing and the window type that uses.In one embodiment, as will be below discussing in more detail, saidly confirm to utilize the window type decision method that is suitable for using long form to make.
Said MDCT coefficient calculator 112 is responsible for utilizing determined window type to calculate the MDCT coefficient.In one embodiment, said MDCT coefficient calculator 112 at first utilizes the long window type of supposition to calculate elementary MDCT coefficient.Then, if window type determiner 108 confirms that the window type that will use is not long window type, then said MDCT coefficient calculator 112 just utilizes determined window type to recomputate the MDCT coefficient.Otherwise elementary MDCT coefficient just need not to be recomputated.
Said short-sighted window marshalling determiner 116 when using short-sighted window type, carries out work and how responsible definition organizes into groups short-sighted window.In one embodiment, the energy that said short-sighted window marshalling determiner 116 bases are associated with each short-sighted window is carried out the elementary marshalling of short-sighted window, and it is divided into two groups.As will discussing in more detail below,, then just further big group is divided into two or more groups if any one in two elementary groups is excessive.
Fig. 2-the 9th, the process flow diagram of process according to various embodiments of the present invention, that can carry out by the bank of filters module of Fig. 1 102.Said process can be carried out by processing logic, and said processing logic can comprise: hardware (for example, circuit, special logic or the like), software (such as operating on general-purpose computing system or the special machine) or the two combination.Process for the software realization; The description of process flow diagram makes those skilled in the art can develop this comprising can go up the program of the instruction of carrying out said process at the computing machine of suitably configuration (execution comes from the processor of computing machine of the instruction of computer-readable medium, and said computer-readable medium comprises storer).Computer executable instructions can be write or can be embedded in the firmware logic with computer programming language.If the programming language of the standard that identifies with following is write, then can on various hardware platforms, carry out this type instruction and can be connected in various operating systems by interface.In addition, embodiments of the invention are not described with reference to any specific program design language.What will recognize that is to use various programming languages to realize described instruction here.In addition, mention the software of taking to move or causing the result, can take a kind of form or other form (for example, program, step, process, application, module, logic ...), this is known in the field.Such expression only is the summary mode that the computing machine executive software makes the processor of this computing machine carry out action or bear results.What will recognize that is, without departing from the scope of the invention, can more or less operation be incorporated among Fig. 2-9 in the middle of the illustrational process of institute, and specific order is not inferred in shown here and described arrangement.
Fig. 2 is the process flow diagram of an embodiment that is used for the frequency spectrum data frame is carried out the process 200 of MDCT.
With reference to Fig. 2, processing logic starts from the elementary MDCT coefficient sets of calculating present frame and the elementary MDCT coefficient sets (processing block 202) of next frame.Calculating is that the window type at supposition present frame and next frame all is to carry out under the situation of long window type.The elementary MDCT coefficient of present frame that is calculated and next frame all is stored in the impact damper.In one embodiment, present frame and next frame are along two consecutive frames in the frame sequence (being also referred to as piece) of time shaft generation sample value of (for example, 50% is overlapping) so that consecutive frame overlaps each other.The said overlapping distortion that on the boundary member between the consecutive frame, takes place of having suppressed.
At processing block 204, processing logic utilizes the elementary MDCT coefficient of elementary MDCT coefficient and the next frame of present frame to confirm the window type of present frame.Said window type confirms to utilize the window type decision method be suitable for using long form to make.To combine Fig. 3 to discuss an embodiment of these class methods in more detail below.
At decision box 206, whether the window type of the definite present frame of being judged of processing logic is long window type.If not, then the window type judged of processing logic utilization calculates the final MDCT coefficient sets (processing block 208) of present frame.If then processing logic is considered the elementary MDCT coefficient of present frame as final coefficient sets (processing block 210).
Fig. 3 is the process flow diagram of an embodiment of windowtype decision process 300.
With reference to Fig. 3, processing logic starts from the indication (decision box 302) of confirming in next frame, whether to exist the transformation from the stabilization signal to the transient signal.In one embodiment, this is confirmed to compare through the energy that will be associated with present frame with the energy that next frame is associated and makes.Come to discuss in more detail to be used to detect an embodiment of the transforming process from the stabilization signal to the transient signal in the middle of the frame below in conjunction with Fig. 4.
If confirm as certainly atdecision box 302, then processing logic just judges that the elementary window type of next frame is short-sighted window type (processing block 304).Otherwise processing logic judges that the elementary window type of next frame is long window type (processing block 306).
In addition, processing logic is confirmed the window type (processing block 308) of present frame according to the window type of the elementary window type of next frame and former frame.The window type of present frame confirm to be suitable for using long window type.In one embodiment, wherein as mpeg standard was defined, two transition window types possibly followed in each different window types back, and processing logic selects to make the minimized window type of use of the short-sighted window in present frame and the subsequent frame.That is to say; Said mpeg standard provides two transition window types that begin from each different window type; One of them transition window type allows in present frame or next frame, to use short-sighted window, and another transition window type allows in present frame or next frame, to use long form.Specifically, said mpeg standard allows following transformation:
A. from long window type to long window type or length-short-sighted window type;
B. from length-short-sighted window type to short-sighted window type or weak point-long window type;
C. from weak point-long window type to long window type or length-short-sighted window type; With
D. from short-sighted window type to short-sighted window type or short long window type.
Therefore; If the window type of former frame for example is the elementary window type of weak point-long window type and next frame is long window type; Then processing logic is just selected long window type for present frame, rather than selects other option, promptly makes things convenient for next frame to use the length-short-sighted window type of short-sighted window.
To combine Fig. 5 to discuss in more detail to be used for to confirm the embodiment of process of the window type of present frame below according to the window type of the elementary window type of next frame and former frame.
Above-mentioned window type decision method and MDCT calculate and combine, and directly the MDCT data are carried out computing and do not need Fast Fourier Transform (FFT) (FFT) computing and calculate the consciousness entropy.In addition, above-mentioned window type decision method is suitable for using long form, and the use of short-sighted window is minimized.It has only and when detecting the indication of the transformation from the stabilization signal to the transient signal, just uses short-sighted window.
Fig. 4 is the process flow diagram that is used to detect an embodiment of theprocess 400 of the indication of the transformation from the stabilization signal to the transient signal in the middle of the frame.
With reference to Fig. 4, processing logic starts from the set (processing block 402) of the elementary MDCT coefficient of the MDCT coefficient sets of calculating present frame and next frame.Then, processing logic is stored the MDCT coefficient sets that is calculated in impact damper.
Atprocessing block 404, the elementary MDCT coefficient of the present frame that the processing logic utilization is calculated calculates the gross energy of present frame.In one embodiment, by the following gross energy that calculates present frame:
current_total_energy=sum(current_coef[i]*current_coef[i]/C)for?i=0?to?1023,
Wherein current_coef [i] is the numerical value of i MDCT coefficient in the present frame, and C is used for preventing that constant that summation overflows is (for example, for 16 bit registers, C=32767).
Atprocessing block 406, the elementary MDCT coefficient of the next frame that the processing logic utilization is calculated calculates the gross energy of next frame.Equally, by the following gross energy that calculates next frame:
next_total_energy=sum(next_coef[i]*next_coef[i]/C)for?i=0to?1023,
Wherein next_coef [i] is the numerical value of i MDCT coefficient in the next frame, and C is used for preventing the constant that summation is overflowed.
Atprocessing block 408, processing logic is with the gross energy of logarithm mode convergent-divergent present frame and the gross energy of next frame.In one embodiment, said convergent-divergent is by following completion:
C_pow=log (current_total_energy) and n_pow=log (next_total_energy).
Atprocessing block 410, processing logic comes the compute gradient energy through the gross energy that the gross energy with the next frame behind the convergent-divergent deducts the present frame behind the convergent-divergent.
At decision box 412, processing logic confirms whether gradient energy exceeds threshold value (for example, 1).In one embodiment, said threshold value experimentally defines.If confirm as certainly at decision box 412, then processing logic is just judged the transformation (processing block 414) that in next frame, occurs to transient signal probably.
Fig. 5 is the process flow diagram of an embodiment of process 500 that is used for confirming according to the window type of the elementary window type of next frame and former frame the window type of present frame.
With reference to Fig. 5, whether processing logic is that long window type begins (decision box 502) from the elementary window type of confirming next frame.If then processing logic confirms that further the window type of former frame is long window type or weak point-long window type (decision box 504).If then processing logic is long window type (processing block 506) with regard to the window type of judging present frame.If not, then processing logic is weak point-long window type (processing block 508) with regard to the window type of judging present frame.
If confirm as negatively atdecision box 502, promptly the elementary window type of next frame is short-sighted window type, and then processing logic confirms that further the window type of former frame is long window type or weak point-long window type (decision box 510).If then processing logic is a length-short-sighted window type (processing block 512) with regard to the window type of judging present frame.If not, then processing logic is short-sighted window type (processing block 514) with regard to the window type of judging present frame.
In one embodiment, use short-sighted window type, then utilize short-sighted window to organize into groups and reduce the supplementary amount that is associated with short-sighted window if be judged to be a frame.Each group all comprises one or more continuous short-sighted windows, and its scale factor is identical.In one embodiment, the information about marshalling is comprised in the middle of the specified bit stream element.In one embodiment, the quantity that comprises the short-sighted window in group's quantity and each frame in the frame about the information of marshalling.
Fig. 6 is the process flow diagram of an embodiment that is used to organize into groups theprocess 600 of the short-sighted window in the frame.
With reference to Fig. 6, processing logic begins (processing block 602) from the short-sighted window of the first kind of identification in the frame and the short-sighted window of second type.The type of short-sighted window is that basis is confirmed with the energy that this form is associated.To combine Fig. 7 to discuss in more detail to be used for to confirm an embodiment of the process of short-sighted window type below.
Atprocessing block 604, its classification of processing logic adjustment is the type of incorrect short-sighted window probably.In one embodiment, the type of adjacent form and adjacent form do not have identical type if its type matches, and then the classification of short-sighted window is likely incorrect.In one embodiment, wherein the quantity of short-sighted window equals 8 in the frame, and said adjustment process can be expressed as:
for?win_i?ndex?1?to?6
if(candidate[win_index_1]=candidate[win_index+1]),
Candidate [win_index]=candidate [win_index-1]; Wherein win_index refers to the number of the short-sighted window in the frame, and candidate [win_index], candidate [win_index-1] and candidate [win_index+1] represent the type of current window, last form and next form respectively.
Atprocessing block 606, processing logic is organized into two elementary groups with the short-sighted window in the frame according to their type.To combine Fig. 8 to discuss in more detail to be used to create the embodiment of process of two elementary groups of short-sighted window below.
Atdecision box 608, processing logic confirms whether the quantity of the short-sighted window in any elementary group exceeds number of thresholds.In one embodiment, number of thresholds is a constant of experimentally confirming.Depend on number of thresholds, possible neither one group is excessive, and perhaps one or two elementary group is excessive.In another embodiment, said number of thresholds is the quantity of the short-sighted window in other elementary group, and if the quantity of the short-sighted window in elementary group exceeds the quantity of the short-sighted window in other elementary group then processing logic just judges that it exceeds threshold value.When using relatively, the elementary group of possible neither one is excessive, or an elementary group possibly be excessive.When group was excessive, its short-sighted window that will have a different qualities combined probably.Then, the use of the public scale factor of this group possibly cause the decline of tonequality.
If processing logic confirms that atdecision box 608 any group in two elementary groups is excessive, then processing logic just further is divided into two or more final groups (processing block 610) with big elementary group.Accomplish final group by such method, so that group's quantity can realize the balance between code efficiency and the tonequality.To combine Fig. 9 to describe in more detail to be used to carry out the embodiment of process of the final marshalling of short-sighted window below.
Atprocessing block 612, processing logic is confirmed the quantity of the group in the frame and the quantity of the short-sighted window in each group according to final group.
Fig. 7 is the process flow diagram of an embodiment ofprocess 700 that is used for confirming the type of short-sighted window.
With reference to Fig. 7, processing logic begins (processing block 702) from the energy that calculates each the short-sighted window in the frame.In one embodiment, by the following energy that calculates each short-sighted window:
Win_energy [win_index]=log [sum (coef [i] * coef [i])+0.5], the number of the current short-sighted window in [win_index] identification frames wherein, win_energy is resulting energy, and coef [i] is an i spectral coefficient in the short-sighted window.
Next, processing logic finds short-sighted window (processing block 704) with least energy and the skew energy value (processing block 706) that calculates each the short-sighted window in the said frame.In one embodiment, the skew energy value is to deduct least energy through the energy with corresponding short-sighted window to calculate.
Atprocessing block 708, processing logic is through calculating the mean deviation energy value of said frame divided by the short-sighted window quantity in the said frame with the summation of all the skew energy values in the said frame.
Atdecision box 710, processing logic is that the first short-sighted window determines whether that its skew energy value exceeds the mean deviation energy value.If then processing logic judges that this short-sighted window has the first kind (processing block 712).If not, then processing logic judges that this short-sighted window has second type (processing block 714).
Next, processing logic confirms in said frame, whether to have more untreated form (decision box 715).If then processing logic moves to next short-sighted window (processing block 716) and advances to decision box 710.If not, then process 700 finishes.
Fig. 8 is the process flow diagram of an embodiment ofprocess 800 that is used to create two elementary groups of short-sighted window.
With reference to Fig. 8, processing logic begins (processing block 802) from one group of variable of initialization.For example, the value that processing logic can last window type variable is set to the first short-sighted window type, and the numerical value of elementary group number variable is set to 1, and the numerical value of the first elementary group length variable is set to 1.
Next, the second short-sighted window of processing logic from said frame begins to handle short-sighted window.Specifically, the processing logic type whether identical with the first short-sighted window type (decision box 804) of confirming current short-sighted window.If then processing logic makes the first elementary group length increase by 1 (processing block 806), and whether inspection also has more short-sighted window still be untreated (decision box 808).If also have more short-sighted window still to be untreated, then processing logic moves to next short-sighted window (processing block 810) and turns back to decision box 804.If also not untreated short-sighted window, then process 800 finishes.
If processing logic confirms that atdecision box 804 type of current short-sighted windows is different with the type of the first short-sighted window, then the just elementary group number of processing logic is set to 2 (processing blocks 812) and calculates the length (processing block 814) of the second elementary group through the length that the total quantity that will lack frame deducts the first elementary group.
Fig. 9 is the process flow diagram of an embodiment ofprocess 900 that is used to carry out the final marshalling of short-sighted window.Process 900 is operated according to mpeg standard, equals 8 according to the quantity of the short-sighted window in the said frame of this standard.
With reference to Fig. 9, whether processing logic exceeds threshold value (for example, 4) beginning (decision box 902) from the length of confirming the first elementary group.If then processing logic confirms further whether the length of the first elementary group equals 8 (decision boxs 904).If then the final group's quantity of the processing logic length that is set to 2, the first final groups is set to the length of the first elementary group, and the length of the second final group is set to the length (processing block 906) of the second elementary group.If not; Then the final amt of processing logic group is set to 3 (processing blocks 908); The length of the 3rd final group is set to the length (processing block 910) of the second elementary group; Through with the length of elementary second group divided by two calculate the second final group length (said calculating can use window_group_length [1]>>1 to represent) (processing block 912), and calculate the length (processing block 914) of the first final group through the length that the length with the first elementary group deducts the second final group.
If processing logic confirms that atdecision box 902 length of the first elementary group does not exceed threshold value, then whether its just further length of confirming the first elementary group is less than threshold value (decision box 916).If; The final amt of processing logic group is set to 3 (processing blocks 917); Through with the length of the second elementary group divided by two calculate the 3rd final group length (said calculating can use window_group_length [2]>>1 to represent) (processing block 918); The length that deducts the 3rd final group through the length from the second elementary group is calculated the length (processing block 920) of the second final group, and the length of the first final group is set to the length (processing block 922) of the first elementary group.
If processing logic confirms that atdecision box 916 length of the first elementary groups is not less than threshold value, then it just the quantity of the group length that is set to the 2 and first final group length that is set to length and the second final group of the first elementary group be set to the length (processing block 924) of the second elementary group.
Figure 10 is the example of the marshalling of the short-sighted window of clear frame for example.
With reference to Figure 10, the type of the short-sighted window of organizing into groups is shown as grouping_bits " 11100011 ".The type of short-sighted window can be confirmed by theprocess 700 of Fig. 7.According to these types of short-sighted window, can be atfirst process 800 through Fig. 8 short-sighted window is organized into two elementary groups, create first elementary group with 3 short-sighted windows and the second elementary group thus with 5 short-sighted windows.Next, can utilize number ofthresholds 4 to come theprocess 900 of execution graph 9, further the second elementary group is divided into two groups.Consequently, created three final groups, wherein the first final group has 3 short-sighted windows, and the second final group has 3 short-sighted windows, and the 3rd final group has 2 short-sighted windows.
The following description of Figure 11 is intended to provide to being applicable to the general introduction of the assembly of realizing computer hardware of the present invention and other work, still is not intended to the restriction applied environment.Figure 11 is for example clear to be suitably used as codedsystem 100 or only to be used as the embodiment of computer system of the bank of filters module 102 of Fig. 1.
Said computer system 1140 comprises: processor 1150, storer 1155 and be coupled in the input/output capabilities 1160 of system bus 1165.Said storer 1155 is configured to store the instruction that when being carried out by processor 1150, moves method described here.I/O 1160 also comprises various types of computer-readable mediums, and comprising can be by the memory device of any kind of processor 1150 visit.Those skilled in the art will recognize immediately: term " computer-readable medium/media " also comprises the carrier wave of encoded data signal.What also will recognize is: said system 1140 is controlled by operating system software performed in the storer 1155.I/O and associated media 1160 storages are used for the executable instruction and the method for the present invention of operating system.Bank of filters module 102 shown in Fig. 1 can be that the stand-alone assembly that is coupled in processor 1150 perhaps can be specialized by processor 1150 performed computer executable instructions.In one embodiment, said computer system 1140 can be that a part of of ISP (ISP) perhaps is coupled in ISP through I/O 1160 so that sending or receiving view data on the Internet.It is obvious that: the invention is not restricted to access to the Internet and the Internet website based on Web; It is also contemplated that direct-coupled network and dedicated network.
What will recognize that is, said computer system 1140 is examples that have in the many possible computer system of different architectures.Typical computer will comprise usually at least: processor, storer and storer is coupled in the bus of processor.Those skilled in the art will recognize immediately: can utilize other Computer Systems Organization to come embodiment of the present invention, comprise multicomputer system, microcomputer, mainframe computer or the like.The present invention can also implement in distributed computing environment, and task is to be carried out by the teleprocessing equipment through linked in said distributed computing environment.
Described aspect carry out in the audio coding process that window type confirms various.Although illustrated and described specific embodiment here, what one of ordinary skill in the art will recognize that is the specific embodiment shown in calculating can substitute with any scheme that reaches identical purpose.The application is intended to contain any modification of the present invention or distortion.