HK1123624B

Movatterモバイル変換

Info

Publication number: HK1123624B
Application number: HK09101393.2A
Authority: HK
Inventors: 房熙锡; 吴贤午; 金东秀; 林宰显; 郑亮源
Original assignee: Lg电子株式会社
Priority date: 2005-06-30
Filing date: 2006-06-30
Publication date: 2013-10-11

Description

Method and apparatus for encoding and decoding audio signal

Technical Field

The present invention relates to audio signal processing, and more particularly, to an apparatus for encoding and decoding an audio signal and method thereof.

Background

In general, an audio signal encoding apparatus compresses an audio signal into a mono or stereo form of a downmix signal, instead of compressing each channel of a multi-channel audio signal. The audio signal encoding apparatus transfers the compressed down-mix signal and the spatial information signal (or the ancillary data signal) to the decoding apparatus and stores the compressed down-mix signal and the spatial information signal in a storage medium.

In this case, a spatial information signal extracted in a process of downmixing a multi-channel audio signal is used to restore an original multi-channel audio signal from a compressed downmix signal.

The spatial information signal includes a header and spatial information. And, the header includes configuration information. The header is information explaining spatial information.

The audio signal decoding apparatus decodes the spatial information using the configuration information contained in the header. The configuration information contained in the header is transferred to a decoding apparatus together with the spatial information or stored in a storage medium.

The audio signal encoding apparatus multiplexes the encoded downmix signal and the spatial information signal into one bitstream form and then delivers the multiplexed signal to the decoding apparatus. Since the configuration information is generally invariant, a header containing the configuration information is inserted into the bitstream at once. Since the configuration information is transmitted only once at the earliest insertion into the audio signal, the audio signal decoding apparatus has a problem in decoding the spatial information since the configuration information does not exist in the case of reproducing the audio signal from a random point in time. That is, since an audio signal is reproduced from a specific time point requested by a user in the case of broadcasting, VOD (video on demand), etc., not from a beginning portion, configuration information delivered in the form contained in the audio signal cannot be used. And thus cannot decode spatial information.

Disclosure of Invention

An object of the present invention is to provide a method and apparatus for encoding and decoding an audio signal, which allows the audio signal to be decoded by selectively including a header in one frame of a spatial information signal.

Another object of the present invention is to provide a method and apparatus for encoding and decoding an audio signal, which can decode the audio signal even if the audio signal is reproduced from a random point by an audio signal decoding apparatus by including a plurality of headers in a spatial information signal.

To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described herein, a method of decoding an audio signal according to the present invention includes: an audio signal including a downmix signal and a spatial information signal is received, and if a header is included in the spatial information signal, configuration information is extracted from the header, spatial information included in the spatial information signal is extracted, and the downmix signal is converted into a multi-channel signal using the configuration information and the spatial information.

Drawings

Fig. 1 is a block diagram of an audio signal according to an embodiment of the present invention.

Fig. 2 is a block diagram of an audio signal according to another embodiment of the present invention.

Fig. 3 is a block diagram of an apparatus for decoding an audio signal according to an embodiment of the present invention.

Fig. 4 is a block diagram of an apparatus for decoding an audio signal according to another embodiment of the present invention.

Fig. 5 is a flowchart of a method of decoding an audio signal according to an embodiment of the present invention.

Fig. 6 is a flowchart of a method of decoding an audio signal according to another embodiment of the present invention.

Fig. 7 is a flowchart of a method of decoding an audio signal according to still another embodiment of the present invention.

Fig. 8 is a flow diagram of a method for obtaining location information representing a quantity, according to one embodiment of the invention.

Fig. 9 is a flowchart of a method of decoding an audio signal according to still another embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings.

For the understanding of the present invention, an apparatus and method for encoding an audio signal will be described before explaining an apparatus and method for decoding an audio signal. However, the decoding apparatus and method according to the present invention are not limited to the encoding apparatus and method hereinafter. Also, the present invention is applicable to an audio coding scheme for generating multi-channels using spatial information as well as MP3(MPEG 1/2-layer III) and AAC (advanced audio coding).

Fig. 1 is a block diagram of an audio signal transferred from an audio signal encoding apparatus to an audio signal decoding apparatus according to an embodiment of the present invention.

Referring to fig. 1, an audio signal includes an audio descriptor 101, a downmix signal 103, and a spatial information signal 105.

In the case of using an encoding scheme for reproducing an audio signal of a broadcast or the like, the audio signal includes auxiliary data in addition to the audio descriptor 101 and the downmix signal 103. The invention comprises a spatial information signal 105 as auxiliary data. In order for the audio signal decoding apparatus to understand the basic information of the audio codec without analyzing the audio signal, the audio signal optionally includes an audio descriptor 101. The audio descriptor 101 is composed of a small amount of basic information necessary for audio decoding, such as a transmission rate of a transmitted audio signal, the number of channels, a sampling frequency of compressed data, an identifier indicating a currently used codec, and the like.

The audio signal decoding apparatus can know a type of codec used by the audio signal using the audio descriptor 101. Specifically, using the audio descriptor 101, the audio signal decoding apparatus can know whether the received audio signal is a signal for restoring multi-channels using the spatial information signal 105 and the downmix signal 103. In this case, the multi-channels include virtual three-dimensional surround in addition to the actual multi-channels. An audio signal having the spatial information signal 105 and the downmix signal 103 combined together can be heard through one or two channels by a virtual three-dimensional surround technology.

The location of the audio descriptor 101 is independent of the downmix signal 103 or the spatial information signal 105 included in the audio signal. For example, the channel descriptor 101 is located in a separate field indicating an audio signal.

In case that a header is not provided to the downmix signal 103, the audio signal decoding apparatus can decode the downmix signal 103 using the audio descriptor 101.

The downmix signal 103 is a signal generated by downmixing multi-channels. The downmix signal 103 may be generated from a downmix unit (not shown in the drawings) included in an audio signal encoding apparatus (not shown in the drawings) or artificially.

The downmix signal 103 can be divided into a case of including the spatial information signal 105 and a case of not including a header.

In the case where the downmix signal 103 includes a header, the header is contained in each frame by a frame unit. In the case where the downmix signal 103 does not include a header, as mentioned in the foregoing description, the audio signal decoding apparatus decodes the downmix signal 103 using the audio descriptor 101. The downmix signal 103 may take the form of a header included in each frame or a form of not including a header. In addition, the downmix signal 103 is included in the audio signal in the same manner until the end of the content.

The spatial information signal 105 is also divided into a case of containing a header and spatial information and a case of containing only the spatial information without containing the header. The header of the spatial information signal 105 is different from the header of the downmix signal 103 in that the header thereof does not need to be identically inserted into each frame. Specifically, the spatial information signal 105 can use both a frame including a header and a frame not including a header. Most of the information contained in the header of the spatial information signal 105 is configuration information in which the configuration information decodes the spatial information by interpreting the spatial information.

Fig. 2 is a block diagram of an audio signal transferred from an audio signal encoding apparatus to an audio signal decoding apparatus according to another embodiment of the present invention.

Referring to fig. 2, an audio signal includes a downmix signal 103 and a spatial information signal 105. Also, the audio signal exists in an ES (elementary stream) form in which frames are arranged.

Each of the downmix signal 103 and the spatial information signal 105 is occasionally delivered to the audio signal decoding apparatus as an independent ES form. Also, as shown in fig. 2, the downmix signal 103 and the spatial information signal 105 are combined into an ES form to wait for being delivered to the audio signal decoding apparatus.

In the case where the downmix signal 103 and the spatial information signal 105 combined into one ES form are transmitted to the audio signal decoding apparatus, the spatial information signal 105 is contained in the position of the ancillary data (ancillary data) or the additional data (extension data) of the downmix signal 103.

And, the audio signal may include signal identification information indicating whether the spatial information signal 105 is combined with the downmix signal 103.

One frame of the spatial information signal 105 is divided into a case of containing the header 201 and the spatial information 203 and a case of containing only the spatial information 203. Specifically, the spatial information signal 105 can use a frame including the header 201 together with a frame not including the header 201.

In the present invention, the header 201 is inserted into the spatial information signal 105 at least once. Specifically, the audio signal encoding apparatus may insert the header 201 into each frame in the spatial information signal 105, periodically insert the header 201 into each frame of a fixed interval in the spatial information signal 105, or insert the header 201 into each frame of a random interval in the spatial information signal 105 without being periodically inserted.

The audio signal may include information indicating whether the header 201 is included in the frame 201 (hereinafter, referred to as "header identification information").

In the case where the header 201 is contained in the spatial information signal 105, the audio signal decoding apparatus extracts the configuration information 205 from the header 201 and then decodes the spatial information 203 transmitted after (behind) the header 201 according to the configuration information 205. Since the header 201 is information decoded by interpreting the spatial information 203, the header 201 is transmitted at a previous stage of transmitting the audio signal.

In the case where the header 201 is not included in the spatial information signal 105, the audio signal decoding apparatus decodes the spatial information 203 using the header 201 transferred in the previous stage.

In the case where the header 201 is lost while the audio signal is transferred from the audio signal encoding apparatus to the audio signal decoding apparatus or in the case where the audio signal transferred in the data stream format is decoded from the middle portion thereof for broadcasting or the like, the previously transmitted header 201 cannot be used. In this case, the audio signal decoding apparatus extracts the configuration information 205 from a header 201 different from a previous header 201, into which the audio signal is first inserted, and is then able to decode the audio signal using the extracted configuration information 205. In this case, the configuration information 205 extracted from the header 201 inserted in the audio signal may be the same as or different from the previous configuration information 205 extracted from the header 201 transmitted at the previous stage.

If the header 201 is changed, the configuration information 205 is extracted from the new header 201, the extracted configuration information 205 is decoded and then the spatial information 203 transmitted after the header 201 is decoded. If the header 201 is not changed, it is determined whether the new header 201 is identical to the old header 201 that was previously transmitted. If the two headers 201 are different from each other, an error is detected in the audio signal on the audio signal transmission path.

The configuration information 205 extracted from the header 201 of the spatial information signal 105 is information that interprets the spatial information 203.

The spatial information signal 105 may include information (hereinafter, referred to as "time alignment information") for discriminating a delay difference between the two signals in generating multi-channels using the downmix signal 103 and the spatial information signal 105 by the audio signal decoding apparatus.

An audio signal transmitted from an audio signal encoding apparatus to an audio signal decoding apparatus is parsed by a demultiplexing unit (not shown in the drawing) and then separated into a downmix signal 103 and a spatial information signal 105.

The downmix signal 103 separated by the demultiplexing unit is decoded. The decoded downmix signal 103 generates multi-channels using the spatial information signal 105. In combining the downmix signal 103 and the spatial information signal 105 to generate multi-channels, the audio signal decoding apparatus can adjust synchronization between the two signals, a position of a start point of combining the two signals, and the like using time alignment information (not shown in the drawing) included in the configuration information 205 extracted from the header 201 of the spatial information signal 105.

The position information 207 of the slot to which the parameter is applied is contained in the spatial information 203 contained in the spatial information signal 105. As the spatial parameter (spatial cue), there are CLD (channel level difference) indicating energy difference between audio signals, ICC (inter-channel correlation) indicating proximity or similarity between audio signals, CPC (channel prediction coefficient) indicating a coefficient for predicting an audio signal value using other signals. Hereinafter, each spatial cue or a bundle of spatial cues is referred to as a "parameter".

In the case where N parameters exist in one frame included in the spatial information signal 105, the N parameters are respectively applied to specific slot positions of the frames. If the information indicating to which slot of slots contained in one frame a parameter is applied is referred to as the position information 207 of the slot, the audio signal decoding apparatus decodes the spatial information 203 using the position information 207 of the slot to which the parameter is applied. In this case, the parameters are contained in the spatial information 203.

Fig. 3 is a block schematic diagram of an apparatus for decoding an audio signal according to an embodiment of the present invention.

Referring to fig. 3, an apparatus for decoding an audio signal according to an embodiment of the present invention includes a receiving unit 301 and an extracting unit 303.

The receiving unit 301 of the audio signal decoding apparatus receives the audio signal transmitted IN ES form by the audio signal encoding apparatus via the input terminal IN 1.

The audio signal received by the audio signal decoding apparatus includes an audio descriptor 101 and a downmix signal 103 and may further include a spatial information signal 105 as ancillary data (ancillary data) or extension data (extension data).

The extracting unit 303 of the audio signal decoding apparatus extracts the configuration information 205 from the header 201 contained in the received audio signal and then outputs the extracted configuration information 205 via the output terminal OUT 1.

The audio signal may include header identification information identifying whether the header 201 is included in one frame.

The audio signal decoding apparatus discriminates whether or not the header 201 is included in the frame using the header identification information included in the audio signal. If the header 201 is contained, the audio signal decoding apparatus extracts the configuration information 205 from the header 201. In the present invention, the spatial information signal 105 includes at least one header 201.

Referring to fig. 4, an apparatus for decoding an audio signal according to another embodiment of the present invention includes a receiving unit 301, a demultiplexing unit 401, a core decoding unit 403, a multi-channel generating unit 405, a spatial information decoding unit 407, and an extracting unit 303.

The receiving unit 301 of the audio signal decoding apparatus receives the audio signal transmitted IN the form of a bit stream from the audio signal encoding apparatus via the input terminal IN 2. Also, the receiving unit 301 sends the received audio signal to the demultiplexing unit 401.

The demultiplexing unit 401 separates the audio signal transmitted through the receiving unit 301 into an encoded downmix signal 103 and an encoded spatial information signal 105. The demultiplexing unit 401 transfers the encoded downmix signal 103 separated from the bitstream to the core decoding unit 403 and transfers the encoded spatial information signal 105 separated from the bitstream to the extracting unit 303.

The encoded downmix signal 103 is decoded by the core decoding unit 403 and then supplied to the multi-channel generating unit 405. The encoded spatial information signal 105 includes a header 201 and spatial information 203.

The extracting unit 303 extracts the configuration information 205 from the header 201 if the header 201 is included in the encoded spatial information signal 105. The extracting unit 303 can recognize the presence of the header 201 using the header identification information contained in the audio signal. Specifically, the header identification information indicates whether the header 201 is included in one frame included in the spatial information signal 105. The header identification information may indicate a frame order or a bit sequence of the audio signal, and if the header 201 is included in the frame, the audio signal includes the configuration information extracted from the header 201.

In the case where it is determined via the header identification information that the header 201 is contained in the frame, the extraction unit 303 extracts the configuration information 205 from the header 201 contained in the frame. The extracted configuration information is then decoded 205.

The spatial information decoding unit 407 decodes the spatial information 203 included in the frame according to the decoded configuration information 205.

And, the multi-channel generating unit 405 generates a multi-channel signal using the decoded downmix signal 103 and the decoded spatial information 203 and then outputs the generated multi-channel signal via an output terminal OUT 2.

Referring to fig. 5, the audio signal decoding apparatus receives the spatial information signal 105 transmitted in the form of a bitstream by the audio signal encoding apparatus (S501).

As mentioned in the foregoing description, the spatial information signal 105 is divided into a case of being transmitted as an ES separate from the downmix signal 103 and a case of being transmitted in combination with the downmix signal 103.

The downmixing unit 401 of the audio signal separates the received audio signal into an encoded downmix signal 103 and an encoded spatial information signal 105. The encoded spatial information signal 105 includes a header 201 and spatial information 203. If the header 201 is included in one frame of the spatial information signal 105, the audio signal decoding apparatus recognizes the header 201 (S503).

The audio signal decoding apparatus extracts the configuration information 205 from the header 201 (S505).

Also, the audio signal decoding apparatus decodes the spatial information 203 using the extracted configuration information 205 (S507).

Referring to fig. 6, the audio signal decoding apparatus receives the spatial information signal 105 transmitted in the form of a bitstream by the audio signal encoding apparatus (S501).

As mentioned in the foregoing description, the spatial information signal 105 is divided into a case of being transmitted as an ES separate from the downmix signal 103 and a case of being transmitted as being included in the ancillary data or the extension data of the downmix signal 103.

The demultiplexing unit 401 of the audio signal separates the received audio signal into the encoded downmix signal 103 and the encoded spatial information signal 105. The encoded spatial information signal 105 includes a header 201 and spatial information 203. The audio signal decoding apparatus determines whether or not the header 201 is included in one frame (S601).

If the header 201 is contained in the frame, the audio signal decoding apparatus recognizes the header 201 (S503).

The audio signal decoding apparatus then extracts the configuration information 205 from the header 201 (S505).

The audio signal decoding apparatus judges whether the configuration information 205 extracted from the header 201 is the configuration information 205 extracted from the first header 201 included in the spatial information signal 105 (S603).

If the configuration information 205 is extracted from the header 201 first extracted from the audio signal, the audio signal decodes the device-decoded configuration information 205(S611) and decodes the spatial information 203 transmitted after the configuration information 205 according to the decoded configuration information 205.

If the header 201 extracted from the audio signal is not the header 201 first extracted from the spatial information signal 105, the audio signal decoding apparatus determines whether the configuration information 205 extracted from the header 201 is the same as the configuration information extracted from the first header 201 (S605).

If the configuration information 205 is the same as the configuration information 205 extracted from the first header 201, the audio signal decoding apparatus decodes the spatial information 203 using the decoded configuration information 205 extracted from the first header 201.

If the extracted configuration information 205 is different from the configuration information 205 extracted from the first header 201, the audio signal decoding apparatus determines whether an error occurs in the audio signal on the transmission path from the audio signal encoding apparatus to the audio signal decoding apparatus (S607).

If the configuration information 205 is changed, an error does not occur even if the configuration information 205 is different from the configuration information 205 extracted from the first header 201. Accordingly, the audio signal decoding apparatus updates the header 201 to a new header 201 (S609). The audio signal decoding apparatus then decodes the configuration information 205 extracted from the updated header 201 (S611).

The audio signal decoding apparatus decodes the spatial information 203 transmitted after the configuration information 205 according to the decoded configuration information 205.

If the configuration information 205 that is not changed is different from the configuration information extracted from the first header 201, this means that an error occurs on the audio signal transmission path. Accordingly, the audio signal decoding apparatus removes the spatial information 203 contained in the frame including the error configuration information 205 or corrects the error of the spatial information 203 (S613).

Referring to fig. 7, the audio signal decoding apparatus receives the spatial information signal 105 transmitted in the form of a bitstream by the audio signal encoding apparatus (S501).

The demultiplexing unit 401 of the audio signal separates the received audio signal into the encoded downmix signal 103 and the encoded spatial information signal 105. In this case, the position information 207 of the slot to which the parameter is applied is contained in the spatial information signal 105.

The audio signal decoding apparatus extracts the position information 207 of the time slot from the spatial information 203 (S701).

The audio signal decoding apparatus applies the parameter to the corresponding slot by adjusting the position of the slot to which the parameter is applied using the extracted position information of the slot (S703).

Fig. 8 is a flow chart of a method of obtaining a position information token according to an embodiment of the invention. The position information characterizing quantity of a time slot is the number of bits allocated to the position information 207 characterizing the time slot.

The location information characterizing quantity of the time slot to which the first parameter is applied can be found by subtracting the number of parameters from the number of time slots, adding 1 to the subtraction result, taking a base-2 logarithm of the added value and applying a ceil function to the logarithm value. Specifically, the location information characterizing quantity of the time slot to which the first parameter is applied may be defined by ceil (log)₂(k-i +1)), where "k" and "i" are the number of slots and the number of parameters, respectively.

Assuming that "N" is a natural number, the position information characterizing quantity of the time slot to which the (N +1) th parameter is applied is represented as the position information 207 of the time slot to which the nth parameter is applied. In this case, the position information 207 of the slot to which the nth parameter is applied may be obtained by adding the number of slots existing between the slot to which the nth parameter is applied and the slot to which the (N-1) th parameter is applied to the position information of the slot to which the (N-1) th coefficient is applied and adding 1 to the added value (S801). Specifically, the position information of the slot to which the (N +1) th parameter is applied may be found by j (N) + r (N +1) +1, where r (N +1) denotes the number of slots existing between the slot to which the (N +1) th parameter is applied and the slot to which the nth parameter is applied.

If the position information of the time slot to which the nth parameter is applied is found, a time slot position information characterizing quantity indicating the position of the time slot in which the (N +1) th parameter acts can be obtained. Specifically, it can be obtained by subtracting the number of parameters applied to one frame and the position information of the slot to which the nth parameter is applied from the number of slots and adding (N +1) to the subtracted value (S803). Specifically, it can pass ceil (log)₂(k-i + N +1-j (N)) to obtain the position information characterizing quantity of the time slot to which the (N +1) th parameter is applied, wherein "k", "i", and "j (N)" are the number of time slots, the number of parameters, and the position information 205 of the time slot to which the nth parameter is applied, respectively.

In the case where the position information characterizing quantity of the time slot is obtained in the above-described manner, the position information characterizing quantity of the time slot to which the (N +1) th parameter is applied has the number of allocated bits inversely proportional to "N". That is, the position information characterizing quantity of the slot to which the parameter is applied is a variation value depending on "N".

The audio signal decoding apparatus receives an audio signal from the audio signal encoding apparatus (S901). The audio signal includes an audio descriptor 101, a downmix signal 103 and a spatial information signal 105.

The audio signal decoding apparatus extracts the audio descriptor 101 included in the audio signal (S903). The audio descriptor 101 includes an identifier indicating an audio codec.

The audio signal decoding apparatus recognizes that the audio signal includes the downmix signal 103 and the spatial information signal 105 using the audio descriptor 101. Specifically, the audio signal decoding apparatus can recognize that the transmitted audio signal is a signal for forming multi-channels using the spatial information signal 105 (S905).

In addition, the audio signal decoding apparatus converts the downmix signal 103 into a multi-channel signal using the spatial information signal 105. As mentioned in the foregoing description, the headers 201 are each contained in the spatial information signal 105 at predetermined intervals.

Industrial applications

As mentioned in the foregoing description, the method and apparatus for encoding and decoding an audio signal according to the present invention can select a header to be included in a spatial information signal.

In addition, in the case where a plurality of headers are included in the spatial information signal, the method and apparatus for encoding and decoding an audio signal according to the present invention can decode the spatial information even if the audio signal decoding apparatus reproduces the audio signal from a random point.

While the invention has been illustrated and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

1. A method of decoding an audio signal, comprising:

receiving an audio signal including a downmix signal and a spatial information signal, the audio signal further including header identification information;

acquiring the header identification information indicating whether a frame of the spatial information signal contains a header;

extracting configuration information from the header when the header identification information indicates that the frame of the spatial information signal includes the header, and extracting configuration information from a header transmitted in a previous stage when the header identification information indicates that the frame of the spatial information signal does not include the header;

extracting spatial information from the spatial information signal;

generating a multi-channel signal from the downmix signal based on the configuration information and the spatial information, an

Wherein the configuration information includes time alignment information,

the time alignment information identifies a time delay between the spatial information signal and the downmix signal when the spatial information signal is embedded in the downmix signal.

2. The method of claim 1, further comprising:

the parameters contained in the spatial information signal are applied to corresponding slots using position information contained in the spatial information signal indicating to which slot in a frame a parameter is to be applied.

3. The method of claim 1, wherein the audio signal includes signal identification information indicating whether the spatial information signal is combined with a downmix signal.

4. The method of claim 1, further comprising:

the start position of the frame of the auxiliary signal is identified using the temporal alignment information.

5. An apparatus for decoding an audio signal, comprising:

a receiving unit receiving an audio signal including a downmix signal and a spatial information signal, the audio signal further including header identification information indicating whether a frame of the spatial information signal has a corresponding header;

an extracting unit, wherein when the header identification information indicates that the frame of the spatial information signal includes a header, configuration information is extracted from the header, and when the header identification information indicates that the frame of the spatial information signal does not include a header, configuration information is extracted from a header transmitted in a previous stage;

a spatial information decoding unit extracting spatial information from the spatial information signal;

a multi-channel generating unit generating a multi-channel signal from the downmix signal based on the configuration information and the spatial information, an

Wherein the configuration information includes time alignment information identifying a time delay between the spatial information signal and the downmix signal when the spatial information signal is embedded in the downmix signal.