Movatterモバイル変換

Silence compression

From Wikipedia, the free encyclopedia

Computer technology

Silence compression is an audio processing technique used to effectively encode silent intervals, reducing the amount of storage or bandwidth needed to transmit audio recordings.

Overview

[edit]

Silence can be defined as audio segments with negligible sound. Examples of silence are pauses between words or sentences in speech and pauses between notes in music. By compressing the silent intervals, the audio files become smaller and easier to handle, store, and send while still retaining the original sound quality. While techniques vary, silence compression is generally achieved through two crucial steps: detection of the silent intervals and the subsequent compression of those intervals. Applications of silence compression includetelecommunications, audio streaming, voice recognition, audio archiving, and media production.^[1]

Techniques

[edit]

1. Trimming

[edit]

Trimming is a method of silence compression in which the silent intervals are removed altogether. This is done by identifying audio intervals below a certain amplitude threshold, indicating silence, and removing that interval from the audio. A drawback of trimming is that it permanently changes the original audio and can cause noticeableartifacts when the audio is played back.^[1]

a. Amplitude Threshold Trimming

[edit]

Amplitude threshold trimming removes silence through the setting of an amplitude threshold in which any audio segments that fall below this threshold are considered silent and are truncated or completely removed. Some common amplitude threshold trimming algorithms are:^{[citation needed]}

Fixed Threshold: In a fixed threshold approach, a static amplitude level is selected, and any audio segments that fall below this threshold are removed. A drawback to this approach is that it can be difficult to choose an appropriate fixed threshold, due to differences in recording conditions and audio sources.^{[citation needed]}
Dynamic Threshold: In a dynamic threshold approach, an algorithm is applied to adjust the threshold dynamically based on audio characteristics. An example algorithm is setting the threshold as a fraction of the average amplitude in a given window. This approach allows for more adaptability when dealing with varying audio sources but requires more processing complexity.^{[citation needed]}

b. Energy-Based Trimming

[edit]

Energy-based trimming works through the analysis of an audio signal's energy levels. The energy level of an audio signal is the magnitude of the signal over a short time interval. A common formula to calculate the audio's energy is $E=\sum _{k=1}^{N}(x(k))^{2}$ , where $E {\displaystyle E}$ is the energy of the signal, $N {\displaystyle N}$ is the samples within the audio signal, and $x(k)$ is the $k {\displaystyle k}$ ^th sample's signal amplitude. Once the energy levels are calculated, a threshold is set in which all energy levels that fall below the threshold are considered to be silent and removed. Energy-based trimming can detect silence more accurately than amplitude-based trimming as it considers the overall power output of the audio as opposed to just the amplitude of the sound wave. Energy-based trimming is often used for voice/speech files due to the need to only store and transmit the relevant portions that contain sound. Some popular energy-based trimming algorithms include the Short-Time Energy (STE) andZero Crossing Rate (ZCR) methods.^[2] Similarly, those algorithms are also used invoice activity detection (VAD) to detect speech activity.^[1]^[3]

2. Silence Suppression

[edit]

Silence suppression is a technique used within the context ofVoice over IP (VoIP) and audio streaming to optimize the rate of data transfer. Through the temporary reduction of data in silent intervals, Audio can be broadcast over the internet in real-time more efficiently.^[1]^[3]

a.Discontinuous Transmission (DTX)

[edit]

DTX works to optimize bandwidth usage during real-time telecommunications by detecting silent intervals and suspending the transmission of those intervals. Through continuously monitoring the audio signal, DTX algorithms can detect silence based on predefined criteria. When silence is detected, a signal is sent to the receiver which stops the transmission of audio data. When speech/sound is resumed, audio transmission is reactivated. This technique allows for uninterrupted communication while being highly efficient in the use of network resources.^[1]^[3]

3. Silence Encoding

[edit]

SilenceEncoding is essential for the efficient representation of silent intervals without the removal of silence altogether. This allows for the minimization of data needed to encode and transmit silence while upholding the audio signal's integrity.^[4]^[5]^[6] There are several encoding methods used for this purpose:

a.Run-Length Encoding (RLE)

[edit]

RLE works to detect repeating identical samples in the audio and encodes those samples in a way that is more space-efficient. Rather than storing each identical sample individually, RLE stores a single sample and keeps count of how many times it repeats. RLE works well in encoding silence as silent intervals often consist of repeated sequences of identical samples. The reduction of identical samples stored subsequently reduces the size of the audio signal.^[4]^[5]

b.Huffman Coding

[edit]

Huffman coding is anentropy encoding method andvariable-length code algorithm that assigns more common values with shorterbinary codes that require fewer bits to store. Huffman coding works in the context of silence compression by assigning frequently occurring silence patterns with shorter binary codes, reducing data size.^[5]^[6]

4. Differential Encoding

[edit]

Differential encoding makes use of the similarity between consecutive audio samples during silent intervals by storing only the difference between samples. Differential encoding is used to efficiently encode the transitions between sound and silence and is useful for audio samples where silence is interspersed with active sound.^[7]^[8]^[9] Some differential encoding algorithms include:

a.Delta Modulation

[edit]

Delta modulation quantizes and encodes differences between consecutive audio samples by encoding thederivative of the audio sample's amplitude. By storing how the audio signal changes over time rather than the samples itself, the transition from silence to sound can be captured efficiently. Delta modulation typically uses a one-bitquantization mechanism, where 1 indicates an increase in the sample size and 0 indicates a decrease. While this allows for efficient use of bandwidth or storage, it is unable to providehigh-fidelity encoding of low-amplitude signals.^[8]

b.Delta-Sigma Modulation

[edit]

Delta-Sigma modulation is a more advanced variant of Delta modulation which allows for high-fidelity encodings for low-amplitude signals. This is done through quantizing at a highoversampling rate, allowing for a precise encoding of slight changes in the audio signal. Delta-sigma modulation is used in situations where maintaining a high audio fidelity is prioritized.^[9]

Applications

[edit]

The reduction of audio size from silence compression has uses in numerous applications:

Telecommunications: The reduction of silent transmissions in telecommunication systems such as VoIP allows for more efficient bandwidth use and reduced data costs.
Audio Streaming: silence compression minimizes data usage during audio streaming, allowing for high-quality audio to be broadcast efficiently over the internet.
Audio Archiving: silence compression helps to conserve space needed to store audio while maintaining audio fidelity.

References

[edit]

^^a ^b ^c ^d ^eBenyassine, A.; Shlomot, E.; Su, H.-Y.; Massaloux, D.; Lamblin, C.; Petit, J.-P. (1997)."ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications".IEEE Communications Magazine.35 (9):64–73.doi:10.1109/35.620527. Retrieved2023-11-09.
^Sahin, Arda; Unlu, Mehmet Zubeyir (2021-01-20)."Speech file compression by eliminating unvoiced/silence components".Sustainable Engineering and Innovation.3 (1):11–14.doi:10.37868/sei.v3i1.119.ISSN 2712-0562.S2CID 234125634.
^^a ^b ^c"On the ITU-T G.729.1 silence compression scheme".IEEE. Retrieved2023-11-09.
^^a ^bElsayed, Hend A. (2014). "Burrows-Wheeler Transform and combination of Move-to-Front coding and Run Length Encoding for lossless audio coding".2014 9th International Conference on Computer Engineering & Systems (ICCES). pp. 354–359.doi:10.1109/ICCES.2014.7030985.ISBN 978-1-4799-6594-6.S2CID 15743605. Retrieved2023-11-09.
^^a ^b ^cPatil, Rupali B.; Kulat, K. D. (2017). "Audio compression using dynamic Huffman and RLE coding".2017 2nd International Conference on Communication and Electronics Systems (ICCES). pp. 160–162.doi:10.1109/CESYS.2017.8321256.ISBN 978-1-5090-5013-0.S2CID 4122679. Retrieved2023-11-09.
^^a ^bFirmansah, Luthfi; Setiawan, Erwin Budi (2016). "Data audio compression lossless FLAC format to lossy audio MP3 format with Huffman Shift Coding algorithm".2016 4th International Conference on Information and Communication Technology (ICoICT). pp. 1–5.doi:10.1109/ICoICT.2016.7571951.ISBN 978-1-4673-9879-4.S2CID 18754681. Retrieved2023-11-09.
^Jensen, J.; Heusdens, R. (2003). "A comparison of differential schemes for low-rate sinusoidal audio coding".2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684). pp. 205–208.doi:10.1109/ASPAA.2003.1285867.ISBN 0-7803-7850-4.S2CID 58213603. Retrieved2023-11-09.
^^a ^bZhu, Y.S.; Leung, S.W.; Wong, C.M. (1996)."A digital audio processing system based on nonuniform sampling delta modulation".IEEE Transactions on Consumer Electronics.42:80–86.doi:10.1109/30.485464. Retrieved2023-11-09.
^^a ^b"Sigma-delta modulation for audio DSP".IEEE. Retrieved2023-11-09.

Data compression methods

Lossless

Entropy type	Adaptive coding Arithmetic Asymmetric numeral systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Unary Universal Exp-Golomb Fibonacci Gamma Levenshtein
Dictionary type	Byte pair encoding Lempel–Ziv 842 LZ4 LZJB LZO LZRW LZSS LZW LZWL Snappy
Other types	BWT CTW CM Delta Incremental DMC DPCM Grammar Re-Pair Sequitur LDCT MTF PAQ PPM RLE
Hybrid	LZ77 + Huffman Deflate LZX LZS LZ77 + ANS LZFSE LZ77 + Huffman + ANS Zstandard LZ77 + Huffman + context Brotli LZSS + Huffman LHA/LZH LZ77 + Range LZMA LZHAM RLE + BWT + MTF + Huffman bzip2

Lossy

Transform type	Discrete cosine transform DCT MDCT DST FFT Wavelet Daubechies DWT SPIHT
Predictive type	DPCM ADPCM LPC ACELP CELP LAR LSP WLPC Motion Compensation Estimation Vector Psychoacoustic

Audio

Concepts	Bit rate ABR CBR VBR Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Silence compression Sound quality Speech coding Sub-band coding
Codec parts	A-law μ-law DPCM ADPCM DM FT FFT LPC ACELP CELP LAR LSP WLPC MDCT Psychoacoustic model

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image Texture compression
Methods	Chain code DCT Deflate Fractal KLT LP RLE Wavelet Daubechies DWT EZW SPIHT

Video

Concepts	Bit rate ABR CBR VBR Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality
Codec parts	DCT DPCM Deblocking filter Lapped transform Motion Compensation Estimation Vector Wavelet Daubechies DWT

Theory

Community

Hutter Prize

People

Mark Adler

Retrieved from "https://en.wikipedia.org/w/index.php?title=Silence_compression&oldid=1237722097"

Category:

Data compression

Hidden categories:

[8]ページ先頭