Movatterモバイル変換

[0]ホーム

Jump to content

Run-length encoding

Edit links

From Wikipedia, the free encyclopedia

Form of lossless data compression

Not to be confused withRun-length limited.

Run-length encoding (RLE) is a form oflossless data compression in whichruns of data (consecutive occurrences of the same data value) are stored as a single occurrence of that data value and a count of its consecutive occurrences, rather than as the original run. For example, a sequence of "green green green green green" in an image built up from colored dots could be shortened to "green x 5".

Run-length encoding is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings, games, and animations. For files that do not have many runs, encoding them with RLE could increase the file size.

RLE may also refer to particular image formats that use the encoding. RLE is an early graphics file format supported byCompuServe for compressing black and white images, that was widely supplanted by their laterGraphics Interchange Format (GIF). It is also the name of a little-used image format inWindows 3.x that is saved with the file extensionrle, consisting of a run-length encoded bitmap; it was used as the format for the Windows 3.x startup screen.

History and applications

[edit]

Run-length encoding (RLE) schemes were employed in the transmission of analog television signals as far back as 1967.^[1] In 1983, run-length encoding waspatented byHitachi.^[2]^[3]^[4] RLE is particularly well suited topalette-based bitmap images (which use relatively few colours) such ascomputer icons, and was a popular image compression method on earlyonline services such asCompuServe before the advent of more sophisticated formats such asGIF.^[5] It does not work well on continuous-tone images (which use very many colours) such as photographs, althoughJPEG uses it on the coefficients that remain after transforming andquantizing image blocks.

Common formats for run-length encoded data includeTruevision TGA,PackBits (by Apple, used inMacPaint),PCX andILBM. TheInternational Telecommunication Union also describes a standard to encode run-length colour forfax machines, known as T.45.^[6] That fax colour coding standard, which along with other techniques is incorporated intoModified Huffman coding,^{[citation needed]} is relatively efficient because most faxed documents are primarily white space, with occasional interruptions of black.

Algorithm

[edit]

RLE has a space complexity of⁠ $O(n)$ ⁠, wheren is the size of the input data.

Encoding algorithm

[edit]

Run-length encoding compresses data by reducing the physical size of a repeating string of characters. This process involves converting the input data into a compressed format by identifying and counting consecutive occurrences of each character. The steps are as follows:

Traverse the input data.
Count the number of consecutive repeating characters (run length).
Store the character and its run length.

Python implementation

[edit]

Imports and helper functions

fromitertoolsimportrepeat,compress,groupbydefilen(iterable):"""    Return the number of items in iterable.    >>> ilen(x for x in range(1000000) if x % 3 == 0)    333334    """# using zip() to wrap the input with 1-tuples which compress() reads as true values.returnsum(compress(repeat(1),zip(iterable)))

defrle_encode(iterable,*,length_first=True):"""    >>> "".join(rle_encode("AAAABBBCCDAA"))    '4A3B2C1D2A'    >>> "".join(rle_encode("AAAABBBCCDAA", length_first=False))    'A4B3C2D1A2'    """return(f"{ilen(g)}{k}"iflength_firstelsef"{k}{ilen(g)}"# ilen(g): length of iterable gfork,gingroupby(iterable))

^[7]

Decoding algorithm

[edit]

The decoding process involves reconstructing the original data from the encoded format by repeating characters according to their counts. The steps are as follows:

Traverse the encoded data.
For each count-character pair, repeat the character count times.
Append these characters to the result string.

Python implementation

[edit]

Imports

fromitertoolsimportchain,repeat,batched

defrle_decode(iterable,*,length_first=True):"""    >>> "".join(rle_decode("4A3B2C1D2A"))    'AAAABBBCCDAA'    >>> "".join(rle_decode("A4B3C2D1A2", length_first=False))    'AAAABBBCCDAA'    """returnchain.from_iterable(repeat(b,int(a))iflength_firstelserepeat(a,int(b))fora,binbatched(iterable,2))

^[7]

Example

[edit]

Consider a screen containing plain black text on a solid white background. There will be many long runs of whitepixels in the blank space, and many short runs of black pixels within the text. A hypotheticalscan line, with B representing a black pixel and W representing white, might read as follows:

WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows:

12W1B12W3B24W1B14W

This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc., and represents the original 67 characters in only 18. While the actual format used for the storage of images is generally binary rather thanASCII characters like this, the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such asDEFLATE often useLZ77-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such asBWWBWWBWWBWW).

Run-length encoding can be expressed in multiple ways to accommodate data properties as well as additional compression algorithms. For instance, one popular method encodes run lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or using the character itself as the escape, so that any time a character appears twice it denotes a run. On the previous example, this would give the following:

WW12BWW12BB3WW24BWW14

This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs, etc. In data where runs are less frequent, this can significantly improve the compression rate.

One other matter is the application of additional compression algorithms. Even with the runs extracted, the frequencies of different characters may be large, allowing for further compression; however, if the run lengths are written in the file in the locations where the runs occurred, the presence of these numbers interrupts the normal flow and makes it harder to compress. To overcome this, some run-length encoders separate the data and escape symbols from the run lengths, so that the two can be handled independently. For the example data, this would result in two outputs, the string "WWBWWBBWWBWW" and the numbers (12,12,3,24,14).

Variants

[edit]

Sequential RLE: This method processes data one line at a time, scanning from left to right. It is commonly employed in image compression. Other variations of this technique include scanning the data vertically, diagonally, or in blocks.
Lossy RLE: In this variation, some bits are intentionally discarded during compression (often by setting one or two significant bits of each pixel to 0). This leads to higher compression rates while minimally impacting the visual quality of the image.
Adaptive RLE: Uses different encoding schemes depending on the length of runs to optimize compression ratios. For example, short runs might use a different encoding format than long runs.

References

[edit]

^Robinson, A. H.; Cherry, C. (1967). "Results of a prototype television bandwidth compression scheme".Proceedings of the IEEE.55 (3).IEEE:356–364.doi:10.1109/PROC.1967.5493.
^"Run Length Encoding Patents". Internet FAQ Consortium. 21 March 1996. Retrieved14 July 2019.
^"Method and system for data compression and restoration".Google Patents. 7 August 1984. Retrieved14 July 2019.
^"Data recording method".Google Patents. 8 August 1983. Retrieved14 July 2019.
^Dunn, Christopher (1987)."Smile! You're on RLE!"(PDF).The Transactor.7 (6).Transactor Publishing:16–18. Retrieved2015-12-06.
^Recommendation T.45 (02/00): Run-length colour encoding.International Telecommunication Union. 2000. Retrieved2015-12-06.
^^a ^b"more-itertools 10.4.0 documentation". August 2024.

External links

[edit]

Run-length encoding implemented in different programming languages (onRosetta Code)
Single Header Run-Length Encoding Library smallest possible implementation (about 20 SLoC) in ANSI C. FOSS, compatible withTruevision TGA, supports 8, 16, 24 and 32 bit elements too.

Data compression methods

Lossless
type

Entropy	Adaptive coding Arithmetic Asymmetric numeral systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Unary Universal Exp-Golomb Fibonacci Gamma Levenshtein
Dictionary	Byte-pair encoding Lempel–Ziv 842 LZ4 LZJB LZO LZRW LZSS LZW LZWL Snappy
Other	BWT CTW CM Delta Incremental DMC DPCM Grammar Re-Pair Sequitur LDCT MTF PAQ PPM RLE
Hybrid	LZ77 + Huffman Deflate LZX LZS LZ77 + ANS LZFSE LZ77 + Huffman + ANS Zstandard LZ77 + Huffman + context Brotli LZSS + Huffman LHA/LZH LZ77 + Range LZMA LZHAM RLE + BWT + MTF + Huffman bzip2

Lossy
type

Transform	Discrete cosine transform DCT MDCT DST FFT Wavelet Daubechies DWT SPIHT
Predictive	DPCM ADPCM LPC ACELP CELP LAR LSP WLPC Motion Compensation Estimation Vector Psychoacoustic

Audio

Concepts	Bit rate ABR CBR VBR Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Silence compression Sound quality Speech coding Sub-band coding
Codec parts	A-law μ-law DPCM ADPCM DM FT FFT LPC ACELP CELP LAR LSP WLPC MDCT Psychoacoustic model

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image Texture compression
Methods	Chain code DCT Deflate Fractal KLT LP RLE Wavelet Daubechies DWT EZW SPIHT

Video

Concepts	Bit rate ABR CBR VBR Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality
Codec parts	DCT DPCM Deblocking filter Lapped transform Motion Compensation Estimation Vector Wavelet Daubechies DWT

Theory

Community

Hutter Prize

People

Multimedia compression andcontainer formats

Video
compression

ISO,IEC, MPEG	DV MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2 / ASP Part 10 / AVC Part 33 / IVC MPEG-H Part 2 / HEVC MPEG-I Part 3 / VVC MPEG-5 Part 1 / EVC Part 2 / LCEVC
ITU-T,VCEG	H.120 H.261 H.262 H.263 H.264 / AVC H.265 / HEVC H.266 / VVC H.267 / Enhanced Compression Model
SMPTE	VC-1 VC-2 VC-3 VC-5 VC-6
TrueMotion and AOMedia	TrueMotion S VP3 VP6 VP7 VP8 VP9 AV1 AV2
Chinese Standard	AVS1 P2/AVS+(GB/T 20090.2/16) AVS2 P2(GB/T 33475.2,GY/T 299.1) HDR Vivid(GY/T 358) AVS3 P2(GY/T 368)
Others	Apple Video AVS Bink Cinepak Daala DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video/Spark Theora Thor Ut WMV XEB YULS

Audio
compression

ISO,IEC, MPEG	MPEG-1 Layer II Multichannel MPEG-1 Layer I MPEG-1 Layer III (MP3) AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio
ITU-T	G.711 A-law µ-law G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1
IETF	Opus iLBC Speex Vorbis FLAC
3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B EVS GSM-HR GSM-FR GSM-EFR
ETSI	AC-3 AC-4 DTS
Bluetooth SIG	SBC LC3
Chinese Standard	AVS1 P10(GB/T 20090.10) AVS2 P3(GB/T 33475.3) Audio Vivid(GY/T 363) DRA(GB/T 22726) ExAC(SJ/T 11299.4)
Others	ACELP ALAC Asao ATRAC CELT Codec 2 iSAC Lyra MELP Monkey's Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio SD2 SHN SILK Siren SMV SVOPC TTA True Audio TwinVQ VMR-WB VSELP WavPack WMA MQA aptX aptX HD aptX Low Latency aptX Adaptive LDAC LHDC LLAC TrueHD

Image
compression

IEC,ISO,IETF, W3C,ITU-T,JPEG	CCITT Group 4 GIF HEIC / HEIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG-LS JPEG XL JPEG XR JPEG XS JPEG XT PNG APNG TIFF TIFF/EP TIFF/IT
Others	AV1 AVIF BPG DjVu EXR FLIF ICER MNG PGF QOI QTVR WBMP WebP

Containers

ISO,IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO/IEC base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport
ITU-T	H.222.0 T.802
IETF	RTP Ogg Matroska
SMPTE	GXF MXF
Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video HEIF IFF M2TS Matroska WebM QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP