Movatterモバイル変換


[0]ホーム

URL:


Stephen Wolfram's A New Kind of Science | Online

Notes

Chapter 10:Processes of Perception and Analysis

Section 5:Data Compression


Pointer-based encoding

One can encode a list of datad by generating pointers to the longest and most recent copies of each subsequence of length at leastb using

PEncode[d_, b_ : 4] := Module[{i, a, u, v},i = 2; a = {First[d]}; While[i Length[d], {u, v} =Last[Sort[Table[{MatchLength[d, i, j], j}, {j, i - 1}]]];If[u b, AppendTo[a, p[i - v, u]]; i += u,AppendTo[a, di]; i++]]; a]

MatchLength[d_, i_, j_] := With[{m = Length[d] - i}, Catch[Do[If[di + k =!= dj + k, Throw[k]], {k, 0, m}]; m + 1]]

The process of encoding can be made considerably faster by keeping a dictionary of previously encountered subsequences. One can reproduce the original data using

PDecode[a_] := Module[{d = Flatten[a /. p[j_, r_] Table[p[j], {r}]]}, Flatten[MapIndexed[If[Head[#1] === p, d#2 = d#2 - First[#1] ,#1] &, d]]]

To get a representation purely in terms of 0 and 1, one can use a self-delimiting representation for each integer that appears. (Knowing the explicit representation one could then determine whether each block would be shorter if encoded literally or using a pointer.) The encoded version of a purely repetitive sequence of lengthn has a length that grows likeLog[n]. The encoded version of a purely nested sequence grows likeLog[n]2. The encoded version of a sufficiently random sequence grows liken (with the specific encoding used in the text, the length is about2n). Note that any sequence of 0's and 1's corresponds to the beginning of the encoding for some sequence or another.

It is possible to construct sequences whose encoded versions grow roughly like fractional powers ofn. An example is the sequenceTable[Append[Table[0, {r}], 1], {r, s}] whose encoded version grows liken Log[n]. Cyclic tag systems often seem to produce sequences whose encoded versions grow like fractional powers ofn. Sequences produced by concatenation sequences are not typically compressed by pointer encoding.

With completely random input, the probability that the lengthb subsequence which begins at elementn is a repeat of a previous subsequence is roughly1 - (1 - 2-b)n - 1. The overall fraction of a lengthn input that consists of repeats of length at leastb is greater than1 - 2b/n and is essentially

1 - Sum[(1 - 2-b)i Product[1 + (1 - 2-b)j - (1 - 2-b - 1)j,{j, i - b + 1, i - 1}], {i, b, n - b}]/(n - 2b + 1)


[8]ページ先頭

©2009-2025 Movatter.jp