Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Sardinas–Patterson algorithm

From Wikipedia, the free encyclopedia

Incoding theory, theSardinas–Patterson algorithm is a classical algorithm for determining inpolynomial time whether a givenvariable-length code is uniquely decodable, named after August Albert Sardinas and George W. Patterson, who published it in 1953.[1] The algorithm carries out a systematic search for a string which admits two different decompositions into codewords. AsKnuth reports, the algorithm was rediscovered about ten years later in 1963 byFloyd, despite the fact that it was at the time already well known in coding theory.[2]

Idea of the algorithm

[edit]

Consider the code{a1,b011,c01110,d1110,e10011}{\displaystyle \{\,{\texttt {a}}\mapsto {\texttt {1}},{\texttt {b}}\mapsto {\texttt {011}},{\texttt {c}}\mapsto {\texttt {01110}},{\texttt {d}}\mapsto {\texttt {1110}},{\texttt {e}}\mapsto {\texttt {10011}}\,\}}. This code, which is based on an example by Berstel,[3] is an example of a code which is not uniquely decodable, since the string

011101110011

can be interpreted as the sequence of codewords

01110 – 1110 – 011,

but also as the sequence of codewords

011 – 1 – 011 – 10011.

Two possible decodings of this encoded string are thus given bycdb andbabe.

In general, a codeword can be found by the following idea: In the first round, we choose two codewordsx1{\displaystyle x_{1}} andy1{\displaystyle y_{1}} such thatx1{\displaystyle x_{1}} is aprefix ofy1{\displaystyle y_{1}}, that is,x1w=y1{\displaystyle x_{1}w=y_{1}} for some "dangling suffix"w{\displaystyle w}. If one tries firstx1=011{\displaystyle x_{1}={\texttt {011}}} andy1=01110{\displaystyle y_{1}={\texttt {01110}}}, the danglingsuffix isw=10{\displaystyle {\texttt {w}}={\texttt {10}}}. If we manage to find two sequencesx2,,xp{\displaystyle x_{2},\ldots ,x_{p}} andy2,,yq{\displaystyle y_{2},\ldots ,y_{q}} of codewords such thatx2xp=wy2yq{\displaystyle x_{2}\cdots x_{p}=wy_{2}\cdots y_{q}}, then we are finished: For then the stringx=x1x2xp{\displaystyle x=x_{1}x_{2}\cdots x_{p}} can alternatively be decomposed asy1y2yq{\displaystyle y_{1}y_{2}\cdots y_{q}}, and we have found the desired string having at least two different decompositions into codewords.

In the second round, we try out two different approaches: the first trial is to look for a codeword that hasw as prefix. Then we obtain a new dangling suffixw, with which we can continue our search. If we eventually encounter a dangling suffix that is itself a codeword (or theempty word), then the search will terminate, as we know there exists a string with two decompositions. The second trial is to seek for a codeword that is itself a prefix ofw. In our example, we havew=10{\displaystyle w={\texttt {10}}}, and the sequence1 is a codeword. We can thus also continue withw=0{\displaystyle w={\texttt {0}}} as the new dangling suffix.

Precise description of the algorithm

[edit]
Ambiguity from result
1110011
db
aFbsince d = aF
aaHbsince F = aH
aaesince e = Hb
1110011
Example run
C{\displaystyle C}:
a1
b011
c01110
d1110
e10011
S1{\displaystyle S_{1}}:
F=a-1d110
G=a-1e0011
H=b-1c10
S2{\displaystyle S_{2}}:
H=a-1F10
I=a-1H0
b=H-1e011
J=I-1b11
K=I-1c1110

The algorithm is described most conveniently usingquotients offormal languages. In general, for two sets of stringsD andN, the (left) quotientN1D{\displaystyle N^{-1}D} is defined as the residual words obtained fromD by removing some prefix inN. Formally,N1D={yxyD and xN}{\displaystyle N^{-1}D=\{\,y\mid xy\in D~{\textrm {and}}~x\in N\,\}}. Now letC{\displaystyle C} denote the (finite) set of codewords in the given code.

The algorithm proceeds in rounds, where we maintain in each round not only one dangling suffix as described above, but the (finite) set of all potential dangling suffixes. Starting with roundi=1{\displaystyle i=1}, the set of potential dangling suffixes will be denoted bySi{\displaystyle S_{i}}. The setsSi{\displaystyle S_{i}} are definedinductively as follows:

S1=C1C{ε}{\displaystyle S_{1}=C^{-1}C\setminus \{\varepsilon \}}. Here, the symbolε{\displaystyle \varepsilon } denotes theempty word.

Si+1=C1SiSi1C{\displaystyle S_{i+1}=C^{-1}S_{i}\cup S_{i}^{-1}C}, for alli1{\displaystyle i\geq 1}.

The algorithm computes the setsSi{\displaystyle S_{i}} in increasing order ofi{\displaystyle i}. As soon as one of theSi{\displaystyle S_{i}} contains a word fromC or the empty word, then the algorithm terminates and answers that the given code is not uniquely decodable. Otherwise, once a setSi{\displaystyle S_{i}}equals a previously encountered setSj{\displaystyle S_{j}} withj<i{\displaystyle j<i}, then the algorithm would enter in principle an endless loop. Instead of continuing endlessly, it answers that the given code is uniquely decodable.

See the left box for an example run of the algorithm on the given code; lower and upper case letters denote code and "dangling sugffix" strings, respectively. During the construction ofS2{\displaystyle S_{2}}, the code wordb is encountered (shown in red), and the algorithm stops. The right box indicates how the example string 1110011 can be shown to have multiple encodings (db,aae), using the equations that were collected during the algorithm run.

Termination and correctness of the algorithm

[edit]

Since all setsSi{\displaystyle S_{i}} are sets of suffixes of a finite set of codewords, there are only finitely many different candidates forSi{\displaystyle S_{i}}. Since visiting one of the sets for the second time will cause the algorithm to stop, the algorithm cannot continue endlessly and thus must alwaysterminate. More precisely, the total number of dangling suffixes that the algorithm considers is at most equal to the total of the lengths of the codewords in the input, so the algorithm runs inpolynomial time as a function of this input length. By using asuffix tree to speed the comparison between each dangling suffix and the codewords, the time for the algorithm can be bounded by O(nk), wheren is the total length of the codewords andk is the number of codewords.[4] The algorithm can be implemented using apattern matching machine.[5] The algorithm can also be implemented to run on anondeterministic Turing machine that uses onlylogarithmic space; the problem of testing unique decipherability isNL-complete, so this space bound is optimal.[6]

A proof that the algorithm iscorrect, i.e. that it always gives the correct answer, is found in the textbooks bySalomaa[7] and by Berstel et al.[8]

See also

[edit]

Notes

[edit]
  1. ^Sardinas & Patterson (1953).
  2. ^Knuth (2003), p. 2
  3. ^Berstel et al. (2009), Example 2.3.1 p. 63
  4. ^Rodeh (1982).
  5. ^Apostolico & Giancarlo (1984).
  6. ^Rytter (1986) proves that the complementary problem, of testing for the existence of a string with two decodings, is NL-complete, and therefore that unique decipherability is co-NL-complete. The equivalence of NL-completeness and co-NL-completeness follows from theImmerman–Szelepcsényi theorem.
  7. ^Salomaa (1981)
  8. ^Berstel et al. (2009), Chapter 2.3

References

[edit]
Further reading
Retrieved from "https://en.wikipedia.org/w/index.php?title=Sardinas–Patterson_algorithm&oldid=1333680834"
Categories:

[8]ページ先頭

©2009-2026 Movatter.jp