Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

String operations

From Wikipedia, the free encyclopedia

Incomputer science, in the area offormal language theory, frequent use is made of a variety ofstring functions; however, the notation used is different from that used forcomputer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

Strings and languages

[edit]

A string is a finite sequence of characters.Theempty string is denoted byε{\displaystyle \varepsilon }.The concatenation of two strings{\displaystyle s} andt{\displaystyle t} is denoted byst{\displaystyle s\cdot t}, or shorter byst{\displaystyle st}.Concatenating with the empty string makes no difference:sε=s=εs{\displaystyle s\cdot \varepsilon =s=\varepsilon \cdot s}.Concatenation of strings isassociative:s(tu)=(st)u{\displaystyle s\cdot (t\cdot u)=(s\cdot t)\cdot u}.

For example,(bl)(εah)=blah=blah{\displaystyle (\langle b\rangle \cdot \langle l\rangle )\cdot (\varepsilon \cdot \langle ah\rangle )=\langle bl\rangle \cdot \langle ah\rangle =\langle blah\rangle }.

Alanguage is a finite or infinite set of strings.Besides the usual set operations like union, intersection etc., concatenation can be applied to languages:if bothS{\displaystyle S} andT{\displaystyle T} are languages, their concatenationST{\displaystyle S\cdot T} is defined as the set of concatenations of any string fromS{\displaystyle S} and any string fromT{\displaystyle T}, formallyST={stsStT}{\displaystyle S\cdot T=\{s\cdot t\mid s\in S\land t\in T\}}.Again, the concatenation dot{\displaystyle \cdot } is often omitted for brevity.

The language{ε}{\displaystyle \{\varepsilon \}} consisting of just the empty string is to be distinguished from the empty language{}{\displaystyle \{\}}.Concatenating any language with the former doesn't make any change:S{ε}=S={ε}S{\displaystyle S\cdot \{\varepsilon \}=S=\{\varepsilon \}\cdot S},while concatenating with the latter always yields the empty language:S{}={}={}S{\displaystyle S\cdot \{\}=\{\}=\{\}\cdot S}.Concatenation of languages is associative:S(TU)=(ST)U{\displaystyle S\cdot (T\cdot U)=(S\cdot T)\cdot U}.

For example, abbreviatingD={0,1,2,3,4,5,6,7,8,9}{\displaystyle D=\{\langle 0\rangle ,\langle 1\rangle ,\langle 2\rangle ,\langle 3\rangle ,\langle 4\rangle ,\langle 5\rangle ,\langle 6\rangle ,\langle 7\rangle ,\langle 8\rangle ,\langle 9\rangle \}}, the set of all three-digit decimal numbers is obtained asDDD{\displaystyle D\cdot D\cdot D}. The set of all decimal numbers of arbitrary length is an example for an infinite language.

Alphabet of a string

[edit]

Thealphabet of a string is the set of all of the characters that occur in a particular string. Ifs is a string, itsalphabet is denoted by

Alph(s){\displaystyle \operatorname {Alph} (s)}

Thealphabet of a languageS{\displaystyle S} is the set of all characters that occur in any string ofS{\displaystyle S}, formally:Alph(S)=sSAlph(s){\displaystyle \operatorname {Alph} (S)=\bigcup _{s\in S}\operatorname {Alph} (s)}.

For example, the set{a,c,o}{\displaystyle \{\langle a\rangle ,\langle c\rangle ,\langle o\rangle \}} is the alphabet of the stringcacao{\displaystyle \langle cacao\rangle },and theaboveD{\displaystyle D} is the alphabet of theabove languageDDD{\displaystyle D\cdot D\cdot D} as well as of the language of all decimal numbers.

String substitution

[edit]

LetL be alanguage, and let Σ be its alphabet. Astring substitution or simply asubstitution is a mappingf that maps characters in Σ to languages (possibly in a different alphabet). Thus, for example, given a charactera ∈ Σ, one hasf(a)=La whereLa ⊆ Δ* is some language whose alphabet is Δ. This mapping may be extended to strings as

f(ε)=ε

for theempty string ε, and

f(sa)=f(s)f(a)

for stringsL and charactera ∈ Σ. String substitutions may be extended to entire languages as[1]

f(L)=sLf(s){\displaystyle f(L)=\bigcup _{s\in L}f(s)}

Regular languages are closed under string substitution. That is, if each character in the alphabet of a regular language is substituted by another regular language, the result is still a regular language.[2]Similarly,context-free languages are closed under string substitution.[3][note 1]

A simple example is the conversionfuc(.) to uppercase, which may be defined e.g. as follows:

charactermapped to languageremark
xfuc(x)
a{ ‹A› }map lowercase char to corresponding uppercase char
A{ ‹A› }map uppercase char to itself
ß{ ‹SS› }no uppercase char available, map to two-char string
‹0›{ ε }map digit to empty string
‹!›{ }forbid punctuation, map to empty language
...similar for other chars

For the extension offuc to strings, we have e.g.

  • fuc(‹Straße›) = {‹S›} ⋅ {‹T›} ⋅ {‹R›} ⋅ {‹A›} ⋅ {‹SS›} ⋅ {‹E›} = {‹STRASSE›},
  • fuc(‹u2›) = {‹U›} ⋅ {ε} = {‹U›}, and
  • fuc(‹Go!›) = {‹G›} ⋅ {‹O›} ⋅ {} = {}.

For the extension offuc to languages, we have e.g.

  • fuc({ ‹Straße›, ‹u2›, ‹Go!› }) = { ‹STRASSE› } ∪ { ‹U› } ∪ { } = { ‹STRASSE›, ‹U› }.

String homomorphism

[edit]

Astring homomorphism (often referred to simply as ahomomorphism informal language theory) is a string substitution such that each character is replaced by a single string. That is,f(a)=s{\displaystyle f(a)=s}, wheres{\displaystyle s} is a string, for each charactera{\displaystyle a}.[note 2][4]

String homomorphisms aremonoid morphisms on thefree monoid, preserving the empty string and thebinary operation ofstring concatenation. Given a languageL{\displaystyle L}, the setf(L){\displaystyle f(L)} is called thehomomorphic image ofL{\displaystyle L}. Theinverse homomorphic image of a strings{\displaystyle s} is defined as

f1(s)={wf(w)=s}{\displaystyle f^{-1}(s)=\{w\mid f(w)=s\}}

while the inverse homomorphic image of a languageL{\displaystyle L} is defined as

f1(L)={sf(s)L}{\displaystyle f^{-1}(L)=\{s\mid f(s)\in L\}}

In general,f(f1(L))L{\displaystyle f(f^{-1}(L))\neq L}, while one does have

f(f1(L))L{\displaystyle f(f^{-1}(L))\subseteq L}

and

Lf1(f(L)){\displaystyle L\subseteq f^{-1}(f(L))}

for any languageL{\displaystyle L}.

The class of regular languages is closed under homomorphisms and inverse homomorphisms.[5] Similarly, the context-free languages are closed under homomorphisms[note 3] and inverse homomorphisms.[6]

A string homomorphism is said to be ε-free (or e-free) iff(a)ε{\displaystyle f(a)\neq \varepsilon } for alla in the alphabetΣ{\displaystyle \Sigma }. Simple single-lettersubstitution ciphers are examples of (ε-free) string homomorphisms.

An example string homomorphismguc can also be obtained by defining similar to theabove substitution:guc(‹a›) = ‹A›, ...,guc(‹0›) = ε, but lettingguc be undefined on punctuation chars. Examples for inverse homomorphic images are

  • guc−1({ ‹SSS› }) = { ‹sss›, ‹sß›, ‹ßs› }, sinceguc(‹sss›) =guc(‹sß›) =guc(‹ßs›) = ‹SSS›, and
  • guc−1({ ‹A›, ‹bb› }) = { ‹a› }, sinceguc(‹a›) = ‹A›, while ‹bb› cannot be reached byguc.

For the latter language,guc(guc−1({ ‹A›, ‹bb› })) =guc({ ‹a› }) = { ‹A› } ≠ { ‹A›, ‹bb› }.The homomorphismguc is not ε-free, since it maps e.g. ‹0› to ε.

A very simple string homomorphism example that maps each character to just a character is the conversion of anEBCDIC-encoded string toASCII.

String projection

[edit]

Ifs is a string, andΣ{\displaystyle \Sigma } is an alphabet, thestring projection ofs is the string that results by removing all characters that are not inΣ{\displaystyle \Sigma }. It is written asπΣ(s){\displaystyle \pi _{\Sigma }(s)\,}. It is formally defined by removal of characters from the right hand side:

πΣ(s)={εif s=ε the empty stringπΣ(t)if s=ta and aΣπΣ(t)aif s=ta and aΣ{\displaystyle \pi _{\Sigma }(s)={\begin{cases}\varepsilon &{\mbox{if }}s=\varepsilon {\mbox{ the empty string}}\\\pi _{\Sigma }(t)&{\mbox{if }}s=ta{\mbox{ and }}a\notin \Sigma \\\pi _{\Sigma }(t)a&{\mbox{if }}s=ta{\mbox{ and }}a\in \Sigma \end{cases}}}

Hereε{\displaystyle \varepsilon } denotes theempty string. The projection of a string is essentially the same as aprojection in relational algebra.

String projection may be promoted to theprojection of a language. Given aformal languageL, its projection is given by

πΣ(L)={πΣ(s) | sL}{\displaystyle \pi _{\Sigma }(L)=\{\pi _{\Sigma }(s)\ \vert \ s\in L\}}[citation needed]

Right and left quotient

[edit]

Theright quotient of a charactera from a strings is the truncation of the charactera in the strings, from the right hand side. It is denoted ass/a{\displaystyle s/a}. If the string does not havea on the right hand side, the result is the empty string. Thus:

(sa)/b={sif a=bεif ab{\displaystyle (sa)/b={\begin{cases}s&{\mbox{if }}a=b\\\varepsilon &{\mbox{if }}a\neq b\end{cases}}}

The quotient of the empty string may be taken:

ε/a=ε{\displaystyle \varepsilon /a=\varepsilon }

Similarly, given a subsetSM{\displaystyle S\subset M} of a monoidM{\displaystyle M}, one may define the quotient subset as

S/a={sM | saS}{\displaystyle S/a=\{s\in M\ \vert \ sa\in S\}}

Left quotients may be defined similarly, with operations taking place on the left of a string.[citation needed]

Hopcroft and Ullman (1979) define the quotientL1/L2 of the languagesL1 andL2 over the same alphabet asL1/L2 = {s | ∃tL2.stL1 }.[7]This is not a generalization of the above definition, since, for a strings and distinct charactersa,b, Hopcroft's and Ullman's definition implies yielding {}, rather than { ε }.

The left quotient (when defined similar to Hopcroft and Ullman 1979) of a singleton languageL1 and an arbitrary languageL2 is known asBrzozowski derivative; ifL2 is represented by aregular expression, so can be the left quotient.[8]

Syntactic relation

[edit]

The right quotient of a subsetSM{\displaystyle S\subset M} of a monoidM{\displaystyle M} defines anequivalence relation, called therightsyntactic relation ofS. It is given by

S={(s,t)M×M | S/s=S/t}{\displaystyle \sim _{S}\;\,=\,\{(s,t)\in M\times M\ \vert \ S/s=S/t\}}

The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if

{S/m | mM}{\displaystyle \{S/m\ \vert \ m\in M\}}

is finite. In the case thatM is the monoid of words over some alphabet,S is then aregular language, that is, a language that can be recognized by afinite-state automaton. This is discussed in greater detail in the article onsyntactic monoids.[citation needed]

Right cancellation

[edit]

Theright cancellation of a charactera from a strings is the removal of the first occurrence of the charactera in the strings, starting from the right hand side. It is denoted ass÷a{\displaystyle s\div a} and is recursively defined as

(sa)÷b={sif a=b(s÷b)aif ab{\displaystyle (sa)\div b={\begin{cases}s&{\mbox{if }}a=b\\(s\div b)a&{\mbox{if }}a\neq b\end{cases}}}

The empty string is always cancellable:

ε÷a=ε{\displaystyle \varepsilon \div a=\varepsilon }

Clearly, right cancellation and projectioncommute:

πΣ(s)÷a=πΣ(s÷a){\displaystyle \pi _{\Sigma }(s)\div a=\pi _{\Sigma }(s\div a)}[citation needed]

Prefixes

[edit]

Theprefixes of a string is the set of allprefixes to a string, with respect to a given language:

PrefL(s)={t | s=tu for t,uAlph(L)}{\displaystyle \operatorname {Pref} _{L}(s)=\{t\ \vert \ s=tu{\mbox{ for }}t,u\in \operatorname {Alph} (L)^{*}\}}

wheresL{\displaystyle s\in L}.

Theprefix closure of a language is

Pref(L)=sLPrefL(s)={t | s=tu;sL;t,uAlph(L)}{\displaystyle \operatorname {Pref} (L)=\bigcup _{s\in L}\operatorname {Pref} _{L}(s)=\left\{t\ \vert \ s=tu;s\in L;t,u\in \operatorname {Alph} (L)^{*}\right\}}

Example:
L={abc} then Pref(L)={ε,a,ab,abc}{\displaystyle L=\left\{abc\right\}{\mbox{ then }}\operatorname {Pref} (L)=\left\{\varepsilon ,a,ab,abc\right\}}

A language is calledprefix closed ifPref(L)=L{\displaystyle \operatorname {Pref} (L)=L}.

The prefix closure operator isidempotent:

Pref(Pref(L))=Pref(L){\displaystyle \operatorname {Pref} (\operatorname {Pref} (L))=\operatorname {Pref} (L)}

Theprefix relation is abinary relation{\displaystyle \sqsubseteq } such thatst{\displaystyle s\sqsubseteq t} if and only ifsPrefL(t){\displaystyle s\in \operatorname {Pref} _{L}(t)}. This relation is a particular example of aprefix order.[citation needed]

See also

[edit]

Notes

[edit]
  1. ^Although every regular language is also context-free, the previous theorem is not implied by the current one, since the former yields a shaper result for regular languages.
  2. ^Strictly formally, a homomorphism yields a language consisting of just one string, i.e.f(a)={s}{\displaystyle f(a)=\{s\}}.
  3. ^This follows from theabove-mentioned closure under arbitrary substitutions.

References

[edit]
  1. ^Hopcroft, Ullman (1979), Sect.3.2, p.60
  2. ^Hopcroft, Ullman (1979), Sect.3.2, Theorem 3.4, p.60
  3. ^Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.2, p.131
  4. ^Hopcroft, Ullman (1979), Sect.3.2, p.60-61
  5. ^Hopcroft, Ullman (1979), Sect.3.2, Theorem 3.5, p.61
  6. ^Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.3, p.132
  7. ^Hopcroft, Ullman (1979), Sect.3.2, p.62
  8. ^Janusz A. Brzozowski (1964)."Derivatives of Regular Expressions".J ACM.11 (4):481–494.doi:10.1145/321239.321249.S2CID 14126942.
String metric
String-searching algorithm
Multiple string searching
Regular expression
Sequence alignment
Data structure
Other
Retrieved from "https://en.wikipedia.org/w/index.php?title=String_operations&oldid=1264117027"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp