Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Range concatenation grammar

From Wikipedia, the free encyclopedia
(Redirected fromRange concatenation grammars)

Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier[1] in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and Germanword order scrambling, which are outside the bounds of themildly context-sensitive languages.[2]

From a theoretical point of view, any language that can be parsed inpolynomial time belongs to the subset of RCG called positive range concatenation grammars, and reciprocally.[4]

Though intended as a variant on Groenink'sliteral movement grammars (LMGs), RCGs treat the grammatical process more as a proof than as a production. Whereas LMGs produce a terminal string from a start predicate, RCGs aim to reduce a start predicate (which predicates of a terminal string) to the empty string, which constitutes a proof of the terminal strings membership in the language.

Description

[edit]

Formal definition

[edit]

APositive Range Concatenation Grammar (PRCG) is a tupleG=(N, T, V, S, P){\displaystyle G=(N,~T,~V,~S,~P)}, where:

ANegative Range Concatenation Grammar (NRCG) is defined like a PRCG, but with the addition that some predicates occurring in the right-hand side of a clause can have the formAi(α1,,αdim(Ai))¯{\displaystyle {\overline {A_{i}(\alpha _{1},\ldots ,\alpha _{\dim(A_{i})})}}}. Such predicates are callednegative predicates.

ARange Concatenation Grammar is a positive or a negative one. Although PRCGs are technically NRCGs, the terms are used to highlight the absence (PRCG) or presence (NRCG) of negative predicates.

Arange in a wordwT{\displaystyle w\in T^{\star }} is a couplel,rw{\displaystyle \langle l,r\rangle _{w}}, with0lrn{\displaystyle 0\leq l\leq r\leq n}, wheren{\displaystyle n} is the length ofw{\displaystyle w}. Variables bind to ranges, not to arbitrary strings of nonterminals. Two rangesl1,r1w{\displaystyle \langle l_{1},r_{1}\rangle _{w}} andl2,r2w{\displaystyle \langle l_{2},r_{2}\rangle _{w}} can beconcatenated iffr1=l2{\displaystyle r_{1}=l_{2}}, and we then have:l1,r1wl2,r2w=l1,r2w{\displaystyle \langle l_{1},r_{1}\rangle _{w}\cdot \langle l_{2},r_{2}\rangle _{w}=\langle l_{1},r_{2}\rangle _{w}}. When instantiating a clause, where an argument consists of multiple elements fromTV{\displaystyle T\cup V}, their ranges must concatenate.

For a wordw=w1w2wn{\displaystyle w=w_{1}w_{2}\ldots w_{n}}, withwiT{\displaystyle w_{i}\in T}, thedotted notation for ranges is:l,rw=w1wl1wlwr1wrwn{\displaystyle \langle l,r\rangle _{w}=w_{1}\ldots w_{l-1}\bullet w_{l}\ldots w_{r-1}\bullet w_{r}\ldots w_{n}}.

Recognition of strings

[edit]

The strings of predicates being rewritten represent constraints that the string being tested has to satisfy (if positive), or in the case of negative predicates not satisfy. The order of predicates is irrelevant. Rewrite steps amount to replacing one constraint by zero or more simpler constraints.

Like LMGs, RCG clauses have the general schemaA(x1,...,xn)α{\displaystyle A(x_{1},...,x_{n})\to \alpha }, where in an RCG,α{\displaystyle \alpha } is either the empty string or a string of predicates. The argumentsxi{\displaystyle x_{i}} consist of strings of terminal symbols and/or variable symbols, which pattern match against actual argument values like in LMG. Adjacent variables constitute a family of matches against partitions, so that the argumentxy{\displaystyle xy}, with two variables, matches the literal stringab{\displaystyle ab} in three different ways:x=ϵ, y=ab; x=a, y=b; x=ab, y=ϵ{\displaystyle x=\epsilon ,\ y=ab;\ x=a,\ y=b;\ x=ab,\ y=\epsilon }. These would give rise to three different instantiations of the clause containing that argumentxy{\displaystyle xy}.

Predicate terms come in two forms, positive (which produce the empty string on success), and negative (which produce the empty string on failure/if the positive term doesnot produce the empty string). Negative terms are denoted the same as positive terms, with an overbar, as inA(x1,...,xn)¯{\displaystyle {\overline {A(x_{1},...,x_{n})}}}.

The rewrite semantics for RCGs is rather simple, identical to the corresponding semantics of LMGs. Given a predicate stringA(α1,...,αn){\displaystyle A(\alpha _{1},...,\alpha _{n})}, where the symbolsαi{\displaystyle \alpha _{i}} are terminal strings, if there is a ruleA(x1,...,xn)β{\displaystyle A(x_{1},...,x_{n})\to \beta } in the grammar that the predicate string matches, the predicate string is replaced byβ{\displaystyle \beta }, substituting for the matched variables in eachxi{\displaystyle x_{i}}.

For example, given the ruleA(x,ayb)B(axb,y){\displaystyle A(x,ayb)\to B(axb,y)}, wherex{\displaystyle x} andy{\displaystyle y} are variable symbols anda{\displaystyle a} andb{\displaystyle b} are terminal symbols, the predicate stringA(a,abb){\displaystyle A(a,abb)} can be rewritten asB(aab,b){\displaystyle B(aab,b)}, becauseA(a,abb){\displaystyle A(a,abb)} matchesA(x,ayb){\displaystyle A(x,ayb)} whenx=a, y=b{\displaystyle x=a,\ y=b}. Similarly, if there were a ruleA(x,ayb)A(x,x) A(y,y){\displaystyle A(x,ayb)\to A(x,x)\ A(y,y)},A(a,abb){\displaystyle A(a,abb)} could be rewritten asA(a,a) A(b,b){\displaystyle A(a,a)\ A(b,b)}.

A proof/recognition of a stringα{\displaystyle \alpha } is done by showing thatS(α){\displaystyle S(\alpha )} produces the empty string. For the individual rewrite steps, when multiple alternative variable matches are possible, any rewrite which could lead the whole proof to succeed is considered. Thus, if there is at least one way to produce the empty string from the initial stringS(α){\displaystyle S(\alpha )}, the proof is considered a success, regardless of how many other ways to fail exist.

Example

[edit]

RCGs are capable of recognizing the non-linear index language{www:w{a,b}}{\displaystyle \{www:w\in \{a,b\}^{*}\}} as follows:

Letting x, y, and z be variable symbols:S(xyz)A(x,y,z)A(ax,ay,az)A(x,y,z)A(bx,by,bz)A(x,y,z)A(ϵ,ϵ,ϵ)ϵ{\displaystyle {\begin{aligned}S(xyz)&\to A(x,y,z)\\A(ax,ay,az)&\to A(x,y,z)\\A(bx,by,bz)&\to A(x,y,z)\\A(\epsilon ,\epsilon ,\epsilon )&\to \epsilon \end{aligned}}}The proof forabbabbabb is then

S(abbabbabb)A(abb,abb,abb)A(bb,bb,bb)A(b,b,b)A(ϵ,ϵ,ϵ)ϵ{\displaystyle S(abbabbabb)\Rightarrow A(abb,abb,abb)\Rightarrow A(bb,bb,bb)\Rightarrow A(b,b,b)\Rightarrow A(\epsilon ,\epsilon ,\epsilon )\Rightarrow \epsilon }

Or, using the more correct dotted notation for ranges:

S(abbabbabb)A(abbabbabb,abbabbabb,abbabbabb)A(abbabbabb,abbabbabb,abbabbabb){\displaystyle S(\bullet {}abbabbabb\bullet {})\Rightarrow A(\bullet {}abb\bullet {}abbabb,abb\bullet {}abb\bullet {}abb,abbabb\bullet {}abb\bullet {})\Rightarrow A(a\bullet {}bb\bullet {}abbabb,abba\bullet {}bb\bullet {}abb,abbabba\bullet {}bb\bullet {})}A(abbabbabb,abbabbabb,abbabbabb)A(ϵ,ϵ,ϵ)ϵ{\displaystyle \Rightarrow A(ab\bullet {}b\bullet {}abbabb,abbab\bullet {}b\bullet {}abb,abbabbab\bullet {}b\bullet {})\Rightarrow A(\epsilon ,\epsilon ,\epsilon )\Rightarrow \epsilon }

For a string of3n{\displaystyle 3n} letters, there are(3n+22)=(3n+2)(3n+1)2{\displaystyle {\binom {3n+2}{2}}={\frac {(3n+2)(3n+1)}{2}}} different instantiations of that first clause, but only the one which makesx,y,z{\displaystyle x,y,z} alln{\displaystyle n} letters each allows the derivation to reachϵ{\displaystyle \epsilon }.

Properties

[edit]

Everycontext-free grammar (CFG) can be converted into a range concatenation grammar:

The intersection and union of two range concatenation languages are trivially range concatenation languages:

Possibly negative range concatenation languages are also closed under set complement.

A consequence of the above is that it isundecidable whether a (positive) range concatenation language is nonempty, because it is undecidable whether the intersection of two context-free languages is nonempty. Hence range concatenation grammars are not generative.

References

[edit]
  1. ^Boullier, Pierre (Jan 1998).Proposal for a Natural Language Processing Syntactic Backbone(PDF) (Technical report). Vol. 3342. INRIA Rocquencourt (France).
  2. ^Pierre Boullier (1999)."Chinese Numbers, MIX, Scrambling, and Range Concatenation Grammars"(PDF).Proc. EACL. pp. 53–60. Archived fromthe original(PDF) on 2003-05-15.
  3. ^Eberhard Bertsch and Mark-Jan Nederhof (Oct 2001)."On the complexity of some extensions of RCG parsing"(PDF).Proceedings of the Seventh International Workshop on Parsing Technologies (Beijing). pp. 66–77.
  4. ^Laura Kallmeyer (2010).Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 37.ISBN 978-3-642-14846-0. citing Bertsch, Nederhof (2001)[3]
Each category of languages, except those marked by a*, is aproper subset of the category directly above it.Any language in each category is generated by a grammar and by an automaton in the category in the same line.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Range_concatenation_grammar&oldid=1199074837"
Categories:

[8]ページ先頭

©2009-2025 Movatter.jp