Movatterモバイル変換

Tree-adjoining grammar

From Wikipedia, the free encyclopedia

Grammar formalism

Tree-adjoining grammar (TAG) is agrammar formalism defined byAravind Joshi. Tree-adjoining grammars are somewhat similar tocontext-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (seetree (graph theory) andtree (data structure)).

History

[edit]

TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG),^[1] the "string grammar" ofZellig Harris.^[2] AGs handleexocentric properties of language in a natural and effective way, but do not have a good characterization ofendocentric constructions; the converse is true ofrewrite grammars, orphrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from theChomsky-Schützenberger hierarchy but intersects it in interesting and linguistically relevant ways.^[3] The center strings and adjunct strings can also be generated by adependency grammar, avoiding the limitations of rewrite systems entirely.^[4]^[5]

Description

[edit]

Schematic illustration of the adjunction operation: the tree $\alpha$ combines with an auxiliary tree $\beta$ at a node labelled with non-terminal symbol $X {\displaystyle X}$ , which must also be the root and foot node of the auxiliary tree. The resulting tree is deeper than the original one.

{\displaystyle \alpha } — Schematic illustration of the adjunction operation: the tree $\alpha$ combines with an auxiliary tree $\beta$ at a node labelled with non-terminal symbol $X {\displaystyle X}$ , which must also be the root and foot node of the auxiliary tree. The resulting tree is deeper than the original one.

Schematic illustration of the substitution operation: two trees ( $\alpha$ and $\beta$ ; note that these need not beelementary trees) are joined on a node labelled with non-terminal $X {\displaystyle X}$ ; this node is one of the leaf nodes marked for substitution in $\alpha$ , and the root of $\beta$ is a node with the same non-terminal.

A TAG can be defined as a 5-tuple $\langle \Sigma ,NT,I,A,S\rangle$ with:^[6]

$\Sigma$ as thefinite set ofterminal symbols;
$N T {\displaystyle NT}$ as the finite set of non-terminal symbols, disjunct from $\Sigma$ ;
$I {\displaystyle I}$ as a finite set of finite trees calledinitial trees;
- Initial trees have non-terminals as inner nodes. The frontier can consist of terminals and non-terminals. Non-terminals at the frontier are marked for substitution (typically by adding the symbol ' $\downarrow$ ' after the non-terminal symbol). Nodes marked for substitution cannot be adjoined to.
$A {\displaystyle A}$ as a finite set of finite trees calledauxiliary trees;
- Auxiliary trees have a special leaf node known as thefoot node (typically marked by ' $\ast$ ') which needs to have the same non-terminal symbol as the root of the tree. Foot nodes cannot be substituted; all other non-terminal nodes at the frontier are marked for substitution. As in initial trees, inner nodes have non-terminal symbols.
$S {\displaystyle S}$ as the special start symbol, belonging to the set of non-terminals.

Additionally, TAGs with adjunction constraints on nodes have been introduced. An adjunction constraint on a node can: completely disallow adjunction (NA, fornull adjunction); make itobligatory (OA); or only allowselected auxiliary trees to adjoin (SA).^[6]

The two types of basic tree in TAG—initial trees (often denoted by ' $\alpha$ ') and auxiliary trees (' $\beta$ ')—are together calledelementary trees. Initial trees represent basic valency relations, while auxiliary trees allow for recursion.^[7]

A derivation starts with an initial tree, which is combined with further trees via eithersubstitution oradjunction. Substitution replaces a frontier node with an initial tree whose root node has the same label as the leaf for which it is substituted. Adjunction inserts an auxiliary tree—at either a frontieror an internal node—whose root and foot labels both match the label of the node whereat it adjoins. Adjunction can thus have the effect of inserting an auxiliary tree into the center of another tree, which operation may be applied recursively.^[4]

Complexity and application

[edit]

For everycontext-free grammar, a tree-adjoining grammar can be generated which accepts the same string-language. Thus, TAGs can generate allcontext-free languages;^[8] they can generate, as well, some—but not all—context-sensitive languages.

Two examples of context-sensitive/non-context-free languages that TAGs (with adjunction constraints) can generate are:^[8]

Thecopy language (i.e. thelanguage of squares), in which an arbitrary string is repeated: $\left\{ww\mid w\in \Sigma ^{*}\right\}$
The three elementary trees necessary to generate the copy language with the alphabet containing only lettersa andb.
Thecount-4 language: $\{a^{n}b^{n}c^{n}d^{n}|1\leq n\}$
Trees necessary to generate the count-4 language including theempty word $\epsilon$ , formally defined as ${\text{Count}}_{4}=\left\{a^{n}b^{n}c^{n}d^{n}\mid n\geq 0\right\}$ .

Tree-adjoining grammars are more powerful (in terms ofweak generative capacity) than context-free grammars, but less powerful thanlinear context-free rewriting systems,^[9]indexed,^{[note 1]} orcontext-sensitive grammars.

Two examples of context-sensitive languages that TAGs cannot generate are:^[8]

Thelanguage of triplicated strings (i.e. thelanguage of cubes): $\left\{www\mid w\in \Sigma ^{*}\right\}$
Languages with more than four distinct character-strings of equal length; e.g., thecount-5 language: $\{a^{n}b^{n}c^{n}d^{n}f^{n}|1\leq n\}$

The processing of languages that TAGs can generate may be represented by anembedded pushdown automaton.

Tree-adjoining grammars are often described asmildly context-sensitive.These grammar classes are conjectured to be powerful enough to modelnatural languages while remaining efficientlyparsable in the general case.^[8]

Equivalences

[edit]

Vijay-Shanker and Weir (1994)^[10] demonstrated thatlinear indexed grammars,combinatory categorial grammar, tree-adjoining grammars, andhead grammars areweakly equivalent formalisms, in that they all define the same string languages.

Variants

[edit]

Lexicalized tree-adjoining grammars (LTAG) are a variant of TAG in which each elementary tree (initial or auxiliary) is associated with a lexical item. Each tree has at least one terminal as a leaf node, which is then called the (lexical) anchor of the tree. A lexicalized grammar for English has been developed by the XTAG Research Group of the Institute for Research in Cognitive Science at the University of Pennsylvania.^[5]

Other variants of TAG allowmulti-component trees, trees with multiple foot nodes, and other extensions.

Notes

[edit]

^This, since—for each tree-adjoining grammar—alinear indexed grammar can be found that produces the same language (seebelow); and, for the latter, a weakly equivalent (proper) indexed grammar can be found in turn (see:Indexed grammar#Computational Power).

References

[edit]

^Joshi, Aravind; S. R. Kosaraju; H. Yamada (1969). "String Adjunct Grammars" (Document). Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada.Joshi, Aravind K.; Kosaraju, S. Rao; Yamada, H. M. (1972), "String Adjunct Grammars: I. Local and Distributed Adjunction",Information and Control,21 (2):93–116,doi:10.1016/S0019-9958(72)90051-4Joshi, Aravind K.; Kosaraju, S. Rao; Yamada, H. M. (1972), "String Adjunct Grammars: II. Equational Representation, Null Symbols, and Linguistic Relevance",Information and Control,21 (3):235–260,doi:10.1016/S0019-9958(72)80005-6
^Harris, Zellig S. (1962).String analysis of sentence structure. Papers on Formal Linguistics. Vol. 1. The Hague: Mouton & Co.
^Joshi, Aravind (1969). "Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance" (Document). Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden.
^^a ^bJoshi, Aravind; Owen Rambow (2003)."A Formalism for Dependency Grammar Based on Tree Adjoining Grammar"(PDF).Proceedings of the Conference on Meaning-Text Theory.
^^a ^b"A Lexicalized Tree Adjoining Grammar for English".
^^a ^bJoshi, Aravind K. Joshi; Shabes, Yves (March 1991).Tree-adjoning grammars and lexicalized grammars.MS-CIS-91-22 (Technical report). Department of Computer and Information Science, University of Pennsylvania.
^Jurafsky, Daniel; James H. Martin (2000).Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. p. 354.
^^a ^b ^c ^dJoshi, Aravind (1985). "How much context-sensitivity is necessary for characterizing structural descriptions". In D. Dowty; L. Karttunen; A. Zwicky (eds.).Natural Language Processing: Theoretical, Computational, and Psychological Perspectives. New York, NY: Cambridge University Press. pp. 206–250.ISBN 9780521262033.
^Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer. Here: p.215-216
^Vijay-Shanker, K. and Weir, David J. 1994.The Equivalence of Four Extensions of Context-Free Grammars. Mathematical Systems Theory 27(6): 511–546.

External links

[edit]

The XTAG project, which uses a TAG for natural language processing.
A tutorial on TAG
SemConst Documentation A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
The TuLiPa project The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly formulti-component tree adjoining grammars withtree tuples
The Metagrammar Toolkit which provides several tools to edit and compileMetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
LLP2 Alexicalized tree adjoining grammar parser which provides an easy to use graphical environment (page in French)

Automata theory:formal languages andformal grammars

Chomsky hierarchy	Grammars	Languages	Abstract machines
Type-0 — Type-1 — — — — — Type-2 — — Type-3 — —	Unrestricted (no common name) Context-sensitive Positiverange concatenation Indexed — Linear context-free rewriting systems Tree-adjoining Context-free Deterministic context-free Visibly pushdown Regular — Non-recursive	Recursively enumerable Decidable Context-sensitive Positiverange concatenation^* Indexed^* — Linear context-free rewriting language Tree-adjoining Context-free Deterministic context-free Visibly pushdown Regular Star-free Finite	Turing machine Decider Linear-bounded PTIME Turing Machine Nested stack Thread automaton restrictedTree stack automaton Embedded pushdown Nondeterministic pushdown Deterministic pushdown Visibly pushdown Finite Counter-free (with aperiodic finite monoid) Acyclic finite

Each category of languages, except those marked by a^*, is aproper subset of the category directly above it.Any language in each category is generated by a grammar and by an automaton in the category in the same line.