Movatterモバイル変換

[0]ホーム

Jump to content

Coreference

Edit links

From Wikipedia, the free encyclopedia

Two or more expressions in a text with the same referent

This articlemay beconfusing or unclear to readers. Please helpclarify the article. There might be a discussion about this onthe talk page.(March 2016) (Learn how and when to remove this message)

Inlinguistics,coreference, sometimes writtenco-reference, occurs when two or more expressions refer to the same person or thing; they have the samereferent. For example, inBill said Alice would arrive soon, and she did, the wordsAlice andshe refer to the same person.^[1]

Co-reference is often non-trivial to determine. For example, inBill said he would come, the wordhe may or may not refer to Bill. Determining which expressions are coreferences is an important part of analyzing or understanding the meaning, and often requires information from the context,real-world knowledge, such as tendencies of some names to be associated with particular species ("Rover"), kinds of artifacts ("Titanic"), grammatical genders, or other properties.

Linguists commonly use indices to notate coreference, as inBill_i said he_i would come. Such expressions are said to becoindexed, indicating that they should be interpreted as coreferential.

When expressions are coreferential, the first to occur is often a full or descriptive form (for example, an entire personal name, perhaps with a title and role), while later occurrences use shorter forms (for example, just a given name, surname, or pronoun). The earlier occurrence is known as theantecedent and the other is called aproform, anaphor, or reference. However, pronouns can sometimes refer forward, as in "When she arrived home, Alice went to sleep." In such cases, the coreference is calledcataphoric rather than anaphoric.

Coreference is important forbinding phenomena in the field of syntax. The theory of binding explores the syntactic relationship that exists between coreferential expressions in sentences and texts.

Types

[edit]

When exploring coreference, numerous distinctions can be made, e.g.anaphora,cataphora, split antecedents, coreferring noun phrases, etc.^[2] Several of these more specific phenomena are illustrated here:

Anaphora: a.The music_i was so loud thatit_i couldn't be enjoyed.–The anaphorit follows the expression to which it refers (its antecedent).; b.Our neighbors_i dislike the music. Ifthey_i are angry, the cops will show up soon.– The anaphorthey follows the expression to which it refers (its antecedent).
Cataphora: a. Ifthey_i are angry about the music,the neighbors_i will call the cops.– The cataphorthey precedes the expression to which it refers (its postcedent).; b. Despiteher_i difficulty,Wilma_i came to understand the point.– The cataphorher precedes the expression to which it refers (its postcedent)
Split antecedents: a.Carol_i toldBob_i to attend the party.They_i arrived together.– The anaphorthey has a split antecedent, referring to bothCarol andBob.; b. WhenCarol_i helpsBob_i andBob_i helpsCarol_i,they_i can accomplish any task.– The anaphorthey has a split antecedent, referring to bothCarol andBob.
Coreferring noun phrases: a.The project leader_i is refusing to help.The jerk_i thinks only ofhimself_i.– Coreferring noun phrases, whereby the second noun phrase is a predication over the first.; b.Some of our colleagues₁ are going to be supportive.These kinds of people₁ will earn our gratitude.– Coreferring noun phrases, whereby the second noun phrase is a predication over the first.

Relation to bound variables

[edit]

Semanticists and logicians sometimes draw a distinction between coreference and what is known as abound variable.^[3] Bound variables occur when the antecedent to the proform is an indefinite quantified expression, e.g.^[4]^{[clarification needed]}

Every student_i has receivedhis_i grade.– The pronounhis is an example of a bound variable
No student_i was upset withhis_i grade.– The pronounhis is an example of a bound variable

Quantified expressions such asevery student andno student are not considered referential. These expressions are grammatically singular but do not pick out single referents in the discourse or real world. Thus, the antecedents tohis in these examples are not properly referential, and neither ishis. Instead, it is considered avariable that isbound by its antecedent. Its reference varies based upon which of the students in the discourse world is thought of. The existence of bound variables is perhaps more apparent with the following example:

Only Jack_i likeshis_i grade.– The pronounhis can be a bound variable.

This sentence is ambiguous. It can mean that Jack likes his grade but everyone else dislikes Jack's grade; or that no one likes theirown grade except Jack. In the first meaning,his is coreferential; in the second, it is a bound variable because its reference varies over the set of all students.

Coindex notation is commonly used for both cases. That is, when two or more expressions are coindexed, it does not signal whether one is dealing with coreference or a bound variable (or as in the last example, whether it depends on interpretation).

Coreference resolution

[edit]

Incomputational linguistics, coreference resolution is a well-studied problem indiscourse. To derive the correct interpretation of a text, or even to estimate the relative importance of various mentioned subjects, pronouns and otherreferring expressions must be connected to the right individuals. Algorithms intended to resolve coreferences commonly look first for the nearest preceding individual that is compatible with the referring expression. For example,she might attach to a preceding expression such asthe woman orAnne, but not as probably toBill. Pronouns such ashimself have much stricter constraints. As with many linguistic tasks, there is a tradeoff betweenprecision and recall.Cluster-quality metrics commonly used to evaluate coreference resolution algorithms include theRand index, theadjusted Rand index, and differentmutual information-based methods.

A particular problem for coreference resolution in English is the pronounit, which has many uses.It can refer much likehe andshe, except that it generally refers to inanimate objects (the rules are actually more complex: animals may be any ofit,he, orshe; ships are traditionallyshe; hurricanes are usuallyit despite having gendered names).It can also refer to abstractions rather than beings, e.g.He was paid minimum wage, but didn't seem to mind it. Finally,it also haspleonastic uses, which do not refer to anything specific:

It's raining.
It's really a shame.
It takes a lot of work to succeed.
Sometimesit's the loudest who have the most influence.

Pleonastic uses are not considered referential, and so are not part of coreference.^[5]

Approaches to coreference resolution can broadly be separated into mention-pair, mention-ranking or entity-based algorithms. Mention-pair algorithms involvebinary decisions if a pair of two given mentions belong to the same entity. Entity-wide constraints likegender are not considered, which leads toerror propagation. For example, the pronounshe orshe can both have a high probability of coreference withthe teacher, but cannot be coreferent with each other. Mention-ranking algorithms expand on this idea but instead stipulate that one mention can only be coreferent with one (previous) mention. As a result, each previous mention must be given a score and the highest scoring mention (or no mention) is linked. Finally, in entity-based methods mentions are linked based on information of the whole coreference chain instead of individual mentions. The representation of a variable-width chain is more complex and computationally expensive than mention-based methods, which lead to these algorithms being mostly based onneural network architectures.

Notes

[edit]

^For definitions of coreference, see for instance Crystal (1997:94) and Radford (2004:332).
^These distinctions (anaphora, cataphora, split antecedents, coreferring noun phrases, etc.) are discussed in Jurafsky and Martin (2000:669ff).
^For discussions of bound variables, see for instance Portner (2005:102ff.).
^See Jurafsky and Martin (2000:701) for an example of a bound variable like the ones given here.
^Li et al. (2009) have demonstrated high accuracy in sorting out pleonasticit, and this success promises to improve the accuracy of coreference resolution overall.

References

[edit]

Crystal, D. 1997. A dictionary of linguistics and phonetics. 4th edition. Cambridge, MA: Blackwell Publishing.
Jurafsky, D. and H. Martin 2000. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. New Delhi, India: Pearson Education.
Portner, P. 2005. What is semantics?: Fundamentals of formal semantics. Malden, MA: Blackwell Publishing.
Radford, A. 2004.English syntax: An introduction. Cambridge, UK: Cambridge University Press.
Li, Y., P. Musilek, M. Reformat, and L. Wyard-Scott 2009.Identification of pleonasticit using the web Archived 2022-10-26 at theWayback Machine.Journal of Artificial Intelligence Research 34, 339–389.

Natural language processing

General terms

Text analysis

Text segmentation	Compound-term processing Lemmatisation Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation

Automatic summarization

Machine translation

Distributional semantics models

Language resources,
datasets and corpora

Types and standards	Corpus linguistics Lexical resource Linguistic Linked Open Data Machine-readable dictionary Parallel text PropBank Semantic network Simple Knowledge Organization System Speech corpus Text corpus Thesaurus (information retrieval) Treebank Universal Dependencies
Data	BabelNet Bank of English DBpedia FrameNet Google Ngram Viewer UBY WordNet Wikidata

Automatic identification
and data capture

Topic model

Computer-assisted
reviewing

Natural language
user interface

Formal semantics (natural language)

Central concepts

Topics

Areas

Phenomena

Formalism

Formal systems	Alternative semantics Categorial grammar Combinatory categorial grammar Discourse representation theory (DRT) Dynamic semantics Generative grammar Glue semantics Inquisitive semantics Intensional logic Lambda calculus Mereology Montague grammar Segmented discourse representation theory (SDRT) Situation semantics Supervaluationism Type theory TTR
Concepts	Autonomy of syntax Context set Continuation Conversational scoreboard Downward entailing Existential closure Function application Meaning postulate Monads Plural quantification Possible world Quantifier raising Quantization Question under discussion Semantic parsing Squiggle operator Strawson entailment Strict conditional Type shifter Universal grinder

Movatterモバイル変換

Types

Relation to bound variables

Coreference resolution

See also

Notes

References