Movatterモバイル変換


[0]ホーム

URL:


Wikipedia

Collocation

This article is about the corpus linguistics notion. For other uses, seeColocation (disambiguation).

Incorpus linguistics, acollocation is a series of words orterms thatco-occur more often than would be expected by chance. Inphraseology, acollocation is a type ofcompositionalphraseme, meaning that it can be understood from the words that make it up. This contrasts with anidiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.

There are about seven main types of collocations: adjective + noun, noun + noun (such ascollective nouns), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb.

Collocation extraction is a computational technique that finds collocations in a document or corpus, using variouscomputational linguistics elements resemblingdata mining.

Expanded definition

edit

Collocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms ascrystal clear,middle management,nuclear family, andcosmetic surgery are examples of collocated pairs of words.

Collocations can be in asyntactic relation (such asverb–object:make anddecision),lexical relation (such asantonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: agrammatically correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation a common focus for language teaching.

Corpus linguists specify akey word in context (KWIC) and identify the words immediately surrounding them, to illustrate the way words are used in practice.

The processing of collocations involves a number of parameters, the most important of which is themeasure of association, which evaluates whether theco-occurrence is purely by chance or statisticallysignificant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association includemutual information,t scores, andlog-likelihood.[1][2]

Rather than select a single definition, Gledhill[3] proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates;[4][5][6] construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern,[7] or as a relation between a base and its collocative partners;[8] and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form.[9][10] These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum:

Free combination ↔ bound collocation ↔ frozen idiom

In dictionaries

edit

In 1933,Harold Palmer'sSecond Interim Report on English Collocations highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning aforeign language.[11] Thus from the 1940s onwards, information about recurrent word combinations became a standard feature ofmonolingual learner's dictionaries. As these dictionaries became "less word-centred and more phrase-centred",[12] more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large textcorpora and intelligentcorpus-querying software, making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as theMacmillan English Dictionary and theLongman Dictionary of Contemporary English included boxes or panels with lists of frequent collocations.[13]

There are also a number ofspecialized dictionaries devoted to describing the frequent collocations in a language.[14] These include (for Spanish)Redes: Diccionario combinatorio del español contemporaneo (2004), (for French)Le Robert: Dictionnaire des combinaisons de mots (2007), and (for English) theLTP Dictionary of Selected Collocations (1997) and theMacmillan Collocations Dictionary (2010).[15]

Statistically significant collocation

edit

Student'st-test can be used to determine whether the occurrence of a collocation in a corpus is statistically significant.[16] For abigramw1w2{\displaystyle w_{1}w_{2}} , letP(w1)=#w1N{\displaystyle P(w_{1})={\frac {\#w_{1}}{N}}}  be the unconditional probability of occurrence ofw1{\displaystyle w_{1}}  in a corpus with sizeN{\displaystyle N} , and letP(w2)=#w2N{\displaystyle P(w_{2})={\frac {\#w_{2}}{N}}}  be the unconditional probability of occurrence ofw2{\displaystyle w_{2}}  in the corpus. The t-score for the bigramw1w2{\displaystyle w_{1}w_{2}}  is calculated as:

t=x¯μs2N,{\displaystyle t={\frac {{\bar {x}}-\mu }{\sqrt {\frac {s^{2}}{N}}}},} 

wherex¯=#wiwjN{\displaystyle {\bar {x}}={\frac {\#w_{i}w_{j}}{N}}}  is the sample mean of the occurrence ofw1w2{\displaystyle w_{1}w_{2}} ,#w1w2{\displaystyle \#w_{1}w_{2}}  is the number of occurrences ofw1w2{\displaystyle w_{1}w_{2}} ,μ=P(wi)P(wj){\displaystyle \mu =P(w_{i})P(w_{j})}  is the probability ofw1w2{\displaystyle w_{1}w_{2}}  under the null-hypothesis thatw1{\displaystyle w_{1}}  andw2{\displaystyle w_{2}}  appear independently in the text, ands2=x¯(1x¯)x¯{\displaystyle s^{2}={\bar {x}}(1-{\bar {x}})\approx {\bar {x}}}  is the sample variance. With a largeN{\displaystyle N} , thet-test is equivalent to aZ-test.

See also

edit

References

edit
  1. ^Dunning, Ted (1993): "Accurate methods for the statistics of surprise and coincidenceArchived 2012-08-05 at theWayback Machine".Computational Linguistics 19, 1 (Mar. 1993), 61–74.
  2. ^Dunning, Ted (2008-03-21)."Surprise and Coincidence". blogspot.com.Archived from the original on 2012-01-20. Retrieved2012-04-09.
  3. ^Gledhill C. (2000):Collocations in Science WritingArchived 2023-06-29 at theWayback Machine, Narr, Tübingen
  4. ^Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.
  5. ^Sinclair J. (1996): "The Search for Units of Meaning", in Textus, IX, 75–106.
  6. ^Smadja F. A & McKeown, K. R. (1990): "Automatically extracting and representing collocations for language generationArchived 2015-09-06 at theWayback Machine", Proceedings of ACL'90, 252–259, Pittsburgh, Pennsylvania.
  7. ^Hunston S. & Francis G. (2000):Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of EnglishArchived 2023-06-29 at theWayback Machine, Amsterdam, John Benjamins
  8. ^Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexikographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010–1019.
  9. ^Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.
  10. ^Frath P. & Gledhill C. (2005): "Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units[dead link]", in Recherches anglaises et Nord-américaines, vol. 38 :25–43
  11. ^Cowie, A.P., English Dictionaries for Foreign Learners, Oxford University Press 1999:54–56
  12. ^Bejoint, H., The Lexicography of English, Oxford University Press 2010: 318
  13. ^"MED Second Edition – Key features – Macmillan".macmillandictionaries.com. Archived fromthe original on 2020-09-28. Retrieved2011-08-24.
  14. ^Herbst, T. and Klotz, M. 'Syntagmatic and Phraseological Dictionaries' in Cowie, A.P. (Ed.) The Oxford History of English Lexicography, 2009: part 2, 234–243
  15. ^"Macmillan Collocation Dictionary – How it was written - Macmillan".macmillandictionaries.com. Archived fromthe original on 2018-12-21. Retrieved2011-08-24.
  16. ^Manning, Chris; Schütze, Hinrich (1999).Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. pp. 163–166.ISBN 0262133601.

External links

edit
Look upcollocation in Wiktionary, the free dictionary.

[8]ページ先頭

©2009-2025 Movatter.jp