Incorpus linguistics, acollocation is a series of words orterms thatco-occur more often than would be expected by chance. Inphraseology, acollocation is a type ofcompositionalphraseme, meaning that it can be understood from the words that make it up. This contrasts with anidiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.
There are about seven main types of collocations: adjective + noun, noun + noun (such ascollective nouns), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb.
Collocation extraction is a computational technique that finds collocations in a document or corpus, using variouscomputational linguistics elements resemblingdata mining.
Expanded definition
editCollocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms ascrystal clear,middle management,nuclear family, andcosmetic surgery are examples of collocated pairs of words.
Collocations can be in asyntactic relation (such asverb–object:make anddecision),lexical relation (such asantonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: agrammatically correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation a common focus for language teaching.
Corpus linguists specify akey word in context (KWIC) and identify the words immediately surrounding them, to illustrate the way words are used in practice.
The processing of collocations involves a number of parameters, the most important of which is themeasure of association, which evaluates whether theco-occurrence is purely by chance or statisticallysignificant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association includemutual information,t scores, andlog-likelihood.[1][2]
Rather than select a single definition, Gledhill[3] proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates;[4][5][6] construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern,[7] or as a relation between a base and its collocative partners;[8] and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form.[9][10] These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum:
- Free combination ↔ bound collocation ↔ frozen idiom
In dictionaries
editIn 1933,Harold Palmer'sSecond Interim Report on English Collocations highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning aforeign language.[11] Thus from the 1940s onwards, information about recurrent word combinations became a standard feature ofmonolingual learner's dictionaries. As these dictionaries became "less word-centred and more phrase-centred",[12] more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large textcorpora and intelligentcorpus-querying software, making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as theMacmillan English Dictionary and theLongman Dictionary of Contemporary English included boxes or panels with lists of frequent collocations.[13]
There are also a number ofspecialized dictionaries devoted to describing the frequent collocations in a language.[14] These include (for Spanish)Redes: Diccionario combinatorio del español contemporaneo (2004), (for French)Le Robert: Dictionnaire des combinaisons de mots (2007), and (for English) theLTP Dictionary of Selected Collocations (1997) and theMacmillan Collocations Dictionary (2010).[15]
Statistically significant collocation
editStudent'st-test can be used to determine whether the occurrence of a collocation in a corpus is statistically significant.[16] For abigram , let be the unconditional probability of occurrence of in a corpus with size , and let be the unconditional probability of occurrence of in the corpus. The t-score for the bigram is calculated as:
where is the sample mean of the occurrence of , is the number of occurrences of , is the probability of under the null-hypothesis that and appear independently in the text, and is the sample variance. With a large , thet-test is equivalent to aZ-test.
See also
edit- English collocations
- Agreement (linguistics)
- Cliché
- Collocational restriction
- Collostructional analysis
- Compound noun, adjective and verb
- Government (linguistics)
- Idiom (language structure)
- Irreversible binomial
- Isocolon
- Lexical item
- N-gram
- Phrasal verb
- Phraseology
- Phraseme
- Sketch Engine
- Statistically improbable phrase
- Word sketch
References
edit- ^Dunning, Ted (1993): "Accurate methods for the statistics of surprise and coincidenceArchived 2012-08-05 at theWayback Machine".Computational Linguistics 19, 1 (Mar. 1993), 61–74.
- ^Dunning, Ted (2008-03-21)."Surprise and Coincidence". blogspot.com.Archived from the original on 2012-01-20. Retrieved2012-04-09.
- ^Gledhill C. (2000):Collocations in Science WritingArchived 2023-06-29 at theWayback Machine, Narr, Tübingen
- ^Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.
- ^Sinclair J. (1996): "The Search for Units of Meaning", in Textus, IX, 75–106.
- ^Smadja F. A & McKeown, K. R. (1990): "Automatically extracting and representing collocations for language generationArchived 2015-09-06 at theWayback Machine", Proceedings of ACL'90, 252–259, Pittsburgh, Pennsylvania.
- ^Hunston S. & Francis G. (2000):Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of EnglishArchived 2023-06-29 at theWayback Machine, Amsterdam, John Benjamins
- ^Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexikographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010–1019.
- ^Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.
- ^Frath P. & Gledhill C. (2005): "Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units[dead link]", in Recherches anglaises et Nord-américaines, vol. 38 :25–43
- ^Cowie, A.P., English Dictionaries for Foreign Learners, Oxford University Press 1999:54–56
- ^Bejoint, H., The Lexicography of English, Oxford University Press 2010: 318
- ^"MED Second Edition – Key features – Macmillan".macmillandictionaries.com. Archived fromthe original on 2020-09-28. Retrieved2011-08-24.
- ^Herbst, T. and Klotz, M. 'Syntagmatic and Phraseological Dictionaries' in Cowie, A.P. (Ed.) The Oxford History of English Lexicography, 2009: part 2, 234–243
- ^"Macmillan Collocation Dictionary – How it was written - Macmillan".macmillandictionaries.com. Archived fromthe original on 2018-12-21. Retrieved2011-08-24.
- ^Manning, Chris; Schütze, Hinrich (1999).Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. pp. 163–166.ISBN 0262133601.
External links
edit- Ozdic Collocation Dictionary
- A Small System Storing Spanish Collocations (Igor A. Bolshakov & Sabino Miranda-Jiménez)
- Morphological characterization of collocations and semantic relationships in Spanish (Sabino Miranda-Jiménez & Igor A. Bolshakov)
- Example of collocations for the word "Surgery" atwordassociations.net