TheOxford English Corpus (OEC) is atext corpus of 21st-centuryEnglish, used by the makers of theOxford English Dictionary and byOxford University Press' language research programme. It is the largest corpus of its kind, containing nearly 2.1 billion words.[1]It includes language from the UK, the United States, Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa.[2] The text is mainly collected fromweb pages; some printed texts, such asacademic journals, have been collected to supplement particular subject areas.[2] The sources are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and fromHansard to the language of blogs, emails, and social media".[2] This may be contrasted with similar databases that sample only a specific kind of writing. The corpus is generally available only to researchers at Oxford University Press, but other researchers who can demonstrate a strong need may apply for access.[2][3]
The digital version of the Oxford English Corpus is formatted inXML and usually analysed withSketch Engine software.[4] By April 27, 2006, the dictionary database had 1 billion words.[5]
Each document in the OE Corpus is accompanied bymetadata including:
This article about theEnglish language is astub. You can help Wikipedia byexpanding it. |
Thistext corpus orspeech corpus-related article is astub. You can help Wikipedia byexpanding it. |