Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 12205))

Included in the following conference series:

International Conference on Human-Computer Interaction

2367Accesses
4Altmetric

Abstract

Corpus linguistics (CL) is one of the most dynamic and rapidly developing areas of modern linguistics. It affects all areas of linguistics, including methodology of teaching foreign languages, translation and other linguistic disciplines. The reviews of publications on this subject include only the works published in English and do not reflect the contribution of Russian researchers. This article fills this gap by presenting an overview of the disparate publications of Russian authors by examining the dynamics in the growth of the publication number, the geographic distribution, publication outlets, citations, focusing on the studies on the application of CL methods in education in Russia since 2011 till the first half of 2019. Methods of finding relevant publications and processing the resulting body of the data in order to obtain answers to the research questions are described. The discussion indicates that most of the work contains guidelines on the CL use in teaching various aspects of the Russian and foreign languages with the involvement of large general-purpose corpora that are freely available on-line. Only a small number of studies present the results of pedagogical experiments. It is noted that the selected indicators make it possible to assess the degree of CL approach influence on education. Drawing on the analysis, we propose the ways of expanding the implementation of CL methods in language teaching by increasing their use in different disciplines of the pre-service teacher training program and in-service training of contemporary foreign language teachers.

You have full access to this open access chapter, Download conference paper PDF

Corpus Linguistics in English Language Teacher Education During the COVID-19 Pandemic: Exploring Opportunities and Addressing Challenges

Computational and Corpus Approaches to Chinese Language Learning: An Introduction

Academic Chinese: From Corpora to Language Teaching

Keywords

1Introduction

Corpus linguistics is a new direction in linguistics, existing for a little over than half a century and involved in the creation and use of corpora for solving various linguistic and non-linguistic problems.

Is it right, though, to call it direction? This, probably, seemed right at the beginning, when Henry Kučera and Nelson Francis, in the Brown University (USA), created the first corpus, which is called Brown University Standard Corpus of Present-Day American English. Nowadays, corpus linguistics is something more than that. Today, corpora have become an integral part of linguistics, one of its cornerstones, such as vocabulary and grammar. After the appearance of the corpora, the whole linguistic science became different, and we can say that the whole linguistics became corpus linguistics, because it is used in various fields of linguistics either as a method or as a data source.

The number of corpora in the world is constantly growing, and currently there are thousands of corpora [1]. Modern corpora are multilingual (both independent and parallel) and multifunctional. Examples include portal Wortschatz at the University of Leipzig (393 corpora representing 252 languages), the Sketch Engine corpus system (588 corpora, 94 languages) [2], the Aranea corpora at Comenius University in Bratislava consisting of 66 corpora, 31 languages, and a number of others.

A special role is played by national corpora. Such projects are supported at the national level, and leading corpus linguists of the country take part in their creation. Countless studies are carried out on corpora basis, the corpora serve as a source of information on language functioning. Due to the corpora we can quickly and effectively check the features of the use of an unfamiliar word or grammatical form by competent authors. Thus, the national corpus is useful to everyone who, by virtue of the profession, out of necessity or out of simple curiosity, is looking for an answer to questions about the structure and functioning of the language, that is, in fact, to the majority of educated speakers of the language and to all those who study it as a foreign language. The national corpus of the Russian language, available online since April 2004, enjoys well-deserved authority among specialists [3].

Corpus linguistics has had a direct impact on teaching foreign languages since the advent of large online corpora. In 1991, Professor Tim Jones from the University of Birmingham introduced the concept of Data Driven Learning (DDL) and was an ardent supporter of the use of corpora by students to test their hypotheses about word compatibility, grammatical constructions, usage contexts, etc., explaining that “research is too important to be left to the researchers”. Boulton and Cobb’s meta-analysis of studies (from 1989 to 2014) on the use of corpora in foreign languages teaching [4] and Pérez-Paredes’ systematic analysis of the literature on the uses and spread of the corpora and data-driven learning in CALL research [5] help to understand better what has been done in the DDL for the recent decades. Neither of the review works contains papers published in non-English languages. The Boulton and Cobb’s meta-analysis concludes that the DDL is effective and efficient in different learning environments with different categories of learners in pursuing different language goals. However, the research articles by Russian authors on the influence of CL on foreign language teaching were not considered by the above authors of the reviews. For example, in an annotated list of references to all 205 studies which Boulton and Cobb selected for their meta-analysis^Footnote1 there is not a single paper either by Russian authors or devoted to teaching the Russian language using corpora. They are missing among the 32 articles selected by Pérez-Paredes for his analysis either.

We decided to fill this gap and analyze the impact of corpus linguistics on the education system in Russia.

1.1Criteria for Assessing the Impact of Corpus Linguistics on Education in Russia

The authors of this work agreed that the following indicators can be chosen as criteria for the influence of CL on the sphere of education in the Russian Federation:

1.
The total number of publications on this topic written by researchers/with the participation of researchers from Russia registered in recognized databases and their growth dynamics over a certain period of time.
2.
The number and geographic distribution of Russian universities conducting research in this area.
3.
Publication outlets (the number of publications of various types that publish materials presenting research interest to us).
4.
Citations. Despite several shortcomings of this indicator described in literature (e.g., not distinguishing between positive and negative citations) they are helpful in identifying papers that, for one reason or another, are popular [6].
5.
The goals of applying corpus linguistics approaches in the educational process, which the authors pursued.

These criteria were formulated as research questions (RQ), to which the authors decided to find the answers in the course of the undertaken study.

2Research Questions

RQ1: What is the number of corpus linguistics research related to the sphere of education in Russia (CLR-RE)? What is their distribution by a year of publishing?
RQ2: How is CLR-RE research geographically distributed?
RQ3: What are preferable publication outlets for this research: journal or conference proceedings? In which journals and conference proceedings is CLR-RE research currently being published?
RQ4: What CLR-RE studies in Russia are most cited?
RQ5: What goals of applying CL approaches do researchers in Russia’s educational settings pursue?

3Methods

In this section we describe the approaches we took to answer the research questions. We describe the methods we used to gather literature (data collection) and the analytic methods we used to examine the literature on corpora we gathered (data classification and analysis).

3.1Data Collection

As the issue of author affiliation with Russian universities was of fundamental importance to us, we began a search for publications on international databases (Scopus, Web of Science) that allow us to choose the country/university of the author of the publication and the Russian platform Russian Science Citation Index (RSCI).

To search for articles in international databases, standard queries were used: different combinations ofcorpus/corpora/corpus linguistics and data-driven learning, DDL, and concordance/concordancer/concordancing with language learning/teaching, higher education indicating the authorship of the Russian Federation (in the Scopus database) and all identified Russian universities in the Web of Science database. As the result of the search several dozens of items were found. After getting familiarized with the article titles and abstracts, we discovered that the problem of using corpus approaches in teaching is reflected in 16 papers by the authors from Russia, with 7 of them being journal articles, while the rest of them were conference proceedings published in 2014–2018. All this means that the number of publications by Russian authors on the subject registered in international bibliographic databases is scarce.

The main source of publications on this topic was the scientific electronic library eLIBRARY.RU, the leading electronic library of scientific periodicals in Russian in the world, which includes the Russian Science Citation Index (hereinafter RSCI). The number of registered users is 1.7 million. The RSCI is a national information and analytical system that accumulates more than 11 million publications of Russian authors, as well as information on citing these publications from more than 6,000 Russian journals. The project started in 2005 [7]. The system is based on a bibliographic abstract database, which indexes articles in Russian scientific journals. In recent years, other types of scientific publications have also begun to be included in the RSCI: conference proceedings, monographs, study guides, patents, PhD and doctoral dissertations. The database contains information on the output data, authors of publications, places of their work, keywords and subject areas, as well as abstracts and cited references. The integration of the RSCI with the scientific electronic library (e-library) allows users, in most cases, to get familiarized with the full text of the desired publication. The chronological coverage of the system is from 2005 to the present day; for many sources the archives cover longer time period. More than one and a half million publications of Russian scientists are annually added to the RSCI. The RSCI is a non-profit project and is in the public domain, which allows all Russian scientists to use this powerful analytical tool without any restrictions.

Advanced Search offers an opportunity to search for information by type of publication, keywords, abstracts, title, author, subject, journal. We searched for publications containing combinations of expressions/phrases typical for a given topic in Russian (query categoryWhat to look for), with the following search filters (categoryWhere to search): in the title, in the abstract, in the keywords, in the full text of the publication. The search was carried out for the following types of publications: articles in journals, books, conference proceedings, dissertations (categoryType of publication). We also used the option, very important for inflectional languages like Russian, – taking into account morphology (categoryParameters). Search settings in all sessions were kept the same for all queries. The variable was theWhat to search field, into which different combinations of keywords were entered. There were no restrictions on publication dates.

In response to the query, the search engine returned the search results in the form of a table containing the title of the article, the names of the authors, the name of the journal/conference materials in which the paper was published, the year of publication and the number of references to this publication. A hyperlink leads to a page with a description of the publication.

The search results obtained in the form of a table were saved and analyzed. At the first stage of the analysis, the works whose title and/or abstract did not contain information on the CL use in the educational process were excluded from the lists obtained.

On the other hand, to search for relevant works, lists of publications citing highly cited works in our original “trawl” were analyzed. We attributed as highly cited the works that were referenced more than 10 times. These works are given in Table 1. Figure 1 illustrates how the citation hyperlink was used for forward citations search.

Table 1. Highly cited works used as sources to replenish the database of relevant publications

Full size table

Another source of information on the search for relevant articles was the materials of the international conference Magic INNO, which is held every 2 years by MGIMO University^Footnote2. At each of the past conferences, one of the Plenary speakers was a specialist in the field of corpus linguistics. For example, at the first conference held in 2013, there was Geoffrey Leech, Prof. from Lancaster University^Footnote3. Search for relevant publications was carried out on PDF versions of conference materials. All articles containing the wordkorpus were scanned.

In addition, the accounts of the individual promising authors containing lists of their publications were studied in the eLIBRARY.RU. We also considered the eLIBRARY.RU’ accounts of all the authors whose works were found in the international databases in order to obtain a more complete picture of their research being conducted in Russia on the subject of our interest.

Duplicates were excluded from the lists obtained. The remaining works were supplemented with abstracts available on the publication page in the electronic library and information on the availability of the full text. If the full text was available, it was downloaded for later analysis. If there was no full text, and the abstract was short and uninformative, the work was excluded from the database. This situation is typical for conference proceedings before 2010–2011. Therefore, it was decided to conduct further analysis on publications issued in 2011 – the first half of 2019. An exception was made for the materials of the conference “National Corpus of the Russian Language and Problems of Humanitarian Education”, held in 2007, the materials of which are available on the RNC website^Footnote4. An explanation to this decision will be given in the Discussion section.

Thus, the body of the publications for further analysis and the search for answers to the research questions were the materials presented in Table 2.

Table 2. Data Collection Methods, Results, and Dates

Full size table

4Results and Discussion

4.1The Number of Publications and Their Distribution by Year of Publication

The final number of published papers that constituted the corpus of this study was 152.77 works were published in conference proceedings, 64 – in journals, (7 of which – in journals registered in the Scopus database), 2 textbooks, 6 PhD dissertations, 3 book chapters. Concerning the years of publications (2011 – the first half of the year 2019), the materials were distributed as shown below in Fig. 2.

4.2Geographic Distribution and Publication Outlets (RQ2 and RQ3)

The collected articles were classified and analyzed using qualitative and quantitative methods. To determine the geographical location of the universities where the CLR-HE studies are conducted, a list of universities where the authors of the articles work is compiled. As a rule, the name of the university allows us to immediately determine its location. However, in some cases, we used the Google search engine to find out the location of the university. To determine the type of material publication from our corpus (article, conference materials, chapters of a collective monograph, dissertation), we calculated how many times each type appeared in the corpus. The list of journals in which the articles were published was further analyzed in order to find out which publications were preferred by the researchers of this subject.

The only journal that regularly publishes the articles on the use of corpus linguistics in teaching foreign languages is Tambov University Review, Series: Humanities (11 works from our corpus were published in it). Filologicheskie nauki. Voprosy teorii i praktiki [J. of Philological Sciences. Issues of Theory and Practice] is in the 2nd place. This journal is also published in Tambov (Publishing House Gramota Publishers), Jazyk i Kul’tura [Language and Culture] (4 publications each), 4 more journals (Industrija perevoda [Translation Industry], Aktual’nye voprosy obrazovanija [Topical Educational Issues], Uchenye zapiski Krymskogo federal’nogo universiteta imeni V.I. Vernadskogo [Scientific Notes of the Crimean Federal University named after V.I. Vernadsky], European Social Science Journal = European Journal of Social Sciences, published in Moscow (publisher: Autonomous Non-Profit Organization International Research Institute)) published 2 articles each. In each of the remaining 37 journals over the period under consideration, one article from our database was published. No special issues of journals focusing on the topic have been discovered.

Conferences in which the research results from our corpus were presented are characterized by a variety of subjects, with the predominance of interdisciplinary conferences devoted to innovative methods of teaching foreign languages and translation, intercultural communication, translation discourse, problems of modern education in general, etc. Only one conference brings together specialists in the area of corpus linguistics. This is the International Conference of Corpus Linguistics, which is held every 2 years at the Department of Mathematical Linguistics of St. Petersburg State University. However, papers related to the use of corpus linguistics approaches in teaching are an exception at this conference, because the main subjects of the conference are the studies on corpora creation, their markup and corpus data analysis rather than the use of CL approaches in teaching.

Authors of the publications work in 59 universities of different regions of Russia. Most of the research is expectedly carried out in large Russian scientific and educational centers, Moscow and St. Petersburg, however, CL in the sphere of education studies are also conducted in regional universities. Table 3 shows the universities that published more than 4 works from those collected in our database.

Table 3. Universities conducting research on the use of CL in the educational process

Full size table

In some cases, publication activity is associated with preparation for the defense of a candidate dissertation. Not all authors of PhD theses (6 in our database) continue research in this area. It is all the more important that some of them continue active research and publication on this topic after receiving the Ph.D. title (candidate of sciences) [8,9]. In this regard, we note that at least 10 works in our corpus were written by students independently or in collaboration with their scientific advisers, which was reflected in the meta-data for publications. Three of them are registered in the international Scopus database [10,11,12]. Perhaps there are more of them: “suspected” are those authors who do not have an account in the electronic library eLIBRARY.RU. As registration in the e-library is a mandatory requirement in Russia for university teachers in recent years, it can be assumed that the authors affiliated with a specific university, but not registered in eLIBRARY.RU, are students. However, this issue requires further investigation.

4.3Most Cited Papers (RQ4)

Finding the answer to RQ4:Which CLR-RE studies in Russia are cited the most? led to the following results. At the time of this writing, 93 articles were not cited, 38 articles were cited 1 to 4 times, with the majority of them being cited once (20). 14 articles were cited from 5 to 11 times, with the most highly cited publications from the corpus being presented in Table 1. Of these, in the period under consideration (2011 - July 2019) 7 publications (2 editions of the textbook on CL), 2 Ph.D. dissertations and 3 articles published in peer-reviewed journals. The lack of references to many articles selected for analysis in a systematic review of the literature was noted by researchers even in such an intensively developing field as empirical MOOC studies. For example, Veletsianos and Shepherdson found that out of 183 works in their corpus, 87 (47.5%) had not been cited at all [6, p. 210]. The lack of references to most of the works in our corpus may indicate a lack of interest in the use of CL in teaching among the professors of Russian universities and/or a lack of distinction of a particular author in a professional environment, or a lack of interest in other people’s experience, including the CL use in teaching among the professors of Russian universities.

On the other hand over half of the works in our corpus are the conference proceedings referenced, as a rule, less than articles in journals or monographs. Until recently, in Russia there were restrictions on the number of references allowed in conference materials, while in some cases they were completely excluded. Another aspect of the citation problem is low informativeness of headings, abstracts and keywords in the works of the Russian authors on humanitarian subjects. This conclusion was made on the basis of a specially conducted study, in which, using a corpus analysis of the texts of the articles by Russian authors, it was shown that many authors of humanitarian articles were not able to clearly formulate the novelty of their research and present information in the texts of scientific articles so that it could be easily extracted and processed [13,14].

4.4Goals of Applying Corpus Linguistics Approach in Language Teaching (RQ 5)

In search of an answer to RQ 5 on the research objectives pursued by the authors, we wanted to find out which corpora they used, who were the members of the experimental group, and what aspects of the study, language or translation, the research was aimed at.

The authors of the publications investigated the use of CL approaches for teaching English (77% of all publications), German (6%), and Russian (17%). The following corpora were considered:

When teachingEnglish the researchers used BNC (35%), COCA (20%), LOB (Lancaster-Oslo-Bergen Corpus) (10%). Also mentioned were English Grammar Corpus, Cambridge Learner Corpus, DANTE, BAWE, RNC, Russian annotated Learner Corpus (REALEC^Footnote5), TED talks, Web corpus, NOW corpus, ERUK (hose- made) corpus for training students focusing on regional studies and some other house-made corpora for ESP classes.

For teachingGerman, DeReCo, DWDS, COSMAS corpora are recommended, as well as DGD (Datenbank für gesprochenes Deutsch), with the possibility of accessing colloquial archives: das “Forschungs und Lehrkorpus gesprochenes Deutsch” (FOLK); das Korpus “Deutsche Mundarten” (Zwirner-Korpus), das Korpus “Deutsche Umgangssprachen” (Pfeffer-Korpus). Kod.ING (Korpus der Ingenieurwissenschaften) (house-made corpus based on the texts of Ph.D. dissertations defended at Leibniz University Hannover [12]), Leipzig Corpora, corpora of the German Language Institute LIMAS-Korpus, NEGRA, and RNC were also used.

When learning theRussian language the RNC and a sound corpus “One speech day” compiled at SPbSU [15] were used.

When teachingtranslation COCA, RNC, Russian Learner Translator Corpus (RusLTC)^Footnote6, an electronic specialized corpus of English-language texts on military-technical topics, created by the author of the study were useful.

From the above information we can draw a preliminary conclusion that non-specialized general language corpora freely available on-line are more commonly used than specialized ones.

The main aspects proposed to be studied using the approaches of corpus linguistics are the following: vocabulary (18.3%), grammar, vocabulary-grammar (lexicogrammar) (15.6%), translation (20%). Some works are devoted to teaching written discourse, phonetics, upgrading foreign language textbooks, working out test papers and exam materials. A number of articles are aimed at familiarizing readers with corpus resources, showing their possible applications (25.6%). For example, corpora may be helpful in organizing students’ research work, involving them in project activities, and developing their cognitive skills.

Most authors consider the possibilities of using CL in tertiary education (in an ESP course for students of different areas of study (17.6%), for linguists (45.8%), and in postgraduate foreign language courses (3.2%)), which corresponds to global trends, because most studies on the use of CL approaches in teaching have been carried out in a university setting. However, the authors of 4 works (two – on teaching German, two – on teaching Russian) discuss the possibility of applying CL approaches in secondary school. Interest in introducing data-driven learning in secondary school (for younger learners/pre-tertiary learners) for studying foreign and native languages is manifested in different countries of the world, as evidenced by the recently published collective monograph. Its authors consider the use of DDL at school to be very promising, noting that current young learners are well aware of computers and are happy to use them to learn new learning resources, such as linguistic corpora [16].

The authors of only 14 articles described their experiment in detail, with a description of the students’ characteristics in the experimental group. In all cases, positive results were achieved, and a positive attitude of students to new ways of learning a foreign language was shown.

It is surprising that the authors of Ph.D. theses, whose articles constitute approximately 15% of the works in our corpus, do not describe the stages and the results of the experiment in the articles, confining themselves only to the general conclusions. For example, they conclude that the results of the experiment “showed that the mastery of future translators’ work with a linguistic corpus algorithm provides an adequate expression of the meanings embedded in the idiomatic speech units, expands their translation horizons, promotes the development of linguistic thinking, contributes to building the skills of learners’ independent work with information and communication resources” [17]. In the dissertations as such, the experiment is described in sufficient detail, while in the articles there is no description of the pedagogical experiment and the processing of its results.

Most authors focus on discussing the didactic advantages of using CL in teaching certain aspects of foreign language or translation, developing a number of competencies provided by educational standards (information & communication competence (ICT), computer literacy, information culture, research competence), acquaint readers with available corpus resources, and offer a number of relevant questions from their own pedagogical practice, the answers to which helped to find the corpora, analyze possible algorithms/scenarios of teaching with the help of corpora, gives examples of the corpora use in the students’ project activities, corpus-based research projects, recommendations for organizing independent work of students with corpora, leading to an increase in their autonomy and motivation, development of cognitive and meta-cognitive skills and strategies (for instance, contextualization, resourcing, translation, inferencing, self-monitoring, self-management, directed attention, selective attention). It is obvious that the criteria for evaluating CL approaches in the development of many of these competencies or skills have yet to be developed. Pérez-Paredes believes that the criterion for the successful development of cognitive skills is the use of concordance programs by students, but none of the 32 articles of 2011–2015 considered in his systematic analysis investigated this subject [5, p. 16].

As in most works there is no description of the experiment, the question of the degree of students’ corpora awareness and the ways of using them in the study of foreign languages remains unclear. In some works where the authors described the experiment, it is noted that before the experiment students did not know about the corpora and did not use them before the start of the experimental training. This is noted in the works describing experimental training for both linguists [18] and non-linguists [12,19]. A special study conducted by a student of IvSU among her fellow students of the Faculty of Romano-Germanic Philology showed that out of 52 respondents, 75% knew about the existence of corpora thanks to the disciplines of Analytical Reading and IT. However, only 43% were actively using corpora [20].

Knowing the situation in the higher education system in Russia, we can assume that most students majoring in linguistics and philology get familiarized with the corpora and approaches of corpus linguistics in the process of studying at university. However, the question remains whether or not familiarity with corpus technologies means their use by young specialists in their own teaching practice, and it requires a separate serious study, as indicated by Boulton and Tyne [21]. The few available studies on this problem in the world give somewhat mixed results. On the one hand, Charles’ s more than a half post graduated students from different disciplines used their Do-It-Yourself corpora on a regular basis a year after the completing the experimental writing course [22]. Lenko-Szymanska showed that one specially designed course for students – future teachers of foreign languages is not enough to build sustainable technical, corpus-linguistic, and pedagogical skills [23]. She believes that students should be exposed to corpora in their language and linguistics classes and be familiarized with DDL methods and techniques in general teacher training courses, which will enhance DDL methodology acquisition and contribute to its sustainability [23].

We proposed our own solution to the problem of updating and supplementing various disciplines of the foreign language teachers majoring in computer assisted language teaching program at Peter the Great St. Petersburg Polytechnic University with tasks for accessing corpora and analyzing corpus data in an article [24].

The low “corpus literacy” [25, p. 148] of teachers prevents their widespread use in the educational process. This applies to the teaching of the Russian language and the use of RNC. One of the reasons seems to be insufficient integration of two areas – corpus linguistics and methods of teaching foreign languages. Linguistic corpora are created mainly by linguists who are not teachers of a foreign language, and teachers themselves are characterized by low corpora awareness, a lack of understanding innovative methods of language teaching, which dictates the need to master new competencies, skills in handling complex computer programs. The increasing sophistication of corpus linguistics tools functionality due to the complex set of programming, linguistic and mathematical means makes these intellectual programs much superior to conventional concordancers [26]. The former provide a wide range of multiple linguistic functions such as lexical-semantic fields formation, detection and visualization of the collocations, text key word list production, and a set of relevant study examples selection. There is an obvious controversy between CL researchers’ interests and ordinary language learners’ and teachers’ needs in terms of complexity and functionality of special software. The former are eager to master state-of-the-art software for intellectual processing of corpus data, providing a wide range of multiple linguistic functions such as lexical-semantic fields formation, detection and visualization of the collocations, text key word list production, and a set of relevant study examples selection. Although conventional concordancers are essentially information storage, searching and retrieval systems with elaborated query and sorting functions these are complex enough for ordinary users who need more training than foreseen by the syllabus to achieve the level of confident users and take advantage of them in their independent teaching and research.

The main directions of the RNC use in teaching will be discussed in the following section.

4.5RNC Use in Learning and Teaching

The RNC is the largest, national scale reference corpus in the Russian language. Available on-line, free of charge, it is a linguistic resource, which can be used without registration. Its present day size is 288727494 million word tokens taken from spoken genres, fiction, and written media (including academic and non-academic texts) in Russian from the mid-18th century to the present. The Russian National Corpus currently uses four types of annotation: metatextual, morphological, accentual and semantic; the introduction of syntactic annotation is planned for the near future. The system of annotation is constantly being improved, which allows for quite complex syntactic and morphological queries.

The RNC has a number of subcorpora of different types including a set of bidirectional parallel text subcorpora. In the latter subcorpora, Russian is complemented by its translation into a different language, and vice versa. The units of the original and the translated texts (usually, a unit is a sentence) are matched through an alignment procedure. At the time of publishing, 19 bidirectional parallel text corpora are available including English/Russian, German/Russian and French/Russian parallel corpora.

All the work on the RNC use is of great methodological value, as they contain a detailed description of the tasks developed for addressing the corpora, for example, for studying diminutives typical for the Russian language [27]. The work of Olkhovskaya and Paramonova [28] contains a description of 8 different types of tasks in the RNC corpus with lexical ambiguity, neologisms in Russian and their graphic variability, the time of the word emergence in Russian using the diachronic corpus, the study of functional styles, work with literary images of different writers, intertextuality assignments, etc. The authors also note the fact that Russian teachers are familiar with the corpus as a linguistic resource, but practically do not use it as a tool for teaching. At the same time, characterizing the ongoing educational work on the pedagogical potential of the corpora, the authors call it “hyperactive”, but refer to the well-known works of their colleagues published more than 10 years ago. Many works were published in the proceedings of the only specialized international conference “National Corpus of the Russian Language and Humanitarian Education Issues”, organized in 2007 by the Higher School of Economics (HSE) in Moscow [29].

In our opinion, the problem of familiarizing practicing teachers of foreign languages and the Russian language with the basics of corpus linguistics can be solved within the framework of continuing education courses taking into account their needs, which is the cornerstone of the effectiveness of such courses [for details see30]. Another important condition for the effectiveness of such in-service training courses is the use of modern resources and training formats. First of all, such technologies as blended learning and flipped classroom were included in the curriculum with the available MOOC courses on corpus linguistics. We mean the Corpus Linguistics: Method, Analysis and Interpretation (Massive Open Online Course) course on the FutureLearn platform, created by teachers and offered by Lancaster University, under the direction of T. McEnery^Footnote7 and Introduction to Corpus Linguistics on the Open Education platform developed at the Higher School of Economics National Research University under the direction of A. Levinson^Footnote8.

5Conclusions

The selected criteria for the research were the number of publications, their dynamics, the number of citations, types of publications, the geographical distribution of universities conducting research in the field of CL and the target settings of the research authors as a whole. All these factors made it possible to get an idea of the CL methods influence in the higher education system in Russia in 2011–2019. The available data allow us to conclude that CL methods are used by teachers at many universities, and most students majoring in linguistics are familiar with the corpora.
The question of whether young teachers who studied CL at the university use this knowledge in their teaching practice requires further study.
The use of CL approaches in teaching a foreign language, mother tongue, and translation in Russia, as well as in other countries, is one of the innovative methods in linguodidactics, but it does not belong to the main methods of teaching these disciplines. Researchers focus on building students’ linguistic competence with the help of corpora, and the available publications contain a large number of recommendations and examples of the corpora use for the development of linguistic competence. Less than 10% of the works contain descriptions of the experimental training results.
One of the existing problems is low awareness of the foreign language teachers of CL use in teaching students, majoring in different areas. It is evident, though, that the CL has a certain potential for building competencies to be formed among graduates in accordance with the requirements of the Federal educational standards. One of the possible solutions is specially developed continuing education programs using available modern resources, like MOOC on CL, and innovative training formats.
It is possible that a special scientific journal on the CL use achievements in pedagogical practice and holding special scientific conferences will contribute to the wider dissemination of this promising teaching method and its sustainable usage in modern pedagogical practice.

Notes

1.
The full list is available in Supporting Information Folder at the Publisher’s websitehttps://onlinelibrary.wiley.com/doi/pdf/10.1111/lang.12224.
2.
The materials of the past conferences are freely available on the university websitehttps://inno-conf.mgimo.ru/materials.
3.
Videorecording of Geoffrey Leech Plenary talk at Magic INNO 2013:https://www.youtube.com/watch?v=CdpOSAIzIfc&feature=youtu.be.
4.
https://studiorum-ruscorpora.ru/education/articles/.
5.
Homepage of REALEC:http://web-corpora.net/RLC.
6.
Homepage of RusLTC:http://rus-ltc.org/search.
7.
Corpus Linguistics: Method, Analysis and Interpretation MOOC course URL:https://www.futurelearn.com/courses/corpus-linguistics.
8.
Introduction to Corpus Linguistics MOOC course URL:https://openedu.ru/course/hse/CORPUS/.

References

Kopotev, M.V.: Introduction to Corpus Linguistics. Animedia Company, Praha (2014). (in Russian)
Google Scholar
Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography1(1), 7–36 (2014).https://doi.org/10.1007/s40607-014-0009-9
Article Google Scholar
Plugnjan, V.A.: Why do we need the National Corpus of the Russian Language? Informal introduction. In: The National Corpus of the Russian Language: 2003–2005, pp. 6–20. Indrik, Moscow (2005). (in Russian)
Google Scholar
Boulton, A., Cobb, T.: Corpus use in language learning: a meta-analysis. Lang. Learn.67(2), 348–393 (2017).https://doi.org/10.1111/lang.12224
Article Google Scholar
Pérez-Paredes, P.: A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Comput. Assist. Lang. Learn. (2019).https://doi.org/10.1080/09588221.2019.1667832. Published online: 26 Sept 2019
Veletsianos, G., Shepherdson, P.A.: Systematic analysis and synthesis of the empirical MOOC literature published in 2013–2015. IRRODL17(2), 198–221 (2016).https://doi.org/10.19173/irrodl.v17i2.2448
Article Google Scholar
Zimina, T.: Citation index in Russian: Dr. Eugene Garfield approved Russian Science Citation Index. Nauka i Zhizn [Science & Life] 19 September 2006.http://www.nkj.ru/news/6301/. (in Russian)
Osipova, E.S.: Corpus linguistics technology in teaching English as a foreign language. In: Chernyavskaya, V., Kuße, H. (eds.) The European Proceedings of Social & Behavioural Sciences EpSBS, vol. 51, pp. 273–283. Future Academy, London (2018).https://doi.org/10.15405/epsbs.2018.12.02.30
Gorina, O.G.: Corpus research tools in L2 teaching. Vestnik Tomskogo gosudarstvennogo universiteta [Tomsk State Univ. Bull.]435, 187–194 (2018).https://doi.org/10.17223/15617793/435/24. (in Russian)
Article Google Scholar
Fenogenova, A., Kuzmenko, E.: Automatic generation of lexical exercises. In: Chernyak, E., Ilvovsky, D., Skorinkin, D., Vybornova, A. (eds.) CEUR Workshop Proceedings, vol. 1886, pp. 20–27 (2016).http://ceur-ws.org/Vol-1886/
Kuzminykh, I., Khoroshilova, S.P.: Investigating the impact of corpus-based classroom activities in english phonetics classes on students’ academic progress. Novosibirsk State Pedagogical Univ. Bull.7(4), 40–51 (2017).https://doi.org/10.15293/2226-3365.1704.03
Article Google Scholar
Kogan, M., Yaroshevich, A., Ni, O.: Corpus-based teaching of German compound nouns and lexical bundles for improving academic writing skills. Lidil,58 (2018).http://journals.openedition.org/lidil/5438.https://doi.org/10.4000/lidil.5438
Belyaeva, L.N., Chernyavskaya, V.E.: Scientific and technical texts in the framework of information 4.0: content analysis and text synthesis. St. Petersburg State Polytech. Univ. J. Hum. Soc. Sci.10(2), 53–63 (2019).https://doi.org/10.18721/JHSS.10205
Beliaeva, L., Chernyavskaya, V.: Technical writer in the framework of modern natural language processing tasks. J. Sib. Fed. Univ. Humanit. Soc. Sci.12(1), 20–31 (2019).https://doi.org/10.17516/1997-1370-0377
Article Google Scholar
Bogdanova-Beglarian, N., Martynenko, G., Sherstinova, T.: The one day of speech corpus: phonetic and syntactic studies of everyday spoken russian. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 429–437. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-23132-7_53
Chapter Google Scholar
Boulton, A.: Foreword. Data-driven learning for younger learners: obstacles and optimism. In: Crosthwaite, P. (ed.) Data-Driven Learning for the Next Generation: Corpora and DDL for Pre-tertiary Learners, pp. 2–9. Routledge, London (2020).https://doi.org/10.4324/9780429425899
Tarnaeva, L.P., Osipova, E.S.: Corpus linguistics resources use in training translators in the sphere of professional communication. Voprosy Teorii i Praktiki63(9), 205–209 (2016). (in Russian)
Google Scholar
Oveshkova, A.N.: Work with english corpora as a means of promoting learner autonomy. Obrazovanie i Nauka [Educ. Sci.]20(8), 66–87 (2018).https://doi.org/10.17853/1994-5639-2018-8-66-87
Article Google Scholar
Almazova, N., Kogan, M.: Computer assisted individual approach to acquiring foreign vocabulary of students major. In: Zaphiris, P., Ioannou, A. (eds.) LCT 2014. LNCS, vol. 8524, pp. 248–257. Springer, Cham (2014).https://doi.org/10.1007/978-3-319-07485-6_25
Chapter Google Scholar
Shamova, N.A.: Corpus technologies in the academic process from students’ perspective. In: All-Russia Student Convent “Innovation” Proceedings, pp. 472–476. Ivanovo State University Press, Ivanovo (2016). (in Russian)
Google Scholar
Boulton, A., Tyne, H.: Corpus-based study of language and teacher education. In: Biglow, M., Ennser-Kananen, J. (eds.) The Routledge Handbook of Educational Linguistics, pp. 301–312. Routledge, New York (2015).https://doi.org/10.4324/9781315797748
Charles, M.: Getting the corpus habit: EAP students’ long-term use of personal corpora. Engl. Specif. Purp.35(1), 30–40 (2014).https://doi.org/10.1016/j.esp.2013.11.004
Article Google Scholar
Leńko-Szymańska, A.: Training teachers in data-driven learning: tackling the challenge. Lang. Learn. Technol.21(3), 217–241 (2017).https://doi.org/10125/44628
Google Scholar
Dmitrijev, A.V., Kogan, M.S.: The potential of corpus linguistics in training foreign language teachers majoring in computer assisted language teaching. St. Petersburg State Polytech. Univ. J. Hum. Soc. Sci.10(4), 69–85 (2019).https://doi.org/10.18721/JHSS.10407. (in Russian)
Article Google Scholar
Dikareva, S.S., Chernobrivetc, S.G.: Corpus technologies in Russian syntax studying. In: Orekhova, V.V., Titarenko, E.Ja. (eds): Dialogue of Cultures. Theory and Practice of Teaching Languages and Literature. VI International Scientific Conference Proceedings, pp. 146–149. Arial Publishing House, Simferopol (2018). (in Russian)
Google Scholar
Zakharov, V.P.: Corpus linguistics tools functionality. In: Nikolaev, I.S. (ed.) Strukturnaja i prikladnaja lingvistika Mezhvuzovskij sbornik [Structural and Applied Linguistics], pp. 81–95. Saint Petersburg State University Press, St. Petersburg (2019). (in Russian)
Google Scholar
Rezanova, Z.I., Shilyaev, K.S.: Russian National Corpus in teaching the use of Russian diminutives. Mezhdunarodnyj Zhurnal Prikladnyh i Fundamental’nyh Issledovanij [Int. J. Appl. Basic Res.]5–4, 634–639 (2015). (in Russian)
Google Scholar
Paramonova, M., Olkhovskaya, A.: Corpus in teaching Russian language and literature. Rhema1, 93–123 (2019).https://doi.org/10.31862/2500-2953-2019-1-93-123
Article Google Scholar
Dobrushina, N.R. (ed.): Nacional’nyj korpus russkogo jazyka i problemy gumanitarnogo obrazovanija [The National Corpus of the Russian Language and Issues of Humanitarian Education]. SU-HSE Press, Moscow (2007)
Google Scholar
Kabanova, N., Kogan, M.: Needs analysis as a cornerstone in formation of ict competence in language teachers through specially tailored in-service training course. In: Zaphiris, P., Ioannou, A. (eds.) LCT 2017. LNCS, vol. 10295, pp. 110–123. Springer, Cham (2017).https://doi.org/10.1007/978-3-319-58509-3_11
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge Inga Kuznetsova, English language instructor for ITMO University (St. Petersburg), Master student at Higher School of Language Teaching and Translation (SPbSPU), for her help in preparing this paper.

Author information

Authors and Affiliations

Higher School of Language Teaching and Translation, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
Marina Kogan, Nina Popova & Nadezhda Almazova
Department of Mathematical Linguistics, Saint Petersburg State University, St. Petersburg, Russia
Victor Zakharov

Authors

Marina Kogan
View author publications
You can also search for this author inPubMed Google Scholar
Victor Zakharov
View author publications
You can also search for this author inPubMed Google Scholar
Nina Popova
View author publications
You can also search for this author inPubMed Google Scholar
Nadezhda Almazova
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toMarina Kogan.

Editor information

Editors and Affiliations

Cyprus University of Technology, Limassol, Cyprus
Panayiotis Zaphiris
Cyprus University of Technology, Limassol, Cyprus
Andri Ioannou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kogan, M., Zakharov, V., Popova, N., Almazova, N. (2020). The Impact of Corpus Linguistics on Language Teaching in Russia’s Educational Context: Systematic Literature Review. In: Zaphiris, P., Ioannou, A. (eds) Learning and Collaboration Technologies. Designing, Developing and Deploying Learning Experiences. HCII 2020. Lecture Notes in Computer Science(), vol 12205. Springer, Cham. https://doi.org/10.1007/978-3-030-50513-4_26

Download citation

DOI:https://doi.org/10.1007/978-3-030-50513-4_26
Published:10 July 2020
Publisher Name:Springer, Cham
Print ISBN:978-3-030-50512-7
Online ISBN:978-3-030-50513-4
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

The Impact of Corpus Linguistics on Language Teaching in Russia’s Educational Context: Systematic Literature Review

Abstract

Similar content being viewed by others

Corpus Linguistics in English Language Teacher Education During the COVID-19 Pandemic: Exploring Opportunities and Addressing Challenges

Computational and Corpus Approaches to Chinese Language Learning: An Introduction

Academic Chinese: From Corpora to Language Teaching

Keywords

1Introduction

1.1Criteria for Assessing the Impact of Corpus Linguistics on Education in Russia

2Research Questions

3Methods

3.1Data Collection

4Results and Discussion

4.1The Number of Publications and Their Distribution by Year of Publication

4.2Geographic Distribution and Publication Outlets (RQ2 and RQ3)

4.3Most Cited Papers (RQ4)

4.4Goals of Applying Corpus Linguistics Approach in Language Teaching (RQ 5)

4.5RNC Use in Learning and Teaching

5Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us