Corpus Linguistics, Frequency and Explanatory Dictionaries: Interaction Vectors

Authors

  • Nina Mechkovskaya Faculty of Philology, Belarusian State University

DOI:

https://doi.org/10.57589/srl.v73i3.4171

Keywords:

frequency dictionaries, synthesis of explanatory and frequency dictionaries in the Macmillan dictionary, semantic component analysis of a 100,000-word dictionary, high-frequency words as semantic multipliers

Abstract

With the overall growth in the number of corpora, their volumes, and their diversity, corpora are becoming increasingly specialized depending on the composition of their targeted content. Electronic corpora of the first generation (with a volume of approximately 100 million word tokens), referred to or perceived as “national” or “state” corpora, maintain a relatively balanced structure of subcorpora and address a broad audience in the social sciences and humanities. As the size of later corpora increases, their specialization develops along two vectors: 1) content-oriented monitor (continuously updated) megacorpora of newspaper and magazine texts; the target user groups for this class of corpus content include sociologists and political scientists, economists, demographers, journalists, and others; 2) thematically unlimited (non-selective) corpora that accumulate digitized texts (printed and electronic), used in computer science as raw material for natural language processing—machine pre-training of neural networks and the creation of statistical algorithms that link words into coherent textual responses of artificial intelligence.

Two of the most significant innovative developments in corpus lexicography are identified: 1) the synthesis of explanatory and frequency dictionaries in the Macmillan dictionaries (2007), later adopted by Collins and Longman; 2) componential semantic analysis of a 100,000-word lexicon using the 2,500 most frequent lexemes in Macmillan (2007) as semantic components. The capabilities of corpora will soon lead to major advancements in diachronic linguistics.

References

Collins Online English Dictionary. Glasgow: HarperCollins Publishers. В сети.

Collins Concise Dictionary, 2011: Glasgow: HarperCollins Publisher.

Нина Р. Добрушина, Михаил А. Даниэль (ред.), 2016: Два века в двадцати словахМосква: Издательский дом Высшей школы экономики.

[Nina R. Dobrušina, Mihail A. Danièl (red.), 2016: Dva veka v dvadcati slovah. Moskva: Izdatel’skij dom Vysšej školy èkonomiki.]

Любовь Н. Засорина (ред.), 1977: Частотный словарь русского языка. Около 40 тысяч слов. Москва: Русский язык.

[Ljubovʼ N. Zasorina (red.), 1977: Chastotnyj slovarʼ russkogo jazyka. Okolo 40 tysjach slov. Moskva: Russkij jazyk.]

Лидия Н. Иорданская, Игорь А. Мельчук, 2007: Смысл и сочетаемость в словаре. Москва: Языки славянских культур.

[Lidija N. Iordanskaja, Igorʼ A. Melʼčuk, 2007: Smysl i sochetaemostʼ v slovare. Moskva: Jazyki slavjanskih kulʼtur.]

Geoffrey Leech, Paul Rayson, Andrew Wilson, 2001: Word Frequencies in Written and Spoken English, based on the British National Corpus. В сети.

Lonngren Lennart et. al., 1993: Частотный словарь современного русского языка. Uppsala: Acta Univ. Ups (Studia Slavica Upsaliensia).

Ольга Н. Ляшевская, Сергей А. Шаров, 2009: Новый частотный словарь русской лексики. Москва: Азбуковник. В сети.

[Olʼga N. Ljaševskaja, Sergej A. Šarov, 2009: Novyj chastotnyj slovarʼ russkoj leksiki. Moskva: Azbukovnik. В сети.

Ольга Н. Ляшевская, Сергей А. Шаров, 2015: Частотный словарь современного русского языка на материалах Национального корпуса русского языка. Москва: Словари.ру.

Olʼga N. Ljaševskaja, Sergej A. Šarov, 2015: Chastotnyj slovarʼ sovremennogo russkogo jazyka na materialah Nacionalʼnogo korpusa russkogo jazyka. Moskva: Slovari.ru].

Macmillan English Dictionary for Advanced Learners, 2002. Oxford: Macmillan ELT

Macmillan English Dictionary for Advanced Learners, 2007. Oxford: Macmillan ELT.

Macmillan English Dictionary. Oxford: Macmillan ELT.

Надзея С. Мажэйка, Адам Я. Супрун, 1976–1992: Частотны слоўнік беларускай мовы. Мiнск: Выдавецтва БДУ.

[Nadzeja S. Mažèjka, Adam Ja. Suprun, 1976–1992: Chastotny sloўnіk belaruskaj movy. Minsk: Vydavectva BDU.]

Игорь А. Мельчук 1960: О терминах “устойчивость” и “идиоматичность”. Вопросы языкознания 4. 73–80.

[Igorʼ A. Melʼčuk 1960: O terminah “ustojchivostʼ” i “idiomatichnostʼ”. Voprosy jazykoznanija 4. 73‍–‍80.]

Published

2025-11-19

How to Cite

Mechkovskaya Н. (2025) “Corpus Linguistics, Frequency and Explanatory Dictionaries: Interaction Vectors”, Slavistična revija, 73(3), pp. 435–450. doi: 10.57589/srl.v73i3.4171.

Issue

Section

ARTICLES