Corpus Linguistics and Lexical Descriptions of the Slovenian Language

Vojko GORJANC

Authors

Vojko GORJANC

Keywords:

slovenščina, leksikologija, korpusi (jezikoslovje), semantika, leksikalna semantika, Slovenian language, lexicology, corpora (linguistics), semantics, lexical semantics

Abstract

In the last decade, corpus linguistics has finally established itself as a separate research starting point, strictly empirical in nature; in the last few years its status of a separate research starting point has emerged in Slovenia as well. Corpora are, of course, a necessary prerequisite for this development, therefore coprus building marked the second half of the 1990s. Inthis process the corpora compiled within the framework of the MULTEXT-EAST project played a pioneer role. Today two monolingual corpora are available forthe Slovenian language, the 100-million-word reference corpus of the Slovenian language, the FIDA Corpus, and a larger non-reference corpus, Nova beseda, of just over 160-million words. At the same time, a very large 300-million-word reference corpus FidaPLUS is being built. Additionally, parallel corpora, so far only combining Slovenian and English, have been created. These corpora presented the starting point for a series of corpus-based linguistic studies carried out in the last few years. Just as the pre-computer Survey of English Usage was a turning point in the linguisticdescription of English, the collectiona of materials compiled for the design of the Slovar slovenskega knjižnega jezika (1970-1991)(Engl. Dictionary of the Standard Slovenian Language), was a turning point for Slovenian lexicosemantic descriptions since it enabled a thorough description of the Slovenian language on the basis of data on textual reality. In the 1960s, when the concept of the new monolingual dictionary was fully formed, lexical descriptions based on materials collected for that purpose, which rejected descriptions of linguistic elements not based on real language use and exceeded the normative approach to language description, were designed. However, no computer-assisted language data processing was initiated within the framework of Slovenian studies, even though this was one of its explicitlystated goals. This meant that Slovenia language studies only began to focus on language technologie in the second half of the 1990s; but at that time its involvement was very active. The impact of corpus linguistics in Slovenia has been quite noticeable in this last decade, above all after the year 2000, with the appearance of the first integral corpus linguistic studies. In the field of Slovenian studies, corpora have, on the other hand, become an independent starting point for linguistic analyses, and, on the other hand, indespensable in various types of language studies as material foranalysis. Corpus data is practically limitless; its analysis is an ongoing challenge, especially when it surpasses the limits of the expected and defies our intuitive perception of language reality. The results of corpus analyses of the Slovenian language are exciting as they reveal the exceptional creativity and vitality of the Slovenian discourse community.

Corpus Linguistics and Lexical Descriptions of the Slovenian Language

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Language

Information