Part-of-Speech Tagging of Slovene Texts: How Far Did We Get?
Keywords:
slovenščina, korusno jezikoslovje, TreeTagger, Nova Beseda, lematizacija, Slovene language, corpus linguistics, lemmatizationAbstract
The article deals with part-of-speech tagging and lemmatization of Slovene texts. The first section explains how these procedures are performed. The second section presents results of experiments in automated tagging of Slovene texts, using a pre-tagged training corpus of one million words. The treeTagger, a statistical tagger, was trained for Slovene and achieved a precision of about 85%. It tagged and lemmatized 100 million russnig words of the Slovene corpus Nova Beseda.Downloads
Published
2005-02-15
How to Cite
LÖNNEKER, B. (2005) “Part-of-Speech Tagging of Slovene Texts: How Far Did We Get?”, Slavistična revija, 53(2), pp. 193–210. Available at: https://srl.si/ojs/srl/article/view/COBISS_ID-30090594 (Accessed: 23 November 2024).
Issue
Section
ARTICLES
License
Slavistična revija (http://www.srl.si) is distributed under
Creative Commons, attribution 4.0 international.
Slavistična revija publishes fully open access journals, which means that all articles are available on the internet to all users immediately upon publication. Non-commercial use and distribution in any medium is permitted, provided the author and the journal are properly credited.