Strojno označevanje slovenskih besedil: Kako daleč smo?

Birte LÖNNEKER

Part-of-Speech Tagging of Slovene Texts: How Far Did We Get?

Authors

Birte LÖNNEKER

Keywords:

slovenščina, korusno jezikoslovje, TreeTagger, Nova Beseda, lematizacija, Slovene language, corpus linguistics, lemmatization

Abstract

The article deals with part-of-speech tagging and lemmatization of Slovene texts. The first section explains how these procedures are performed. The second section presents results of experiments in automated tagging of Slovene texts, using a pre-tagged training corpus of one million words. The treeTagger, a statistical tagger, was trained for Slovene and achieved a precision of about 85%. It tagged and lemmatized 100 million russnig words of the Slovene corpus Nova Beseda.

Downloads

PDF (Slovenščina)

Published

2005-02-15

How to Cite

LÖNNEKER, B. (2005) “Part-of-Speech Tagging of Slovene Texts: How Far Did We Get?”, Slavistična revija, 53(2), pp. 193–210. Available at: https://srl.si/ojs/srl/article/view/COBISS_ID-30090594 (Accessed: 29 June 2026).

Download Citation

BibTeX

Issue

Vol. 53 No. 2 (2005)

Section

ARTICLES

License

Slavistična revija (http://www.srl.si) is distributed under
Creative Commons, attribution 4.0 international.

Slavistična revija publishes fully open access journals, which means that all articles are available on the internet to all users immediately upon publication. Non-commercial use and distribution in any medium is permitted, provided the author and the journal are properly credited.