Part-of-Speech Tagging of Slovene Texts: How Far Did We Get?

Authors

  • Birte LÖNNEKER

Keywords:

slovenščina, korusno jezikoslovje, TreeTagger, Nova Beseda, lematizacija, Slovene language, corpus linguistics, lemmatization

Abstract

The article deals with part-of-speech tagging and lemmatization of Slovene texts. The first section explains how these procedures are performed. The second section presents results of experiments in automated tagging of Slovene texts, using a pre-tagged training corpus of one million words. The treeTagger, a statistical tagger, was trained for Slovene and achieved a precision of about 85%. It tagged and lemmatized 100 million russnig words of the Slovene corpus Nova Beseda.

Published

2005-02-15

How to Cite

LÖNNEKER, B. (2005) “Part-of-Speech Tagging of Slovene Texts: How Far Did We Get?”, Slavistična revija, 53(2), pp. 193–210. Available at: https://srl.si/ojs/srl/article/view/COBISS_ID-30090594 (Accessed: 23 November 2024).

Issue

Section

ARTICLES