Izdelava seznama besed za množično raziskavo razširjenosti slovenskih besed
DOI:
https://doi.org/10.57589/srl.v73i1.4231Ključne besede:
besedišče, razširjenost, množična raziskava, korpus, frekvenca, slovenščinaPovzetek
Članek predstavlja metodologijo izdelave seznama besed za množično raziskavo razširjenosti slovenskih besed. Pri oblikovanju seznama so bili uporabljeni geslovniki treh razlagalnih slovarjev slovenskega jezika: druge izdaje Slovarja slovenskega knjižnega jezika, eSSKJ in Sprotnega slovarja slovenskega jezika. Izbor besed je bil omejen z izbranimi merili, med drugim z dolžino besed in korpusno frekvenco ter z izločitvijo lastnih imen. Končni seznam obsega 79.413 besed in zajema sodobno občno besedje. Seznam je uporabljen v preizkusu besedišča, s katerim bodo pridobljeni podatki o razširjenosti besed, tj. o deležu govorcev slovenskega jezika, ki poznajo posamezno besedo. Rezultati bodo prispevali k boljšemu razumevanju mentalnega leksikona govorcev slovenščine.
Literatura
Domen Krvina, 2014–: Sprotni slovar slovenskega jezika. Na spletu.
ePravopis: Slovar slovenskega pravopisa. 2014–. Na spletu.
eSSKJ: Slovar slovenskega knjižnega jezika. 2016–. Na spletu.
Slovar slovenskega knjižnega jezika, druga, dopolnjena in deloma prenovljena izdaja. 2014. Tudi na spletu.
Jože Toporišič (ur.), 2001: Slovenski pravopis. Tudi na spletu.
Jose Armando Aguasvivas, Manuel Carreiras, Marc Brysbaert, Paweł Mandera, Emmanuel Keuleers, Jon Andoni Duñabeitia, 2018: SPALEX: A Spanish Lexical Decision Database From a Massive Online Data Collection. Frontiers in Psychology 9. 2156. https://doi.org/10.3389/fpsyg.2018.02156.
Kozma Ahačič, Nina Ledinek, Andrej Perdih, 2015: Portal Fran – nastanek in trenutno stanje. Slovnica in slovar – aktualni jezikovni opis. Ur. Mojca Smolej. Ljubljana: Znanstvena založba Filozofske fakultete (Obdobja 34). 57–66.
Špela Arhar Holdt, Senja Pollak, Marko Robnik Šikonja, Simon Krek, 2020: Referenčni seznam pogostih splošnih besed za slovenščino. Jezikovne tehnologije in digitalna humanistika: zbornik konference. Ur. Darja Fišer, Tomaž Erjavec. Ljubljana. 10–5.
R.H. Baayen, L.B. Feldman, R. Schreuder, 2006: Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language 55/2. 290–313. https://doi.org/10.1016/j.jml.2006.03.008.
David A. Balota, Michael J. Cortese, Susan D. Sergent-Marshall, Daniel H. Spieler, Melvin J. Yap, 2004: Visual Word Recognition of Single-Syllable Words. Journal of Experimental Psychology: General 133/2. 283–316. https://doi.org/10.1037/0096-3445.133.2.283.
David A. Balota, Melvin J. Yap, Keith A. Hutchison, Michael J. Cortese, Brett Kessler, Bjorn Loftis, James H. Neely, Douglas L. Nelson, Greg B. Simpson, Rebecca Treiman, 2007: The English Lexicon Project. Behavior Research Methods 39. 445–59. https://doi.org/10.3758/BF03193014.
Rebekah George Benjamin, 2012: Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty. Educational Psychology Review 24. 63–88. https://doi.org/10.1007/s10648-011-9181-8.
Helen Bird, Sue Franklin, David Howard, 2001: Age of acquisition and imageability ratings for a large set of words, including verbs and function words. Behavior Research Methods, Instruments, & Computers 33. 73–9. https://doi.org/10.3758/BF03195349.
Marc Brysbaert, Paweł Mandera, Samantha F. Mcc ormick , Emmanuel Keuleers, 2019: Word prevalence norms for 62,000 English lemmas. Behavior Research Methods 51. 467–79. https://doi.org/10.3758/s13428-018-1077-9.
Marc Brysbaert, Boris New, 2009: Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41. 977–90. https://doi.org/10.3758/BRM.41.4.977.
Marc Brysbaert, Michaël Stevens, Paweł Mandera, Emmanuel Keuleers, 2016a: The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance 42/3. 441–58. https://doi.org/10.1037/xhp0000159.
Marc Brysbaert, Michaël Stevens, Paweł Mandera, Emmanuel Keuleers, 2016b: How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age. Frontiers in Psychology 7. https://doi.org/10.3389/fpsyg.2016.01116.
Orphée De Clercq, Véronique Hoste, 2016: All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch, Computational Linguistics 42/3. 457–90. https://doi.org/10.1162/COLI_a_00255.
Pasquale A. Della Rosa, Eleonora Catricalà, Gabriella Vigliocco, Stefano F. Cappa, 2010: Beyond the abstract—concrete dichotomy: Mode of acquisition, concreteness, imageability, familiarity, age of acquisition, context availability, and abstractness norms for a set of 417 Italian words. Behavior Research Methods 42. 1042–8. https://doi.org/10.3758/BRM.42.4.1042.
Alain Desrochers, Glenn L. Thompson, 2009: Subjective frequency and imageability ratings for 3,600 French nouns. Behavior Research Methods 41. 546–57. https:// doi.org/10.3758/BRM.41.2.546.
Andrew Duchon, Manuel Perea, Nuria Sebastián-Gallés, Antonia Martí, Manuel Carreiras, 2013: EsPal: One-stop shopping for Spanish word properties. Behavior Research Methods 45. 1246–58. https://doi.org/10.3758/s13428-013-0326-1.
Charles M. Eddington, Natasha Tokowicz, 2015: How meaning similarity influences ambiguous word processing: the current state of the literature. Psychonomic Bulletin & Review 22. 13–37. https://doi.org/10.3758/s13423-014-0665-7.
Eva M. Fernández, Helen Smith Cairns (ur.), 2018: The handbook of psycholinguistics. John Wiley & Sons. https://doi.org/10.1002/9781118829516.
Ludovic Ferrand, Boris New, Marc Brysbaert, Emmanuel Keuleers, Patrick Bonin, Alain Méot, Maria Augustinova, Christophe Pallier, 2010: The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods 42. 488–96. https://doi.org/10.3758/BRM.42.2.488.
John Field, 2004: Psycholinguistics: the key concepts. London, New York: Routledge.
Kenneth I. Forster, 2000: The potential for experimenter bias effects in word recognition experiments. Memory & Cognition 28/7. 1109–15. https://doi.org/10.3758/BF03211812.
Nataša Gliha Komac, Nataša Jakop, Janoš Ježovnik, Simona Klemenčič, Domen Krvina, Nina Ledinek, Mija Michelizz a, Matej Meterc, Tanja Mirtič, Andrej Perdih, Špela Petric, Marko Snoj, Andreja Žele, 2016: Novi slovar slovenskega knjižnega jezika – predstavitev temeljnih konceptualnih izhodišč. Škrabčevi dnevi 9. Zbornik prispevkov s simpozija 2015. Ur. Franc Marušič idr. Nova Gorica: Založba Univerze v Novi Gorici. 19–33.
Marc Guasch, Roger Boada, Jon Andoni Duñabeitia, Pilar Ferré, 2022: Prevalence norms for 40,777 Catalan words: An online megastudy of vocabulary size. Behavior Research Methods 55. 3198–217. https://doi.org/10.3758/s13428-022-01959-5.
Marc Guasch, Pilar Ferré, Isabel Fraga, 2016: Spanish norms for affective and lexicosemantic variables for 1,400 words. Behavior Research Methods 48. 1358–69. https://doi.org/10.3758/s13428-015-0684-y.
Julia Hanck e, Sowmya Vajjala, Detmar Meurers, 2012: Readability Classification for German using Lexical, Syntactic, and Morphological Features. Proceedings of COLING 2012. Ur. Martin Kay, Christian Boitet. Mumbai: The COLING 2012 Organizing Committee. 1063–80.
Kamil K. Imbir, 2016: Affective Norms for 4900 Polish Words Reload (ANPW_R): Assessments for Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability and, Age of Acquisition. Frontiers in Psychology 7. 1081. https://doi.org/10.3389/fpsyg.2016.01081.
Emmanuel Keuleers, Marc Brysbaert, 2010: Wuggy: A multilingual pseudoword generator. Behavior Research Methods 42/3. 627–33. https://doi.org/10.3758/BRM.42.3.627.
Emmanuel Keuleers, Marc Brysbaert, 2011: Detecting inherent bias in lexical decision experiments with the LD1NN algorithm. The Mental Lexicon 6/1. 34–52. https:// doi.org/10.1075/ml.6.1.02keu.
Emmanuel Keuleers, Michaël Stevens, Paweł Mandera, Marc Brysbaert, 2015: Word knowledge in the crowd: Measuring vocabulary size and word prevalence in
a massive online experiment. Quarterly Journal of Experimental Psychology 68/8. 1665–92. https://doi.org/10.1080/17470218.2015.1022560.
Matej Klemen, Špela Arhar Holdt, Senja Pollak, Iztok Kosem, Eva Pori, Polona Gantar, Mihaela Knez, 2023: Building a CEFR-labeled core vocabulary and developing a lexical resource for Slovenian as a second and foreign language. Proceedings of the eLex 2023 conference. Ur. Marek Medveď idr. Brno: Lexical Computing CZ. 654–68.
Matej Klemen, 2024: Test poznavanja splošnih besed v slovenščini med udeleženci Mladinske poletne šole slovenščine. Jezikovne tehnologije in digitalna humanistika:
zbornik konference. Ur. Špela Arhar Holdt, Tomaž Erjavec. Ljubljana. 604–20. https://dx.doi.org/10.5281/zenodo.13936445.
Simon Krek, Špela Arhar Holdt, Tomaž Erjavec, Jaka Čibej, Andraž Repar, Polona Gantar, Nikola Ljubešić, Iztok Kosem, Kaja Dobrovoljc, 2020: Gigafida 2.0: the reference corpus of written standard Slovene. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). Ur. Nicoletta Calzolari. ELRA - European Language Resources Association. 3340–5.
Domen Krvina, 2024: Sprotni slovar slovenskega jezika: 2014–2023. Pleteršnikova dediščina: ob stoletnici smrti Maksa Pleteršnika. Ur. Marko Jesenšek. Maribor: Univerza v Mariboru, Univerzitetna založba (ZORA 154). 136–51. https://doi.org/10.18690/um.ff.3.2024.9
Domen Krvina, Špela Petric Žižić, 2024: The Relation Between the Composition of Corpora (Genre Balance and Representativeness) and Their Reliability in Compiling General Explanatory Dictionary. Slovenski jezik / Slovene Linguistic Studies 16. 149–76. https://doi.org/10.3986/16.1.07.
Victor Kuperman, Julie A. Van Dyke, 2013: Reassessing word frequency as a determinant of word recognition for skilled and unskilled readers. Journal of Experimental Psychology: Human Perception and Performance 39/3. 802–23. https://doi.org/10.1037/a0030859.
Nina Ledinek, Mateja Jemec Tomazin, Mitja Trojar, Andrej Perdih, Janoš Ježovnik, Miro Romih, Tomaž Erjavec, 2022: Korpus šolskih besedil slovenskega jezika: zasnova in gradnja. Jezikoslovni zapiski 28/1. 122–37. https://doi.org/10.3986/JZ.28.1.07.
Kristin Lemhöfer, Mirjam Broersma, 2012: Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English. Behavior Research Methods 44. 325–43. https://doi.org/10.3758/s13428-011-0146-0.
Michael B. Lewis, Matei Vladeanu, 2006: Short Article: What do we know about Psycholinguistic Effects?. Quarterly Journal of Experimental Psychology 59/6. 977–86. https://doi.org/10.1080/17470210600638076.
Nataša Logar, Vojko Gorjanc, Špela Arhar Holdt, 2023: Korpus Gigafida 2.0: Mnenje uporabnikov. Jezik in slovstvo 68/2. 75–91. https://doi.org/10.4312/jis.68.2.75-91.
Matej Meterc, 2017: Paremiološki optimum. Ljubljana: Založba ZRC, ZRC SAZU. https://doi.org/10.3986/9789610504153.
Maria Montefinese, David Vinson, Gabriella Vigliocc o, Ettore Ambrosini, 2019: Italian Age of Acquisition Norms for a Large Set of Words (ItAoA). Frontiers in Psychology 10. 278. https://doi.org/10.3389/fpsyg.2019.00278.
Lynda Mugglestone, 2015: Description and Prescription in Dictionaries. The Oxford Handbook of Lexicography. Ur. Philip Durkin. Oxford University Press. 546–60. https://doi.org/10.1093/oxfordhb/9780199691630.013.39.
Paul Nation, David Beglar, 2007: A vocabulary size test. The Language Teacher 31. 9–13.
Petra Obrul, Tamara Vidakovič, Adela Lang, Barbara Vogrinčič, Tina Pogorelčnik, Matic Pavlič, 2022: Ocena govorno-jezikovnih sposobnosti odrasle osebe z afazijo po ishemični možganski kapi z uporabo slovenske različice Baterije testov za hitro prepoznavanje afazije (QAB-SI; angl. the Quick Aphasia Battery – QAB). Zbornik prispevkov VI. Kongresa logopedov Slovenije. Ur. Tanja Kocjančič Antolík. Moravske Toplice: Društvo logopedov Slovenije. 63–71.
Allan Paivio, John C. Yuille, Stephen A. Madigan, 1968: Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology 76/1. 1–25. https://doi.org/10.1037/h0025327.
Andrej Perdih, 2020: Portal Fran: od začetkov do danes. Rasprave Instituta za hrvatski jezik i jezikoslovlje 46/2. 997–1018. https://doi.org/10.31724/rihjj.46.2.28.
Andrej Perdih, Marko Snoj, 2015: SSKJ2. Slavia Centralis 8/1. 5–15.
Jennifer Rodd, 2018: Lexical Ambiguity. The Oxford Handbook of Psycholinguistics. Ur. Shirley-Ann Rueschemeyer idr. Oxford: Oxford University Press. 95–117. https://doi.org/10.1093/oxfordhb/9780198786825.013.5.
Ana Paula Soares, Ana Santos Costa, João Machado, Montserrat Comesaña, Helena Mendes Oliveira, 2017: The Minho Word Pool: Norms for imageability, concreteness, and subjective frequency for 3,800 Portuguese words. Behavior Research Methods 49. 1065–81. https://doi.org/10.3758/s13428-016-0767-4.
Raymond Stubbe, 2012: Do pseudoword false alarm rates and overestimation rates in Yes/No vocabulary tests change with Japanese university students’ English ability levels?. Language Testing 29/4. 471–88. https://doi.org/10.1177/0265532211433033.
Wei Ping Sze, Melvin J. Yap, Susan J. Rick ard Liow, 2015: The role of lexical variables in the visual recognition of Chinese characters: A megastudy analysis. Quarterly Journal of Experimental Psychology 68/8. 1541–70. https://doi.org/10.1080/17470218.2014.985234.
Matthew J. Traxler, Morton A. Gernsbacher (ur.), 2006: Handbook of psycholinguistics. Elsevier. https://doi.org/10.1016/B978-0-12-369374-7.X5000-7.
Walter J. B. Van Heuven, Pawel Mandera, Emmanuel Keuleers, Marc Brysbaert, 2014: Subtlex-UK: A New and Improved Word Frequency Database for British English. Quarterly Journal of Experimental Psychology 67/6. 1176–90. https://doi.org/10.1080/17470218.2013.850521.
Barbara Vogrinčič, Matic Pavlič, Tina Pogorelčnik, Blaž Koritnik, Elke De Witte, Djaina Satoer, 2024: Slovenian adaptation of Diagnostic Instrument for Mild Aphasia (DIMA-SI): a pilot study in a digital and pen-and-paper version. Science of Aphasia 2024 – Book of abstracts. Geneva. 86–7.
Barbara Vogrinčič, Tina Pogorelčnik, Matic Pavlič, David Gosar, 2023: Slovenski test iskanja besed. Ljubljana: Center za psihodiagnostična sredstva.
Melvin J. Yap, David A. Balota, Daragh E. Sibley, Roger Ratcliff, 2012: Individual differences in visual word recognition: Insights from the English Lexicon Project. Journal of Experimental Psychology: Human Perception and Performance 38/1. 53–79. https://doi.org/10.1037/a0024177.
Melvin J. Yap, Susan J. Rickard Liow, Sajlia Binte Jalil, Siti Syuhada Binte Faizal, 2010: The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods 42/4. 992–1003. https://doi.org/10.3758/BRM.42.4.992.
Prenosi
Objavljeno
Kako citirati
Številka
Rubrike
Licenca
Avtorske pravice (c) 2025 Andrej Perdih, Dejan Gabrovšek, Matic Pavlič

To delo je licencirano pod Creative Commons Priznanje avtorstva 4.0 mednarodno licenco.
Slavistična revija (http://www.srl.si) is distributed under
Creative Commons, attribution 4.0 international.
Slavistična revija publishes fully open access journals, which means that all articles are available on the internet to all users immediately upon publication. Non-commercial use and distribution in any medium is permitted, provided the author and the journal are properly credited.