Izdelava seznama besed za množično raziskavo razširjenosti slovenskih besed

Avtorji

DOI:

https://doi.org/10.57589/srl.v73i1.4231

Ključne besede:

besedišče, razširjenost, množična raziskava, korpus, frekvenca, slovenščina

Povzetek

Članek predstavlja metodologijo izdelave seznama besed za množično raziskavo razširjenosti slovenskih besed. Pri oblikovanju seznama so bili uporabljeni geslovniki treh razlagalnih slovarjev slovenskega jezika: druge izdaje Slovarja slovenskega knjižnega jezika, eSSKJ in Sprotnega slovarja slovenskega jezika. Izbor besed je bil omejen z izbranimi merili, med drugim z dolžino besed in korpusno frekvenco ter z izločitvijo lastnih imen. Končni seznam obsega 79.413 besed in zajema sodobno občno besedje. Seznam je uporabljen v preizkusu besedišča, s katerim bodo pridobljeni podatki o razširjenosti besed, tj. o deležu govorcev slovenskega jezika, ki poznajo posamezno besedo. Rezultati bodo prispevali k boljšemu razumevanju mentalnega leksikona govorcev slovenščine.

Literatura

Domen Krvina, 2014–: Sprotni slovar slovenskega jezika. Na spletu.

ePravopis: Slovar slovenskega pravopisa. 2014–. Na spletu.

eSSKJ: Slovar slovenskega knjižnega jezika. 2016–. Na spletu.

Slovar slovenskega knjižnega jezika, druga, dopolnjena in deloma prenovljena izdaja. 2014. Tudi na spletu.

Jože Toporišič (ur.), 2001: Slovenski pravopis. Tudi na spletu.

Jose Armando Aguasvivas, Manuel Carreiras, Marc Brysbaert, Paweł Mandera, Emmanuel Keuleers, Jon Andoni Duñabeitia, 2018: SPALEX: A Spanish Lexical Decision Database From a Massive Online Data Collection. Frontiers in Psychology 9. 2156. https://doi.org/10.3389/fpsyg.2018.02156.

Kozma Ahačič, Nina Ledinek, Andrej Perdih, 2015: Portal Fran – nastanek in trenutno stanje. Slovnica in slovar – aktualni jezikovni opis. Ur. Mojca Smolej. Ljubljana: Znanstvena založba Filozofske fakultete (Obdobja 34). 57–66.

Špela Arhar Holdt, Senja Pollak, Marko Robnik Šikonja, Simon Krek, 2020: Referenčni seznam pogostih splošnih besed za slovenščino. Jezikovne tehnologije in digitalna humanistika: zbornik konference. Ur. Darja Fišer, Tomaž Erjavec. Ljubljana. 10–5.

R.H. Baayen, L.B. Feldman, R. Schreuder, 2006: Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language 55/2. 290–313. https://doi.org/10.1016/j.jml.2006.03.008.

David A. Balota, Michael J. Cortese, Susan D. Sergent-Marshall, Daniel H. Spieler, Melvin J. Yap, 2004: Visual Word Recognition of Single-Syllable Words. Journal of Experimental Psychology: General 133/2. 283–316. https://doi.org/10.1037/0096-3445.133.2.283.

David A. Balota, Melvin J. Yap, Keith A. Hutchison, Michael J. Cortese, Brett Kessler, Bjorn Loftis, James H. Neely, Douglas L. Nelson, Greg B. Simpson, Rebecca Treiman, 2007: The English Lexicon Project. Behavior Research Methods 39. 445–‍59. https://doi.org/10.3758/BF03193014.

Rebekah George Benjamin, 2012: Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty. Educational Psychology Review 24. 63–88. https://doi.org/10.1007/s10648-011-9181-8.

Helen Bird, Sue Franklin, David Howard, 2001: Age of acquisition and imageability ratings for a large set of words, including verbs and function words. Behavior Research Methods, Instruments, & Computers 33. 73–9. https://doi.org/10.3758/BF03195349.

Marc Brysbaert, Paweł Mandera, Samantha F. Mcc ormick , Emmanuel Keuleers, 2019: Word prevalence norms for 62,000 English lemmas. Behavior Research Methods 51. 467–79. https://doi.org/10.3758/s13428-018-1077-9.

Marc Brysbaert, Boris New, 2009: Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41. 977–90. https://doi.org/10.3758/BRM.41.4.977.

Marc Brysbaert, Michaël Stevens, Paweł Mandera, Emmanuel Keuleers, 2016a: The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance 42/3. 441–58. https://doi.org/10.1037/xhp0000159.

Marc Brysbaert, Michaël Stevens, Paweł Mandera, Emmanuel Keuleers, 2016b: How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age. Frontiers in Psychology 7. https://doi.org/10.3389/fpsyg.2016.01116.

Orphée De Clercq, Véronique Hoste, 2016: All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch, Computational Linguistics 42/3. 457–90. https://doi.org/10.1162/COLI_a_00255.

Pasquale A. Della Rosa, Eleonora Catricalà, Gabriella Vigliocco, Stefano F. Cappa, 2010: Beyond the abstract—concrete dichotomy: Mode of acquisition, concreteness, imageability, familiarity, age of acquisition, context availability, and abstractness norms for a set of 417 Italian words. Behavior Research Methods 42. 1042–8. https://doi.org/10.3758/BRM.42.4.1042.

Alain Desrochers, Glenn L. Thompson, 2009: Subjective frequency and imageability ratings for 3,600 French nouns. Behavior Research Methods 41. 546–57. https:// doi.org/10.3758/BRM.41.2.546.

Andrew Duchon, Manuel Perea, Nuria Sebastián-Gallés, Antonia Martí, Manuel Carreiras, 2013: EsPal: One-stop shopping for Spanish word properties. Behavior Research Methods 45. 1246–58. https://doi.org/10.3758/s13428-013-0326-1.

Charles M. Eddington, Natasha Tokowicz, 2015: How meaning similarity influences ambiguous word processing: the current state of the literature. Psychonomic Bulletin & Review 22. 13–37. https://doi.org/10.3758/s13423-014-0665-7.

Eva M. Fernández, Helen Smith Cairns (ur.), 2018: The handbook of psycholinguistics. John Wiley & Sons. https://doi.org/10.1002/9781118829516.

Ludovic Ferrand, Boris New, Marc Brysbaert, Emmanuel Keuleers, Patrick Bonin, Alain Méot, Maria Augustinova, Christophe Pallier, 2010: The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods 42. 488–96. https://doi.org/10.3758/BRM.42.2.488.

John Field, 2004: Psycholinguistics: the key concepts. London, New York: Routledge.

Kenneth I. Forster, 2000: The potential for experimenter bias effects in word recognition experiments. Memory & Cognition 28/7. 1109–15. https://doi.org/10.3758/BF03211812.

Nataša Gliha Komac, Nataša Jakop, Janoš Ježovnik, Simona Klemenčič, Domen Krvina, Nina Ledinek, Mija Michelizz a, Matej Meterc, Tanja Mirtič, Andrej Perdih, Špela Petric, Marko Snoj, Andreja Žele, 2016: Novi slovar slovenskega knjižnega jezika – predstavitev temeljnih konceptualnih izhodišč. Škrabčevi dnevi 9. Zbornik prispevkov s simpozija 2015. Ur. Franc Marušič idr. Nova Gorica: Založba Univerze v Novi Gorici. 19–33.

Marc Guasch, Roger Boada, Jon Andoni Duñabeitia, Pilar Ferré, 2022: Prevalence norms for 40,777 Catalan words: An online megastudy of vocabulary size. Behavior Research Methods 55. 3198–217. https://doi.org/10.3758/s13428-022-01959-5.

Marc Guasch, Pilar Ferré, Isabel Fraga, 2016: Spanish norms for affective and lexicosemantic variables for 1,400 words. Behavior Research Methods 48. 1358–69. https://doi.org/10.3758/s13428-015-0684-y.

Julia Hanck e, Sowmya Vajjala, Detmar Meurers, 2012: Readability Classification for German using Lexical, Syntactic, and Morphological Features. Proceedings of COLING 2012. Ur. Martin Kay, Christian Boitet. Mumbai: The COLING 2012 Organizing Committee. 1063–80.

Kamil K. Imbir, 2016: Affective Norms for 4900 Polish Words Reload (ANPW_R): Assessments for Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability and, Age of Acquisition. Frontiers in Psychology 7. 1081. https://doi.org/10.3389/fpsyg.2016.01081.

Emmanuel Keuleers, Marc Brysbaert, 2010: Wuggy: A multilingual pseudoword generator. Behavior Research Methods 42/3. 627–33. https://doi.org/10.3758/BRM.42.3.627.

Emmanuel Keuleers, Marc Brysbaert, 2011: Detecting inherent bias in lexical decision experiments with the LD1NN algorithm. The Mental Lexicon 6/1. 34–52. https:// doi.org/10.1075/ml.6.1.02keu.

Emmanuel Keuleers, Michaël Stevens, Paweł Mandera, Marc Brysbaert, 2015: Word knowledge in the crowd: Measuring vocabulary size and word prevalence in

a massive online experiment. Quarterly Journal of Experimental Psychology 68/8. 1665–92. https://doi.org/10.1080/17470218.2015.1022560.

Matej Klemen, Špela Arhar Holdt, Senja Pollak, Iztok Kosem, Eva Pori, Polona Gantar, Mihaela Knez, 2023: Building a CEFR-labeled core vocabulary and developing a lexical resource for Slovenian as a second and foreign language. Proceedings of the eLex 2023 conference. Ur. Marek Medveď idr. Brno: Lexical Computing CZ. 654–68.

Matej Klemen, 2024: Test poznavanja splošnih besed v slovenščini med udeleženci Mladinske poletne šole slovenščine. Jezikovne tehnologije in digitalna humanistika:

zbornik konference. Ur. Špela Arhar Holdt, Tomaž Erjavec. Ljubljana. 604–20. https://dx.doi.org/10.5281/zenodo.13936445.

Simon Krek, Špela Arhar Holdt, Tomaž Erjavec, Jaka Čibej, Andraž Repar, Polona Gantar, Nikola Ljubešić, Iztok Kosem, Kaja Dobrovoljc, 2020: Gigafida 2.0: the reference corpus of written standard Slovene. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). Ur. Nicoletta Calzolari. ELRA - European Language Resources Association. 3340–5.

Domen Krvina, 2024: Sprotni slovar slovenskega jezika: 2014–2023. Pleteršnikova dediščina: ob stoletnici smrti Maksa Pleteršnika. Ur. Marko Jesenšek. Maribor: Univerza v Mariboru, Univerzitetna založba (ZORA 154). 136–51. https://doi.org/10.18690/um.ff.3.2024.9

Domen Krvina, Špela Petric Žižić, 2024: The Relation Between the Composition of Corpora (Genre Balance and Representativeness) and Their Reliability in Compiling General Explanatory Dictionary. Slovenski jezik / Slovene Linguistic Studies 16. 149–76. https://doi.org/10.3986/16.1.07.

Victor Kuperman, Julie A. Van Dyke, 2013: Reassessing word frequency as a determinant of word recognition for skilled and unskilled readers. Journal of Experimental Psychology: Human Perception and Performance 39/3. 802–23. https://doi.org/10.1037/a0030859.

Nina Ledinek, Mateja Jemec Tomazin, Mitja Trojar, Andrej Perdih, Janoš Ježovnik, Miro Romih, Tomaž Erjavec, 2022: Korpus šolskih besedil slovenskega jezika: zasnova in gradnja. Jezikoslovni zapiski 28/1. 122–37. https://doi.org/10.3986/JZ.28.1.07.

Kristin Lemhöfer, Mirjam Broersma, 2012: Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English. Behavior Research Methods 44. 325–‍43. https://doi.org/10.3758/s13428-011-0146-0.

Michael B. Lewis, Matei Vladeanu, 2006: Short Article: What do we know about Psycholinguistic Effects?. Quarterly Journal of Experimental Psychology 59/6. 977–‍86. https://doi.org/10.1080/17470210600638076.

Nataša Logar, Vojko Gorjanc, Špela Arhar Holdt, 2023: Korpus Gigafida 2.0: Mnenje uporabnikov. Jezik in slovstvo 68/2. 75–91. https://doi.org/10.4312/jis.68.2.75-91.

Matej Meterc, 2017: Paremiološki optimum. Ljubljana: Založba ZRC, ZRC SAZU. https://doi.org/10.3986/9789610504153.

Maria Montefinese, David Vinson, Gabriella Vigliocc o, Ettore Ambrosini, 2019: Italian Age of Acquisition Norms for a Large Set of Words (ItAoA). Frontiers in Psychology 10. 278. https://doi.org/10.3389/fpsyg.2019.00278.

Lynda Mugglestone, 2015: Description and Prescription in Dictionaries. The Oxford Handbook of Lexicography. Ur. Philip Durkin. Oxford University Press. 546–60. https://doi.org/10.1093/oxfordhb/9780199691630.013.39.

Paul Nation, David Beglar, 2007: A vocabulary size test. The Language Teacher 31. 9–13.

Petra Obrul, Tamara Vidakovič, Adela Lang, Barbara Vogrinčič, Tina Pogorelčnik, Matic Pavlič, 2022: Ocena govorno-jezikovnih sposobnosti odrasle osebe z afazijo po ishemični možganski kapi z uporabo slovenske različice Baterije testov za hitro prepoznavanje afazije (QAB-SI; angl. the Quick Aphasia Battery – QAB). Zbornik prispevkov VI. Kongresa logopedov Slovenije. Ur. Tanja Kocjančič Antolík. Moravske Toplice: Društvo logopedov Slovenije. 63–71.

Allan Paivio, John C. Yuille, Stephen A. Madigan, 1968: Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology 76/1. 1–25. https://doi.org/10.1037/h0025327.

Andrej Perdih, 2020: Portal Fran: od začetkov do danes. Rasprave Instituta za hrvatski jezik i jezikoslovlje 46/2. 997–1018. https://doi.org/10.31724/rihjj.46.2.28.

Andrej Perdih, Marko Snoj, 2015: SSKJ2. Slavia Centralis 8/1. 5–15.

Jennifer Rodd, 2018: Lexical Ambiguity. The Oxford Handbook of Psycholinguistics. Ur. Shirley-Ann Rueschemeyer idr. Oxford: Oxford University Press. 95–117. https://doi.org/10.1093/oxfordhb/9780198786825.013.5.

Ana Paula Soares, Ana Santos Costa, João Machado, Montserrat Comesaña, Helena Mendes Oliveira, 2017: The Minho Word Pool: Norms for imageability, concreteness, and subjective frequency for 3,800 Portuguese words. Behavior Research Methods 49. 1065–81. https://doi.org/10.3758/s13428-016-0767-4.

Raymond Stubbe, 2012: Do pseudoword false alarm rates and overestimation rates in Yes/No vocabulary tests change with Japanese university students’ English ability levels?. Language Testing 29/4. 471–88. https://doi.org/10.1177/0265532211433033.

Wei Ping Sze, Melvin J. Yap, Susan J. Rick ard Liow, 2015: The role of lexical variables in the visual recognition of Chinese characters: A megastudy analysis. Quarterly Journal of Experimental Psychology 68/8. 1541–70. https://doi.org/10.1080/17470218.2014.985234.

Matthew J. Traxler, Morton A. Gernsbacher (ur.), 2006: Handbook of psycholinguistics. Elsevier. https://doi.org/10.1016/B978-0-12-369374-7.X5000-7.

Walter J. B. Van Heuven, Pawel Mandera, Emmanuel Keuleers, Marc Brysbaert, 2014: Subtlex-UK: A New and Improved Word Frequency Database for British English. Quarterly Journal of Experimental Psychology 67/6. 1176–90. https://doi.org/10.1080/17470218.2013.850521.

Barbara Vogrinčič, Matic Pavlič, Tina Pogorelčnik, Blaž Koritnik, Elke De Witte, Djaina Satoer, 2024: Slovenian adaptation of Diagnostic Instrument for Mild Aphasia (DIMA-SI): a pilot study in a digital and pen-and-paper version. Science of Aphasia 2024 – Book of abstracts. Geneva. 86–7.

Barbara Vogrinčič, Tina Pogorelčnik, Matic Pavlič, David Gosar, 2023: Slovenski test iskanja besed. Ljubljana: Center za psihodiagnostična sredstva.

Melvin J. Yap, David A. Balota, Daragh E. Sibley, Roger Ratcliff, 2012: Individual differences in visual word recognition: Insights from the English Lexicon Project. Journal of Experimental Psychology: Human Perception and Performance 38/1. 53–‍79. https://doi.org/10.1037/a0024177.

Melvin J. Yap, Susan J. Rickard Liow, Sajlia Binte Jalil, Siti Syuhada Binte Faizal, 2010: The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods 42/4. 992–1003. https://doi.org/10.3758/BRM.42.4.992.

Prenosi

Objavljeno

2025-03-10

Kako citirati

Perdih, A., Gabrovšek, D. in Pavlič, M. (2025) „Izdelava seznama besed za množično raziskavo razširjenosti slovenskih besed“, Slavistična revija, 73(1), str. 121–138. doi: 10.57589/srl.v73i1.4231.

Številka

Rubrike

RAZPRAVE