Robnik Šikonja, M. (2025, April 16-17). Veliki jezikovni modeli za slovenščino in prevajanje [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
Arhar Holdt, Š. (2025, April 16-17). Lektoriranje v času umetne inteligence: Kdo bo postavljal piko na UI? [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
Robnik, Šikonja, M. (2025, June 10). Projekt PoVeJMo, Gravitacija in ERA Chair projekt AI4DH [Conference presentation]. 4. Nacionalna konferenca Umetna inteligenca - nove smeri razvoja in izzivi za Slovenijo. Mengeš, Slovenia. https://dogodki.vlada.si/umetna-inteligenca-digitalna-preobrazba-prijava
Robnik Šikonja, M. (2025, June 13). Large Language Models for Analysis of Complex Phenomena [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
Arčon, T, Robnik Šikonja, M. and Tratnik, P. (2025, June 13). Motif Detection Using Large Language Models: The Cinderella Case Study [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
Horvat, M., Koražija, J. and Tratnik, P. (2025, June 13). Modeling Deliberative Values in Narrative Culture Using LLMs [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
Babnik, J. and Tratnik, P. (2025, June 13) The Dragon-Slayer’s Narrative: Structural Kinship and Discursive Divergence [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
Robnik, Šikonja, M. (2025, June 17). The importance of language data for the development of LT solutions - future steps [Conference presentation]. EU LDS Country Workshop. Ljubljana, Slovenia. https://language-data-space.ec.europa.eu/events/lds-country-workshop-slovenia-2025-06-17_en
Hüll, N. and Dobrovoljc, K. (2025). Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders. Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025). Ljubljana, Slovenia.
Terčon, L. and Dobrovoljc, K. (2025). ComparaTree: A Multi-Level Comparative Treebank Analysis Tool. Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025). Ljubljana, Slovenia.
Krsnik, L. and Dobrovoljc, K. (2025). STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis. Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025). Ljubljana, Slovenia.
Munda, T. and Arhar Holdt, Š. (2025). First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0. Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025). Ljubljana, Slovenia.
Arčon, T., Kosem, I. and Arhar Holdt, Š. (2025, September 24). Using large language models to generate distractors for language games [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
Klemen, M., Doborovoljc, K., Terčon, L., Hüll, N., Arčon, T., and Robnik Šikonja, M. (2025, September 24). Agentic Large Language Models for Grammatical Analysis of Multilingual Corpora [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
Jelovčan, G., Robnik Šikonja, M., Arhar Holdt, Š., and Vreš, D. (2025, September 24). Attempt to Create Synthetic Dataset for Grammar Error Correction in Slovenian Language [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
Pretnar Žagar, A. (2025, September 24). Evaluating LLMs on Value Annotation Task [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
Arčon, T., Robnik Šikonja, M., and Tratnik, P. (2025, September 24). Automatic detection of folkloristic motifs with large language models: the Cinderella tale [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
Robnik Šikonja, A. (2025, September 17). Trends and challenges in artificial intelligence [Conference presentation]. SNC’25 Sinapsa neuroscience conference 2025, Ljubljana, Slovenia. https://www.sinapsa.org/SNC25/programme
Robnik, Šikonja, M. (2025, November 18). Large language models for lexicography [Invited keynote speech at the conference]. eLex 2025: Electronic lexicography in the 21st century: Intelligent Lexicography, Bled, Slovenia. https://elex.link/elex2025/keynote-speakers/
Kosem, I. and Arhar Holdt, Š. (2025). Using Large Language Models to Generate Distractors for Language Games [Conference presentation]. eLex 2025: Electronic lexicography in the 21st century: Intelligent Lexicography, Bled, Slovenia. https://elex.link/elex2025/wp-content/uploads/elex2025_book_of_abstracts.pdf
Robnik Šikonja, M. (2025). What are open LLMs and how do we build them? / Kaj so odprti LLMs in kako jih gradimo? [Conference presentation]. ERA Knowledge Rights 21 Conference, Ljubljana, Slovenia. https://www.odipi.si/era-kr21-konferenca-slovenija-2025/program-era-kr21-konference-2025/
Publications in conference proceedings / workshops:
Arhar Holdt, Š., Lukan, T., Dobrovoljc, K., Doucet, A., Krek, S., Pretnar Žagar, A., Tratnik, P., Vobič, I., Žitnik, S., & Robnik Šikonja, M. (2025). Advancing interdisciplinary research: The European Centre of Excellence in Artificial Intelligence for Digital Humanities (CoE AI4DH). In: AI in Science Summit 2025 : 3.–4. November 2025, Copenhagen, Denmark. https://cdn.prod.website-files.com/68a7113a28bc36a9033775bf/6903613514553a574fad3d4c_6.pdf.
Delaunay, J. et al. (2026). Multidisciplinary End-to-End Document-Level Relation Extraction from Scientific Literature. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16026. Springer, Cham. https://doi.org/10.1007/978-3-032-04627-7_15
Estève, L. & Dobrovoljc, K. (2026). DELTA: A Toolkit for Measuring Linguistic Diversity in Dependency-Parsed Corpora. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 75–85, Rabat, Marocco. Association for Computational Linguistics. 10.18653/v1/2026.eacl-demo.6
Klemen, M., Arčon, T., Terčon, L., Robnik-Šikonja, M., & Dobrovoljc, K. (2025). Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis. arXiv preprint arXiv:2512.00214.
P. P. Mai Chau, S. Bakkali and A. Doucet, "DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization," in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, 2025, pp. 1213-1222, doi: 10.1109/WACVW65960.2025.00144
Nguyen, N.N., Hamdi, A., Doucet, A., Jatowt, A., Coustaty, M. (2026). Rethinking OCR Evaluation for Information Extraction in Business Documents. In: Oh, S., Doucet, A., Buranarach, M., Buenrostro-Cabbab, I., Liu, Y., Olgado, B.S. (eds) Intelligence and Equity: Shaping the Future of Knowledge. ICADL 2025. Lecture Notes in Computer Science, vol 16242. Springer, Singapore. https://doi.org/10.1007/978-981-95-4861-3_21
Pham, TC., Coustaty, M., Joseph, A., Deloin, G., Poulain d’Andecy, V., Doucet, A. (2026). Few-Shot Document Classification in Real Applications: Boosting Precision with Novelty Detection. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16025. Springer, Cham. https://doi.org/10.1007/978-3-032-04624-6_5
Piryani, B., Mozafari, J., Abdallah, A., Doucet, A., & Jatowt, A. (2025). MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts. https://arxiv.org/abs/2502.16781
Sharma Kafle, D., Talhi, E., Coustaty, M., Doucet, A. (2026). Expertise Finding: Domain Extraction from Documents Using Fuzzy Clustering. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16023. Springer, Cham. https://doi.org/10.1007/978-3-032-04614-7_23
Sun, W., Girdhar, N., Tran, H.T.H., González-Gallardo, CE., Coustaty, M., Doucet, A. (2026). Ar-Q-Former: Historical Newspaper Article Separation Based on Multimodal Transformer Structure. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16025. Springer, Cham. https://doi.org/10.1007/978-3-032-04624-6_28
Telnoff, Q., Baitu, B., Coustaty, M., Crohas, F., Doucet, A. (2025). VisHubGAT: Visible Connectivity and Hub Nodes for Multimodal Entity Extraction. In: Brun, L., Carletti, V., Bougleux, S., Gaüzère, B. (eds) Graph-Based Representations in Pattern Recognition. GbRPR 2025. Lecture Notes in Computer Science, vol 15727. Springer, Cham. https://doi.org/10.1007/978-3-031-94139-9_25
Vajda, D., Vreš, D., & Robnik-Šikonja, M. (2025, October). Improving LLMs for Machine Translation Using Synthetic Preference Data. In Proceedings of the 2nd LUHME Workshop (pp. 67-73). https://doi.org/10.48550/arXiv.2508.14951
Žagar, A. P., & Tekavčič, K. P. de M. (2026). Carniolan Provincial Assembly: Corpus Improvements and Enhancements. Digital Humanities in the Nordic and Baltic Countries Publications, 8(1). https://doi.org/10.5617/dhnbpub.13202
Publications
Girdhar, N., Raj, A., Sharma, D., Singh, V., Doucet, A., & Renz, M. (2025). A comprehensive review of frugal artificial intelligence: challenges, applications, and the road to sustainable AI. Soft Computing, 29(13), 4823-4856.
Girdhar, N., Coustaty, M., & Doucet, A. (2026). STRAS: a semantic textual-cues leveraged rule-based approach for article separation in historical newspapers. International Journal on Digital Libraries, 27(1), 2.
Ulčar, M., Žagar, A., Armendariz, C.S., Repar, A., Pollak, S., Purver, M., and Robnik Šikonja, M. (2026). Mono- and cross-lingual evaluation of representation language models on less-resourced languages, Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852
Ivačič, N., Škrlj, B., Koloski, B., Pollak, S., Lavrač, N., & Purver, M. (2025). Extreme Multi-Label Text Classification for Less-Represented Languages and Low-Resource Environments: Advances and Lessons Learned. Machine Learning and Knowledge Extraction, 7(4), 142. https://doi.org/10.3390/make7040142
Klemen, M., Božič, M., Holdt, Š. A., & Robnik-Šikonja, M. (2025). Grammatical error correction of Slovenian school essays using large language models. Journal of Contemporary Educational Studies/Sodobna Pedagogika, 76(3).
Ulčar, M., Žagar, A., Armendariz, C. S., Repar, A., Pollak, S., Purver, M., & Robnik-Šikonja, M. (2026). Mono-and cross-lingual evaluation of representation language models on less-resourced languages. Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852
Vobič, I., Robnik Šikonja, M., Žagar, A. & Mance, B. (2025). Watchdog or Copycat? Examining News Diversity in Slovenian Journalism System. Medijska istraživanja, 31 (2), 5-34. https://doi.org/10.22572/mi.30.2.1
Pavletič, K., & Pretnar Žagar, A. (2025). Uporaba strojnega učenja za napovedovanje spola na poznoantičnem grobišču Lajh v Kranju. Arheo: arheološka obvestila, 25-40.
Pham, T. C., Coustaty, M., Doucet, A., Joseph, A., & D’andecy, V. P. (2025). Deep metric learning for end-to-end document classification. Neurocomputing, 131241.
Pham, T. C., Coustaty, M., Joseph, A., Deloin, G., Poulain d’Andecy, V., & Doucet, A. (2025). Exemplar sampling algorithm for instance incremental learning on imbalanced document datasets: T.-C. Pham et al. International Journal on Document Analysis and Recognition (IJDAR), 1-12.
Pretnar Žagar, A. (2025). Computational Analysis of Slovenian Historical Newspapers (1771–1914): Linguistic, Thematic, and Nation-Building Insights. Contributions to the Contemporary History, 65(3), 42-66. https://doi.org/10.51663/pnz.65.3.02
Tran, H. T. H., Martinc, M., Caporusso, J., Delaunay, J., Doucet, A., & Pollak, S. (2026). Recent Advances in Automatic Term Extraction: A Comprehensive Survey. ACM Computing Surveys, 58(9), 1-35.
Tratnik, P. (2025). Saint George, the Dragon Slayer: Sacralized Violence and the Allegorical Union of Sacred and Secular Power. ACTA HISTRIAE, 33(4), 617–668. https://doi.org/10.19233/AH.2025.23
Tratnik. P. (2025). Flusser on Artificial Intelligence. Flusser studies, 40, pp. 1-8. https://repozitorij.uni-lj.si/IzpisGradiva.php?id=178705&lang=slv
Research datasets
Žagar, A., Dobrovoljc, K., Munda, T., Brglez, M., and Robnik Šikonja, M. (2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1988.