Preskoči na vsebino

Odprta znanost

Conference Presentations

  • Robnik Šikonja, M. (2025, April 16-17). Veliki jezikovni modeli za slovenščino in prevajanje [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
  • Arhar Holdt, Š. (2025, April 16-17). Lektoriranje v času umetne inteligence: Kdo bo postavljal piko na UI? [Conference presentation]. Proofreading and Translation Conference 2025: The Impact of Digital Transformation on Translation, Ljubljana, Slovenia. https://lektornica.si/delavnice/jezikovne/translation-conference-2025-the-impact-of-digital-transformation-in-translation/
  • Robnik, Šikonja, M. (2025, June 10). Projekt PoVeJMo, Gravitacija in ERA Chair projekt AI4DH [Conference presentation]. 4. Nacionalna konferenca Umetna inteligenca - nove smeri razvoja in izzivi za Slovenijo. Mengeš, Slovenia. https://dogodki.vlada.si/umetna-inteligenca-digitalna-preobrazba-prijava
  • Robnik Šikonja, M. (2025, June 13). Large Language Models for Analysis of Complex Phenomena [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
  • Arčon, T, Robnik Šikonja, M. and Tratnik, P. (2025, June 13). Motif Detection Using Large Language Models: The Cinderella Case Study [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
  • Horvat, M., Koražija, J. and Tratnik, P. (2025, June 13). Modeling Deliberative Values in Narrative Culture Using LLMs [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
  • Babnik, J. and Tratnik, P. (2025, June 13) The Dragon-Slayer’s Narrative: Structural Kinship and Discursive Divergence [Conference presentation]. AI Methods for Research of Folkloristic Narratives, Ljubljana, Slovenia. https://cjvt.si/llm4dh/en/blog/workshop-ai-methods-for-research-of-folkloristic-narratives/
  • Robnik, Šikonja, M. (2025, June 17). The importance of language data for the development of LT solutions - future steps [Conference presentation]. EU LDS Country Workshop. Ljubljana, Slovenia. https://language-data-space.ec.europa.eu/events/lds-country-workshop-slovenia-2025-06-17_en
  • Hüll, N. and Dobrovoljc, K. (2025). Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders. Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025). Ljubljana, Slovenia.
  • Terčon, L. and Dobrovoljc, K. (2025). ComparaTree: A Multi-Level Comparative Treebank Analysis Tool. Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025). Ljubljana, Slovenia.
  • Krsnik, L. and Dobrovoljc, K. (2025). STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis. Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025). Ljubljana, Slovenia.
  • Munda, T. and Arhar Holdt, Š. (2025). First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0. Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025). Ljubljana, Slovenia.
  • Arčon, T., Kosem, I. and Arhar Holdt, Š. (2025, September 24). Using large language models to generate distractors for language games [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
  • Klemen, M., Doborovoljc, K., Terčon, L., Hüll, N., Arčon, T., and Robnik Šikonja, M. (2025, September 24). Agentic Large Language Models for Grammatical Analysis of Multilingual Corpora [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
  • Jelovčan, G., Robnik Šikonja, M., Arhar Holdt, Š., and Vreš, D. (2025, September 24). Attempt to Create Synthetic Dataset for Grammar Error Correction in Slovenian Language [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
  • Pretnar Žagar, A. (2025, September 24). Evaluating LLMs on Value Annotation Task [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
  • Arčon, T., Robnik Šikonja, M., and Tratnik, P. (2025, September 24). Automatic detection of folkloristic motifs with large language models: the Cinderella tale [Conference presentation]. 28th International Conference, Discovery Science AI 4 Science Conference, Ljubljana, Slovenia. https://ds2025.ijs.si/assets/files/978-3-032-05461-6_Book_OnlinePDF.pdf
  • Robnik Šikonja, A. (2025, September 17). Trends and challenges in artificial intelligence [Conference presentation]. SNC’25 Sinapsa neuroscience conference 2025, Ljubljana, Slovenia. https://www.sinapsa.org/SNC25/programme
  • Robnik, Šikonja, M. (2025, November 18). Large language models for lexicography [Invited keynote speech at the conference]. eLex 2025: Electronic lexicography in the 21st century: Intelligent Lexicography, Bled, Slovenia. https://elex.link/elex2025/keynote-speakers/
  • Kosem, I. and Arhar Holdt, Š. (2025). Using Large Language Models to Generate Distractors for Language Games [Conference presentation]. eLex 2025: Electronic lexicography in the 21st century: Intelligent Lexicography, Bled, Slovenia. https://elex.link/elex2025/wp-content/uploads/elex2025_book_of_abstracts.pdf
  • Robnik Šikonja, M. (2025). What are open LLMs and how do we build them? / Kaj so odprti LLMs in kako jih gradimo? [Conference presentation]. ERA Knowledge Rights 21 Conference, Ljubljana, Slovenia. https://www.odipi.si/era-kr21-konferenca-slovenija-2025/program-era-kr21-konference-2025/

Publications in conference proceedings / workshops:

  • Arhar Holdt, Š., Lukan, T., Dobrovoljc, K., Doucet, A., Krek, S., Pretnar Žagar, A., Tratnik, P., Vobič, I., Žitnik, S., & Robnik Šikonja, M. (2025). Advancing interdisciplinary research: The European Centre of Excellence in Artificial Intelligence for Digital Humanities (CoE AI4DH). In: AI in Science Summit 2025 : 3.–4. November 2025, Copenhagen, Denmark. https://cdn.prod.website-files.com/68a7113a28bc36a9033775bf/6903613514553a574fad3d4c_6.pdf.
  • Delaunay, J. et al. (2026). Multidisciplinary End-to-End Document-Level Relation Extraction from Scientific Literature. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16026. Springer, Cham. https://doi.org/10.1007/978-3-032-04627-7_15
  • Estève, L. & Dobrovoljc, K. (2026). DELTA: A Toolkit for Measuring Linguistic Diversity in Dependency-Parsed Corpora. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 75–85, Rabat, Marocco. Association for Computational Linguistics. 10.18653/v1/2026.eacl-demo.6
  • Klemen, M., Arčon, T., Terčon, L., Robnik-Šikonja, M., & Dobrovoljc, K. (2025). Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis. arXiv preprint arXiv:2512.00214.
  • P. P. Mai Chau, S. Bakkali and A. Doucet, "DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization," in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, 2025, pp. 1213-1222, doi: 10.1109/WACVW65960.2025.00144
  • Nguyen, N.N., Hamdi, A., Doucet, A., Jatowt, A., Coustaty, M. (2026). Rethinking OCR Evaluation for Information Extraction in Business Documents. In: Oh, S., Doucet, A., Buranarach, M., Buenrostro-Cabbab, I., Liu, Y., Olgado, B.S. (eds) Intelligence and Equity: Shaping the Future of Knowledge. ICADL 2025. Lecture Notes in Computer Science, vol 16242. Springer, Singapore. https://doi.org/10.1007/978-981-95-4861-3_21
  • Pham, TC., Coustaty, M., Joseph, A., Deloin, G., Poulain d’Andecy, V., Doucet, A. (2026). Few-Shot Document Classification in Real Applications: Boosting Precision with Novelty Detection. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16025. Springer, Cham. https://doi.org/10.1007/978-3-032-04624-6_5
  • Piryani, B., Mozafari, J., Abdallah, A., Doucet, A., & Jatowt, A. (2025). MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts. https://arxiv.org/abs/2502.16781
  • Sharma Kafle, D., Talhi, E., Coustaty, M., Doucet, A. (2026). Expertise Finding: Domain Extraction from Documents Using Fuzzy Clustering. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16023. Springer, Cham. https://doi.org/10.1007/978-3-032-04614-7_23
  • Sun, W., Girdhar, N., Tran, H.T.H., González-Gallardo, CE., Coustaty, M., Doucet, A. (2026). Ar-Q-Former: Historical Newspaper Article Separation Based on Multimodal Transformer Structure. In: Yin, XC., Karatzas, D., Lopresti, D. (eds) Document Analysis and Recognition – ICDAR 2025. ICDAR 2025. Lecture Notes in Computer Science, vol 16025. Springer, Cham. https://doi.org/10.1007/978-3-032-04624-6_28
  • Telnoff, Q., Baitu, B., Coustaty, M., Crohas, F., Doucet, A. (2025). VisHubGAT: Visible Connectivity and Hub Nodes for Multimodal Entity Extraction. In: Brun, L., Carletti, V., Bougleux, S., Gaüzère, B. (eds) Graph-Based Representations in Pattern Recognition. GbRPR 2025. Lecture Notes in Computer Science, vol 15727. Springer, Cham. https://doi.org/10.1007/978-3-031-94139-9_25
  • Vajda, D., Vreš, D., & Robnik-Šikonja, M. (2025, October). Improving LLMs for Machine Translation Using Synthetic Preference Data. In Proceedings of the 2nd LUHME Workshop (pp. 67-73). https://doi.org/10.48550/arXiv.2508.14951
  • Žagar, A. P., & Tekavčič, K. P. de M. (2026). Carniolan Provincial Assembly: Corpus Improvements and Enhancements. Digital Humanities in the Nordic and Baltic Countries Publications, 8(1). https://doi.org/10.5617/dhnbpub.13202

Publications

  • Girdhar, N., Raj, A., Sharma, D., Singh, V., Doucet, A., & Renz, M. (2025). A comprehensive review of frugal artificial intelligence: challenges, applications, and the road to sustainable AI. Soft Computing, 29(13), 4823-4856.
  • Girdhar, N., Coustaty, M., & Doucet, A. (2026). STRAS: a semantic textual-cues leveraged rule-based approach for article separation in historical newspapers. International Journal on Digital Libraries, 27(1), 2.
  • Ulčar, M., Žagar, A., Armendariz, C.S., Repar, A., Pollak, S., Purver, M., and Robnik Šikonja, M. (2026). Mono- and cross-lingual evaluation of representation language models on less-resourced languages, Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852
  • Ivačič, N., Škrlj, B., Koloski, B., Pollak, S., Lavrač, N., & Purver, M. (2025). Extreme Multi-Label Text Classification for Less-Represented Languages and Low-Resource Environments: Advances and Lessons Learned. Machine Learning and Knowledge Extraction, 7(4), 142. https://doi.org/10.3390/make7040142
  • Klemen, M., Božič, M., Holdt, Š. A., & Robnik-Šikonja, M. (2025). Grammatical error correction of Slovenian school essays using large language models. Journal of Contemporary Educational Studies/Sodobna Pedagogika, 76(3).
  • Ulčar, M., Žagar, A., Armendariz, C. S., Repar, A., Pollak, S., Purver, M., & Robnik-Šikonja, M. (2026). Mono-and cross-lingual evaluation of representation language models on less-resourced languages. Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852
  • Vobič, I., Robnik Šikonja, M., Žagar, A. & Mance, B. (2025). Watchdog or Copycat? Examining News Diversity in Slovenian Journalism System. Medijska istraživanja, 31 (2), 5-34. https://doi.org/10.22572/mi.30.2.1
  • Pavletič, K., & Pretnar Žagar, A. (2025). Uporaba strojnega učenja za napovedovanje spola na poznoantičnem grobišču Lajh v Kranju. Arheo: arheološka obvestila, 25-40.
  • Pham, T. C., Coustaty, M., Doucet, A., Joseph, A., & D’andecy, V. P. (2025). Deep metric learning for end-to-end document classification. Neurocomputing, 131241.
  • Pham, T. C., Coustaty, M., Joseph, A., Deloin, G., Poulain d’Andecy, V., & Doucet, A. (2025). Exemplar sampling algorithm for instance incremental learning on imbalanced document datasets: T.-C. Pham et al. International Journal on Document Analysis and Recognition (IJDAR), 1-12.
  • Pretnar Žagar, A. (2025). Computational Analysis of Slovenian Historical Newspapers (1771–1914): Linguistic, Thematic, and Nation-Building Insights. Contributions to the Contemporary History, 65(3), 42-66. https://doi.org/10.51663/pnz.65.3.02
  • Tran, H. T. H., Martinc, M., Caporusso, J., Delaunay, J., Doucet, A., & Pollak, S. (2026). Recent Advances in Automatic Term Extraction: A Comprehensive Survey. ACM Computing Surveys, 58(9), 1-35.
  • Tratnik, P. (2025). Saint George, the Dragon Slayer: Sacralized Violence and the Allegorical Union of Sacred and Secular Power. ACTA HISTRIAE, 33(4), 617–668. https://doi.org/10.19233/AH.2025.23
  • Tratnik. P. (2025). Flusser on Artificial Intelligence. Flusser studies, 40, pp. 1-8. https://repozitorij.uni-lj.si/IzpisGradiva.php?id=178705&lang=slv

Research datasets

  • Žagar, A., Dobrovoljc, K., Munda, T., Brglez, M., and Robnik Šikonja, M. (2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0, Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1988.

Software repositories

Public deliverables