Skip to content

Keynote speech by Dr Marko Robnik-Šikonja at the eLex 2025 conference

Dr. Marko Robnik Šikonja delivered the keynote address at the eLex 2025 conference on electronic lexicography. His talk explored a central question: how are large language models (LLMs) transforming the field of lexicography? He examined their benefits and limitations, and also considered what the future may bring.

Large language models are transforming lexicography in several ways. They enhance both monolingual and bilingual support, enable the creation of richer dictionaries, and streamline many lexicographic tasks. LLMs can draft definitions, distinguish word senses, generate illustrative example sentences, and align translations for bilingual dictionaries. However, despite these advances, human involvement remains indispensable. As Marko Robnik‑Šikonja emphasises,

“Human intuition, guidance, and knowledge are still essential when working with LLMs in lexicography.”

LLMs for Lexicography

The speech also highlighted the benefits and limitations of large language models in lexicography. LLMs enable faster and more cost-effective drafting, while expanding coverage to include a wider range of words, such as rare terms, slang, and newly emerging scientific vocabulary. They help ensure a consistent style and structure across dictionary entries. Drawing on his work with the Slovene language, Dr Šikonja emphasised that LLMs hold particular promise for advancing research and resources in less-resourced and low-resourced languages.

There are also significant limitations in applying large language models to lexicography. LLMs are prone to generating inaccurate or “hallucinated” content, and their outputs often reflect underlying biases. They tend to favour dominant languages, which can disadvantage less-resourced ones. Further challenges include maintaining consistent formatting and the ongoing need for human review, both of which require considerable time and resources.

Lexicography for LLMs

The relationship can also work in the opposite direction: lexicography can enhance large language models. Lexical resources provide reliable, high-quality, human-verified data that is invaluable for developing LLMs. This data can be used to pretrain models specifically designed for lexicographic purposes (LexLLMs) and to create instruction-following tasks tailored to those models.

It is important to note that working at the intersection of lexicography and large language models requires significant computing power. In particular, access to Graphics Processing Units (GPUs) is essential. If you are interested in pursuing this kind of work, please contact us. We would be glad to collaborate and help you gain access to the necessary computing infrastructure.

What does the future hold?

The future of lexicography will involve an increasing number of large language models designed specifically for the field. We can expect the emergence of real-time dictionary assistants and broader multilingual coverage. LLM agents will help reduce the workload of human lexicographers and facilitate evaluation efforts. There will also be growing attention to non-standard language, specialised domains, dialects, and low-resource languages. LexLLMs are already available, and more are forthcoming, bringing new methodological challenges as well as fresh scientific paradigms and approaches. A recent article titled The AI Scientist described how researchers simulated and fully automated the research process in computer science, producing complete scientific papers with AI. Could a similar level of automation be achieved in lexicography?

Explore the conference papers at this link.