23. June 2025

AI Methods for Research of Folkloristic Narratives

How is AI revolutionising folkloristics? AI methods can help researchers recognise latent patterns, structures and discursive shifts that would remain hidden using traditional methods. In an international workshop that took place on Friday 13 June (the date may or may not be intentional), we brought together international researchers from computer science and the humanities to present and discuss their interdisciplinary research. The aim of the workshop was to develop interdisciplinary methods for analysing folkloristic corpora using AI-based tools that combine qualitative insights with quantitative rigour. Read this short article in which we summarise the topics, methods and results presented at the workshop.

In the plenary lecture, Tim Tangherlini presented the challenges he has faced in projects to digitise folklore archives, including the Berkeley Folklore Archive. He demonstrated how LLMs can be used to estimate the data schema for poorly labelled data, how these models can help with optical character recognition (OCR) correction and searching a corpus with multiple languages. Finally, he also presented how he used computational methods to discover the latent meanings of fairy tales. The Danish fairy tales about witches, for example, have to do with the theft of milk, and the witches disappeared from Danish folklore when milk was no longer so scarce in society. The workshop continued with three thematic panels.

Group of scholars that participated at the workshop

In the first panel, scholars presented how they use LLMs for folktale type index and motif recognition. Sándor Darányi and Thomas van Erven presented an open-source Python workflow that combines machine learning-based text categorisation with sentiment vs. social importance analysis. Tjaša Arčon presented a methodology for recognising motifs in a large collection of Cinderella variants and found that LLMs can recognise complex interactions in fairy tales, making computational analysis of much larger collections of texts than previously possible. Judith Veld analysed the publicly available LLMs for classifying folktale motifs into a more generalised hierarchy. The results show moderate accuracy and challenges with abstraction, hierarchical depth and gender and cultural bias. She underscored limitations of current classification systems and emphasised the need for closer collaboration between expert annotators and LLM researchers.

In the second panel, researchers focussed on the topic of values and deep structure analysis with LLMs. Jasmina Rejec analysed moral values in around 60 Slovenian folk tales using computer-assisted methods. Her study included an expert review to assess the alignment between machine-generated and human-assigned moral labels. Marjan Horvat, Jure Koražija and Polona Tratnik applied large language models to analyse three Slovenian versions of three folktales. They identified value-based cognitive matrices that characterise inclusion, mutual respect, responsibility and the common good. Their method revealed diachronic shifts and cultural specificities in collective decision-making. Jan Babnik and Polona Tratnik presented their research on dragon slayer narratives, which have a similar internal structure but differ discursively. By comparing legends, folk tales, chivalric romances and novels, they analysed how these structurally related tales have different discursive functions.

The last panel dealt with the topic of LLMs for analysing complex phenomena. Darko Darovec presented a study on the earliest accounts of the Americas and the customary conflict resolution analysed with ChatGPT. The results show that vengeance is a universal, ritualised form of conflict resolution that leads to peace, rather than a violent trait of so-called primitive cultures. Jan Babnik and Matej Martinc presented the development of a vision language model for Slovene, how to train the model to answer questions about visual content, and how to reduce hallucinations and improve the model’s reasoning abilities in a language with low resources and scarce data availability such as Slovene.

Finally, Antoine Doucet closed the workshop with a presentation by the Centre of Excellence in Artificial Intelligence for Digital Humanities, in which he introduced the support that the CoE offers to humanities and social sciences researchers.

A book of abstracts is available on this link.