Estonian LiterarY Museum
Trace patterns in Estonian/Finnic oral poetry computationally
Runosongs represent one of the most remarkable shared poetic traditions of the Finnic peoples, spanning centuries, regions, and dialects. Collected extensively in the nineteenth and twentieth centuries, these texts now form one of the largest digitized corpora of oral poetry in Europe. In Estonia and Finland, the material has been preserved in central folklore archives and systematically datafied over the past 25 years. Since 2020, these data have been organized into a joint database within the framework of the Finnish–Estonian FILTER project.
For this hackathon, participants can work either with the Estonian corpus (approximately 100,000 texts) or with the full Finnic dataset (approximately 250,000 texts). Each record is accompanied by metadata, including archival reference, place of origin, collector, and, in many later recordings, information about the performer. Although the material has been partially organized typologically, much of its internal structure remains to be discovered. The FILTER project also developed a methodology for calculating similarity between verses and songs, the corresponding data were added to the dataset, and implemented in the web environment runoregi.fi, and visualisations page.
The corpus invites computational investigation of patterns in language, content, and poetic form. Its primary challenge lies in the substantial linguistic variation of the archaic runosong idiom, a challenge that makes it especially suitable for embedding methods, network analysis, and visualization approaches. Possible hackathon outcomes include predicting song origin from textual features, building embedding-based explorers, mapping formulaic structures, or experimenting with generative models trained on runosong style.

