Tallinn University, Archaeological Research Collection
You have 2,400 photographs, some data, and one weekend – can you solve a glass puzzle with ~3,000 shiny pieces?
In 2018, a late 15th/early 16th-century landfill was discovered in the suburb of Kalamaja in Tallinn. Around 40,000 artefacts were collected, making this the largest archaeological collection in Estonia. Due to the variety and number of rare artefacts, the site has been dubbed a treasure trove for Estonian archaeology. The dataset at hand consists of tabulated artefact data (.XLSX), metadata (.TXT) and 2,400 working photographs (.JPG) of the 3,000 glass fragments from the landfill and around 80 fragments from later inhabitation periods. The author took the photographs from March 2019 until August 2024 at Tallinn University’s archaeological research collection using various cameras (Canon EOS 1200D, Huawei Mate Pro 10, Canon EOS 250D, iPhone 14 Pro Max). The files are renamed to include the artefact label. The quality of the unedited working photos varies greatly. Most photographs include a scale bar, ruler or use graph paper as a background. If necessary, both sides or multiple angles of the artefact are shown. Some photos are taken through minigrip bags. In such cases the artefact label is on the bag or visible through it. Although the photographs were used in completing a PhD thesis, most have not been published before. As the artefacts are packed away for moving, the working photographs are currently the only way to study them.
Ideas:
Object reconstruction – using the Excel sheet, select objects and their photographs where it is known that they are from the same artefacts and build digital reconstructions (2D) of the objects. otode põhjal luua 2D digitaalseid rekonstruktsioone.
Identify and cross-reference find labels, rename photographs – as the find labels are written on artefacts or bags by hand, the main challenges are legibility and human error, i.e. the wrong artefact number written on the object or bag. The Excel sheet with artifact data can double-check whether the list compiled using machine vision is correct. Since all photographs are currently renamed by hand, an automated system which can either identify artifact labels and rename photographs accordingly or identify artifact labels and compile a list with corresponding file names would greatly reduce the time it currently takes to sort and organise the photographs. Please note that some artefacts may lack physical labels but all files have the label in the name of the file. Identifying such artefacts is also important as labelling can then be suggested to the head of collections. The highlighted part means that once the data is pulled, the head of collections can know where the finds are kept, and then write the numbers on the artefacts if needed. Without the numbers on the bags or the artefacts themselves, the finds risk being misplaced or lost.
Identify fragments of the same artefacts – the notes field in the Excel sheet can test whether machine vision can accurately identify objects. Since a human made these identifications, they are likely incomplete but should provide enough basis to test the model. This task would allow identifying the number of individual artefacts from the site and show whether and how the fragments of the same artefact are scattered across this landfill based on the find context of the fragments. Knowing which fragments go together can also assist in object typology, e.g. allow to identify the sizes of window panes, stained glass patterns, the diameter of beakers etc.
Identify typological similarities – In addition to fragments of the same artefacts, the site has numerous fragments from objects of the same type. The published object typology was done by human eye and not all fragments could be positively identified (Type – Unknown). This task would test whether it is possible to group the images by object type with input from a researcher and whether the pictured objects correspond to the artifact descriptions.

