So I started several months ago a part time internship within the Library and Information Services group at the University of Edinburgh but now I am in my final week and I feel I should write up the fantastic experience I have had working on this project. Others might just find it interesting to read but I also want to make it easier for anyone else looking to do something similar.

So my initial challenge given was to create a website to gamify the transcription and translation of the University Collections.

To give context – the University of Edinburgh was founded a long time ago and has been building up an incredible collection of collections of literally all sorts of things. Way too many things for us to have any real idea of what so much of it actually is. Of course in this lovely digital age we have lots of awesome tools to help us catalogue and slowly this massive load of documents is being digitised and made available online.

However, the scale of the collections mean it is simply not feasible to pay staff to write up the transcriptions, the digital text version of the written contents on the pages. When our wonderful staff are digitising documents they add the metadata that they can – often the documents being processed are those being requested by academic staff and so information can be provided by them.

This system still leaves problems though – it would be ideal if the academic staff could search through the collections digitally first rather than manually through the papers and then request digitisation. Additionally it would be easier for those researching subjects elsewhere in the world to find the relevant documents if the metadata was complete.

So a little while back people experimented at Edinburgh and made the Library Labs Metadata Tags game in which you could just add tags to the images being presented in the game and this would be used to add data to the collections. Investigations were being done into developing something similar to provide the larger metadata of the document’s transcriptions. And the translations came along after as of course it is even harder for the collections staff to make documents accessible where they really don’t have any clue as to what they say.

Initial experiments were done with looking into Zooniverse and a couple of others as options but it had quickly become apparent that it was a slightly different sort of challenge to crowd source because of the variations in page format, language, interpretations of reading and also for translating. So, this is where I was brought in and hired to try and develop a website for the task. Well at the time of writing this I am still doing last minute scrabbling to get everything actually up and working before it all goes up online but I absolutely promise to update this page with the links when it is all up!

For now though I am going to write up all the things that have been done and lessons learned across the project in a series of posts and leave the initial Github repo link at the bottom. Enjoy!

The main Github Repo (for now) – https://github.com/BluePigeons/Polyglot








2 thoughts on “PolyAnno – Adventures in Annotation

  1. Great work on this. Did you ever consider putting your annotations triples in a dedicated triple store like Sesame? We are currently using Mirador and simple annotation server to do something similar, but Im a big fan of Leaflet.


    1. I’ve never encountered Sesame before but equally PolyAnno was using JSON-LD as opposed to conventional triples because of the flexibility and ease of use of the JSON format. So we were simply using MongoDB for storage.

