This is part of my series of posts about the PolyAnno project – more here
The Images
So the first consideration I had with the transcription design was not the texts but the images themselves. Firstly, images taken as digital copies of collections like Edinburgh’s are taken at incredibly high resolutions – every possible pixel of the images are recorded and preserved for analysis. Which sounds like a good idea all the way until you consider loading these images into web pages. As I have mentioned in the images section of my Beginner’s Guide to Hacking web design page, there are several different ways to store images but for photos it is generally a compromise between trying to keep as much information as possible but not using files that are just too large for the page to reasonably load.
However, solutions exist to this problem as I am certainly not the first nor the last to have come across it! Rather than loading all the image all at once in one big (even when compressed) file, images of that size can be loaded in sections at the different scales you want to view them – so you can load a file of the whole image at one size with less information and then to zoom in you reload just that zoomed in section but with the higher resolution.
Of course code to enable this to happen has been developed all over the place and for years different people have found different ways of doing it. And of course all the differences in encoding created difficulties when trying to exchange images with each other. This became an especially important in the digital humanities where institutions discovered that you could get different pages of a book being stored in different parts of the world and although they had photos of the pages no one could reassemble the book in one piece online because the code for each page was so different.
The IIIF APIs
So people got together and agreed upon standards to use so everyone could share images – the International Image Interoperability Framework (IIIF). The University of Edinburgh had fairly recently been upping its involvement with the IIIF folks not long before I started my placement and so were eager to see how it all fitted in to this project.
You can see an example of IIIF with the British Library in action here – http://sanddragon.bl.uk/mirador/
The formats agreed upon by the IIIF for their image standards or APIs are all variations of JSON Linked Data files (JSON-LD). I admit that previous to this project I had only ever been vaguely aware of JSON-LD files as I had never really worked on anything long term like this before where the additional work for the Linked Data was ever worth it. So having brushed up on all the buzzwords and rules for IIIF and JSON-LD I was ready to go to display the images that were to be transcribed.
Image Sections and Annotations
Then of course I had to address the problem of text within the images. Those lovely people creating the thousands of weird and wonderful documents in our library collection did not have standards to comply with for their creations and so text can be written in literally any shape or arrangement possible on the page.
What order to write up the stuff on this page so that everyone can understand which section you have written about? Everyone who loads the website can link the text you have written to the correct bit on the page? What if you only want to write the transcription for the word you have spotted faded in the corner that no one else has noticed but it is a quote and you don’t want it to get confused with the transcription of that bit in the printed text?
Although IIIF defines how the files are encoded it does not of course limit you to one “viewer” – the client side code to interpret and load the IIIF images within the website. So I needed to decide on what viewer to load the images in and how to create a connection for the user between the text and the section of the image.
After reviewing the growing number of possible methods it came down to a choice between Leaflet JS with the Leaflet-IIIF as viewer with Leaflet Draw to select sections and Mirador as viewer with a few different third party packages to select sections.
Mirador with the Harvard CATCHA annotation software here – http://oculus-dev.harvardx.harvard.edu/demo/
Leaflet IIIF & Draw example here – http://bl.ocks.org/mejackreed/462e89092ce71ae7dd09e6074d60f2e0
The Mirador annotation UI is considerably more “clunky” than that of the Leaflet IIIF/Draw but also the code for the Leaflet package was considerably more modular and smaller. Leaflet JS itself requires the extra IIIF package to work as a viewer because unlike Mirador it was actually created primarily for maps- a very similar instance of loading images by sections rather than downloading a whole file of maps at once. There is a considerably larger community surrounding Leaflet JS because of its much wider range of applications beyond IIIF images and even the Leaflet Draw library has a large community in its own right. Whereas Mirador’s community are relatively new to the field of annotation and client side section selecting so the support for this functionality is more limited. However Leaflet was fairly new to the field of IIIF support whereas Mirador was designed for it.
In the end I decided to go for using Leaflet but although I am at the end of the project (well my official paid involvement at least) I would love to hear any feedback or opinions on this decision – and I am sure the IIIF fans in the office would love to hear too!
Next up: PolyAnno: Annotations
This is part of my series of posts about the PolyAnno project – more here
Hello there, I was wondering if you heard about the Universal Viewer (http://universalviewer.io/) which is developed primarily by Digirati and used now at the British Library. It was first designed for the Wellcome Trust (and called the Wellcome Player). Also, Mirador released their 2.1 version a couple of months ago (http://projectmirador.org/). Best,
LikeLike