PolyAnno: All The Unicode

This is part of my series of posts about the PolyAnno project – more here

As discussed earlier in the Transcription page, one of the design problems faced by the project is how to allow users to input into the textboxes in any of the possible languages or characters that may be necessary to transcribe the manuscripts displayed in the PolyAnno website. Eventually the development of the code to allow this became worth separating into the All The Unicode package.


The Github can be found here – https://github.com/BluePigeons/alltheunicode – along with the main usage documentation. This post is intended to be more about the development of the package as part of the PolyAnno project.

Alternative Keyboards

The file alltheunicode.js starts with the definition of the variable udataArray as taken from the official Unicode website  – this is a string of the complete Unicode characters. I tried to avoid simply copying and pasting this from the official website into the code because of the frequency with which this updated (mainly emoticons but whatever…) but because of increasing regulations on the handling of async requests and the lack of a nice simple REST API to access the information, I simply settled for copying for now. However, this needs to be improved.

var udataArray = [`0000;<control>;Cc;0;BN;;;;;N;NULL;;;;
0001;<control>;Cc;0;BN;;;;;N;START OF HEADING;;;;
0002;<control>;Cc;0;BN;;;;;N;START OF TEXT;;;;
0003;<control>;Cc;0;BN;;;;;N;END OF TEXT;;;;
0004;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION;;;;

This is then processed using code based upon that used at https://www.cs.tut.fi/~jkorpela/fui.html8 to generate the full Unicode character set keyboard.

Firstly the generic HTML for the keyboard is saved primarily in the two variables:

  • atu_handlebar_HTML
  • atu_main_HTML – by default this is completed with the Latin alphabet options

Then the udataArray string is converted into a more useful array:

var udata = String.raw({raw: udataArray});
var loadUnicodeData = function() {
  var lines = udata.split('\n');
  for(var i=0; i < (lines.length-1); i++) {
        var entry = lines[i].split(';');
        entry[14] = entry[14].replace('\n', '');
        var code = parseInt(entry[0],16);
        uD = {
         na: entry[1],
         gc: entry[2],
         cc: entry[3],
         bc: entry[4],
         dt: entry[5],
         nv1: entry[6],
         nv: entry[7],
         nv3: entry[8],
         bi: entry[9],
         na1: entry[10],
         is: entry[11],
         suc: entry[12],
         sl: entry[13],
         stc: entry[14]
  fixRange(0x3400, 0x4DB5);
  fixRange(0x4E00, 0x9FCC);
  fixRange(0xAC00, 0xD7A3);
  fixRange(0xD800, 0xDB7F);
  fixRange(0xDB80, 0xDBFF);
  fixRange(0xDC00, 0xDFFF);
  fixRange(0xE000, 0xF8FF);
  fixRange(0x20000, 0x2A6D6);
  fixRange(0x2A700, 0x2B734);
  fixRange(0x2B740, 0x2B81D);
  fixRange(0xF0000, 0xFFFFD);
  fixRange(0x100000, 0x10FFFD);   
function fixRange(first, last) {
  var i;
  var desc = uD[first].na.replace(/, First/, '');
  var cat = uD[first].gc;
  for(i = first; i <= last; i++) {
      uD[i] = {
		   na: desc,
		   gc: cat

var cgName = {
  Cc: 'Other, control',
  Cf: 'Other, format',
  Cs: 'Other, surrogate',
  Co: 'Other, private use',
  Cn: 'Other, not assigned'

var hexMax = 0x10FFFF;

Then the code to determine the result of 'typing' (selecting a button on the keyboard) is done. The first three functions, insert the character into the current textbox selected on the page when clicked:

function clicked(elem, code) {
function add(code) {
function addstr(addition) {
  if (!isUseless(atu_the_input)) {  
    atu_the_input.value += addition;  

function fixedFromCharCode (codePt) {
    if (codePt > 0xFFFF) {
        codePt -= 0x10000;
        return String.fromCharCode(0xD800 + (codePt >> 10), 0xDC00 +
(codePt & 0x3FF));
    else {
        return String.fromCharCode(codePt);

Then the functions below that collectively load and change the Unicode characters corresponding to each of the HTML elements in the keyboard, then the search function, then the back and forth buttons.

Then the function atu_initialise_setup builds the keyboard for the first time, and sets up all the event triggers to listen.

var atu_has_setup_initialised = false; 

var atu_initialise_setup = function() {

  atu_the_code = document.getElementById('atu_the_code');
  buildMap( new_atu_map_body_id , '0000');
  var initialBlocks = document.getElementsByClassName('atu-languageChoice');
  document.getElementsByClassName('atu-blockMenu').selectedIndex = 1;

  ///event listeners setup here
  atu_has_setup_initialised = true;


All the event listeners are listening for the objects within the "atu-keyboard-parent" class objects.


I implemented the Dragon Drop package with this, but additionally ensured that the keys within the keyboard itself adjust to overflow into new Bootstrap rows if the size drops below a width that would allow a reasonable font size.

  $( ".atu-keyboard-parent" ).on( "resizestop", ".keyboardPopup", function( event, ui ) {
    var gridwidth = Math.round($("#ViewerBox1").width() / 12 );
    var newWidth = ui.size.width;
    var colwidth = Math.round(newWidth/gridwidth);
    var newName = "col-md-"+colwidth;
    var theClasses = $(".keyboardPopup").attr("class").toString();
    var theStartIndex = theClasses.indexOf("col-md-");
    var theEndIndex;
    var spaceIndex = theClasses.indexOf(" ", theStartIndex);
    var finishingIndex = theClasses.length;
    if (spaceIndex == -1) {  theEndIndex = finishingIndex;  }
    else {  theEndIndex = spaceIndex;  };
    var theClassName = theClasses.substring(theStartIndex, theEndIndex);
    if ((theStartIndex != -1) && (theClassName != newName)) {
      $(".keyboardPopup").removeClass(theClassName).addClass(newName+" ");

  } );

Then the actual addKeyboard function uses the Dragon Drop package to allow multiple keyboards on screen.

var addKeyboard = function(the_drag_options, initialise_dd) {
  var newKeyboardIDwithHash = add_dragondrop_pop("keyboardPopup", atu_main_HTML, $(".atu-keyboard-parent").attr("id"), the_drag_options.minimise,  atu_handlebar_HTML);
  new_atu_map_body_id = Math.random().toString().substring(2);
  $(newKeyboardIDwithHash).find(".atu-mapPopupBody").attr("id", new_atu_map_body_id );
  if (!atu_has_setup_initialised) { atu_initialise_setup(); };
  if (initialise_dd) { initialise_dragondrop( $(".atu-keyboard-parent").attr("id"), the_drag_options ); };


Alternative Input Method Editors

The basic HTML is saved in the variable atu_IME_HTML.

After investigating many different IME libraries optimised for individual languages, I chose to develop the main functions using the multi language Wikimedia JQuery IME libraries are atu_setup_IME_area1 and atu_setup_IME_area2. using code from https://github.com/wikimedia/jquery.ime.

var atu_setup_IME_area2 = function(thisArea, theCurrentIME) {
  theCurrentIME.getLanguageCodes().forEach( function ( lang ) {
      $( '<option/>' ).attr( 'value', lang ).text( theCurrentIME.getAutonym( lang ) )
  } );
  $langSelector.on( 'change', function () {
    var lang = $langSelector.find( 'option:selected' ).val() || null;
    theCurrentIME.setLanguage( lang );
  } );
  thisArea.on( 'imeLanguageChange', function () {
    listInputMethods( theCurrentIME.getLanguage() );
  } );

  function listInputMethods( lang ) {
    theCurrentIME.getInputMethods( lang ).forEach( function ( inputMethod ) {
        $( '<option/>' ).attr( 'value', inputMethod.id ).text( inputMethod.name )
    } );
    $imeSelector.trigger( 'change' );

  $imeSelector.on( 'change', function () {
    var inputMethodId = $imeSelector.find( 'option:selected' ).val();
    theCurrentIME.load( inputMethodId ).done( function () {
      theCurrentIME.setIM( inputMethodId );
    } );
  } );


var atu_setup_IME_area1 = function(thisArea) {
    showSelector: false
  var theCurrentIME = thisArea.data( 'ime' );
  if (!isUseless($langSelector)) {
    atu_setup_IME_area2(thisArea, theCurrentIME);

The atu_initialise_IMEs function checks that there is a textbox currently assigned to the variable atu_the_input before allowing the change of input method.

var atu_initialise_IMEs = function() {
  if (!isUseless(atu_the_input)) {

The atu_load_scripts function is a generic function to implement the addition of further Javascript libraries into the page by adding the HTML strings into the relevant DOM. This allows dynamic loading of the slow font and IME packages only when necessary.

var atu_load_scripts = function(the_src, callback_func) {
  var atu_script = document.createElement( 'script' );
  atu_script.type = 'text/javascript';
  atu_script.setAttribute( 'src', the_src );
  atu_script.onload = function() {
  document.head.appendChild( atu_script );


Because of different storage locations of rangy-core and the rules folder the Wikimedia versions are slightly different to the ones in All The Unicode.

The separate functions as callbacks in the addIMEs function is necessary because of Javascript being synchronous - they load in the wrong order otherwis.

 var addIMEs = function(by_button, initialise_options, active_load) {


  var setupIMElisteners = function() {

    if ( by_button ) {
      $(".polyanno-enable-IME").css("display", "none");
      $(".polyanno-add-ime").on("click", function(event){
        if ($(this).hasClass("polyanno-IME-options-open")) {
          $(".polyanno-enable-IME").css("display", "none");
        else {
          $(".polyanno-enable-IME").css("display", "inline-block");
          $langSelector = $( 'select#polyanno-lang-selector' );
          $imeSelector = $( 'select#polyanno-ime-selector' );
          if (initialise_options) { atu_initialise_IMEs(); };
    else if ( !by_button ) {
      $langSelector = $( 'select#polyanno-lang-selector' );
      $imeSelector = $( 'select#polyanno-ime-selector' );
      if (initialise_options) { atu_initialise_IMEs(); };

  var loadScript2 = function() {  atu_load_scripts("https://rawgit.com/BluePigeons/alltheunicode/master/libs/jquery.ime.js", loadScript3);  };
  var loadScript3 = function() {  atu_load_scripts("https://rawgit.com/BluePigeons/alltheunicode/master/libs/jquery.ime.selector.js", loadScript4);  };
  var loadScript4 = function() {  atu_load_scripts("https://rawgit.com/BluePigeons/alltheunicode/master/libs/jquery.ime.preferences.js", loadScript5);  };
  var loadScript5 = function() {  atu_load_scripts("https://rawgit.com/BluePigeons/alltheunicode/master/libs/jquery.ime.inputmethods.js", setupIMElisteners);  };

  if (active_load) {
    atu_load_scripts("https://rawgit.com/BluePigeons/alltheunicode/master/libs/rangy-core.js", loadScript2);
  else {




I found the largest Unicode font coverage could be found Google's Noto font but this still isn't complete. I was investigating the better use of fallback fonts for more complete coverage but somewhere along the line I found that not all of the Google fonts were loading and loading fonts is a very slow process...

So many notofus! || Image by Erin Nolan 2016 || pigeonsblue.com

In the long term I would like to use the Promises of FontFaceFaceObserver for loading fonts see here and here for more information.

Next: Further Product Development 2

This is part of my series of posts about the PolyAnno project – more here


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s