A Complete Beginner’s Guide to Hacking: Data Design

This post is part of the series A Complete Beginner’s Guide To Hacking

So the main website languages define files that are sent off to be interpreted “client side” or rather to be interpreted by the browser that the client or user trying to view the page has. Client side languages are really good for doing things we want the client’s computer and browser to be doing but the bigger the file or the more work their computer has to do then the slower the website will run on their computer. Therefore we want to minimise this when developing our application which can be done by using languages best suited to the function we want them to do.

Despite their popularity the three main web client side languages are just three of thousands of possible higher level languages to make a computer do things as discussed in an earlier chapter of this series and there are lots of things, like complex mathematical functions or creating and searching databases, that other languages do better at and can run in the background or “back end” of the application doing the calculations that the users are not directly interacting with.

The most common languages to write back end code for web applications with are Object Oriented Programming languages, such as Python, Ruby, Node.JS (another variation of Javascript) and PHP.

They are all very similar so it is not so difficult to learn the others once you have learned one. For beginner projects you will often see very little difference between them but when projects get larger and more complex then you will find one or two more preferable to the others. One of the considerations is the web templates available for that language.

Web Templates & Frameworks

These other languages are generally server side languages, i.e. languages designed to be run outside of browsers and interpreted by different packages on your computer. Libraries and new languages are constantly being developed though to allow the different client and server side languages to communicate with each other and these are generally known as web templates. The web templates can either be client-side or server-side depending on whether they allow a server-side language to be run in the browser client-side, or if they allow a server-side language to interact with client-side via an API (see the APIs section for more info).

Client-side web templates minimise the work done by the user’s computer in terms of processing exchanges via the web APIs. So the user has to wait for the request to see the web page to be sent off from their computer to the URL address, and then the website files to be sent back and loaded onto their computer and then the site should run with few, if any, information exchanges via the internet connection after that. Websites with these templates are often called “single page” websites. However, because of all the work done by the server-side languages the files to load for the website will often be much larger than server-side and so that initial loading will be slower and the time to run each server-side function will depend on the processing power of the user’s computer.

One client-side type is embedded web templates – these allow you to create a HTML object and write the server side code inside that, like you can with Javascript. Some languages like PHP, have embedded web template code built into them so you can do this with just plain PHP.

With other languages, like Ruby or Node.JS, you will find that you will have to use another language to translate from the normal server side code to something the browser can understand. These sorts of languages or frameworks are known as embedded complex languages as they allow you to embed, or write in, a server-side language in a HTML document e.g. eRuby or EJS to write in Ruby or Node.JS.

Server-side web templates minimise the work done processing functions on the client’s computer and can take advantage of running them on servers that have designed for optimal performance of these functions. However to get the results of the processes the client-side website will need to keep exchanging information over the internet connection between the client’s computer and the server.

Server-side web templates that keep most of the processing server side are sometimes called “thin client” and include packages like Express or Django that let Node.JS or Python code interact with client-side via features like REST APIs.

Databases

So how to store data? When running an OOP application you can often store variables within the program while it is running but this would require you to keep a whole instance of an application running and using processing power while a user was offline just to save a value. Yep, there are other languages designed just to more efficiently just store data.

Generally, these database languages are divided into two main types relational, sometimes known as SQL after the most common language most of them are just extensions of, and non-relational, or noSQL. There are other types such as graphs, but as they generally only handled in niche applications I’ve not discussed them further here.

Relational

Relational databases store data according to how it relates to each other – or in other words it stores values in tables. The tables then of course

There are lots of variations of this such as MySQL and PostgreSQL and lots of short courses out there to teach you all the little variations because it is the most popular way of interacting with databases. Although you can easily learn just plain SQL that they are all build upon, there are so many security issues involved with handling data out of plain SQL databases so it is generally preferable for beginners to use an existing framework to interact rather than use SQL itself.

Non-relational

There are lots of things where relational databases are just unnecessary work and you just need a big long list of key values stored that you can just pop a value out of at one end and push into at the other. For these applications databases like Redis are what you need.

But there may be other reasons why a non-relational database is used – for example MongoDB is great at searching through and handling data stored in JSON format and there are a whole load of different reasons why you might prefer working in JSON data.

The Internet of Things, Linked Data and other buzzwords…

One reason for the rising popularity of non-relational databases is because of the current phenomena or trend of “the internet of things” or IoT. This is the idea of “putting everything online” that is often overhyped and has so far resulted in things like fridges with iPads stuck to the front of them just to give them an internet connection.

It is not actually supposed to mean just anything with a physical internet connection just stuck somewhere on it. The official definitions being implemented actually state that a ‘Web Thing’ (their name for whatever you are connecting) must have a valid HTTP URL “that acts as the entry point for the Web Thing and enables the interaction with it” but also that a “Web Thing must support JSON as default representation”. In other words, if the internet connection isn’t actually doing anything then it isn’t officially IoT but also that the interaction should be possible using JSON as a format. If you are handling JSONs then often (not always) non-relational databases are a better choice.

Similarly “linked data” a lesser known trend, often confused with IoT, is also having an impact on database choice. This was supposed to become a trend a while ago when the W3C standards people started trying to implement linked data standards for the “semantic web”. The idea is that if everything referencing the same thing provides links to the same URL and defines what the relationship is between the two things then it becomes easier for a computer to understand how everything fits together.

Web pages sort of already do this – everyone who wants to reference the Google.com home web page always links to the same http://www.google.com address and often in the HTML they have it in an anchor tag so the relationship is a redirection from that page to that one. However the idea is to do this for non-web-page things as well. The things can be physical objects (and can then use their URL to join the IoT, or not) or even abstract concepts (I do not know if love has an official URL) . And other relationships can be defined – as long as the terms used to describe it are consistent with the standards so all software can interpret it in the same way. For example Facebook can write in code that my username is a “knows” another.

Except as useful as this could be to computers understanding things better, it was a lot of extra work on the part of people who were just building databases that don’t need to do that as the original formats suggested for writing this code was quite messy. So basically, although people have been trying to get this to work for a while it didn’t have as much success at first as hoped. But in 2010 the format of JSON-LD or JSON-Linked Data started and allowed the same extra stuff but by just adding a few extra fields into your JSON file.

JSON-LD

The format agreed upon is adding a field “@id” that gives the URL for the thing, “@type” which tells the computer what type of thing it is e.g. “Person”, and then “@context” which tells the computer where to look for definitions of all the relationships that each field in the JSON defines – so for example if I had a JSON of me with a field as shown below:

{

“@id”: “www.pigeonsblue.com/34973598265”,

“@type”: “Person”,

“name”: “Erin Nolan”,

}

Then my “@context” called “name” whose value was a link to a website whose code could explain to another computer what it means for a “Person” type to have a relationship of “name” to a value, and another to explain what a “Person” type is. For example:

{

“@context”: {

“name”: “http://xmlns.com/foaf/0.1/name”,

“Person”: “http://xmlns.com/foaf/0.1/Person”

},

“@id”: “www.pigeonsblue.com/34973598265”,

“@type”: “Person”,

“name”: “Erin Nolan”,

}

This may still appear to be very effortful to code but as this is easier to integrate than the alternatives and could still be better than no option for using or producing linked data in the future, it is encouraging more people to use JSON as a data format so that JSON-LD is an easy extension if wanted.

Next: Hardware and Embedded Processing