Asit stands for JavaScript Object Notation the JSON quickly became a buzzword in the Internet and cloud-based software world due to its simplicity and human readable text form. The use of such notation is a natural evolution of the JavaScript (JS) scripting/programming language because of its Object-Oriented Programming paradigm that places the object as a basic building block of the software code. An object is the encapsulated container where relevant data and the functions to handle that data are kept in the same logical “code-spot”.

Objects, code, and serialisation

The objects consist of data (attributes) and methods (functions) and the data in the object can be hidden (private) or exposed (public) to the rest of the code. Each attribute is not limited to a simple data-type such as integer, Boolean (string of floating-point value) but can also be a collection (array/list), a structure (name/value) or another object type. It is straightforward to see how a relatively complex object is represented as a hierarchical structure of data (linked-list or data-tree). When kept in-memory (that data “tree” is not in plainly visible format) the simple values are some bits in the memory. The elements of a collection are accessed by memory “pointers” (references) that are only meaningful for the source code compiler. If the object’s data need to be represented in a machine- or human-readable format, the process is called serialisation, and this is where data-representation formats like JSON come in very handy.

The request of objects to be serialised (and de-serialised — reconstructed back into memory) comes from the fact that a software program keeps only some of the data in the computer memory. Most of the accessible data is actually stored in a database or some online cloud storage. When the data is communicated between software systems, if those are both implemented with programming languages supporting OOP (object-oriented programming) — it is a straightforward process to write all object’s data in a text, send this text over the network as payload for the HTTP protocol and decode (de-serialise) this text back into an object in the other software system.

Data markup

For the object data to be easily encoded and decoded in a text form all data elements (values, arrays, objects) need to be enclosed in special symbols. Those symbols are usually some relatively rare combination of characters (such as </>, {} or []). If we have to create a data structure (object type) to store one person’s essential information – the object’s JSON data will look something like:

{
"person": {
"firstName": "Peter",
"lastName": "Pan",
"addr": "47 Sun Str, San Jose, CA"
}
}

The special symbols used to “enclose” meaningful data are called “markup” — hence the paradigm of the markup languages. One of the most notable markup-languages is the XML (extensible markup language) — that was the predecessor of the wide-spread HTML.

As we can see this form is quite easy to understand than, for example, a raw byte-stream (hex format):

0x40 0x45 0x55 0x32 0x44 0x45 0x89

The natural text-form object data is way more easy to be understood and reviewed by software developers. It helps to troubleshoot or examine the database of a software system. It is also very straightforward to create objects that would store the application configuration and deploy those as JSON files.

One extensive usage of the JSON notation is to exchange object data between software systems or applications connected via local or wide-networks (Internet). These are, for example, the server applications that handle incoming API or data requests, or microservice processes running on the company’s internal server systems. The exchange of command messages between different software applications is called inter-process communications. Before the wide-spread adoption of XML and JSON data markup formats, there where several standards of object data exchange in a binary format: CORBA (Java), COM/DCOM (Windows), DDE (Windows).

JSON — compared to XML

As already mentioned, one widely used JSON predecessor was the eXtensible Markup Language or XML. This was the first attempt for software applications to exchange data in a structured textual manner while the data elements themselves are named. The basic building block of an XML was the tag, expressed with open (<tag>) and closed (</tag>) markup characters.

The Person’s object data would be presented as follows:

<Person>
<FirstName>Peter</FirstName>
<LastName>Pan</LastName>
<Addr>47 Sun Str, San Jose, CA</Addr>
</Person>

The XML notation is meant not only to transport the data but also to describe the data elements. When the data is described in such a way any application (or person) can extract only what it requires thus allowing for more flexibility. If some of the data is not required or it’s only optional — future services might be developed to use that as an extension. The XML was originated from the SGML (Standard Generalised Markup Language), which was the first attempt to put a markup (or “tagged”) data ideology into a standard. Two additional standards and a set of tools made the XML very handy — the XSLT (XML data transformation based on a set of rules) and the XPath (the structured textual “pointer” to any part of the XML data contained in a file).

But by far the most wide-spread markup data evolution was the HTML — the language of the browsers and the web pages. It was compact, flexible and allowed for a “loose” interpretation (the browser would not complain if some of the tags are not perfectly in order). The HTML was an XML derivation with well-defined tags, that had semantic meaning for the browser in order to render a visual web page.

JSON simplicity

One main problem with markup languages was the textual overhead they’ve placed over the exchanged data. In most implementations — the data (the values of attributes, lists, and structures) is no more than 20 to 25% of the exchanged byte-stream. Such inefficiency was becoming a problem for the large data processing — especially for Internet and cloud-based software systems. JSON came in as a natural way to “strip-down” the unnecessary “data weight” and turn the data-to-markup ratio to 90–95%. From an efficiency standpoint, it was a huge improvement. Such data representation allowed for 3–4 times more real data to be transported with the same raw byte-stream over the networks. JSON is the perfect balance: data is still somehow tagged/marked with special characters, but these are easy to understand by the JSON intended target audience — the software developers. All the special characters used in JSON ([], {}, :, "") have already been used for decades in the C-notation based programming languages (such as C++, C#, Java, JavaScript).

The hierarchical structure of JSON-represented data is very suitable for “document-based” database systems. Unlike the traditional Relational Database Management Systems (RDBMS) where all data is stored in tables with a fixed number of columns and the data is linked between the tables with special columns called “keys” (just referential numbers) — the document-based databases can store arbitrary (not always exactly the same) tree-like data documents. When tuned properly such database systems can fill-in the missing attribute values or transform parts of the original JSON-represented data to the schema rules of the database. By using directly the JSON format those DB systems avoid the algorithmic “nightmare” of decomposing all the tree-like hierarchical data into sets of tables (columns and rows, just like in Excel) when the data is stored and then re-construct the data back into JSON or XML text formatting.

Serialisation and Deserialisation

If the data has to be put into a textual format and back — more specifically for the object’s data (all properties, lists, and nested objects) — the process is called object serialisation (and deserialisation respectively). This is the transformation of all in-memory object data into binary or text form that can be written to the computer storage or transmitted as protocol message to another application. The idea is that the object can be stored and then restored back into the computer memory without any loss of the object’s data. Such implementation is very handy in transaction-processing systems (FinTech digital providers for example) that can record the state of the software system and restore it to a working order at any moment.

The main challenge for the object’s data serialisation is purely algorithmic. It is due to the very hierarchical nature of the object’s in-memory data representation and the way the object data can be nested (one object can hold as attribute pointer to another object, etc…). In order to avoid data redundancy (copying same data when the same object is referred by several other objects) the software algorithms must be implemented to support the so-called “deep” (copy all nested objects data) and “shallow” (copy nested data to certain extend) copying. When the data is de-serialised back into objects, the problem is a lack of pointer references in the JSON data file. So the software algorithm has to fetch and analyse all the contents of the data file in order to “fill-in” all nested object’s data. If not properly implemented such algorithms can cause severe performance issues during the deserialisation process.

To wrap it up

For an effective data exchange in textual a form, it is quite of importance to have standards in place in order to regulate the representations of numbers (with decimal precision), the scientific numeric notations, the special characters, the date-time, and the time-span information. Such a standard approach would guarantee that the data can be exchanged between systems based on different geographic locales with no loss of data or any chance of misinterpretation.

The currently adopted JSON ECMA-404 standard ensures all JSON data is properly validated and interpreted in its textual format. JSON is here to stay with the ever-growing community, tools, and libraries supporting it in virtually every software development environment.