If you have been coding in the web development industry, you are most likely pretty familiar with JSON. It is the all-encompassing de facto standard that is never challenged. It is used everywhere, and you have become accustomed to it. All your REST calls transfer data via JSON. You know the format's limitations, and you accept them.
Or do you have to?
(Note: all links to packages and code are in the Links-section of the article)
Brief history
My background is heavily in the Java and JavaScript/TypeScript world, so I have learned how to deal with their own quirks. And many years ago, I began a hobby web project (TypeScript/Node) that had a problem which JSON could not solve well.
I wanted to break free from the RESTful mindset to a more relaxed, message-based transport between browser and server. And for that, I really wanted to utilize the JavaScript type system to differentiate messages from each other. You know, one would have classes like AddDocument, GetUsers, GiveMeAllYourMoney, etc. And instead of having many HTTP endpoints, I would have just one, and the messages would flow from browser to server and back in a more ad hoc manner.
But I did not really have an elegant solution for my needs, because JSON destroys all type information during serialization. Of course, I could use some dedicated property to transfer types, but that would require custom processing, and I just felt that it was not the route I wanted to go. I just wanted a protocol that would take my object as is and serialize it in such a way that it would be exactly the same when deserialized. It would retain all type information, and that's it. So I needed an alternative.
One could argue that there are plenty of alternatives to JSON, like Protocol Buffers or MessagePack. But the nature of these alternatives is that they are all binary protocols. And even when I searched information where they are used, web development was totally absent from the scene. And I personally didn't feel that any of them would satisfy my requirements.
Thus, I began the challenge of creating a better JSON for myself.
Introducing Cbot (Character Based Object Transport) -protocol
About five years ago, I began this journey by creating the first version of the protocol. It didn't have a name then; I just referred to it as EJS (Enhanced JSON).
Through gradual improvements, I developed the second iteration. And now, with the third evolution, which I have named Cbot, I finally feel it is mature enough to introduce to others who might be interested.
What are the main features of this protocol and why are they important?
As I mentioned before, the original spark for the project was the capability to retain types during the serialization process. But I quickly realized that I could then also embed even more information that JSON could not.
JSON has a bad habit of not guaranteeing anything. You can put whatever you like, and you normally have to just trust that your name property actually contains a string and not an array of booleans. Or you check everything at runtime to make sure. Of course, there are libraries to check the model. But again, I asked myself, why can't the actual protocol implementation already do it so that I could trust that what was deserialized was the thing I wanted?
Also, the native types of JSON are quite limited. For instance, JSON has an array for collections. But JavaScript already has sets and maps. Could I add them too? And what about dates? There have been numerous times when I have struggled with date formats. Maybe you have a date with a timezone, or maybe you don't. Was it even intended to have one? You never know, because JSON does not really tell you anything.
So in essence, I wanted to rectify this kind of shortcomings in some way.
Why not a binary protocol?
This is a good question. First reason is, that there are already a plethora of binary protocols out there. So why create another one? And the second question is that, if there are better alternatives, why haven't they taken over JSON years ago? There must be good reasons for it.
My guess is simply that working with binary data is not that easy with Javascript. It is easier to work with strings. And JSON is easy to understand and view as humans. And of course browsers have a native JSON-support.
Because Cbot is targeted to work in browser environment, it was more clear to create a character-protocol instead.
What does it look like?
This article is not meant to be a tutorial for Cbot because such a tutorial already exists. However, because you are most likely a developer/engineer, you need at least some sort of understanding of what is happening. So I formulated an example for this purpose. In the example, I am using Cbot as a simple JSON replacement. Using more advanced features requires the use of a metamodel, which is also discussed in the actual tutorial.
But anyway, here is the object:
{
name: "John Smith",
age: 41,
address: {
street: "Second Avenue",
postalCode: "1356-A",
city: "Yorkistan"
},
isNiceGuy: true,
hobbies: [
"Playing cards",
"Shopping",
"Asking odd questions"
],
favouritePoem: {
title: "Digital Dreams",
created: new Date("2024-09-16T12:13:00"),
content: "In the code, we drift and weave,\n"
+ "A dance of data we perceive.\n"
+ "With each keypress, a world unfolds,\n"
+ "Infinite stories, yet untold."
}
}
When this object is converted to a Cbot-message, it looks like this:
112345abb
E
A name
B JKJohn Smith
A !age
B !Id41
A "address
B "E
A #street
B #JKSecond Avenue
A $postalCode
B $JK1356-A
A %city
B %JKYorkistan
F
A &isNiceGuy
B &Iet
A 'hobbies
B 'C
JKPlaying cards
JKShopping
JKAsking odd questions
D
A (favouritePoem
B (E
A )title
B )JKDigital Dreams
A *created
B *Ih2024-09-16T12:13:00.000+03:00
A +content
B +JL
OIn the code, we drift and weave,
OA dance of data we perceive.
OWith each keypress, a world unfolds,
NInfinite stories, yet untold.
M
F
F
Cbot format is designed to be primarily machine-readable. It has a predictable and straightforward syntax, and it could be seen as a kind of small assembly language. Each command is separated by a newline, and each line begins with an opcode that explains how objects are to be constructed.
Because this format is meant to be read programmatically, it does not really make any sense to be read as is. However, it can be visualized in disassembly format, which explains the content much better:
MCSM 12345abb
OBJB (plain)
DEFN 0 name
ASGV 0 (name) STRN SSTR John Smith
DEFN 1 age
ASGV 1 (age) NATV FLOAT64 41
DEFN 2 address
ASGV 2 (address) OBJB (plain)
DEFN 3 street
ASGV 3 (street) STRN SSTR Second Avenue
DEFN 4 postalCode
ASGV 4 (postalCode) STRN SSTR 1356-A
DEFN 5 city
ASGV 5 (city) STRN SSTR Yorkistan
OBJE
DEFN 6 isNiceGuy
ASGV 6 (isNiceGuy) NATV BOOLEAN TRUE
DEFN 7 hobbies
ASGV 7 (hobbies) ARRB
STRN SSTR Playing cards
STRN SSTR Shopping
STRN SSTR Asking odd questions
ARRE
DEFN 8 favouritePoem
ASGV 8 (favouritePoem) OBJB (plain)
DEFN 9 title
ASGV 9 (title) STRN SSTR Digital Dreams
DEFN 10 created
ASGV 10 (created) NATV ZONED_DATETIME 2024-09-16T12:13:00.000+03:00
DEFN 11 content
ASGV 11 (content) STRN STBG
STNL In the code, we drift and weave,
STNL A dance of data we perceive.
STNL With each keypress, a world unfolds,
STPA Infinite stories, yet untold.
STEN
OBJE
OBJE
In the disassembly, one can see a number of commands, some explanations, and data. Here is a brief summary of the opcodes:
-
MCSM
is a Model Checksum, which is used as a sanity check that the message is understood by both parties. -
OBJB / OBJE
denotes the beginning and the end of an object -
DEFN / ASGV
pair means that first an index is assigned to a property name, and then ASGV uses that index to assign a value to an object. Therefore, if the same property name is encountered again within the message, it does not have to be repeated. -
SSTR SSTR
denotes a simple ordinary string -
NATV FLOAT64
denotes a native value for a 64-bit float -
NATV BOOLEAN TRUE
, well you guessed it already -
ARRB / ARRE
-pair denotes the beginning and the end of an array -
NATV ZONED_DATETIME
denotes a zoned datetime, which is the default for Javascript Date -
STRN
,STBG
,STNL
,STPA
, andSTEN
are a set of instructions that define a string builder. Because strings may contain newlines and they can be indefinitely long, a string builder pattern is used to split the string into more manageable pieces.
Is this a TypeScript-only thing?
No, it is not.
Due to my background and the use case, the implementation naturally started from the JavaScript side. But because Cbot is language-agnostic, it can be extended to other languages as well. In fact, there is already a working Java implementation that supports basically everything that the protocol is capable of doing.
Is there a specification somewhere?
Kind of. I found that creating a proper specification was actually really hard to do. I did try to use some sort of EBNF format to create one, but my first problem was that there is no single specification for such a format (the irony). Just a bunch of interpretations of it. Also, even if I had used one of the versions, I wouldn't have any means to actually validate specifications correctness.
So instead, I decided to create a TypeScript file that contains the validation logic as types and classes. And I used that spec file to validate my tests. Thus, it became the validating specification. That spec file is then the master specification that other implementations must use as the source of truth.
What is the status of the project right now?
As I am writing now, I feel that it is basically feature-complete for most use cases. There are some functionalities that need more research, for instance, enums, binary-type support and non-nullable-property support in the meta-model.
However, what I actually need is feedback. I get it that for some, making a JSON replacement is utter nonsense and TypeScript smells like fart. But those who actually feel that Cbot may solve a use case, I would like to know how it fares what support is considered important.
In essence, the next step is just to get some constructive feedback to make sure that the protocol can be stabilized to a first actual version.
Links
- Contact: sisujs@sisujs.fi
- Git: https://gitlab.com/sisujs/sisujs/
Repositories
- NPM: https://www.npmjs.com/package/@sisujs/meta-cbot
- MVN: https://mvnrepository.com/artifact/fi.sisujs/cbot
Documentation
- TypeScript Tutorial: https://gitlab.com/sisujs/sisujs/-/blob/main/docs/cbot/tutorial_ts.md
- Java Tutorial: https://gitlab.com/sisujs/sisujs/-/blob/main/docs/cbot/tutorial_java.md
- Typedoc: https://gitlab.com/sisujs/sisujs/-/blob/main/js/meta-cbot/typedoc/README.md
- Javadoc: https://www.javadoc.io/doc/fi.sisujs/cbot/latest/index.html
Top comments (0)