Introduction
Some time ago, I made a joke encoding called Base8192. It looks like this:
Input: What is base8192?
Base8192: 卶晡啂幩唲幢吗慥冃弹儣洀等
Base64: V2hhdCBpcyBiYXNlODE5Mj8=
Base8192 utilises Chinese characters and uses fewer letters compared to Base64. In the example above, Base8192 uses only 13 characters, whereas Base64 needs 24 characters.
If you're curious how it works, here is the GitHub link to the repo and README explains more details.
https://github.com/karintomania/Base8192
So, I got an idea, implemented it and confirmed it works as expected.
But now what? What if I wanted to encode something more than just user-input text?
Self-Hosting Dream
I’ve heard that, in the world of making a compiler, compiling a compiler in your own compiler is a big milestone.
This is called self-hosting.
I decided to do something similar with my encoding: encoding an encoder with the encoder.
(Image:https://unsplash.com/photos/red-blue-and-yellow-ceramic-figurine-PB80D_B4g7c)
How it works
My original implementation of Base8192 is written in JavaScript. I chose it because I wanted to make an interactive demo website. You can try out from the link below:
https://karintomania.github.io/Base8192/
I want my new self-encoded encoder to be available in the website too, but it doesn't make sense to encode JavaScript code for the browser where you can run it directly.
But, there is another runtime in browsers: WASM.
To include a WASM binary in your code, there are a few ways to do it and encoding the binary and decoding it in JS is perfect for my case.
How to make WASM binary
Up until I started this challenge, I didn’t really know what WASM is. Apparently various languages can be compiled to WASM (even JS!).
My choice of language was zig. Simply because that is the language I've been playing with and its WASM support looks good.
Another choice could have been Moonbit and it’s a quite interesting language. But at this point, I started to learn too many languages, so I resisted my urge to learn something shiny 😅
So, the mental model of this project looks somewhat like the diagram below:
Rewrite the encoder in zig
Writing the same code in JavaScript and zig was a fun experience! These two languages are wildly different; JS is a high level, dynamically typed, scripting language with Garbage collector, whereas zig is a low level, statically typed, compiled language, with manual memory management.
To my surprise, the amount of code is not too different. The core logic of the JS version (base8192.js) contains 288 lines and the zig version (base8192.zig) contains 258 lines (excluding tests in the same file).
I expected zig code to have more lines given zig being a quite low level language! This is partially because my implementation in zig is a bit better as it’s my second iteration. But I think zig’s simple syntax helped to reduce the code volume.
Call WASM from JS
Now, let’s call the encode/decode function defined in the WASM from JS!
One thing you need to be careful about is that WASM functions can only return primitive types to JS.
My encode function wants to return string, and decode function wants to return a struct. Neither string nor struct is supported. What do I do?
To return a string, first we store the string as an array of 8-bit unsigned integers (u8) on WASM memory. Now, you somehow need to tell JS which address the string starts and the length of the string, so that the JS knows exactly where in the WASM memory to read.
Since you can only return 1 value in function, this is a bit tricky (I had to remind myself a few times I can’t return an object like {"address": xxx, "length": yyy} in WASM 😅).
One straightforward solution is to create 2 functions, which returns the pointer and the length. For Base8192 encoding, the length of the encoded string is determined by the input’s size, so this is feasible.
fn getEncodeResultLen(inputLength: u32) u32 {
// rest of the logic
// ...
return length;
}
fn encode(input_ptr: [*]const u8, length: usize) ?[*]u8 {
// encoding logic
// ...
return result; // return pointer
}
But this doesn’t work well for decoding as we only know the result of decoding after the decoding is done. This is because when Base8192 detects invalid sequences, it skips them until it reaches valid data again. Therefore, the length is unknown until the decoding finishes.
What I did is to prefix the string with its length. It looks like below:
The function returns pointer to the prefixed string. JS reads the first 4 bytes (32bits) and it is the length of the output. Now, JS can read the string, starting from the fifth byte and for the given length.
For the decode function, I wanted to return the result string and the index of the detected errors (base 8192 is self synchronous, meaning it can detect errors and still can decode the rest of the input). It would be like this:
{
"result": "Hello World!!",
"errors": [2, 4]
}
I could do a similar approach, prefixing the result with length of result and errors array. But I got lazy 😅 and decided to just return JSON as string and prefix it with the length.
This comes with some performance penalty on JS side to run JSON.parse(), but I can take that trade-off for the ease of the implementation.
Encode the encoder in the encoding
Finally we can self-encode the encoder.
The code below reasd the encoder file and store it in sourceBinary. sourceBinary is now used to create a WASM instance and the input of the encode_w_binary function.
async function encode_self() {
// Read the binary file
const sourceBinary = fs.readFileSync("./zig-out/bin/base8192.WASM");
const typedArray = new Uint8Array(sourceBinary);
// create WASM instance from the binary
const WASM = await WebAssembly.instantiate(typedArray, {env: {}});
const WASMInstance = WASM.instance;
const WASMMemory = WASMInstance.exports.memory;
// encode the binary with the WASM instance
const result = await encode_w_binary(sourceBinary, WASMInstance, WASMMemory);
// write the JS string to encoder.js
fs.writeFileSync("./encoder.js", `export const encoded = "${result}";`);
}
And here it is. The result is stored in encoder.js like below (I managed to fit 10,000+ characters into one screenshot):
Beautiful 😊
You can see the actual file on GitHub too: https://raw.githubusercontent.com/karintomania/Base8192/refs/heads/main/encoder.js
Benchmarking
Although performance is not the reason I used WASM, I expected some improvement with the WASM version of the encoder.
I created a 2 MB of csv file to measure how long it takes to encode it.
Here is the result:
| Speed (ms) | Memory Usage (MB) | |
|---|---|---|
| JS | 220 | 187 |
| WASM | 658 | 142 |
Memory usage is taken from time command’s Maximum resident set size.
Actually the JS version is faster! I ran the test multiple times but the result didn't change. I assume encoding is a kind of the task where JIT can do a great job.
Memory usage is lower in the WASM version as expected.
Optimising the WASM
I couldn’t accept the fact that my WASM is slower than JS 😅 and I remembered zig has multiple optimisation options.
Zig has a build option for optimisation and by specifying the option, you can get a binary which is optimised for something.
The standard options are:
- Debug (Provide better debug info)
- Fast (Optimised for speed)
- Small (Optimised for binary size)
- Safe (Optimised for memory safety)
The option I used was the small one. This is to make the binary small so that the demo site can be snappy to load. But for the benchmark, why not using the fastest option!
After using the release-fast option, this is the result.
| Speed (ms) | Memory Usage (MB) | |
|---|---|---|
| JS | 220 | 187 |
| WASM (small) | 658 | 142 |
| WASM (fast) | 183 | 140 |
This is like 350% faster (658 → 183ms) and WASM is now faster than JS! I’m really happy with this result.
Conclusion
If you are interested in Base8192, here is the interactive demo site, which runs the WASM binary:
https://karintomania.github.io/Base8192/
You might wonder the point of this article.
Will Base8192 replace Base64? No.
But after creating my own encoding and self-encoded the encoder, I learned more about encoding, WASM and a lot of low level stuff through zig!
I hope you enjoy this article.





Top comments (0)