DEV Community: Smersh

ESPB: WASM-like bytecode interpreter for ESP32 with seamless FreeRTOS integration. Part 2: The JIT Compiler

Smersh — Sun, 22 Feb 2026 13:00:50 +0000

Hi.

Exactly 3 months have passed since the first publication.
During this time, I’ve shaken things up significantly: I added a full-fledged JIT for Xtensa and RISC-V and implemented a heap of optimizations in the translator. I tested it on ESP32, ESP32-C3, and ESP32-C6 chips (the latter was tested on a residual basis—I only ran the main test; primary debugging was done on the first two).

Here are the main innovations.

1. Fast Symbols: Killing strcmp in Linking

Among the new features, I added Fast Symbols in addition to the standard symbol tables. These are two tables: one for system functions (ESP-IDF) and another custom one for your convenience.
The core idea is that we remove the string name of functions from the binary, leaving only the pointer. This approach requires strict coordination between the table and the translator so that the translator knows exactly which index to use for calling a function via libffi. This reduces the space occupied in flash memory and eliminates the slow strcmp during module loading.
At runtime, linking turns into instant address retrieval from a flat array:
But what about Kconfig?
In ESP-IDF, modules can be disabled (for example, cutting out GPIO). If we simply remove a function from the array, all subsequent indices will shift, and the wrong function will be called. This issue is solved via a macro:

// idf_fast.sym
ESPB_SYM("printf", (const void*)&printf)
ESPB_SYM("vTaskDelay", (const void*)&vTaskDelay)
// If GPIO is disabled in menuconfig, the macro substitutes NULL but keeps the index!
ESPB_SYM_OPT(CONFIG_ESPB_IDF_GPIO, "gpio_set_level", (const void*)&gpio_set_level)

The array size and the order of indices remain absolutely stable regardless of the firmware configuration.

2. JIT Compiler

The second feature is JIT. I decided that the best approach is to give the developer the ability to manually mark the specific functions in the code that need to be translated into machine code.
ESPB is originally designed as a register machine (up to 256 virtual registers). All the complex mathematics (Graph Coloring, Register Allocation) are handled by the C# translator on the PC. The ESPB runtime on the microcontroller is left with the simplest task: generating Xtensa or RISC-V instructions.
How it works:

In the C/C++ script code, the developer marks a heavy function with the JIT_HOT macro.
The translator sees this and sets the ESPB_FUNC_FLAG_HOT flag in the function header within the .espb file.
When instantiating the module, the runtime allocates memory via heap_caps_malloc(size, MALLOC_CAP_EXEC) (memory where execution is permitted).
The JIT engine generates the binary code and places the pointer in the table.
Cold code (e.g., one-time initialization) remains in Flash memory as bytecode, saving expensive IRAM.

P.S. Surprisingly, implementing JIT for the Xtensa architecture was the hardest part due to its register window ABI and literal pools.

3. Moment of Truth: ESPB vs WAMR

I went to the trouble of preparing a project for the wasm-micro-runtime (WAMR) from Espressif with an implementation of the Fibonacci(85) test, identical to the one I use for ESPB.

Tests were conducted on an ESP32-C3 chip (160 MHz):

The pure ESPB interpreter currently works slower than the WAMR interpreter. My efforts here weren't enough, and there is room for growth (ESPB currently lacks super-opcodes where multiple actions are baked into one instruction, and the .espb translator—as well as the interpreter—can still be optimized for a long time).
But the good news is that hot code works, and here we are 2+ times faster than WAMR's best mode, judging by this single test, of course. By the way, WAMR for ESP32 generally does not have a JIT compiler, only Classic and Fast interpreter modes.
It is evident that the team of programmers meticulously optimized the WAMR interpreters, which commands respect. The comparison currently isn't in favor of the creation of a suffering indie-coder, however.
I am not considering the AOT mode in WAMR, as the main idea is to make a single bytecode work on all systems.
By the way, another direction for development (besides optimization) is emerging here: I see it as "AOT on Device". That is, compiling all code into JIT, placing it in a partition on the flash, and subsequently executing it via XIP (Execute In Place). All of this needs to be generously diluted with a GOT (Global Offset Table) so that main firmware updates via OTA allow this AOT version to continue working. I need to conduct experiments first, but I think this direction should be viable. I'm actually considering this as the main mode if it works out.

4. Battle for Memory

I compiled five firmware variants for ESP32-C3: from an empty "Hello World" to "full option."

Figures from the build report (idf.py size):

What we see:

Smallest Engine: The pure ESPB interpreter (No JIT) takes up less space in Flash memory than even the most basic WAMR Classic (~2.5 KB less).
Note: The WASM was generated with Lib pthread, Libc builtin, Libc WASI, and Loader mode-normal options.
Cost of JIT: Enabling the JIT compiler in ESPB increases the firmware size (Flash) by approximately 56 KB. Static RAM consumption (DRAM) does not change.
DRAM: All runtimes add about 11–12 KB to RAM consumption relative to an empty project.

Script Sizes:

Size of .wasm file — 1277 bytes (uses LEB128 compression).
Size of .espb file — 1511 bytes (fixed types for speed).
Generated JIT code for two test functions occupied 2494 bytes in IRAM.

5. FFI: Death to "Glue Code"

Simple functions are easy to call everywhere. But the real pain begins when you need to use callbacks. Imagine a task: create a software FreeRTOS timer (xTimerCreate) that calls a function inside your script when triggered.
Let's see how this is solved in WAMR and ESPB.
WAMR: Architectural Pain
WASM is isolated from the microcontroller's memory. You cannot simply pass a pointer to a function into FreeRTOS because the native code doesn't know where to look for this function inside the virtual machine.

Step 1. Write the script (Guest side).
We cannot pass the function directly. We have to pass its index in the table.

// WASM (Guest)
typedef void (*timer_cb_t)(uint32_t, uint32_t);

// Get function index (in wasm32 this is not an address, but an index!)
timer_cb_t cb_ptr = test_timer_cb;
uint32_t cb_func_idx = (uint32_t)(uintptr_t)cb_ptr;

// Call custom wrapper, passing index instead of pointer
xTimerCreate_native("tmr", 2000, 1, 0, cb_func_idx);

Step 2. Write the "Bridge" in firmware (Host side).
This is the scary part. We need to create a context structure, write a native wrapper for timer creation, and a native callback adapter.

// Host (Firmware)

// 1. Context to pass arguments through
typedef struct {
    wasm_exec_env_t cb_exec_env;
    uint32_t        cb_func_idx; 
    // ... more fields for instance and handle
} wasm_timer_ctx_t;

// 2. Native callback adapter
static void native_timer_callback(TimerHandle_t xTimer) {
    wasm_timer_ctx_t *ctx = (wasm_timer_ctx_t *)pvTimerGetTimerID(xTimer);
    uint32_t argv[2] = { ctx->wasm_handle, ctx->timer_id };

    // Manual interpreter call
    wasm_runtime_call_indirect(ctx->cb_exec_env, ctx->cb_func_idx, 2, argv);
}

// 3. Wrapper over xTimerCreate
static uint32_t native_xTimerCreate(wasm_exec_env_t exec_env, 
                                    const char *name, uint32_t period, 
                                    uint32_t reload, uint32_t id, 
                                    uint32_t cb_idx) {
    // ... need to allocate context, save env, create timer ...
    // ... pass native_timer_callback instead of real callback ...
    return (uint32_t)handle;
}

// 4. Registration with scary signatures
static NativeSymbol native_symbols[] = {
    { "xTimerCreate_native", native_xTimerCreate, "($iiii)i", NULL }
};

Result: ~100 lines of code just to start one timer.

ESPB: Zero Glue Code
In ESPB, I solved this problem systemically. The translator knows that xTimerCreate accepts a callback. The runtime generates a trampoline on the fly via libffi in IRAM, which FreeRTOS sees as a standard C function.

Step 1. Write the script.
It's just standard C code. We pass the test_timer_cb function as is.

// ESPB (Script)
static void test_timer_cb(TimerHandle_t xTimer) {
    printf("Timer tick!\n");
}

void app_main(void) {
    // Call standard FreeRTOS API
    TimerHandle_t t = xTimerCreate("tcb", pdMS_TO_TICKS(2000), 
                                   pdTRUE, NULL, test_timer_cb);
    if (t) xTimerStart(t, 0);
}

Step 2. Add to firmware.
We don't need wrappers. We simply export 3 functions: create timer, get ID (for context), and the command control (since xTimerStart is a macro over xTimerGenericCommand).

// Host (Firmware) - Symbol Table
ESPB_SYM("xTimerCreate", (const void*)&xTimerCreate)
ESPB_SYM("pvTimerGetTimerID", (const void*)&pvTimerGetTimerID)
ESPB_SYM("xTimerGenericCommand", (const void*)&xTimerGenericCommand)

Result: 0 lines of glue code (only symbol registration). You write the script as if it were part of the firmware. The runtime itself understands that a function pointer was passed and creates a native closure for it.

I uploaded the WAMR project to GitHub:
https://github.com/smersh1307n2/wamr

6. Developer Experience: No "Header Hell"

Usually, development for custom VMs is painful. The IDE doesn't see system headers (FreeRTOS), autocompletion doesn't work, and to compile a script, you have to manually specify hundreds of paths to include directories.
I solved this problem radically: using the standard ESP-IDF build system.
You write script code in a normal C/C++ project inside VS Code. IntelliSense, code navigation, and error highlighting work because the project is configured as a legal ESP32 application.
For bytecode compilation, I wrote a PowerShell script get-ir-cmake.ps1 that performs magic:

Pulls actual build flags directly from your project's CMake.
Compiles script files using Clang into LLVM Bitcode (.bc).
Links the result (llvm-link) into a single .bc file, ready to be sent to the translator.

You only have to write code and mark critical sections with JIT_HOT.

7. Translation (Desktop Client)
I created the ESPB Desktop Client. This is a lightweight utility that works in conjunction with a cloud translator.
You simply feed the client the required files.
The client sends them to the cloud, where the server performs register optimization, calculates metadata for FFI, replaces string function names with indices (Fast Symbols), and returns the finished .espb file to the specified location in the ESPB Desktop Client.

On that note, allow me to take my leave.

Online Translator:
http://espb.runasp.net/
Interpreter Repository:
https://github.com/smersh1307n2/ESPB
Project for preparing LLVM IR: https://github.com/smersh1307n2/ESP32_PRJ_TO_LLVM
ESPB_Desktop_Client:
https://github.com/smersh1307n2/ESPB_Desktop_Client
I also recorded a video supplement:
https://www.youtube.com/watch?v=UbcuU-mabLs

ESPB: WASM-like bytecode interpreter for ESP32 with seamless FreeRTOS integration.

Smersh — Wed, 26 Nov 2025 05:16:57 +0000

Hi.

I want to present a project born from a long search for a way to dynamically load code onto a running ESP32 device. I think many have researched this direction.
It all started with an applied task, again, as a "can I do it?" challenge. In simple terms, a device was developed and assembled to switch pumps based on operating hours and manage a make-up system for an individual heating unit. It connects to a phone for monitoring and configuration via Bluetooth. At some point, I wanted to be able to extend its logic with new control schemes directly from the phone, without recompiling or re-flashing the main core. And so it began...

The Agony of Choice: Why Not WASM, Lua, or Something Else?

I considered standard solutions but rejected them for various reasons. In the end, the concepts of an ELF Loader and WASM caught my attention.
ELF Loader: This allows loading native code and executing it at maximum speed, with only a table of function pointers (let's call it a symbol table) on the firmware side. The resulting ELF file is tightly coupled to the architecture. Code compiled for an ESP32-S3 (Xtensa) will not run on an ESP32-C3 (RISC-V). I wanted universality—"one binary for the entire lineup."
WebAssembly (WASM): A sufficiently fast and interesting technology whose bytecode is not tied to a specific architecture. However, anyone who has tried to call a native function like xTaskCreate from WASM and pass a callback to it knows what a pain it is. It requires writing a huge amount of "glue code" and manually registering imports/exports. I wanted to write standard C code using the standard ESP-IDF APIs and have it "just work." This is how the idea for ESPB (ESP Bytecode) was born.

What is ESPB?

It's an ecosystem consisting of a Translator (which turns your C/(possibly)C++ code into bytecode) and an Interpreter (a virtual machine running on the microcontroller).
The main feature of the project is its seamless integration with the native API. By using a symbol table and a custom implementation of libffi, ESPB allows calling FreeRTOS functions (timers, tasks) directly from a loaded module without writing any wrappers.

How It Works Under the Hood:

Translator (based on LLVM)
I didn't invent my compiler from scratch; instead, I used LLVM. The process looks like this:
You write code in a standard ESP-IDF project.
You compile it with clang into LLVM Intermediate Representation (.bc file).
The Translator analyzes this .bc file and generates .espb bytecode as output.
The magic happens in the third step. The Translator performs complex work: it conducts a deep static analysis of the IR to understand the semantics of native function calls.
For example, it sees a call to xTaskCreate and understands that:
the first argument is a pointer to a function that will become the task body;
the last argument is a pointer to a TaskHandle_t, meaning it's an output (OUT) parameter.
Based on this analysis, the translator automatically generates special metadata:
cbmeta section: Information for the interpreter on how to correctly create a "trampoline" for the callback (my_task).
immeta section: Instructions for the interpreter on how to marshal OUT parameters—that is, how to safely copy the task handle from native memory back into the virtual machine's memory after the call.
It is this automatic analysis that eliminates the need to write tons of "glue code" manually.
What If the Automation Fails? The .hints Files
I aimed to make the translator as "smart" as possible. As mentioned, it performs a deep static analysis of LLVM IR, trying to automatically determine the semantics of calls: which pointer is an output (OUT), where the callback function is, and where its user data is.
Automatic analysis is not omnipotent. There will always be non-standard APIs or complex cases where heuristics can fail.
This is precisely why I introduced .hints files. These are simple text files that can be "fed" to the translator along with the .bc file. They allow you to manually "hint" to the translator how to correctly handle a particular function.

How does it work?

Suppose you have a native function my_complex_api(char* out_buffer, int size, my_callback_t cb). If the translator couldn't automatically determine that out_buffer is an output parameter, you can simply add one line to a .hints file:
Code: Select all

File my_project.hints

my_complex_api: out 0, cb 2

This entry tells the translator:
"For the function my_complex_api...
...the parameter at index 0 is an output (out).
...and the parameter at index 2 is a callback function (cb)."

This way, you get full control over the generation of FFI metadata, correcting any inaccuracies of the automatic analysis without changing a single line in the translator's source code.
Interpreter (on the device)
This is a virtual machine that executes the .espb file. It was designed from the ground up specifically for the ESP32 series of microcontrollers.

Key implementation features:

Custom libffi with "trampolines" placed in IRAM: I took the libffi library as a basis and adapted it to support the Xtensa and RISC-V architectures for this VM. A key feature of my adaptation is a special allocator that places the executable code of closures ("trampolines" for callbacks) in fast IRAM (Instruction RAM). This is critically important as it allows callbacks to be invoked (for example, from FreeRTOS timers).
Register-based machine with a shadow stack: Unlike stack-based VMs, ESPB uses a register-based model. This is closer to the architecture of real processors and allows for the generation of more efficient bytecode. For maximum compactness, register indices in instructions are encoded in just one byte. All operations with the call stack and local variables of functions occur in a special "shadow stack"—a dedicated buffer in RAM, which ensures isolation and predictability.

Memory Isolation:

Isolated Linear Memory and a Private Heap: For each ESPB module, the interpreter allocates a contiguous block of RAM—linear memory. This block becomes the full address space for the executed bytecode. This is where:
Static data is copied: All global variables, string literals, and constant arrays from your C code are placed in this memory when the module is loaded.
A private heap operates: When your ESPB code calls malloc, calloc, or free, it is actually accessing a memory manager (espb_heap_manager) that manages memory allocation within this same isolated block. This prevents fragmentation of the global ESP-IDF heap and increases system stability.
Stack variables are placed: The alloca instruction also allocates memory in this area, emulating the behavior of a native stack.
Vibecoding and the Role of Neural Networks
This project is the result of so-called "vibecoding." It has been a long and rather difficult journey since May. Neural networks helped to implement this project. It was exclusively this symbiosis that allowed me to take on system programming at this level.

How to Try It?

I tried to make the process as similar as possible to standard ESP32 development. There is a template project, ESP32_PRJ_TO_LLVM. This is effectively a standard ESP-IDF project. You write your code in it, include libraries, and debug. The get-ir-cmake.ps1 script extracts the .bc file from the build system. This file is fed into the online translator (link below), which outputs a ready-to-use .espb file. The .espb file is placed in the firmware (or uploaded via Wi-Fi/UART) and executed by the interpreter. So far, I have only tested hard-coding the .espb file along with the .bin. To obtain the .bc, I used the clang version included with ESP-IDF 5.4. It's worth noting that the translator supports clang versions no higher than 20.1.2.
Example of What Works "Out of the Box"
The most interesting part is that you can write the following code, compile it into bytecode, and it will work:

#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include <stdio.h>

while (true)
{
    vTaskDelay(1000 / portTICK_PERIOD_MS);
}

void my_task(void* pvParam) {
    while(1) {
        printf("Hello from dynamic code!\n");
        vTaskDelay(1000 / portTICK_PERIOD_MS);
    }
}


void app_main(int argc, char* argv[], char* envp[])
{
    xTaskCreate(my_task, "dyn_task", 4048, NULL, 5, NULL); 

    while (true)
    {
        vTaskDelay(1000 / portTICK_PERIOD_MS);
    }
}

Example symbol table in the interpreter

static const EspbSymbol cpp_symbols[] = {
    { "printf", (const void*)&printf },         
    { "puts", (const void*)&puts },
    { "vTaskDelay", (const void*)&vTaskDelay },
    { "xTaskCreatePinnedToCore", (const void*)&xTaskCreatePinnedToCore },
    { "xTimerCreate", (const void*)&xTimerCreate },
    { "pvTimerGetTimerID", (const void*)&pvTimerGetTimerID },
    { "xTimerGenericCommand", (const void*)&xTimerGenericCommand },
    { "xTaskGetTickCount", (const void*)&xTaskGetTickCount },
    {"pvTimerGetTimerID", (const void*)pvTimerGetTimerID},
    { "vTaskDelete", (const void*)&vTaskDelete },
    // ... and other necessary functions
    ESP_ELFSYM_END
};

Tested on Hardware

I didn't limit myself to simulators. The entire system was tested and debugged on real devices to ensure cross-architecture compatibility:
ESP32 (dual-core Xtensa LX6)
ESP32-C3 (single-core RISC-V)
ESP32-C6 (single-core RISC-V)
The same .espb file successfully launched and ran on all these platforms, confirming the main idea—the universality of the executable code.

Project Status and Links

The current implementation of the interpreter does not yet support JIT or AOT—it is a pure interpreter. The project is in an active Proof of Concept (PoC) stage but can already execute quite complex logic. Future plans include polishing, bug fixing, and optimization.

Online Translator: http://espb.runasp.net/
Interpreter Repository: https://github.com/smersh1307n2/ESPB
Project for preparing LLVM IR: https://github.com/smersh1307n2/ESP32_PRJ_TO_LLVM

For the Online Translator, I need to add a translation statistics output to make it clear how the cbmeta and immeta sections are formed. The site is also in its infancy. Essentially, it's just for translating the .espb file for now and contains a generated description.
I would be glad to receive any criticism and advice.