Rafa Calderon

Posted on Feb 24 • Edited on Mar 1 • Originally published at bdovenbird.com

WASM Microservices: From Single Binaries to Composable Components

#webassembly #architecture #microservices #rust

Traditional microservices pay a massive tax on serialization and network overhead. WASM microservices eliminate this toll completely — inter-service calls in nanoseconds instead of milliseconds. But to understand how we got here, let's start at the beginning: your deployment pipeline has layers. Too many of them

Your code lives inside a runtime (JVM, Node, Python), which runs inside a container (Docker), managed by an orchestrator (Kubernetes), hosted on a VM, which finally runs on actual hardware somewhere in Virginia. Each layer was added to solve a real problem. But together, they add weight, cold start times, and more moving parts that can break.

However, there is a trend quietly dismantling this complexity. It starts with something surprisingly simple: a single file. And it ends with something that could change how we think about microservices forever: WASM microservices.

Part 1 — The Single Binary

Rust compiles your code into a single executable. If you compile against musl instead of glibc, the resulting binary has exactly zero system dependencies. Everything your application needs is packed into that file. There is no JVM. No node_modules. It doesn't even need the C standard library installed on the target machine. You can drop it into a FROM scratch Docker image — literally an empty filesystem with nothing but your executable. You copy it to a server and run it. That's the deployment.

Languages like Go do something very similar (packing even their own garbage collector into a static file), but the premise is the same: no heavy base image.

The size difference is hard to ignore:

Stack	Artifact Size	Runtime Dependencies	Cold Start
Rust (musl static)	~5–10 MB	None	< 10 ms
Go (static)	~10–20 MB	None	< 10 ms
Java (Spring Boot)	~50–200 MB	JVM (~200 MB)	Seconds
Node.js (Next.js)	~200–500 MB	Node Runtime (~100 MB)	Seconds
Python (Django)	~100–300 MB	Python + C libs	Seconds

This isn't an artificial benchmark. It's what happens when you remove layers between your code and the machine. No interpreter, no classloading, no dependency resolution at startup.

It's worth noting that the tools infrastructure engineers build for themselves are almost always single binaries: Kubernetes, Docker, Terraform, Prometheus, CockroachDB, Caddy, Hugo, ripgrep. These are people who deal with deployment complexity every single day. They chose not to inflict it upon themselves.

The single binary is a real, proven win. Less to deploy, less to break, faster to start, cheaper to run.

But it's not the end of the story. It's the beginning.

Because no matter how much you optimize each service individually, in a microservices architecture, the bulk of the overhead isn't inside the services. It's between them: serialization, HTTP over TLS, deserialization, and starting all over again at the next hop. And yes, this still applies if you use gRPC with Protobuf instead of JSON — binary serialization is faster, but you still pay the physical toll of the network: TCP, TLS, latency, service mesh sidecars if you have them. The network is still the bottleneck. Multiply that by every jump in the chain, and you have a system where communication can cost more than computation.

Microservices exist for good reasons — independent deployment, team autonomy, fault isolation. You can't just merge everything back into a monolith. What you need is the isolation of separate services with the speed of a function call.

That is exactly what WASM microservices are. And to understand them, we first need to talk about WebAssembly.

Part 2 — WebAssembly, Fast

Before we get to the interesting part, let's make it clear what WebAssembly (WASM) is, without the hype.

WebAssembly is a bytecode format. You write code in Rust, Go, C, Python, or other languages, compile it into a .wasm file, and a WebAssembly runtime executes it. Think of it like Java's .class files or .NET's IL, but designed to be universal rather than tied to a single-language ecosystem.

Three properties matter for our story:

It's portable: The exact same .wasm file runs on Linux, macOS, Windows, in a browser, on a server, or on a Raspberry Pi. Compile once, run anywhere.
It's sandboxed: A WASM module cannot do anything by default. It cannot read files, it cannot open network connections, it cannot access memory outside its own sandbox. You have to explicitly grant it permissions. It is the exact opposite of a normal process, which can do everything unless you restrict it.
It's fast: WASM runs at near-native speed. It's not "fast after warming up the JIT for 1000 calls." It is consistently close to the performance of native C/Rust code.

Now, a WASM module that only knows how to do math operations in its own isolated memory isn't very useful. That's why WASI exists — WebAssembly System Interface. WASI gives WASM modules controlled access to system capabilities: reading files, opening sockets, getting the current time. It is the standard library that WASM lacks on its own.

With WASI, you can compile a real application — an HTTP server, a CLI tool, a data pipeline — to WASM and run it on any platform that has a WASM runtime. That is already incredibly useful. But it's not the reason we are here.

We are here for what WASI 0.2 introduced in 2024: the Component Model.

Part 3 — WASM Microservices: Services Calling Each Other Like Functions

Here we reach the core concept. A WASM microservice is a WebAssembly component that acts like a classic microservice — it has its own responsibility, its own isolation, it deploys independently — but it communicates with other WASM microservices via typed function calls in shared memory. Not via HTTP. Not over the network. Functions.

The piece that makes this possible is the Component Model, introduced with WASI 0.2. It defines a standardized way for WASM modules to declare what they offer and what they need, using a small interface description language called WIT (WebAssembly Interface Types):

// A CSV parser component declares what it exports
package myapp:parser;

interface csv-parser {
    record row {
        fields: list<string>,
    }

    parse: func(raw-data: list<u8>) -> list<row>;
}

world parser-component {
    export csv-parser;
}

// A prediction component declares what it needs and what it offers
package myapp:ml;

interface predictor {
    record prediction {
        label: string,
        confidence: f64,
    }

    predict: func(rows: list<myapp:parser/csv-parser.row>) -> list<prediction>;
}

world ml-component {
    import myapp:parser/csv-parser;   // "I need a parser"
    export predictor;                  // "I offer predictions"
}

This looks like an interface definition (like OpenAPI or Protobuf), and it is. But there's a fundamental difference: these components don't communicate over the network. They are linked at runtime.

When Component A calls a function in Component B, what actually happens is:

Component A places data into a shared memory region.
The runtime invokes Component B's function.
Component B reads the data, processes it, and places the result back.
Component A reads the result.

No JSON serialization. No building an HTTP request. No TLS handshake. No TCP socket. No network. Just a function call with data that is already in memory.

The cost of that call is measured in nanoseconds. The cost of an HTTP call between microservices is measured in milliseconds. That's orders of magnitude in difference.

But doesn't this break isolation?

No. And this is the part that makes it all work.

Each WASM Component has its own isolated linear memory space. Component A cannot read or write to Component B's internal memory under any circumstances. The only way to interact is through the explicit interfaces defined in WIT. The runtime mediates and secures every single call between components.

You get the security boundary of traditional microservices — no component can corrupt another's state — with the performance of an in-process function call. That is a WASM microservice: the isolation of a microservice, the cost of a function call. It's like two people passing documents through a secure teller window: they can exchange data through a well-defined opening, but neither can enter the other's office.

Composition: Multiple WASM microservices in a single file

This is where the technology unleashes its full potential. You can take WASM microservices written in different languages and compose them into a single binary at compile time:

# parser.wasm (compiled from Rust)
# ml-model.wasm (compiled from Python)
# reporter.wasm (compiled from Go)

$ wasm-tools compose parser.wasm ml-model.wasm reporter.wasm -o pipeline.wasm

$ ls -lh pipeline.wasm
-rw-r--r--  1 rafa  staff  2.1M  pipeline.wasm

Three "services", written in three different languages, with strict typed contracts between them, composed into a single 2 MB file. You can deploy that file on a server, on an edge node, or in a browser. No orchestrator, no service mesh, no network between them.

What if you need to replace the ML model? You recompile just that component, recompose the binary, and redeploy. The parser and reporter remain unchanged. You have independent deployability at the component level, not at the network service level.

WASM microservices in practice: Fermyon Spin

Spin is probably the most mature framework for building WASM microservices today. It defines itself as a "framework for building and running event-driven microservice applications with WebAssembly components." Spin was accepted into the CNCF Sandbox (the same foundation that hosts Kubernetes), and following Fermyon's acquisition by Akamai in December 2025, it is backed by one of the largest edge networks in the world.

Here is a Spin application in Rust:

use spin_sdk::http::{IntoResponse, Request, Response};
use spin_sdk::http_component;

#[http_component]
fn handle_request(req: Request) -> anyhow::Result<impl IntoResponse> {
    let body = format!("Hello from a WASM component! Path: {}", req.path());
    Ok(Response::builder()
        .status(200)
        .header("content-type", "text/plain")
        .body(body)
        .build())
}

$ spin build
$ spin up
Serving http://127.0.0.1:3000

That component weighs kilobytes and boots in under a millisecond. A Spin application can have dozens of WASM microservices, each handling different routes, written in different languages, and composed into a single deployment unit.

Spin 3.0 also added selective deployments: platform engineers can repackage the exact same WASM microservices into different deployment topologies without touching a single line of component code. Need the parser and the ML model bundled together on one node, but the reporter separated on another? Reconfigure, recompose, done. This is structurally impossible with traditional containers without rewriting your code.

Part 4 — Where Are WASM Microservices Today?

This is not just a W3C specification gathering dust. There are several patterns where WASM is already in massive production, and each exploits a different property of the technology.

Plugin systems: Running third-party code without risk

This is probably the most mature use case. Shopify Functions allows developers in its ecosystem to inject custom logic into Shopify's backend (discounts, shipping rules, checkout validations). Each function is a WASM module running in a strict sandbox within Shopify's infrastructure. The partner has no access to the OS, the network, or the memory of other functions. They only receive input data, process it, and return a result.

Why WASM and not containers? Because Shopify needs to execute code from thousands of third parties on critical paths like checkout, where every millisecond of latency means lost revenue. One container per function doesn't scale. A WASM module that boots in microseconds and runs at near-native speed does. (Shopify is also a Bytecode Alliance member and created Javy, the toolchain that compiles JavaScript to WASM, now widely used across the industry).

Edge computing and serverless: Zero cold starts

Fastly Compute runs WASM in over 79 global datacenters with instantiation times measured in microseconds — not milliseconds, microseconds. Every request creates an isolated WASM instance, executes the logic, and destroys it. No connection pools to maintain, no warm containers eating up idle memory.

Akamai acquired Fermyon (the creators of Spin) in December 2025 to integrate WASM microservices into its network of over 4,000 global edge locations. Before the acquisition, they were already handling 75 million requests per second in production with fractional-millisecond cold starts. When a CDN of that scale buys a WASM company, the technology is no longer experimental.

Cloudflare Workers has been running logic at the edge for years using V8 Isolates (not pure WASM), but the ecosystem's trajectory is the same: push compute as close to the user as possible, using instances that start instantly and weigh almost nothing. The "one container per function" model has unacceptable overhead here. WASM eliminates it.

Heavy embedded compute: Google Sheets

A non-microservice case that perfectly illustrates WASM's potential to redesign heavy systems: Google migrated the Google Sheets calculation engine from JavaScript to Java compiled to WasmGC, achieving a 2x performance improvement. WasmGC allows garbage-collected languages to compile to WASM without shipping their own GC, drastically reducing binary size. When you move a calculation engine used by billions to WASM, the numbers justify it.

IoT and industrial edge

MachineMetrics, an industrial IoT company, uses wasmCloud to move WASM microservices between edge devices and cloud environments with dynamic fault tolerance. If a factory node goes down, components migrate to another node or to the AWS cloud automatically. Try doing that seamlessly with Docker containers in milliseconds. WASM's true portability (the exact same binary runs on an industrial ARM and an x86 cloud server) makes this possible.

Enterprise FaaS

American Express built an internal FaaS platform using wasmCloud. Their primary motivation: pack more functions into the same physical infrastructure while maintaining strict security boundaries, support multiple languages without maintaining dozens of Docker base images, and slash cold starts.

The common thread

None of these companies adopted WASM because of the hype. The common denominator is that they all hit a bottleneck where the traditional model (containers, heavy runtimes, network hops) simply couldn't scale:

Shopify needed to run third-party code securely and blazingly fast.
Fastly and Akamai needed distributed compute without container bloat.
MachineMetrics needed true binary portability across CPU architectures.
American Express needed extreme density and isolation.
Google needed pure performance in the browser.

WASM gave them the way out.

Part 5 — The Honest State of WASM Microservices

I've been painting an optimistic picture, so let me be blunt about what is not mature just yet:

The transition to Async I/O (WASI 0.3): The previous version (WASI 0.2) only supported synchronous I/O, meaning that reading from a socket blocked the entire instance. The arrival of WASI 0.3 in early 2026 has finally brought native asynchronous I/O to the Component Model, which is absolutely critical for high-performance network services. The standard is here, but libraries and languages are still in the process of digesting and adopting this new paradigm.
The library ecosystem is young: If you need an OAuth2 client, a native PostgreSQL driver, or an image processing library compiled as a WASM Component, you might find it, or you might not. The situation is improving fast (especially in Rust and Go), but it's nowhere near the vastness of npm, crates.io, or Maven Central.
Tooling is maturing: Debugging a WASM Component isn't as seamless as debugging a regular application. Profiling tools are limited, and IDE support exists but isn't first-class everywhere just yet.
Not all languages are created equal: In theory, you can mix Rust, Go, Python, and TypeScript. In practice, Rust and Go are first-class citizens. Python and TypeScript work, but they often do so by bundling an entire interpreter inside the WASM module (using tools like Javy or ComponentizeJS), which bloats the binary size and hurts performance. Today, the sweet spot for true high performance is Rust or Go.
Not everything should be a WASM microservice: If your service is I/O bound (waiting on slow database queries or calling third-party APIs), the inter-service network overhead is not your real bottleneck. WASM microservices shine when services do actual processing and call each other at very high frequencies. For a simple CRUD app that just talks to PostgreSQL, a normal Go single binary is still a fantastic choice.
You don't have to choose between worlds: wasmCloud can run standalone or on top of Kubernetes clusters. Spin can be deployed alongside your existing container infrastructure thanks to tools like SpinKube, which allows you to run WASM microservices directly on your K8s nodes exactly like normal pods. This isn't a "rip and replace" technology. You just get a new, ultra-lightweight workload type running right next to your legacy containers.

Where Is This Going?

The trajectory is clear.

Containers solved the "it works on my machine" problem in 2013 and became the default deployment unit of the cloud. But they carry the burden of virtualizing an entire operating system for every single service — an abstraction tax we've accepted as normal simply because there was no better alternative.

WASM microservices offer that alternative. They point to a world where the unit of deployment is a sandboxed, portable, composable module measured in kilobytes instead of gigabytes. Where services communicate via typed function calls instead of heavy network protocols. Where you can compose business logic written in multiple languages into a single file that runs instantly anywhere.

We aren't there yet for all workloads. But the path from single binaries (which are already the standard in infra tooling) to WASM microservices (which are production-ready for key use cases) is a straight line. Every step removes a layer of abstraction between your code and the bare metal.

And as we've been exploring throughout this series: every layer you remove is pure performance you get back.

DEV Community