A Raspberry Pi displays a smart mirror. Clock in the top corner. Weather below. Notifications slide in from the bottom. Each panel is a separate process — if weather crashes, the clock keeps ticking.
This is gogpu/compose: a Pure Go library that lets independent processes render UI into offscreen buffers and ship pixels to a single compositor over Unix sockets.
go get github.com/gogpu/compose
The Problem
Desktop applications are monolithic. One process, one crash domain. A bug in the notification panel takes down the entire display.
Wayland solved this at the OS level — every application is a separate client, and the compositor (KWin, Mutter, Sway) combines their surfaces. Android SurfaceFlinger does the same thing for mobile apps.
But what if you need this pattern inside your application? A dashboard where each data source is an independent process. A kiosk where third-party modules render into slots. A plugin host where untrusted code runs in its own crash domain.
There was no Go library for this. You had to roll your own socket protocol, frame encoding, connection management, flow control. Every project reinvented the same plumbing.
What We Built
compose is a two-sided library: a Server (compositor) accepts connections, a Client (module) publishes frames. One import, one socket path.
Module side
client, _ := compose.Dial("/tmp/compose.sock",
compose.WithName("clock"),
compose.WithFrameSize(400, 120),
compose.WithFPS(1),
)
defer client.Close()
client.OnFrameRequest(func() {
dc := gg.NewContext(400, 120)
dc.SetRGB(1, 1, 1)
dc.DrawString(time.Now().Format("15:04:05"), 20, 60)
client.PublishFrame(compose.Frame{
Pixels: dc.ImageRGBA().Pix,
Width: 400,
Height: 120,
})
})
Compositor side
srv, _ := compose.Listen("/tmp/compose.sock",
compose.WithMaxModules(8),
compose.WithCompression("lz4"),
)
srv.OnFrame(func(f compose.Frame) {
layout.Place(f.Name, f.Pixels)
})
srv.OnConnect(func(id uint64, name string) {
log.Printf("module connected: %s", name)
})
srv.OnDisconnect(func(id uint64, name string) {
log.Printf("module gone: %s", name)
})
That's it. The module renders into a pixel buffer, publishes it. The compositor receives frames, positions them on screen. Hot-plug is automatic — modules come and go without restarting the compositor.
The Wire Protocol
Every frame travels as a 64-byte header followed by pixel data.
┌──────────────────────────────────────────────────────────┐
│ Magic "COMP" │ Version │ MsgType │ Flags │
│ ModuleID │ Sequence │ Timestamp │
│ Width │ Height │ Stride │ DirtyRect (x,y,w,h) │
│ PixelFormat │ Compression │ PayloadSize │ Uncompressed │
└──────────────────────────────────────────────────────────┘
64 bytes, little-endian, cache-line aligned
The header is fixed-size — one io.ReadFull call parses it. No variable-length fields, no schema negotiation mid-stream. Encode and decode take under 10 ns and under 25 ns respectively, with zero allocations.
The protocol carries dirty rectangles (only the changed region's pixels), compression flags, and monotonic sequence numbers for frame ordering.
Handshake
When a module connects, it sends a 128-byte HelloMsg (name, dimensions, preferred FPS). The compositor replies with a 128-byte WelcomeMsg (assigned module ID, granted transport). Fixed-size messages — no JSON parsing, no protobuf dependency, no allocation.
LZ4 Compression
GUI pixels compress exceptionally well. Buttons, backgrounds, text labels — large flat-color regions that LZ4 handles at near-memcpy speed.
| Frame Size | Encode | Decode | Ratio |
|---|---|---|---|
| 400×120 (small module) | 2,753 MB/s | 1,175 MB/s | 99.6% savings |
| 1920×1080 (full HD) | 2,728 MB/s | 1,446 MB/s | varies |
A 192 KB module frame compresses to ~810 bytes. At 60 FPS, that's 47 KB/s over the socket — trivial for any transport.
Compression is optional. Static modules (clock, weather) benefit most. Animated modules with random pixel patterns can disable it with a flag in the frame header.
Pull-Based Flow Control
We studied how Wayland handles frame pacing. The compositor sends a frame callback — "I'm ready, give me a frame." The client renders only when asked.
compose follows this pattern:
- Compositor → Module:
FrameRequest - Module renders → sends
Frame - Compositor processes → sends next
FrameRequest
If the module is slow, the compositor displays the last frame. If the compositor is slow, the module blocks — no wasted work. After 3 consecutive missed responses, the flow controller halves the request rate (adaptive backoff).
This is the opposite of the naive "push frames as fast as possible" model, which floods slow compositors and wastes CPU on frames nobody will ever see.
Architecture: Enterprise internal/
The public API has fewer than 15 exported declarations — 5 types, 2 constructors, 5 option functions, and a handful of sentinel errors. Everything else is hidden behind internal/ packages:
compose/ # Public: Listen, Dial, Frame, options
├── internal/protocol/ # Wire format (100% coverage)
├── internal/codec/ # Raw + LZ4 (97% coverage)
├── internal/conn/ # Module lifecycle (98.9% coverage)
├── internal/flow/ # Pull-based pacing (100% coverage)
└── internal/transport/socket/ # Unix sockets (95.1% coverage)
The dependency graph is a clean DAG — protocol is the leaf (imported by all, imports nothing), codec/conn/flow are independent, transport/socket imports only protocol. No cycles possible by construction.
This isn't an accident. We designed compose the way database/sql is designed: public Server/Client wrap internal implementations. Users never import sub-packages. Adding a shared memory transport (Phase 2) won't change the public API — it's a new internal/transport/shm/ behind an existing WithSharedMemory() option.
By the Numbers
| Metric | Value |
|---|---|
| Source code | 2,897 LOC |
| Test code | 5,619 LOC |
| Total | 8,516 LOC |
| Test cases | 171 |
| Packages | 6 |
| Files | 38 |
| Coverage (internal avg) | 98% |
| Header encode | < 10 ns, 0 allocs |
| Header decode | < 25 ns, 0 allocs |
| LZ4 encode throughput | 2.7+ GB/s |
| Socket throughput | 5.9+ GB/s |
| Dependencies | 1 (pierrec/lz4/v4) |
| CGO | Zero |
| CI | Ubuntu + macOS + Windows |
| Lint (30+ rules) | 0 issues |
2× more test code than source code. Every internal package is independently testable with its own benchmark suite.
Who Needs This
Smart mirrors and kiosks. Independent modules (time, weather, transit, notifications) render into slots. A crash in one module doesn't affect the display.
Modular dashboards. Each data source is its own process, possibly developed by different teams. The compositor arranges them on screen.
Plugin hosts. Third-party plugins run in separate processes. The host application composites their output without trusting their code. f4 by @unxed — a full Go reimplementation of Far Manager — already uses --gui=gogpu for GPU-accelerated rendering and has an out-of-process plugin architecture (MessagePack RPC supporting Go, Python, Rust, Node.js, C++, Lua). compose could serve as the pixel transport between f4's plugin host and its GUI modules.
Cross-language UIs. The wire protocol is language-agnostic. A Rust module or a Python script can participate — anything that writes RGBA to a Unix socket.
KiGo by @AgentNemo00 — a modular Go application — is already using offscreen rendering with multi-process composition and is evaluating compose v0.1.0.
These aren't hypothetical users waiting for a stable release. They're building real software on the GoGPU stack right now — and finding real bugs that make the ecosystem better. Every project that adopts GoGPU accelerates the path to enterprise grade. If you have a desktop app, a dashboard, a kiosk, a file manager, an IDE panel — there's never been a better time to try it in Pure Go.
What's Next
Phase 2: Reference examples. Three separate binaries — compositor, clock module, notification module — demonstrating the full multi-process workflow.
Phase 3: Shared memory transport. Triple-buffer ring in mmap'd memory with atomic slot states. Zero-copy pixel transfer for 60 FPS at 1080p. The public API stays the same — you just pass WithSharedMemory().
Phase 4: Delta frames. Send only the dirty rectangle's pixels, not the full frame. The protocol already carries dirty rects in the header — we just need the compositor-side texture cache.
Try It
go get github.com/gogpu/compose
- GitHub: github.com/gogpu/compose
- Architecture: docs/ARCHITECTURE.md
- GoDoc: pkg.go.dev/github.com/gogpu/compose
- Discussion: RFC #177
Part of the GoGPU Ecosystem
compose is the newest member of GoGPU — 800K+ lines of Pure Go GPU computing:
| Library | Purpose |
|---|---|
| wgpu | Pure Go WebGPU (Vulkan/Metal/DX12/GLES) |
| naga | Shader compiler (WGSL → SPIR-V/MSL/GLSL/HLSL/DXIL) |
| gg | 2D graphics with GPU acceleration |
| ui | Enterprise GUI toolkit (22+ widgets, 4 themes) |
| gogpu | Application framework, windowing |
| compose | Multi-process composition (this library) |
| systray | System tray (Win32/macOS/Linux) |
| audio | Pure Go audio engine |
Help Us Get to Enterprise Grade
The ecosystem grows faster with every pair of eyes on it. Here's how you can help:
-
Test it. Run
go get github.com/gogpu/compose, build a two-process prototype, tell us what breaks. Edge cases on your OS, your hardware, your use case — that's what we can't find ourselves. -
Validate the API. Does
Listen/Dial/PublishFramefeel right? Is the wire protocol missing a field you need? The API is still v0.x — now is the time to reshape it, before it freezes. - Propose and discuss. Open an issue or join the compose RFC discussion. Real use cases drive the roadmap — not hypotheticals.
- Spread the word. Star the repo, share this article, mention it in your Go meetup. The more developers use the ecosystem, the faster it reaches enterprise grade.
- Contribute. Every PR matters — from typo fixes to new transport implementations. See CONTRIBUTING.md.
The entire GoGPU ecosystem — 800K+ lines of Pure Go — is built by a small team. But the libraries are production-ready enough that people are building real products on them: a Far Manager rewrite with GPU rendering, an ML framework fully migrated to our WebGPU stack, a Quake 1 engine port running on gogpu/wgpu Vulkan (demo), modular app platforms. The more developers who start building on GoGPU — whether it's a new project or porting an existing one — the faster the entire ecosystem matures. Your project's edge cases become our test suite. Your feature requests become our roadmap.
Start building. We'll make sure the foundation holds.
Building what Go "can't do." One library at a time.
Top comments (0)