Baalateja Kataru

Posted on Jan 13 • Originally published at bkataru.bearblog.dev

Building AI Infrastructure for a Post-Framework World

#ai #zig #rust #agents

2025 was the year I stopped treating AI infrastructure like a prototype problem. The field matured fast—we went from barely having tools to drowning in incompatible ones. Turns out trading "not enough options" for "too many fragmented options" isn't actually an upgrade.

The Fragmentation Problem

Every major AI framework built its own protocol stack. Anthropic's MCP, OpenAI's function calling, LangChain's abstractions—each one solves similar problems while creating vendor lock-in, protocol silos, and guaranteed rewrites down the line.

What I learned building these projects: we need infrastructure that outlives any single framework. Better primitives, not more abstractions.

Technical Achievements: A Year in Review

1. PocketFlow-Zig: Flow-Based Agent Framework

Repository: PocketFlow-Zig

Started as a side project, ended up becoming the official Zig implementation of PocketFlow—a minimalist framework for building LLM-powered workflows. The design goal was explicit: build just enough framework to let agents construct other agents.

What I Built:

Zero dependencies, pure Zig
Flow-based programming (think Unix pipes for LLM workflows)
Comptime-generated agent templates (zero runtime cost)
Cross-compilation for embedded targets

PocketFlow now runs across multiple platforms as a base layer for agentic workflows.

2. TOON-Zig: Efficient Token Serialization

Repository: toon-zig

TOON (Token Oriented Object Notation) is a data serialization format optimized for LLM consumption. Built the Zig implementation from scratch:

What Got Built:

340 official TOON 2.0 spec tests (100% passing)
196 decoding fixtures + 144 encoding fixtures
15-30% better token efficiency than JSON for typical LLM payloads
Zero dependencies, pure Zig

Getting spec compliance while keeping performance high took some careful engineering.

3. HF-Hub-Zig: HuggingFace Integration Layer

Repository: hf-hub-zig

Zig needed native HuggingFace Hub API support, especially for GGUF model management. So I built it.

What It Does:

GGUF model discovery and search
Efficient model downloading with resume capability
Local model registry management
Cross-platform CLI interface

How It Works:

Zero dependencies, pure Zig
Efficient streaming downloads with progress tracking
Robust error handling for network failures
Memory-mapped file access for large models

4. Zenmap: Cross-Platform Memory Mapping

Repository: zenmap

A single-file Zig library for memory mapping large files, built specifically for GGUF model handling.

The Challenge:

Unified POSIX (Linux, macOS, BSD) and Windows memory mapping APIs
Zero-copy file access for 70B+ parameter models
Efficient handling of sparse files and partial loads
Cross-platform abstraction with minimal overhead

5. Igllama: Zig-Based Ollama Alternative

Repository: igllama

A Zig-based alternative to Ollama that uses the Zig build system's embedded Clang toolchain to compile and run llama.cpp.

What Makes It Different:

Zig build system integration with llama.cpp's CMake build
Docker-like CLI experience for GGUF management
Transparent llama.cpp updates (no lagging behind upstream)
Cross-platform binary distribution

Production Work at Dirmacs

I've been working with Suprabhat Rapolu and the team at Dirmacs Global Services to ship production AI infrastructure in Rust:

A.R.E.S: Production Agent Framework

Repository: ares

A production agentic chatbot library built in Rust with:

Multi-provider LLM support
Tool calling and ReAct loops
RAG knowledge bases
MCP protocol integration
Leptos-based embedded web UI

Current Status: Actively dogfooding this in pilot projects. Design goals keep evolving.

Daedra: Web Research MCP Server

Repository: daedra

High-performance DuckDuckGo-powered web search MCP server written in Rust. Gives AI assistants web search and page fetching as tools.

Features:

No API keys needed (uses DuckDuckGo's public API)
Rust rewrite of an old TypeScript version
Ships production-ready: cargo install daedra

Lancor: LLaMA.cpp Client Library

Repository: lancor

A Rust client library for llama.cpp's OpenAI-compatible API server.

Goal: Simple, straightforward integration with existing Rust AI workflows.

What Ties This Together

All these projects share a common thread: explicit control over system boundaries. In a world where frameworks come and go, infrastructure needs to:

Outlive any single vendor - Protocol-level interop
Show explicit costs - No magic, no hidden abstractions
Stay fast - Zero-cost abstractions where possible
Run everywhere - Build once, deploy anywhere

What's Next for 2026

Foundation's built. Now comes production hardening:

Protocol work: Add gRPC and custom transports to zig-utcp
Production features: Streaming, error handling, resource management
Distributed systems: Get agents coordinating across networks
Real-world testing: Pilot deployments, feedback loops

The goal stays the same: build infrastructure that lets developers construct robust AI systems without getting locked into any framework.

All projects are MIT licensed and open source. Pull requests and technical discussions welcome.

links :)

DEV Community