DEV Community

Cover image for Building AI Infrastructure for a Post-Framework World
Baalateja Kataru
Baalateja Kataru

Posted on • Originally published at bkataru.bearblog.dev

Building AI Infrastructure for a Post-Framework World

2025 was the year I stopped treating AI infrastructure like a prototype problem. The field matured fast—we went from barely having tools to drowning in incompatible ones. Turns out trading "not enough options" for "too many fragmented options" isn't actually an upgrade.

The Fragmentation Problem

Every major AI framework built its own protocol stack. Anthropic's MCP, OpenAI's function calling, LangChain's abstractions—each one solves similar problems while creating vendor lock-in, protocol silos, and guaranteed rewrites down the line.

What I learned building these projects: we need infrastructure that outlives any single framework. Better primitives, not more abstractions.

Technical Achievements: A Year in Review

1. PocketFlow-Zig: Flow-Based Agent Framework

Repository: PocketFlow-Zig

Started as a side project, ended up becoming the official Zig implementation of PocketFlow—a minimalist framework for building LLM-powered workflows. The design goal was explicit: build just enough framework to let agents construct other agents.

What I Built:

  • Zero dependencies, pure Zig
  • Flow-based programming (think Unix pipes for LLM workflows)
  • Comptime-generated agent templates (zero runtime cost)
  • Cross-compilation for embedded targets

PocketFlow now runs across multiple platforms as a base layer for agentic workflows.

2. TOON-Zig: Efficient Token Serialization

Repository: toon-zig

TOON (Token Oriented Object Notation) is a data serialization format optimized for LLM consumption. Built the Zig implementation from scratch:

What Got Built:

  • 340 official TOON 2.0 spec tests (100% passing)
  • 196 decoding fixtures + 144 encoding fixtures
  • 15-30% better token efficiency than JSON for typical LLM payloads
  • Zero dependencies, pure Zig

Getting spec compliance while keeping performance high took some careful engineering.

3. HF-Hub-Zig: HuggingFace Integration Layer

Repository: hf-hub-zig

Zig needed native HuggingFace Hub API support, especially for GGUF model management. So I built it.

What It Does:

  • GGUF model discovery and search
  • Efficient model downloading with resume capability
  • Local model registry management
  • Cross-platform CLI interface

How It Works:

  • Zero dependencies, pure Zig
  • Efficient streaming downloads with progress tracking
  • Robust error handling for network failures
  • Memory-mapped file access for large models

4. Zenmap: Cross-Platform Memory Mapping

Repository: zenmap

A single-file Zig library for memory mapping large files, built specifically for GGUF model handling.

The Challenge:

  • Unified POSIX (Linux, macOS, BSD) and Windows memory mapping APIs
  • Zero-copy file access for 70B+ parameter models
  • Efficient handling of sparse files and partial loads
  • Cross-platform abstraction with minimal overhead

5. Igllama: Zig-Based Ollama Alternative

Repository: igllama

A Zig-based alternative to Ollama that uses the Zig build system's embedded Clang toolchain to compile and run llama.cpp.

What Makes It Different:

  • Zig build system integration with llama.cpp's CMake build
  • Docker-like CLI experience for GGUF management
  • Transparent llama.cpp updates (no lagging behind upstream)
  • Cross-platform binary distribution

Production Work at Dirmacs

I've been working with Suprabhat Rapolu and the team at Dirmacs Global Services to ship production AI infrastructure in Rust:

A.R.E.S: Production Agent Framework

Repository: ares

A production agentic chatbot library built in Rust with:

  • Multi-provider LLM support
  • Tool calling and ReAct loops
  • RAG knowledge bases
  • MCP protocol integration
  • Leptos-based embedded web UI

Current Status: Actively dogfooding this in pilot projects. Design goals keep evolving.

Daedra: Web Research MCP Server

Repository: daedra

High-performance DuckDuckGo-powered web search MCP server written in Rust. Gives AI assistants web search and page fetching as tools.

Features:

  • No API keys needed (uses DuckDuckGo's public API)
  • Rust rewrite of an old TypeScript version
  • Ships production-ready: cargo install daedra

Lancor: LLaMA.cpp Client Library

Repository: lancor

A Rust client library for llama.cpp's OpenAI-compatible API server.

Goal: Simple, straightforward integration with existing Rust AI workflows.

What Ties This Together

All these projects share a common thread: explicit control over system boundaries. In a world where frameworks come and go, infrastructure needs to:

  1. Outlive any single vendor - Protocol-level interop
  2. Show explicit costs - No magic, no hidden abstractions
  3. Stay fast - Zero-cost abstractions where possible
  4. Run everywhere - Build once, deploy anywhere

What's Next for 2026

Foundation's built. Now comes production hardening:

  • Protocol work: Add gRPC and custom transports to zig-utcp
  • Production features: Streaming, error handling, resource management
  • Distributed systems: Get agents coordinating across networks
  • Real-world testing: Pilot deployments, feedback loops

The goal stays the same: build infrastructure that lets developers construct robust AI systems without getting locked into any framework.


All projects are MIT licensed and open source. Pull requests and technical discussions welcome.


links :)

Top comments (0)