Agentium: Build Production-Grade AI Agents in TypeScript Without the Bloat

Akash Sengar — Sun, 24 May 2026 07:14:38 +0000

description: A deep dive into Agentium — a TypeScript-first AI agent framework with a layered architecture, built-in memory, tool calling, voice/browser agents, and benchmark results that beat LangChain on cost and tool calling speed.

If you've built anything with LangChain in Node.js, you've probably felt the friction — verbose setup, heavy abstractions, and enough boilerplate to make you question your life choices. Agentium is a TypeScript-first agent framework that tries to fix all of that. It's lean, layered, and surprisingly fast out of the box.

Let's go from zero to a streaming, tool-calling agent — and then dig into what's actually happening under the hood.

Getting Started in 3 Steps

Installation

npm install @agentium/core openai
export OPENAI_API_KEY=your-key

Agentium supports OpenAI, Anthropic, Google, Ollama, and Vertex out of the box — just swap the provider package.

Step 1: Your First Agent

import { Agent, openai } from "@agentium/core";

const agent = new Agent({
  name: "assistant",
  model: openai("gpt-4o"),
  instructions: "You are a helpful assistant.",
});

const result = await agent.run("What is TypeScript?");
console.log(result.text);

That's it. No chains, no pipelines, no ceremony.

Step 2: Add Tools (With Type Safety)

Agentium uses Zod schemas for tool parameters, so everything is fully typed:

import { Agent, openai, defineTool } from "@agentium/core";
import { z } from "zod";

const weatherTool = defineTool({
  name: "get_weather",
  description: "Get current weather for a city",
  parameters: z.object({
    city: z.string().describe("City name"),
  }),
  execute: async ({ city }) => `Weather in ${city}: 72°F, sunny`,
});

const agent = new Agent({
  name: "weather-bot",
  model: openai("gpt-4o"),
  instructions: "You help users check the weather.",
  tools: [weatherTool],
});

const result = await agent.run("What's the weather in Tokyo?");
console.log(result.text);

The agent calls get_weather automatically when needed — no wiring required.

Step 3: Streaming Responses

for await (const chunk of agent.stream("Tell me a joke")) {
  if (chunk.type === "text") {
    process.stdout.write(chunk.text);
  }
}

Streaming works for both text and tool-call chunks. Handle chunk.type to branch as needed.

Architecture: It's Actually Well Thought Out

This is where Agentium gets interesting. It's a monorepo with four focused packages:

Package	What it does
`@agentium/core`	Agents, tools, memory, voice, browser, MCP/A2A
`@agentium/transport`	REST API, Socket.IO, Voice/Browser gateways
`@agentium/queue`	BullMQ background job processing
`@agentium/browser`	Vision-based browser automation via Playwright

Use only what you need — each package is independently installable.

The Layered Model

Agentium's internals stack cleanly:

SDK Layer — Agent, Team, Workflow, VoiceAgent, BrowserAgent
Engine Layer — LLM loop, tool executor, memory manager (sessions, summaries, user facts, profiles, entities)
Safety Layer — Sandboxed subprocess execution, human-in-the-loop approval gates, guardrails
Model Abstraction — Unified interface across OpenAI, Anthropic, Google, Ollama, Vertex
Protocol Integration — MCP client (consume external tools), A2A client (call remote agents)
Infrastructure — Pluggable storage: in-memory, SQLite, PostgreSQL, MongoDB
Registry & Auto-Discovery — Every agent/team/workflow auto-registers on construction; transport layers pick them up dynamically
Transport (optional) — Express REST, Socket.IO WebSocket, Voice Gateway, Browser Gateway
Queue (optional) — BullMQ workers for async processing ### How a Request Actually Flows

Here's the complete path a text request takes:

User Input
    │
Agent.run() / Agent.stream()
    │
buildMessages (history + system instructions + memory context + skill instructions)
    │
LLM Loop (with automatic retry on 429/5xx)
    │
ModelProvider (OpenAI / Anthropic / Google / Ollama / Vertex)
    │
Tool Executor (if tool calls present)
  ├── Approval check (if requiresApproval is set)
  ├── Sandbox execution (if sandbox is enabled)
  ├── Local tools
  ├── MCP tools (external servers)
  └── A2A tools (remote agents)
    │
MemoryManager.appendMessages() → auto-summarize overflow
    │
MemoryManager.afterRun() → fire-and-forget extraction
  (user facts, profile, entities, learnings)
    │
Output to caller

The memory extraction at the end — user facts, profile updates, entity relationships, learned patterns — all runs in the background and doesn't block your response.

Memory: Seven Levels Deep

The MemoryManager is one of the most interesting parts. It supports seven distinct memory stores, all sharing a single StorageDriver:

Store	Scope	Default	What it captures
Sessions	Per-session	✅	Message history, auto-trimmed
Summaries	Per-session	✅	LLM-generated summaries of overflowed messages
User Facts	Per-user, cross-session	❌	"Prefers dark mode", "lives in Mumbai"
User Profile	Per-user, cross-session	❌	Name, role, company, timezone
Entity Memory	Global/namespaced	❌	Companies, people, projects with relationships
Decision Log	Per-agent	❌	Audit trail of decisions
Learned Knowledge	Global (vector-backed)	❌	Reusable insights from past conversations

Enable what you need. All extraction is non-blocking.

Voice and Browser Agents

Voice Agent

Audio Input (WebSocket / Socket.IO)
    │
VoiceAgent.connect()
    │
RealtimeProvider (OpenAI Realtime / Google Live)
    │
Bidirectional audio stream ↔ Tool calls ↔ MemoryManager
    │
Audio Output → Client

Sessions persist across reconnects. Memory extraction works on voice transcripts.

Browser Agent

Agentium's browser automation is vision-based — it takes screenshots, passes them to a vision model, and decides what to click/type/scroll next. Key features:

Stealth mode — patches navigator.webdriver, WebGL, and plugins
Humanize mode — random delays, mouse movement curves, typing variation
Credential vault — secrets are never sent to the LLM; only {{placeholders}} appear in prompts
Video recording — native Playwright session recording

Performance: The Numbers

Benchmarks against LangChain (Node.js) and Agno (Python), using gpt-4o-mini, 5 runs per scenario:

Startup Time

Agentium: 171ms vs LangChain: 301ms vs Agno: 2730ms

Tool Calling

	Agentium	LangChain	Agno
Avg Response	1617ms	1678ms	3064ms
Prompt Tokens	167	167	173
Total Tokens	196	196	202

Multi-turn Memory

	Agentium	LangChain	Agno
Prompt Tokens	189	309	94
Cost / Run	$0.000046	$0.000081	$0.000054

Agentium uses 39% fewer prompt tokens and costs 43% less than LangChain on multi-turn conversations. LangChain injects heavier system prompts and history formatting overhead.

How Agentium Keeps Token Count Low

A few concrete optimizations:

1. Tool schema caching — Zod-to-JSON Schema conversion happens once at construction, not on every LLM call.

2. Minimal schema serialization — Strips $schema, additionalProperties, and other verbose JSON Schema fields that add tokens without adding meaning.

3. Token-based history trimming — Set maxContextTokens and oldest messages are automatically dropped to stay within budget.

const agent = new Agent({
  name: "bot",
  model: openai("gpt-4o"),
  maxContextTokens: 8000,
});

4. Non-blocking memory extraction — Fact extraction runs in the background, saving 500–1000ms per request.

5. Smart context deduplication — If you register userMemory.asTool(), user facts are fetched on demand via tool call and not pre-injected into the system prompt. Saves tokens when facts aren't always needed.

6. Automatic retry with backoff — Configurable retry on 429/5xx so you're not writing that yourself:

const agent = new Agent({
  name: "reliable-bot",
  model: openai("gpt-4o"),
  retry: {
    maxRetries: 5,
    initialDelayMs: 1000,
    maxDelayMs: 30000,
  },
});

The Registry: Auto-Discovery Without Config

One of the small-but-great quality-of-life features is the global Registry. Every agent, team, and workflow registers itself on construction:

import { Agent, openai, registry } from "@agentium/core";

new Agent({ name: "bot", model: openai("gpt-4o") });

registry.list();
// { agents: ["bot"], teams: [], workflows: [] }

The transport layer reads from this registry at request time. Spin up a new agent after the server starts? It's immediately available over HTTP and WebSocket — no restart, no rewiring.

Design Principles Worth Calling Out

Zero meta-framework dependency — works with any Node.js server or headless script
Optional peer dependencies — only bundle the providers you actually use
Pluggable everything — storage, models, vector stores, transport are all swappable
Safety by default — sandboxed subprocess execution and human-in-the-loop approval are opt-in per tool

- Open protocol support — MCP for tool integration, A2A for agent-to-agent interoperability (no vendor lock-in)

Should You Use It?

If you're building AI agents in Node.js/TypeScript and you want:

Less boilerplate than LangChain
Real multi-layer memory without building it yourself
Voice and browser automation in the same framework
Lower token costs at scale (multi-turn conversations especially)
Production-grade retry, sandboxing, and approval flows ...then Agentium is worth a serious look.

The docs are at docs.agentium.in and the quickstart genuinely takes under five minutes.

Github github.com/agentiumOs/agentium

Have you tried Agentium or another TypeScript agent framework? What's your experience been? Drop it in the comments.

DEV Community: Akash Sengar