description: A deep dive into Agentium — a TypeScript-first AI agent framework with a layered architecture, built-in memory, tool calling, voice/browser agents, and benchmark results that beat LangChain on cost and tool calling speed.
If you've built anything with LangChain in Node.js, you've probably felt the friction — verbose setup, heavy abstractions, and enough boilerplate to make you question your life choices. Agentium is a TypeScript-first agent framework that tries to fix all of that. It's lean, layered, and surprisingly fast out of the box.
Let's go from zero to a streaming, tool-calling agent — and then dig into what's actually happening under the hood.
Getting Started in 3 Steps
Installation
npm install @agentium/core openai
export OPENAI_API_KEY=your-key
Agentium supports OpenAI, Anthropic, Google, Ollama, and Vertex out of the box — just swap the provider package.
Step 1: Your First Agent
import { Agent, openai } from "@agentium/core";
const agent = new Agent({
name: "assistant",
model: openai("gpt-4o"),
instructions: "You are a helpful assistant.",
});
const result = await agent.run("What is TypeScript?");
console.log(result.text);
That's it. No chains, no pipelines, no ceremony.
Step 2: Add Tools (With Type Safety)
Agentium uses Zod schemas for tool parameters, so everything is fully typed:
import { Agent, openai, defineTool } from "@agentium/core";
import { z } from "zod";
const weatherTool = defineTool({
name: "get_weather",
description: "Get current weather for a city",
parameters: z.object({
city: z.string().describe("City name"),
}),
execute: async ({ city }) => `Weather in ${city}: 72°F, sunny`,
});
const agent = new Agent({
name: "weather-bot",
model: openai("gpt-4o"),
instructions: "You help users check the weather.",
tools: [weatherTool],
});
const result = await agent.run("What's the weather in Tokyo?");
console.log(result.text);
The agent calls get_weather automatically when needed — no wiring required.
Step 3: Streaming Responses
for await (const chunk of agent.stream("Tell me a joke")) {
if (chunk.type === "text") {
process.stdout.write(chunk.text);
}
}
Streaming works for both text and tool-call chunks. Handle chunk.type to branch as needed.
Architecture: It's Actually Well Thought Out
This is where Agentium gets interesting. It's a monorepo with four focused packages:
| Package | What it does |
|---|---|
@agentium/core |
Agents, tools, memory, voice, browser, MCP/A2A |
@agentium/transport |
REST API, Socket.IO, Voice/Browser gateways |
@agentium/queue |
BullMQ background job processing |
@agentium/browser |
Vision-based browser automation via Playwright |
Use only what you need — each package is independently installable.
The Layered Model
Agentium's internals stack cleanly:
-
SDK Layer —
Agent,Team,Workflow,VoiceAgent,BrowserAgent - Engine Layer — LLM loop, tool executor, memory manager (sessions, summaries, user facts, profiles, entities)
- Safety Layer — Sandboxed subprocess execution, human-in-the-loop approval gates, guardrails
- Model Abstraction — Unified interface across OpenAI, Anthropic, Google, Ollama, Vertex
- Protocol Integration — MCP client (consume external tools), A2A client (call remote agents)
- Infrastructure — Pluggable storage: in-memory, SQLite, PostgreSQL, MongoDB
- Registry & Auto-Discovery — Every agent/team/workflow auto-registers on construction; transport layers pick them up dynamically
- Transport (optional) — Express REST, Socket.IO WebSocket, Voice Gateway, Browser Gateway
- Queue (optional) — BullMQ workers for async processing ### How a Request Actually Flows
Here's the complete path a text request takes:
User Input
│
Agent.run() / Agent.stream()
│
buildMessages (history + system instructions + memory context + skill instructions)
│
LLM Loop (with automatic retry on 429/5xx)
│
ModelProvider (OpenAI / Anthropic / Google / Ollama / Vertex)
│
Tool Executor (if tool calls present)
├── Approval check (if requiresApproval is set)
├── Sandbox execution (if sandbox is enabled)
├── Local tools
├── MCP tools (external servers)
└── A2A tools (remote agents)
│
MemoryManager.appendMessages() → auto-summarize overflow
│
MemoryManager.afterRun() → fire-and-forget extraction
(user facts, profile, entities, learnings)
│
Output to caller
The memory extraction at the end — user facts, profile updates, entity relationships, learned patterns — all runs in the background and doesn't block your response.
Memory: Seven Levels Deep
The MemoryManager is one of the most interesting parts. It supports seven distinct memory stores, all sharing a single StorageDriver:
| Store | Scope | Default | What it captures |
|---|---|---|---|
| Sessions | Per-session | ✅ | Message history, auto-trimmed |
| Summaries | Per-session | ✅ | LLM-generated summaries of overflowed messages |
| User Facts | Per-user, cross-session | ❌ | "Prefers dark mode", "lives in Mumbai" |
| User Profile | Per-user, cross-session | ❌ | Name, role, company, timezone |
| Entity Memory | Global/namespaced | ❌ | Companies, people, projects with relationships |
| Decision Log | Per-agent | ❌ | Audit trail of decisions |
| Learned Knowledge | Global (vector-backed) | ❌ | Reusable insights from past conversations |
Enable what you need. All extraction is non-blocking.
Voice and Browser Agents
Voice Agent
Audio Input (WebSocket / Socket.IO)
│
VoiceAgent.connect()
│
RealtimeProvider (OpenAI Realtime / Google Live)
│
Bidirectional audio stream ↔ Tool calls ↔ MemoryManager
│
Audio Output → Client
Sessions persist across reconnects. Memory extraction works on voice transcripts.
Browser Agent
Agentium's browser automation is vision-based — it takes screenshots, passes them to a vision model, and decides what to click/type/scroll next. Key features:
-
Stealth mode — patches
navigator.webdriver, WebGL, and plugins - Humanize mode — random delays, mouse movement curves, typing variation
-
Credential vault — secrets are never sent to the LLM; only
{{placeholders}}appear in prompts - Video recording — native Playwright session recording
Performance: The Numbers
Benchmarks against LangChain (Node.js) and Agno (Python), using gpt-4o-mini, 5 runs per scenario:
Startup Time
Agentium: 171ms vs LangChain: 301ms vs Agno: 2730ms
Tool Calling
| Agentium | LangChain | Agno | |
|---|---|---|---|
| Avg Response | 1617ms | 1678ms | 3064ms |
| Prompt Tokens | 167 | 167 | 173 |
| Total Tokens | 196 | 196 | 202 |
Multi-turn Memory
| Agentium | LangChain | Agno | |
|---|---|---|---|
| Prompt Tokens | 189 | 309 | 94 |
| Cost / Run | $0.000046 | $0.000081 | $0.000054 |
Agentium uses 39% fewer prompt tokens and costs 43% less than LangChain on multi-turn conversations. LangChain injects heavier system prompts and history formatting overhead.
How Agentium Keeps Token Count Low
A few concrete optimizations:
1. Tool schema caching — Zod-to-JSON Schema conversion happens once at construction, not on every LLM call.
2. Minimal schema serialization — Strips $schema, additionalProperties, and other verbose JSON Schema fields that add tokens without adding meaning.
3. Token-based history trimming — Set maxContextTokens and oldest messages are automatically dropped to stay within budget.
const agent = new Agent({
name: "bot",
model: openai("gpt-4o"),
maxContextTokens: 8000,
});
4. Non-blocking memory extraction — Fact extraction runs in the background, saving 500–1000ms per request.
5. Smart context deduplication — If you register userMemory.asTool(), user facts are fetched on demand via tool call and not pre-injected into the system prompt. Saves tokens when facts aren't always needed.
6. Automatic retry with backoff — Configurable retry on 429/5xx so you're not writing that yourself:
const agent = new Agent({
name: "reliable-bot",
model: openai("gpt-4o"),
retry: {
maxRetries: 5,
initialDelayMs: 1000,
maxDelayMs: 30000,
},
});
The Registry: Auto-Discovery Without Config
One of the small-but-great quality-of-life features is the global Registry. Every agent, team, and workflow registers itself on construction:
import { Agent, openai, registry } from "@agentium/core";
new Agent({ name: "bot", model: openai("gpt-4o") });
registry.list();
// { agents: ["bot"], teams: [], workflows: [] }
The transport layer reads from this registry at request time. Spin up a new agent after the server starts? It's immediately available over HTTP and WebSocket — no restart, no rewiring.
Design Principles Worth Calling Out
- Zero meta-framework dependency — works with any Node.js server or headless script
- Optional peer dependencies — only bundle the providers you actually use
- Pluggable everything — storage, models, vector stores, transport are all swappable
- Safety by default — sandboxed subprocess execution and human-in-the-loop approval are opt-in per tool
- Open protocol support — MCP for tool integration, A2A for agent-to-agent interoperability (no vendor lock-in)
Should You Use It?
If you're building AI agents in Node.js/TypeScript and you want:
- Less boilerplate than LangChain
- Real multi-layer memory without building it yourself
- Voice and browser automation in the same framework
- Lower token costs at scale (multi-turn conversations especially)
- Production-grade retry, sandboxing, and approval flows ...then Agentium is worth a serious look.
The docs are at docs.agentium.in and the quickstart genuinely takes under five minutes.
Github github.com/agentiumOs/agentium
Have you tried Agentium or another TypeScript agent framework? What's your experience been? Drop it in the comments.
Top comments (0)