DEV Community

Cover image for Agentium: Build Production-Grade AI Agents in TypeScript Without the Bloat
Akash Sengar
Akash Sengar

Posted on

Agentium: Build Production-Grade AI Agents in TypeScript Without the Bloat

description: A deep dive into Agentium — a TypeScript-first AI agent framework with a layered architecture, built-in memory, tool calling, voice/browser agents, and benchmark results that beat LangChain on cost and tool calling speed.

If you've built anything with LangChain in Node.js, you've probably felt the friction — verbose setup, heavy abstractions, and enough boilerplate to make you question your life choices. Agentium is a TypeScript-first agent framework that tries to fix all of that. It's lean, layered, and surprisingly fast out of the box.

Let's go from zero to a streaming, tool-calling agent — and then dig into what's actually happening under the hood.


Getting Started in 3 Steps

Installation

npm install @agentium/core openai
export OPENAI_API_KEY=your-key
Enter fullscreen mode Exit fullscreen mode

Agentium supports OpenAI, Anthropic, Google, Ollama, and Vertex out of the box — just swap the provider package.


Step 1: Your First Agent

import { Agent, openai } from "@agentium/core";

const agent = new Agent({
  name: "assistant",
  model: openai("gpt-4o"),
  instructions: "You are a helpful assistant.",
});

const result = await agent.run("What is TypeScript?");
console.log(result.text);
Enter fullscreen mode Exit fullscreen mode

That's it. No chains, no pipelines, no ceremony.


Step 2: Add Tools (With Type Safety)

Agentium uses Zod schemas for tool parameters, so everything is fully typed:

import { Agent, openai, defineTool } from "@agentium/core";
import { z } from "zod";

const weatherTool = defineTool({
  name: "get_weather",
  description: "Get current weather for a city",
  parameters: z.object({
    city: z.string().describe("City name"),
  }),
  execute: async ({ city }) => `Weather in ${city}: 72°F, sunny`,
});

const agent = new Agent({
  name: "weather-bot",
  model: openai("gpt-4o"),
  instructions: "You help users check the weather.",
  tools: [weatherTool],
});

const result = await agent.run("What's the weather in Tokyo?");
console.log(result.text);
Enter fullscreen mode Exit fullscreen mode

The agent calls get_weather automatically when needed — no wiring required.


Step 3: Streaming Responses

for await (const chunk of agent.stream("Tell me a joke")) {
  if (chunk.type === "text") {
    process.stdout.write(chunk.text);
  }
}
Enter fullscreen mode Exit fullscreen mode

Streaming works for both text and tool-call chunks. Handle chunk.type to branch as needed.


Architecture: It's Actually Well Thought Out

This is where Agentium gets interesting. It's a monorepo with four focused packages:

Package What it does
@agentium/core Agents, tools, memory, voice, browser, MCP/A2A
@agentium/transport REST API, Socket.IO, Voice/Browser gateways
@agentium/queue BullMQ background job processing
@agentium/browser Vision-based browser automation via Playwright

Use only what you need — each package is independently installable.

The Layered Model

Agentium's internals stack cleanly:

  1. SDK LayerAgent, Team, Workflow, VoiceAgent, BrowserAgent
  2. Engine Layer — LLM loop, tool executor, memory manager (sessions, summaries, user facts, profiles, entities)
  3. Safety Layer — Sandboxed subprocess execution, human-in-the-loop approval gates, guardrails
  4. Model Abstraction — Unified interface across OpenAI, Anthropic, Google, Ollama, Vertex
  5. Protocol Integration — MCP client (consume external tools), A2A client (call remote agents)
  6. Infrastructure — Pluggable storage: in-memory, SQLite, PostgreSQL, MongoDB
  7. Registry & Auto-Discovery — Every agent/team/workflow auto-registers on construction; transport layers pick them up dynamically
  8. Transport (optional) — Express REST, Socket.IO WebSocket, Voice Gateway, Browser Gateway
  9. Queue (optional) — BullMQ workers for async processing ### How a Request Actually Flows

Here's the complete path a text request takes:

User Input
    │
Agent.run() / Agent.stream()
    │
buildMessages (history + system instructions + memory context + skill instructions)
    │
LLM Loop (with automatic retry on 429/5xx)
    │
ModelProvider (OpenAI / Anthropic / Google / Ollama / Vertex)
    │
Tool Executor (if tool calls present)
  ├── Approval check (if requiresApproval is set)
  ├── Sandbox execution (if sandbox is enabled)
  ├── Local tools
  ├── MCP tools (external servers)
  └── A2A tools (remote agents)
    │
MemoryManager.appendMessages() → auto-summarize overflow
    │
MemoryManager.afterRun() → fire-and-forget extraction
  (user facts, profile, entities, learnings)
    │
Output to caller
Enter fullscreen mode Exit fullscreen mode

The memory extraction at the end — user facts, profile updates, entity relationships, learned patterns — all runs in the background and doesn't block your response.


Memory: Seven Levels Deep

The MemoryManager is one of the most interesting parts. It supports seven distinct memory stores, all sharing a single StorageDriver:

Store Scope Default What it captures
Sessions Per-session Message history, auto-trimmed
Summaries Per-session LLM-generated summaries of overflowed messages
User Facts Per-user, cross-session "Prefers dark mode", "lives in Mumbai"
User Profile Per-user, cross-session Name, role, company, timezone
Entity Memory Global/namespaced Companies, people, projects with relationships
Decision Log Per-agent Audit trail of decisions
Learned Knowledge Global (vector-backed) Reusable insights from past conversations

Enable what you need. All extraction is non-blocking.


Voice and Browser Agents

Voice Agent

Audio Input (WebSocket / Socket.IO)
    │
VoiceAgent.connect()
    │
RealtimeProvider (OpenAI Realtime / Google Live)
    │
Bidirectional audio stream ↔ Tool calls ↔ MemoryManager
    │
Audio Output → Client
Enter fullscreen mode Exit fullscreen mode

Sessions persist across reconnects. Memory extraction works on voice transcripts.

Browser Agent

Agentium's browser automation is vision-based — it takes screenshots, passes them to a vision model, and decides what to click/type/scroll next. Key features:

  • Stealth mode — patches navigator.webdriver, WebGL, and plugins
  • Humanize mode — random delays, mouse movement curves, typing variation
  • Credential vault — secrets are never sent to the LLM; only {{placeholders}} appear in prompts
  • Video recording — native Playwright session recording

Performance: The Numbers

Benchmarks against LangChain (Node.js) and Agno (Python), using gpt-4o-mini, 5 runs per scenario:

Startup Time

Agentium: 171ms vs LangChain: 301ms vs Agno: 2730ms

Tool Calling

Agentium LangChain Agno
Avg Response 1617ms 1678ms 3064ms
Prompt Tokens 167 167 173
Total Tokens 196 196 202

Multi-turn Memory

Agentium LangChain Agno
Prompt Tokens 189 309 94
Cost / Run $0.000046 $0.000081 $0.000054

Agentium uses 39% fewer prompt tokens and costs 43% less than LangChain on multi-turn conversations. LangChain injects heavier system prompts and history formatting overhead.

How Agentium Keeps Token Count Low

A few concrete optimizations:

1. Tool schema caching — Zod-to-JSON Schema conversion happens once at construction, not on every LLM call.

2. Minimal schema serialization — Strips $schema, additionalProperties, and other verbose JSON Schema fields that add tokens without adding meaning.

3. Token-based history trimming — Set maxContextTokens and oldest messages are automatically dropped to stay within budget.

const agent = new Agent({
  name: "bot",
  model: openai("gpt-4o"),
  maxContextTokens: 8000,
});
Enter fullscreen mode Exit fullscreen mode

4. Non-blocking memory extraction — Fact extraction runs in the background, saving 500–1000ms per request.

5. Smart context deduplication — If you register userMemory.asTool(), user facts are fetched on demand via tool call and not pre-injected into the system prompt. Saves tokens when facts aren't always needed.

6. Automatic retry with backoff — Configurable retry on 429/5xx so you're not writing that yourself:

const agent = new Agent({
  name: "reliable-bot",
  model: openai("gpt-4o"),
  retry: {
    maxRetries: 5,
    initialDelayMs: 1000,
    maxDelayMs: 30000,
  },
});
Enter fullscreen mode Exit fullscreen mode

The Registry: Auto-Discovery Without Config

One of the small-but-great quality-of-life features is the global Registry. Every agent, team, and workflow registers itself on construction:

import { Agent, openai, registry } from "@agentium/core";

new Agent({ name: "bot", model: openai("gpt-4o") });

registry.list();
// { agents: ["bot"], teams: [], workflows: [] }
Enter fullscreen mode Exit fullscreen mode

The transport layer reads from this registry at request time. Spin up a new agent after the server starts? It's immediately available over HTTP and WebSocket — no restart, no rewiring.


Design Principles Worth Calling Out

  • Zero meta-framework dependency — works with any Node.js server or headless script
  • Optional peer dependencies — only bundle the providers you actually use
  • Pluggable everything — storage, models, vector stores, transport are all swappable
  • Safety by default — sandboxed subprocess execution and human-in-the-loop approval are opt-in per tool

- Open protocol support — MCP for tool integration, A2A for agent-to-agent interoperability (no vendor lock-in)

Should You Use It?

If you're building AI agents in Node.js/TypeScript and you want:

  • Less boilerplate than LangChain
  • Real multi-layer memory without building it yourself
  • Voice and browser automation in the same framework
  • Lower token costs at scale (multi-turn conversations especially)
  • Production-grade retry, sandboxing, and approval flows ...then Agentium is worth a serious look.

The docs are at docs.agentium.in and the quickstart genuinely takes under five minutes.

Github github.com/agentiumOs/agentium


Have you tried Agentium or another TypeScript agent framework? What's your experience been? Drop it in the comments.

Top comments (0)