1. What is Helium Agent?
Helium Agent is a lightweight, terminal-focused AI agent written in Python.
Think Codex or OpenCode, but without the overhead. It's published on PyPI (
pip install helium-agent ), runs as helium . from any directory, and works
with any OpenAI-compatible LLM endpoint — cloud or local.
Core capabilities:
• General-purpose chat with tool calling
• Long-running coding tasks via an agentic loop
• Deep research with multi-source evidence collection and citation
• RAG (Retrieval-Augmented Generation) for file-based Q&A
• Persistent memory across sessions
• Hierarchical subagent delegation
• A plugin system via SKILL.md files
The design philosophy is minimum viable complexity — every feature is
implemented in the simplest way that works, with no speculative
abstractions.
2. Architecture Overview
The architecture is deliberately flat. There's no framework, no dependency
injection container, no event bus. Modules import each other directly (with
lazy imports where circular dependencies would otherwise occur). This is a
feature, not a bug — the codebase is navigable in a single sitting.
3. Design Choices
3.1 Prompt-Based Tool Calling
Decision: Helium does NOT use OpenAI's function calling API. Instead, the
LLM is instructed (via the system prompt) to output tool calls as
<action>{"tool": "...", "args": {...}}</action> XML tags.
Why:
• Model agnostic. Works with any LLM that can follow instructions — local
llama.cpp models, OpenRouter free tiers, fine-tuned models. No need for the
provider to support a specific tool-calling schema.
• Full control. The tool prompt is a plain string, editable without touching
code. Adding a new tool means adding a function and a description — no
schema generation, no API negotiation.
• Simpler debugging. The raw LLM output is human-readable. You can see
exactly what the model tried to do.
Trade-off: JSON extraction becomes fragile. The model might output slightly
malformed JSON, wrap it in markdown code blocks, or include extra text. This
led to extract_json() in utils/parser.py — a 75-line cascade of
fallbacks:
This works in practice but is a maintenance liability. Every new edge case from a new model means another fallback branch.
3.2 Dependency Injection in the Agentic Loop
Decision: AgenticLoop accepts two callables — ask_model and execute_tool_call — rather than importing the LLM and tool modules directly.
class AgenticLoop:
def __init__(self, ask_model, execute_tool_call, max_turns=6):
self.ask_model = ask_model
self.execute_tool_call = execute_tool_call
self.max_turns = max_turns
Why: This single decision enables the entire system's composability:
- The general chat loop uses the standard LLM and tool execution.
- The coding workflow (
/code) creates anAgenticLoopwith auto-approved tools and max_turns=30. -
Subagents create their own
AgenticLoopwith a filtered tool set (only the tools the parent allowed). -
Skills inject their SKILL.md body into the system prompt and run a fresh
AgenticLoop.
No subclasses, no strategy pattern, no configuration objects. Just two callables.
3.3 Global Mutable State (by design)
Decision: Several modules use module-level singletons:
-
conversation_history(list) incore/llm.py -
_managerintools/memory_ops.py -
_todo_listintools/todo_tools.py -
_managerintools/subagent_tools.py
Why: Helium is a single-user, single-threaded terminal agent. There is exactly one conversation, one memory store, one todo list at any time. Global state is the simplest representation of this reality.
Trade-off: This rules out concurrency. You can't run two subagents in parallel because they'd share the same conversation history and tool state. This is acceptable today but is the first thing that would need to change for async support.
3.4 Three-Tier Permission Model
- safe — auto-execute. Reads, searches, memory lookups, todo queries.
- risky — requires user confirmation. File writes, bash, app launching.
-
conditional (bash only) —
is_command_safe()inspects the command string.ls,cat,pwdare safe.rm,mv,chmodare risky.
The --nuclear / --auto-approve flag bypasses all checks. Useful for CI, dangerous for production.
Why this matters: The LLM can hallucinate tool calls. Without permission gates, a confused model could delete files or run arbitrary commands. The three-tier system lets harmless operations flow freely while blocking anything that could cause damage.
3.5 No SDK — Raw HTTP Only
Decision: All LLM communication uses requests.post() with manual SSE parsing. No OpenAI SDK, no httpx, no abstraction layer.
Why:
-
Fewer dependencies. The
requirements.txtstays small. Each dependency is a potential breakage point. - Full transparency. You can see exactly what's being sent to the API and what's coming back.
- Provider flexibility. Any endpoint that accepts OpenAI-format chat completions works. No SDK version pinning, no API compatibility matrix.
Trade-off: Manual SSE parsing is fiddly. stream_openrouter_response() in utils/check_llm_api.py handles chunked transfer encoding, data: [DONE] markers, and error responses. This is code that a well-tested SDK would handle for you.
3.6 SKILL.md Plugin System
Decision: Skills are markdown files with YAML frontmatter, discovered from ~/.config/helium-agent/skills/ and .helium/skills/.
---
name: caveman
trigger: /caveman
type: slash
description: "Respond like a caveman"
---
You are a caveman. Respond to everything in caveman speak.
Use grunts and simple words. No modern language.
Why:
- Zero code required. Anyone can create a skill by writing a markdown file.
-
Two types: Slash commands (triggered by
/name) and contextual skills (injected into the system prompt when relevant). -
Skill-scoped tools. A skill can declare
allowed_toolsto restrict what the LLM can do within that skill's context.
This is the simplest possible plugin system. No Python entry points, no registration APIs, no configuration files beyond the markdown itself.
4. Orchestration: How It All Fits Together
4.1 The Agentic Loop
The core of Helium is a turn-based loop that alternates between the LLM and the tool system:
Key parameters:
-
General chat:
max_turns = 6. Enough for a few tool calls without runaway loops. -
Coding workflow:
max_turns = 30. Long enough for multi-file edits with verification. -
Temperature:
0.3. Low enough for consistency, high enough for natural language. -
History:
MAX_HISTORY = 10. The last 10 messages are kept. Older ones are dropped.
The loop terminates when:
- The LLM responds without an
<action>tag (final answer). -
max_turnsis reached (timeout). - A tool call is invalid and the LLM can't recover.
4.2 The Research Pipeline
Deep research is the most complex orchestration in Helium. It's a multi-stage pipeline with iteration:
The dual-provider search (DuckDuckGo + SearxNG) is a redundancy measure. If one provider is down or rate-limited, the other fills the gap. The SourcePolicy scores URLs by source type (official docs > blogs > forums) to prioritize high-quality evidence.
4.3 The Subagent System
Added in the June 15 session, the subagent system enables hierarchical delegation:
Critical design decisions:
- Reuses AgenticLoop. No new execution engine. A subagent runs the same loop as the main agent, just with a filtered tool set.
-
Tool filtering at the manager layer.
_wrap_execute_tool()intercepts tool calls and rejects anything not in the subagent'sallowed_tools. The LLM doesn't know it's restricted. - IDs are unique, names aren't. You can have five "researcher" subagents with different IDs. This allows multiple instances of the same role.
-
Lazy imports.
tools/registry.pyimports subagent tools lazily to avoid a circular dependency chain (registry → subagent_tools → subagent_manager → llm → registry).
4.4 The Memory System
The three-layer approach means:
- Flat store handles keyword search well ("What's my preferred language?")
- Knowledge graph handles relationship queries ("What do I prefer?") via SPO triplets
- Conversation store provides session context without polluting long-term memory
All three share a single SQLite connection. The memory is "persistent" within a project directory (stored as memory.db), but not shared across projects.
5. Mistakes and How to Avoid Them
5.1 Circular Import Hell
What happened: When adding the subagent system, the import chain tools/registry.py → tools/subagent_tools.py → core/subagent_manager.py → core/llm.py → tools/registry.py created a circular dependency that crashed on startup.
The fix: Lazy imports in tools/registry.py:
def _create_subagent_lazy(*args, **kwargs):
from tools.subagent_tools import create_subagent
return create_subagent(*args, **kwargs)
How to avoid it:
- Draw the import graph before adding new modules.
- If A → B → C → A is unavoidable, break the cycle at the point with the fewest dependents (usually the leaf module).
- Consider a registry/observer pattern instead of direct imports for tool systems.
5.2 JSON Extraction Fragility
What happened: Different LLMs produce tool calls in subtly different formats. Some wrap JSON in markdown code blocks. Some include trailing commas. Some add explanatory text before or after the JSON. Each new model exposed a new edge case in extract_json().
The current state: A 75-line cascade of regex fallbacks. It works, but every new model is a potential breakage.
How to avoid it:
- If your LLM provider supports structured function calling, use it. Prompt-based tool calling is a fallback for models that don't support it.
- If you must use prompt-based calling, enforce a strict output format in the system prompt with examples, and reject anything that doesn't match.
- Consider a two-pass approach: first try to extract JSON, then if it fails, ask the LLM to reformat its response.
6. Key Takeaways
Simplicity scales. Global state, raw HTTP, prompt-based tool calling — these are "wrong" by enterprise standards but right for a single-user terminal agent. Choose the simplest solution that works for your actual use case.
Dependency injection is cheap composability. Two callables (
ask_model,execute_tool_call) gave Helium four distinct execution modes (chat, coding, subagent, skill) without a single subclass.The tool prompt IS the API. In prompt-based agents, the system prompt that describes available tools is as important as the code that implements them. Treat it as a first-class artifact.
Lazy imports are a design signal. If you need lazy imports to avoid circular dependencies, your module boundaries are wrong. Fix the architecture, not the import strategy.







Top comments (0)