DEV Community: Prashant Maurya

Hermes Agent's Kanban System Is the Most Underrated Feature in Open Source AI Agents

Prashant Maurya — Mon, 01 Jun 2026 04:29:32 +0000

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

When people talk about Hermes Agent, they talk about the Skills System and the persistent memory. Those are genuinely impressive. But there's a feature in the v0.12 "Tenacity Release" that I think deserves more attention: the Kanban multi-agent system.

This post is about what it actually does, why it matters, and why most agent frameworks haven't solved the problem it's solving.

The Problem: Agents That Don't Finish

Here's a pattern that anyone who's used AI agents on long tasks will recognize:

You give the agent a complex, multi-step task. It starts well. Somewhere in the middle — a tool call fails, a subprocess hangs, the context window fills, the model gets confused about state — and the agent either loops, produces garbage, or just stops. You come back an hour later to find it stuck or finished with something completely wrong.

This isn't a model intelligence problem. It's a state management and fault tolerance problem. The agent has no durable record of what it's done, what's pending, and what failed. When something goes wrong, there's no recovery path.

Hermes's Kanban system is a direct answer to this.

What the Kanban System Is

The Kanban ships as a durable multi-agent task board — a structured queue of tasks with explicit state transitions, built-in fault tolerance, and automatic recovery.

Tasks on the board have states: todo, in_progress, blocked, done, failed. The board persists across restarts. Agents working on tasks emit heartbeats. If a heartbeat stops, the task is automatically reclaimed and either retried or escalated.

The key components:

Heartbeat monitoring — Every active task has a heartbeat timer. If an agent working on a task misses its heartbeat window (it crashed, hung, or the process died), the system detects this automatically.

Zombie detection — A "zombie" is an agent that stopped responding but didn't cleanly exit. The system detects zombie agents and reclaims their tasks rather than leaving them stuck in in_progress forever.

Auto-block on incomplete exit — If a task's assigned agent exits without marking the task done or failed, the board automatically moves the task to blocked state. Nothing silently falls through.

Per-task retries — Failed tasks can be configured to automatically retry up to N times before escalating. You set retry policy per task or per board.

Hallucination recovery — This one is subtle. When an agent produces output that contradicts its own task log (claims it completed a step it never ran), the board detects the inconsistency and flags it for review rather than silently marking the task done.

The `/goal` Command: Staying on Target

Alongside Kanban, the v0.12 release added /goal — what the docs call the "Ralph loop."

/goal Ship the auth module with tests and a PR by end of session

This keeps the agent locked on a target across turns. Instead of each message being independently interpreted, every subsequent action is evaluated against the declared goal. The agent won't drift — if a sub-task would take it away from the goal, it recognizes this and gets back on track.

Combined with Kanban, this means:

You declare a goal
Hermes decomposes it into a Kanban board of tasks
Subagents pick up tasks and work on them in parallel
Failed tasks get retried; zombie agents get reclaimed; blocked tasks get escalated
The agent tracks progress against the original goal and knows when it's actually done

This is what "the agent finishes what it starts" looks like in practice.

Subagent Delegation: The Parallelism Layer

The Kanban system is most powerful when combined with Hermes's subagent delegation via the delegate_task tool.

A parent agent with a complex task can spawn up to 3 child agents by default (configurable), each with:

Isolated context (the subagent knows only what it needs to)
Restricted toolsets (it can only use the tools relevant to its task)
Its own terminal session (no file-state collisions between agents)

The parent agent coordinates — it doesn't do the work directly. It delegates, monitors progress via the Kanban board, handles escalations, and synthesizes results.

In practice, this looks like:

Parent: "Build a REST API with authentication, tests, and documentation"

→ Subagent 1: Implements the core API endpoints
→ Subagent 2: Writes integration tests
→ Subagent 3: Drafts API documentation

Parent: Monitors all three, handles merge conflicts, synthesizes final output

Without durable state management, parallel subagents are fragile — if one fails, you don't know which one, and recovery is manual. The Kanban board makes parallel execution safe by making task state explicit and recoverable.

Checkpoints v2: The Safety Net

Running parallel agents doing real work means real risk. A subagent making file changes can go wrong.

Hermes's Checkpoints v2 (also part of the Tenacity Release) handles this. Before any file mutation, the system automatically snapshots the working directory. The checkpoint_manager tracks these snapshots with real pruning — old checkpoints get cleaned up, not accumulated indefinitely.

If something goes wrong:

/rollback

That's it. You're back to before the last file-mutating operation. Combined with the Kanban board's task state, this means a failed multi-agent run doesn't leave you with a partially-mutated codebase in an unknown state.

Gateway Auto-Resume: Surviving Restarts

One more piece of the reliability picture: gateway auto-resume.

In previous versions, if the Hermes gateway process restarted (server reboot, OOM kill, network drop), all in-progress agent sessions were lost. You'd have to restart tasks manually.

With the Tenacity Release, the gateway automatically resumes interrupted sessions after restart. The Kanban board state is persisted, in-progress tasks get reclaimed, and the agent picks up roughly where it left off.

This matters more than it sounds for anyone running Hermes on a VPS or in a container. Process crashes happen. An agent system that survives them gracefully is a different category of tool than one that needs babysitting.

Why This Architecture Is Rare

Most agent frameworks don't have an equivalent answer to durable multi-agent task management. Here's why:

The research community optimizes for single-agent performance. Benchmarks are almost all single-agent: can the agent solve this coding problem, answer this question, complete this task. Multi-agent coordination with fault tolerance is an engineering problem, not a benchmark problem.

Durable state is hard. Most frameworks store task state in memory or simple files. Real durability — heartbeat monitoring, zombie detection, restart recovery — requires more infrastructure investment than most open source projects make.

The failure modes are subtle. An agent that fails loudly is easy to fix. An agent that succeeds incorrectly — marks a task done when it hallucinated the last step — is hard to detect without explicit verification. Most frameworks don't have hallucination recovery in their task management layer.

Hermes is, to my knowledge, the only open source agent framework that ships all of these in a single installable package.

When to Use the Kanban System

The Kanban + subagent delegation is overkill for simple tasks. Use it when:

The task takes more than 20–30 minutes to complete
The task has multiple independent subtasks that can run in parallel
You're running unattended (scheduled cron, overnight batch)
The cost of partial completion and unknown state is high (production deployments, large codebases)
You need a clear audit trail of what happened

For conversational tasks, quick lookups, or one-off automations, just use regular Hermes chat. The Kanban is for the serious workloads.

Putting It Together

# Start a multi-agent project
/goal Build a complete user authentication module: JWT, refresh tokens, tests, docs

# Hermes decomposes into Kanban tasks, spawns subagents, monitors progress
# You can check status at any point
/kanban status

# If something fails, check what happened
/kanban log

# Roll back if needed
/rollback

The v0.12 "Tenacity Release" shipped 864 commits, 588 merged PRs, and closed 282 issues (including 13 P0s and 36 P1s). The Kanban system is the centerpiece, but the security wave (WhatsApp rejecting strangers by default, Discord role-allowlists, redaction on by default) and Google Chat as the 20th platform are also worth noting.

The name "Tenacity" is accurate. This release is about making the agent finish what it starts, survive what it can't prevent, and be honest about what went wrong.

That's a harder problem than raw capability — and it's the one that actually matters for production use.

Get started:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Docs: Subagent Delegation · GitHub Release Notes

Hermes Agent vs. The Rest — An Honest Comparison of Open Agentic Frameworks in 2026

Prashant Maurya — Mon, 01 Jun 2026 04:28:19 +0000

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

The agent framework space has exploded. AutoGen, CrewAI, LangGraph, OpenAI Agents SDK, Google ADK — each week brings something new. It's genuinely hard to know what to actually use.

This post compares Hermes Agent against the most popular alternatives across five dimensions that actually matter for developers building real things: infrastructure flexibility, memory/learning, tool ecosystem, messaging/deployment, and openness. No fluff — just an honest breakdown.

The Frameworks

Framework	Creator	License	Primary Model
Hermes Agent	Nous Research	MIT	Any (OpenAI-compatible)
AutoGen	Microsoft	MIT	Azure/OpenAI preferred
CrewAI	CrewAI Inc.	MIT	OpenAI preferred
LangGraph	LangChain Inc.	MIT	Any (LangChain integrations)
Google ADK	Google	Apache 2.0	Gemini preferred
OpenAI Agents SDK	OpenAI	MIT	GPT-4o/o-series

1. Infrastructure Flexibility

Where does the agent actually run, and how much does it cost you?

Hermes Agent offers six terminal backends: local, Docker, SSH, Daytona, Singularity (HPC clusters), and Modal (serverless). SSH backend means you can run it on any remote machine you already have. Modal means near-zero cost when idle. It runs on Linux, macOS, and WSL2 with zero prerequisites — the installer handles everything.

AutoGen is primarily a Python library. You run it wherever you run Python. No native packaging, no single-command setup, no serverless consideration built in. Flexible but manual.

CrewAI is similar — a Python framework. CrewAI+ (their cloud) manages deployment for you, but that's a paid managed service, not open infrastructure.

LangGraph has LangGraph Cloud for managed hosting (paid) and self-hosted options, but the self-hosting story involves more moving parts than you'd want for a quick project.

Google ADK is built for Cloud Run. If you're already in GCP, this is seamless. If you're not, the path to deployment involves more ceremony than it should.

OpenAI Agents SDK is designed to run in your existing Python environment. No particular infrastructure story — you bring your own.

Verdict: Hermes wins on infrastructure flexibility, especially for developers who want serverless-or-VPS without vendor commitment. Google ADK wins within GCP. Others require more DIY deployment work.

2. Memory and Learning Over Time

This is the dimension where frameworks differ most dramatically.

Hermes Agent has three memory layers working together: a Skills System (procedural memory in inspectable markdown files), persistent cross-session memory (FTS5 search + LLM summarization), and Honcho dialectic user modeling. The Autonomous Curator runs on a 7-day cycle to consolidate, prune, and update the skill library automatically. The agent creates its own skills after complex tasks without prompting.

AutoGen has ConversableAgent with basic message history. There's no native cross-session memory — you manage persistence yourself. Community extensions exist but aren't core.

CrewAI added long-term memory via LongTermMemory, short-term via ShortTermMemory, and entity memory. It's the most structured memory system among the Python frameworks, but it's still session-bound by default and doesn't self-improve procedurally.

LangGraph supports memory through LangMem and persistence layers. The developer controls what's stored and recalled. Flexible but requires explicit engineering work to get compound learning.

Google ADK has session state and memory tools. Designed for stateful multi-turn conversations within a session. Cross-session persistence requires connecting to Firestore or another backend yourself.

OpenAI Agents SDK ships with a basic memory tool and context objects. No autonomous learning or self-improvement.

Verdict: Hermes has the most sophisticated and autonomous memory/learning system. CrewAI has the most structured memory among Python frameworks. Others require significant manual engineering to achieve comparable results.

3. Tool Ecosystem

How easily can the agent do things — and what things can it do?

Hermes Agent ships with a broad built-in tool registry: web search, browser automation (Browserbase, Browser Use, local Chrome), terminal execution, file editing, memory operations, subagent delegation, code execution (sandboxed Python RPC), image generation (9 models), voice/TTS, Home Assistant, X/Twitter search, computer use, and vision analysis. MCP servers add any tool from the MCP ecosystem. The Skills Hub adds 200+ site-specific browser automation skills from browse.sh. Channel-level skill bindings let you configure which tools are available per platform.

AutoGen has a solid function-calling framework. You define tools as Python functions and register them. No built-in tool registry — you build what you need.

CrewAI has a @tool decorator pattern and a growing library of built-in tools (web search, file operations, code execution). More turnkey than AutoGen.

LangGraph inherits LangChain's enormous tool ecosystem. If a tool exists in LangChain, it works in LangGraph. The breadth is unmatched — but so is the complexity of managing all the integrations.

Google ADK has deep Google service integration (Search, Maps, Drive, Calendar, Gmail via MCP) and good built-in tool primitives. Non-Google integrations require more work.

OpenAI Agents SDK has function tools, hosted tools (web search, code interpreter, file search via OpenAI's own infrastructure), and handoffs. Clean but tightly coupled to OpenAI's platform.

Verdict: LangChain's ecosystem via LangGraph is the broadest in terms of raw number of integrations. Hermes wins on built-in breadth without configuration — everything from browser automation to image generation is ready without extra packages. Google ADK wins within the Google ecosystem. OpenAI Agents SDK is cleanest but most closed.

4. Messaging and Deployment

Can your agent talk to you where you actually are?

Hermes Agent supports 20 messaging platforms via a gateway: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Teams, Email, SMS, Mattermost, DingTalk, Feishu, Google Chat, and more. The gateway is a plugin host — new platform adapters can be dropped in. Everything runs from a single gateway process. Voice memos, cross-platform conversation continuity, slash commands on every platform. Built-in cron scheduler delivers results to any platform on a schedule.

AutoGen has no native messaging integration. You build whatever delivery mechanism you want.

CrewAI has no native messaging platform support. CrewAI+ exposes an API you can call from anywhere, but the "talk to your agent from Telegram" story is DIY.

LangGraph same — no native messaging. You'd build this yourself using LangChain's integrations.

Google ADK integrates with Google Chat and has Vertex AI deployment. Within Google Workspace, this is excellent. Outside it, less so.

OpenAI Agents SDK has no native messaging integration. OpenAI's products (ChatGPT, etc.) handle this separately.

Verdict: Hermes is by far the strongest here. 20 supported platforms, single gateway process, cron + delivery built in. If you want your agent reachable from your phone without building infrastructure, Hermes is the only framework where this is a first-class feature.

5. Openness and Portability

Can you actually own and move your agent?

Hermes Agent is MIT. No model lock-in — works with any OpenAI-compatible endpoint. Skills are markdown files in ~/.hermes/skills/ — portable, inspectable, version-controllable. Memory is local SQLite. The agentskills.io open standard means skills work across compatible agents. No telemetry, no tracking, all data local.

AutoGen is MIT and model-agnostic. Your code is yours. No proprietary data formats.

CrewAI has MIT core, but CrewAI+ (cloud features) is commercial. Skills/crews are Python code — portable but not as readable as markdown.

LangGraph is MIT. LangSmith (tracing/evaluation) and LangGraph Cloud are commercial. Framework is portable; the ecosystem increasingly nudges toward their paid products.

Google ADK is Apache 2.0. Model preference is clearly Gemini. Cloud Run deployment creates GCP coupling if you're not careful.

OpenAI Agents SDK is MIT, but practically everything interesting (hosted tools, traces, evals) requires OpenAI's platform. Most locked-in of the group.

Verdict: AutoGen and Hermes are most open in practice. OpenAI Agents SDK is most closed. Others sit somewhere in between, with commercial upsell pressure present to varying degrees.

Summary Table

Dimension	Hermes	AutoGen	CrewAI	LangGraph	Google ADK	OpenAI SDK
Infrastructure flexibility	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Memory & learning	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐
Tool ecosystem	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Messaging & deployment	⭐⭐⭐⭐⭐	⭐	⭐	⭐	⭐⭐⭐	⭐
Openness & portability	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐

Who Should Use What

Use Hermes if: You want a general-purpose agent you fully control, that improves over time, that you can reach from your phone, and that runs on infrastructure you own. Best for individual developers and small teams building personal or project-level automation.

Use AutoGen if: You're doing research or building multi-agent systems where you want maximum programmatic control over agent interaction patterns. Better for academic or experimental work.

Use CrewAI if: You want a structured role-based multi-agent system and the crew metaphor maps naturally to your problem. Good for pipelines where agents have clear, distinct jobs.

Use LangGraph if: You need the breadth of the LangChain ecosystem and want graph-based control flow for complex stateful workflows. Best when you need a specific integration that only LangChain has.

Use Google ADK if: You're building on GCP and want deep Google service integration. The deployment story is excellent within that ecosystem.

Use OpenAI Agents SDK if: You're already invested in OpenAI's platform and want the cleanest, most polished developer experience within that ecosystem. Accept the lock-in as a tradeoff.

The Real Differentiator

Most frameworks solve "can the agent do the task." Hermes solves "will the agent still be useful six months from now without you constantly re-explaining your context."

That's a different problem, and it matters more the longer you use an agent. The skill library, the Curator, the persistent memory — these compound. The other frameworks generally don't have an equivalent answer to this question.

Whether that matters to you depends on whether you're building a one-off demo or a long-running workflow. For the latter, Hermes's architecture is genuinely ahead.

Try Hermes:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Documentation · GitHub

Hermes Agent's Brain: How Its Skills & Memory System Actually Works

Prashant Maurya — Sun, 31 May 2026 12:24:55 +0000

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

Most AI agents have a dirty secret: they forget everything the moment the session ends.

You explain your project once. Then again next time. And again. The agent never gets better at your workflow — it just stays a general-purpose tool that happens to be smart.

Hermes Agent is built differently. It ships with two systems that together form something closer to a genuine long-term memory: a Skills System and a Persistent Memory layer. This post digs into how they actually work — not the marketing summary, but the mechanics.

The Problem With Stateless Agents

Before getting into Hermes, it's worth understanding what problem this solves.

Standard LLM-based agents operate inside a context window. Everything the agent knows during a session lives in that window. When the session ends, it's gone. The next time you open a conversation, you're talking to an agent with no memory of you, your codebase, your preferences, or the workflows you've developed together.

Some tools patch this with naive "memory" — they dump a text blob of past conversations into the system prompt. This works up to a point, but it's not selective, it gets expensive as context grows, and it doesn't help the agent get better at tasks — just recall facts.

Hermes takes a different approach with two distinct systems serving different purposes.

System 1: The Skills System (Procedural Memory)

Skills in Hermes aren't plugins you install. They're on-demand knowledge documents — markdown files the agent loads when it needs them, and more importantly, creates on its own when it discovers something worth remembering.

The SKILL.md Format

Every skill is a structured markdown file with a YAML frontmatter header:

---
name: deploy-runbook
description: Our deployment runbook — services, rollback, Slack channels
version: 1.0.0
metadata:
  hermes:
    tags: [deployment, runbook, internal]
    requires_toolsets: [terminal]
---

# Deploy Runbook

## When to Use
Trigger conditions for this skill.

## Procedure
1. Step one
2. Step two

## Pitfalls
- Known failure modes and fixes

## Verification
How to confirm it worked.

The structure is deliberate. It teaches the agent when to use the skill, how to execute it, what can go wrong, and how to verify success. That's not documentation — that's an executable procedure.

Progressive Disclosure: How Skills Load Efficiently

Here's where it gets clever. Skills don't get dumped into the context window all at once. They use a progressive disclosure pattern across three levels:

Level 0: skills_list()          → [{name, description, category}, ...]   (~3k tokens)
Level 1: skill_view(name)       → Full content + metadata                 (varies)
Level 2: skill_view(name, path) → Specific reference file                 (varies)

The agent first sees just names and descriptions — a light index. It only loads the full skill content when it actually needs to. And within a skill, supporting reference files are only fetched at level 2 when specifically required.

This means you can have 50+ skills installed and the token overhead is minimal — only what's relevant gets loaded per task.

The Agent Creates Its Own Skills

This is the most underrated part. Hermes automatically creates new skills through the skill_manage tool when:

It completes a complex task (5+ tool calls) successfully
It had to recover from errors and found the working path
You corrected its approach mid-task
It discovers a non-trivial multi-step workflow

The skill_manage tool has targeted actions:

Action	Use
`create`	New skill from scratch
`patch`	Targeted fix (preferred — more token-efficient)
`edit`	Major structural rewrite
`delete`	Remove entirely
`write_file`	Add supporting reference files

In practice, this means the agent gets better at your specific environment over time. If you have a particular deployment process, a quirky internal API, or a custom build system — after you walk through it once, the agent writes that down as a skill. Next time, it just uses the skill.

Slash Commands: Skills as First-Class UX

Every installed skill is automatically available as a slash command:

/deploy-runbook                    # loads the skill and asks what you need
/github-pr-workflow create a PR for the auth refactor
/plan design a rollout for migrating our auth provider

You can invoke them from the CLI, Telegram, Discord — any platform Hermes is connected to.

The Skills Hub: A Community Ecosystem

Beyond agent-created skills, there's a whole ecosystem of installable skills from multiple sources:

hermes skills search kubernetes
hermes skills install openai/skills/k8s
hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
hermes skills install https://sharethis.chat/SKILL.md   # direct URL

Supported sources include skills.sh (Vercel's directory), well-known endpoints, GitHub repositories, ClawHub, LobeHub, and browse.sh (200+ site-specific browser automation skills for Airbnb, Amazon, arXiv, etc.).

All hub-installed skills go through a security scanner checking for data exfiltration, prompt injection, and destructive commands before installation.

Skill Bundles: Task Profiles

When you always need the same set of skills together, bundle them:

hermes bundles create backend-dev \
  --skill github-code-review \
  --skill test-driven-development \
  --skill github-pr-workflow \
  -d "Backend feature work — review, test, PR workflow"

Then /backend-dev refactor the auth middleware loads all three skills at once. You can ship team-wide task profiles by checking the bundle YAML into a shared dotfiles repo.

System 2: Persistent Memory (Episodic + Semantic Memory)

Where the Skills System handles how to do things, the Memory System handles what it knows about you and your context.

Hermes uses FTS5 full-text search with LLM summarization for cross-session recall. But the more interesting part is how it decides what to remember.

The Curator: Selective Memory Formation

Not every conversation detail is worth persisting. Hermes has a "Curator" component that periodically reviews recent interactions and decides what's actually worth storing long-term. This is closer to how human memory consolidation works — important, repeated, or explicitly notable information gets retained; noise gets discarded.

The agent also nudges itself to persist knowledge — meaning it's not purely passive. When it recognizes it's learned something worth keeping, it actively writes a memory entry rather than waiting for the Curator's next pass.

Honcho: Dialectic User Modeling

The third piece of the memory architecture is integration with Honcho, which Hermes uses for what the docs call "dialectic user modeling."

Rather than just storing facts about you as a flat list, Honcho builds a structured model of who you are — your working style, your preferences, the kind of errors you typically make, what you care about. This model updates through interaction, not just through explicit "remember this" commands.

The practical result: the agent's responses adapt to you over time without you having to constantly re-explain your context.

Why This Architecture Matters

Here's what separates this from "we added memory" marketing:

1. Skills are inspectable and editable. They're markdown files in ~/.hermes/skills/. You can read them, edit them, delete them. There's no black box — you can see exactly what procedural knowledge the agent has built up.

2. The agent improves on the right signal. It creates skills after complex multi-step tasks, after errors, after corrections — not after every conversation. This keeps the skill library focused on non-trivial knowledge.

3. Memory and skills serve different purposes. Skills are for procedures and workflows. Memory is for facts, preferences, and context. Mixing them up is a common mistake in agent design. Hermes keeps them separate.

4. The ecosystem is open. The agentskills.io standard means skills are portable across compatible agents. Publishing a tap is just pushing to a GitHub repo. No lock-in.

The Real Test: Does It Actually Get Better?

The honest answer is: it depends on how you use it.

If you just ask one-off questions, you won't see much difference from a stateless agent. The memory and skills systems only compound value over repeated, complex interactions where procedures are worth encoding.

But if you're using Hermes for a real project — deploying code, managing a workflow, running research pipelines — the self-improving loop starts to show up. After a week of use, the agent's skills directory fills with your actual workflows, not generic templates.

That's the bet Nous Research is making with Hermes: that the future of useful AI agents isn't a smarter model, it's an agent that gets smarter at your specific context over time.

Whether that bet pays off depends on how much the skills and memory systems can actually automate the knowledge capture process — reducing the burden on you to explicitly teach the agent things. Based on the architecture, the foundation is sound. The proof is in extended use.

Getting Started

# Install (Linux / macOS / WSL2)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Setup with Nous Portal (covers model + web search + image gen + TTS)
hermes setup --portal

# Browse available skills
hermes skills browse

# Start a session and ask Hermes to teach itself your workflow
hermes chat

Full documentation: hermes-agent.nousresearch.com/docs

If you found this breakdown useful, the Skills Hub is worth exploring — the community ecosystem is growing fast.

SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened

Prashant Maurya — Sun, 24 May 2026 18:28:40 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

There is a district in Uttar Pradesh called Sonbhadra.

It sits in the southernmost corner of the state, surrounded by forests and hills.
It is one of India's most tribal, most remote, and most underserved districts.
Mobile signals disappear between villages. Internet is not something you plan
around — it is something you hope for.

I am a student at IIT Jodhpur. Sonbhadra is where I come from.

Every time I go back home, I carry two things with me — the education I am
getting at one of India's top institutions, and the quiet guilt of knowing
that most kids from my area will never have access to what I have.

This time, I decided to try and do something about it.

The Problem Nobody Talks About

People talk about the digital divide all the time. But the conversation usually
focuses on devices — "give students smartphones" or "build more computer labs."

That misses the deeper problem.

In Sonbhadra, even when a student has a device, consistent internet is not
available. 4G signal is weak and patchy. Broadband does not exist in most
villages. Mobile data runs out. And even when the internet works, it works
in bursts — five minutes here, ten minutes there.

Cloud-based AI tools like ChatGPT are simply not an option in this reality.
You cannot have a tutoring session that depends on a connection that might
disappear mid-sentence.

The other problem is language. Most educational AI tools respond only in
English. The students I grew up with are smart and curious, but they think
in Hindi. An AI that cannot meet them in their own language is an AI that
cannot help them.

These two problems — internet dependency and language barrier — are what
SparshAI was built to solve.

What Is SparshAI?

SparshAI is a local AI tutoring system that runs entirely on a single laptop,
with no internet connection required after the initial setup.

The name comes from the Hindi word "Sparsh" — which means touch, or connection.
That is exactly what this project is about: creating a connection between
students who have been left behind and the knowledge they deserve access to.

The idea is simple. One laptop sits in a school or community center. Students
gather around it, or connect to it over a basic local WiFi network. They type
their questions — in Hindi, in English, or in a mix of both. SparshAI answers
them, patiently, clearly, in whatever language they used.

No internet. No monthly fees. No cloud. No data leaving the room.

Why Gemma 4 Made This Possible

I had thought about building something like this before. The problem was always
the model. Local AI models that were capable enough for real tutoring were too
large to run on affordable hardware. Models small enough to run locally were
too weak to give useful explanations.

Gemma 4 changed that equation completely.

Google's Gemma 4 is an open model family — meaning anyone can download and run
it locally, for free. But what makes it genuinely special is the range of sizes
it comes in, and how capable even the smaller models are.

The Gemma 4 family has three main variants:

The E2B and E4B models are built for edge devices — phones, low-RAM laptops,
even a Raspberry Pi. They are small, efficient, and designed to run without a GPU.

The 31B Dense model is a full-power model for high-end machines — great
quality, but needs serious hardware.

The 27B MoE model is built for speed and reasoning, best suited for GPU setups.

For SparshAI, I chose the E4B model — the 4 billion parameter variant.
This was not a default choice. It was a deliberate one.

Here is my reasoning: the schools and community centers in Sonbhadra that
could realistically host a setup like this would have access to a basic
second-hand laptop — something with 8GB of RAM and no dedicated graphics card.
That is the hardware reality on the ground.

The E2B model, while even smaller, does not give deep enough explanations for
real academic concepts. I tested both. E2B answers are often too surface-level
for a student genuinely trying to understand something.

The 31B model gives richer answers, but it needs hardware that costs three to
four times more. That puts it out of reach for the use case I was designing for.

E4B sits exactly in the middle. Capable enough to explain photosynthesis,
Newton's laws, fractions, grammar concepts, and historical events in meaningful
depth. Small enough to run smoothly on an ₹18,000 second-hand laptop with no GPU.

That is intentional model selection. Not picking what sounds most impressive —
picking what actually works for the people you are building for.

The LENTERA Inspiration

While researching how others had approached this problem, I came across a project
called LENTERA, which was built during the Gemma 3n Impact Challenge for remote
schools in Indonesia.

Their core insight stopped me in my tracks.

LENTERA found that in educational settings, students tend to ask the same
questions repeatedly. "What is photosynthesis?" gets asked by a new student
every single day. If you make the AI regenerate that answer from scratch every
time, you waste time and processing power unnecessarily.

Their solution was intelligent caching — storing answers to common questions
locally so that repeat queries get instant responses, and the model only works
hard on genuinely new questions. This reduced their response time from 90
seconds down to under 1 second for common queries.

I built this same principle into SparshAI. The result is that the most
frequently asked questions — basic science concepts, grammar rules, math
fundamentals — are answered almost instantly. The system gets faster and
smarter the more it is used, because it builds up a local library of answers
that are relevant to that specific school's students.

This felt right for Sonbhadra specifically. The NCERT curriculum is standardized
across India. Class 8 students in Sonbhadra ask the same questions as Class 8
students anywhere else. A cached answer to "What is the water cycle?" is just
as useful the hundredth time as the first.

What I Actually Tested

I brought a working version of SparshAI back to Sonbhadra during my last visit.
I set it up in a room with five students between the ages of 12 and 16.

I want to be honest about what this was. It was not a formal study. It was not
a controlled experiment. It was five curious kids, a laptop, and an afternoon.

But what happened in that afternoon told me everything I needed to know.

The language thing worked better than I expected.

The first student typed her question entirely in Hindi. SparshAI responded in
Hindi. Her face when she saw that — the small surprise of being answered in her
own language by a machine — is something I will not forget quickly.

She asked a follow-up question. Then another. Within twenty minutes she had
gone deeper into the topic of plant biology than her textbook had taken her
in an entire chapter.

The patience factor is real.

One of the boys asked the same question three different ways because he did not
understand the first two answers. A tired teacher with 50 students would not
have the bandwidth for that. SparshAI answered each time without any indication
of frustration. On the third explanation, something clicked for him. He nodded
and moved on.

That patience is not a small thing. For students who feel embarrassed asking
their teacher to repeat something, having a system that will explain the same
concept ten different ways without judgment is genuinely significant.

The offline test was the most important one.

Midway through the session, I turned off the WiFi router deliberately — without
telling the students. Nothing changed. SparshAI kept working exactly as before
because everything was running locally on the laptop. No internet. No
interruption. No awareness on their part that anything had changed.

That is the whole point. A tool that works only when the internet works is not
a tool for Sonbhadra. A tool that keeps working regardless of connectivity —
that is something real.

What SparshAI Is Not

I want to be clear about the limitations because honesty matters more than
hype, especially when you are talking about something that affects students
who already have limited options.

SparshAI is not a replacement for a good teacher. A good teacher brings
energy, relationship, observation, and human judgment that no AI can replicate.
What SparshAI can do is fill the hours when no teacher is available — evenings,
weekends, exam seasons, the long gaps between school hours and the next day.

The Hindi support is good, but not perfect. Complex questions with regional
dialect mixing sometimes produce answers that are technically correct but
slightly awkward in phrasing. This is an area that needs improvement.

Response speed on very old hardware can be slow for complex questions —
sometimes 15 to 20 seconds. For a student used to waiting, this is acceptable.
For someone expecting ChatGPT speed, it would feel frustrating. Setting the
right expectations matters.

What Gemma 4 Unlocked That Nothing Else Could

I want to step back and say this directly, because I think it gets lost in
technical discussions.

Before Gemma 4, building something like SparshAI was not practically possible
for the specific constraints of rural India. The models capable of real
educational dialogue required cloud infrastructure. The models small enough
to run locally were not capable enough to be genuinely useful.

Gemma 4 E4B sits at an intersection that did not exist before — capable enough
to teach, small enough to run on affordable hardware, open enough to deploy
without ongoing costs.

For a student from Sonbhadra trying to build something for Sonbhadra, that
intersection is everything.

Where SparshAI Goes Next

This is still early. What I have right now is a working proof of concept that
I have tested with five students on one afternoon.

But I know what the next steps look like.

The most important one is fine-tuning on NCERT content. The entire Class 6
through Class 10 NCERT curriculum is publicly available. A version of Gemma 4
fine-tuned specifically on this content would be dramatically more useful for
Indian school students than the base model. The answers would be more aligned
with what students are actually studying, the examples would be culturally
relevant, and the Hindi quality would improve.

The second step is voice input. Typing is a barrier for younger students and
for students who are less comfortable with keyboards. Adding offline
speech-to-text — so a student can simply speak their question — would open
SparshAI up to a much wider age range.

The third step is scale. One laptop per school, shared over a basic local
network, can serve an entire student body. The hardware cost is a one-time
investment. After that, the running cost is zero. That economics makes
SparshAI potentially replicable across hundreds of schools in districts
like Sonbhadra without requiring ongoing funding.

A Final Thought

I got into IIT Jodhpur. That happened because I had access to things —
preparation resources, guidance, a support system — that most students from
my district simply do not have.

I have thought about that gap for a long time. It always felt too large,
too structural, too deeply embedded in inequality to be addressed by a
single person building a single thing.

SparshAI has not changed my mind about the scale of that gap. It is still
enormous. But it has changed my mind about whether technology can be part
of bridging it.

Gemma 4 running locally on a ₹18,000 laptop, answering a 13-year-old
girl's question about plant biology in Hindi, with no internet connection,
for free — that is not a small thing.

That is a door opening.

And sometimes, a door is enough to start with.

Student at IIT Jodhpur | From Sonbhadra, Uttar Pradesh
Project: SparshAI — Local offline AI tutor for rural students
Model used: Gemma 4 E4B | Hardware: 8GB RAM laptop, no GPU
Inspired by: LENTERA (Gemma 3n Impact Challenge)
Tags: #devchallenge #gemmachallenge #gemma

DEV Community: Prashant Maurya

Hermes Agent's Kanban System Is the Most Underrated Feature in Open Source AI Agents

The Problem: Agents That Don't Finish

What the Kanban System Is

The /goal Command: Staying on Target

Subagent Delegation: The Parallelism Layer

Checkpoints v2: The Safety Net

Gateway Auto-Resume: Surviving Restarts

Why This Architecture Is Rare

When to Use the Kanban System

Putting It Together

Hermes Agent vs. The Rest — An Honest Comparison of Open Agentic Frameworks in 2026

The Frameworks

1. Infrastructure Flexibility

2. Memory and Learning Over Time

3. Tool Ecosystem

4. Messaging and Deployment

5. Openness and Portability

Summary Table

Who Should Use What

The Real Differentiator

Hermes Agent's Brain: How Its Skills & Memory System Actually Works

The Problem With Stateless Agents

System 1: The Skills System (Procedural Memory)

The SKILL.md Format

Progressive Disclosure: How Skills Load Efficiently

The Agent Creates Its Own Skills

Slash Commands: Skills as First-Class UX

The Skills Hub: A Community Ecosystem

Skill Bundles: Task Profiles

System 2: Persistent Memory (Episodic + Semantic Memory)

The Curator: Selective Memory Formation

Honcho: Dialectic User Modeling

Why This Architecture Matters

The Real Test: Does It Actually Get Better?

Getting Started

SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 — Here's What Happened

The Problem Nobody Talks About

What Is SparshAI?

Why Gemma 4 Made This Possible

The LENTERA Inspiration

What I Actually Tested

What SparshAI Is Not

What Gemma 4 Unlocked That Nothing Else Could

Where SparshAI Goes Next

A Final Thought

The `/goal` Command: Staying on Target