DEV Community: Prashant Maurya

I Let Hermes Agent Run My Workflow for a Week — Here's What Actually Happened

Prashant Maurya — Mon, 01 Jun 2026 04:18:21 +0000

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

I'll be honest: I've seen a lot of "AI agent" tools that impress in demos and disappoint in daily use. The demo runs perfectly. Then you try it on your actual messy workflow and it falls apart.

So when I set up Hermes Agent, I didn't benchmark it. I just used it — for real tasks, over a week. This is what happened.

Day 0: Setup (Took 4 Minutes)

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
hermes setup --portal
hermes chat

That's it. No Docker config, no Python environment juggling, no API key hunting across five different services. The --portal flag sets up Nous Portal which bundles web search, image generation, TTS, and browser automation under one subscription — no separate keys needed.

One thing worth knowing: Hermes auto-detects your OS and installs prerequisites. On Ubuntu it grabbed uv, ripgrep, and fd automatically. You can also opt into docker backend if you want isolated execution:

# ~/.hermes/config.yaml
terminal:
  backend: docker
  docker_image: python:3.11-slim

I kept local for the first week. Docker later when I needed isolation for untrusted scripts.

Day 1: The First Real Test — A Research Task

I had to summarize three recent papers on multi-agent coordination and write a one-page briefing. Normally: 45 minutes of reading, note-taking, drafting.

I typed: "Research recent papers on multi-agent LLM coordination from 2025–2026, summarize the key approaches, and write a structured briefing I can share with my team. Save it as multi-agent-briefing.md."

It ran web searches, fetched abstracts, organized findings by theme, and wrote the briefing. Took about 6 minutes. The output needed minor edits — some citations were redundant — but the structure was solid and I saved 35 minutes.

More importantly: after finishing, Hermes created a skill.

✓ Created skill: research-briefing
  Procedure for structured literature research and briefing generation.
  Saved to ~/.hermes/skills/research-briefing.md

I didn't ask for this. It just did it because the task involved 7+ tool calls and a non-trivial workflow. That skill is now reusable — /research-briefing next time instead of re-explaining the whole thing.

Day 2: Connecting Telegram

I spend time away from my laptop. I wanted to be able to kick off tasks from my phone.

hermes gateway telegram

It walked me through the BotFather setup, gave me a QR code to scan, and asked which phone number to whitelist. Five minutes later I was sending tasks from Telegram and getting results in the same chat.

What this actually unlocks: I can start a long-running task from Telegram in the morning, have it run while I'm in class or commuting, and get the result waiting for me. The agent doesn't need me babysitting it.

The gateway supports 20+ platforms now — Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Teams, Email, SMS, and more. I stuck with Telegram because it's what I already have on my phone.

Day 3: Subagents — Parallelizing Work

This was the feature that surprised me most. Hermes can spawn child agent instances and run them in parallel:

/delegate Run these three tasks in parallel:
1. Summarize the Q1 changelog for our project
2. Write unit tests for the auth module
3. Search for any known CVEs in our dependencies

It spawned three subagents with isolated contexts and tool access. All three ran concurrently. Results came back in one consolidated reply.

The default is 3 concurrent subagents. You can bump it up in config. Each subagent gets its own terminal session, so they don't step on each other's file state.

This is the kind of thing that sounds like a toy until you realize you just compressed three sequential tasks into one parallel run.

Day 4: The Curator Ran Overnight

I woke up to a notification:

Hermes Curator finished a review cycle.
- Consolidated 3 skills into 1 (research-briefing, research-summary, lit-review → research-workflow)
- Pruned 1 outdated skill (old deploy config — superseded by deploy-runbook)
- Updated 2 skills with corrections from recent task recoveries

The Autonomous Curator is a background agent that runs on a 7-day cycle (configurable). It grades your skill library, consolidates related skills, prunes dead ones, and rewrites skills that have been corrected during use.

I had four days of skills by this point and it had already found redundancies I didn't notice. The pruned deploy skill was genuinely outdated — I'd updated the config and the old skill would have led the agent astray.

This is the self-maintenance loop in practice. The skill library doesn't just grow — it stays accurate.

Day 5: Scheduled Tasks (Cron)

hermes cron add "Every Monday at 8am, search for new arXiv papers on LLM agents, summarize top 3, and send to Telegram" \
  --skill research-workflow

It parsed the natural language, confirmed the schedule (0 8 * * 1), attached the research-workflow skill, and registered the job. Now every Monday morning I have a paper digest in Telegram without touching anything.

You can also use proper cron expressions if you prefer them. Jobs support pause/resume/edit, and results get delivered to whichever platform you specify.

Day 6: The API Server

This one I didn't expect to be useful, but it was. Hermes exposes an OpenAI-compatible endpoint:

hermes proxy start
# Listening on localhost:8080

This means any tool that talks to OpenAI's API — Aider, Cline, VS Code Continue, Codex — can now route through Hermes. You get Hermes's memory, skills, and tool access through whatever interface you already use.

I pointed my VS Code Continue extension at localhost:8080 and immediately had access to all my project skills inside the editor. No context re-explaining. The agent already knew my project structure from previous sessions.

Day 7: What I Actually Think

After a week, a few things are clear:

What works really well:

The install-and-run experience is genuinely smooth
Telegram gateway makes it actually portable
Skill auto-creation means the agent gets better at your specific tasks without you managing it
The API proxy is a quiet force-multiplier — existing tools suddenly get memory
Parallel subagents save real time on decomposable work

What requires adjustment:

Cold starts with many skills loaded can feel slow (v0.12 cut this by ~57%, still noticeable)
The skill auto-creation is aggressive — you'll want to review the library after the first week and delete anything that's too narrow or task-specific
Browser automation occasionally needs a retry on JS-heavy sites

What I underestimated:
The compounding effect. Day 1, it's just a capable agent. By day 7, it has a library of skills tuned to my actual workflow, scheduled tasks running without my input, and memory of my project context. The gap between day 1 and day 7 is larger than I expected.

The Setup That Makes Sense for Most Developers

If you want to reproduce a useful configuration fast:

# 1. Install
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# 2. Setup with portal (handles model + tools in one)
hermes setup --portal

# 3. Connect Telegram (or Discord/Slack)
hermes gateway telegram

# 4. Add a cron task you'll actually want
hermes cron add "Every morning at 9am summarize my GitHub notifications and send to Telegram"

# 5. Let it run for a week before judging it
hermes chat

The last step is the real one. Hermes rewards sustained use more than most tools.

Docs: hermes-agent.nousresearch.com/docs
GitHub: NousResearch/hermes-agent

The Agent That Lives on a $5 VPS — Why Hermes Changes the Open Source AI Story

Prashant Maurya — Sun, 31 May 2026 09:01:50 +0000

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

There's a version of AI agent development that most people are building toward: a capable agent you fully control, that runs on infrastructure you own, that gets better the longer it runs, and that you can actually understand from the inside.

For a while, that version felt like it was always six months away.

Hermes Agent by Nous Research makes a serious case that we're there now.

The "Not Tied to Your Laptop" Insight

Most AI agent tools are built around the assumption that you're sitting at a computer, running the agent locally, in an IDE or a browser tab. The interaction model is: you prompt, it responds, you wait, you prompt again.

Hermes is designed around a different assumption: the agent should be running somewhere, doing things, while you get on with your life.

The infrastructure flexibility here is real, not marketing. It supports six terminal backends:

Local — your machine, direct execution
Docker — containerized, isolated
SSH — remote server you already have
Daytona — cloud dev environments
Singularity — HPC/research clusters
Modal — serverless, costs nearly nothing when idle

That last two matter a lot. Serverless persistence means your environment hibernates when idle. A pipeline that runs twice a day doesn't need to cost you $50/month on a dedicated VM. You spin it up on Modal, it runs when triggered, it sleeps the rest of the time.

And the messaging gateway makes the "while you get on with your life" part literal. Hermes connects to 20+ platforms: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Teams, Email, SMS — and more. You kick off a research task from Telegram while commuting. The agent runs on a cloud VM. You get the results in the same Telegram chat when it's done.

This is not a trivial architecture decision. It's a fundamentally different relationship between you and the agent.

Open Source Without the Asterisks

"Open source AI" has a credibility problem right now. It often means: open weights for the model, closed system around it, rate-limited API you're dependent on, no way to actually inspect what's happening.

Hermes is MIT licensed. It works with any OpenAI-compatible endpoint — Nous Portal, OpenRouter, OpenAI, your own local model. The skills system is based on the agentskills.io open standard, meaning skills are portable across compatible agents.

More importantly: the agent's knowledge is inspectable. Every skill it creates, every procedure it encodes, lives in markdown files under ~/.hermes/skills/. You can read them. Edit them. Delete them. Version control them. The agent's growing competence isn't locked in a proprietary database somewhere — it's files on a filesystem.

This is what open infrastructure for AI agents should look like. Not just open weights, but open runtime behavior.

The Self-Improvement Loop: Genuine or Hype?

The phrase "self-improving AI" triggers justified skepticism. Let me be specific about what Hermes actually does and what it doesn't claim to do.

What it does:

After completing a complex multi-step task, Hermes writes a skill — a structured markdown document encoding the procedure it followed, the pitfalls it encountered, and how to verify success. When it hits errors and finds the working path, it updates or creates skills. When you correct its approach, it captures the correction.

Next time a similar task comes up, it loads the relevant skill and works from encoded experience rather than reasoning from scratch.

This is not model fine-tuning. The underlying model weights don't change. What changes is the agent's procedural knowledge about your specific environment, workflows, and preferences.

What it doesn't do:

It doesn't make the model fundamentally smarter. It doesn't guarantee skills are always correct. The quality of auto-generated skills depends on the quality of the underlying task execution.

What it does well:

For anyone running repetitive agentic workflows — deployment pipelines, research tasks, code review processes — the compound effect over weeks of use is real. The skill library fills with your actual procedures. The agent stops reinventing your wheel every session.

The Skills Hub: A Bet on Community Infrastructure

Beyond self-created skills, Hermes integrates with a growing ecosystem of community skills:

skills.sh — Vercel's public skills directory
Well-known endpoints — sites publishing /.well-known/skills/index.json (Mintlify does this)
GitHub taps — install from any public repo with a skills/ directory
browse.sh — 200+ site-specific browser automation skills for Amazon, arXiv, Airbnb, and more
ClawHub, LobeHub — community marketplaces

Publishing your own skills is as simple as pushing to a GitHub repo. Other Hermes users add it as a tap with one command.

This is a bet that skills are a valuable unit of shareable knowledge for AI agents — that the right abstraction isn't sharing prompts or sharing models, but sharing encoded workflows that agents can discover, install, inspect, and build on.

That bet makes sense to me. A well-written skill is higher-signal than a prompt, more portable than a fine-tuned model, and more readable than a code library. If the ecosystem grows, there's real compounding value here.

What This Means for Developers

If you're building AI-powered workflows right now, Hermes offers something most options don't: a path from prototype to production that doesn't require you to bet on a single vendor's API staying available and affordable.

Run it on Daytona or Modal when you need serverless scale. Run it on a $5 VPS when you need persistent, always-on access. Switch model providers when prices change. Keep your skills and memory — they're just files.

The messaging gateway means you can interact with your agent from wherever you actually are, not just from a laptop with the right tab open.

And the open architecture means you can inspect, audit, and control the agent's behavior in ways that closed systems simply don't allow.

The Honest Caveats

Hermes is not the finished version of this vision — it's a serious, working implementation of the early version. Some things to know:

Setup has friction. The one-liner installer is smooth, but configuring SSH backends, Telegram gateways, and custom skill directories takes time and comfort with terminal tooling. This is not a consumer product yet.

Native Windows support is early beta. The docs are honest about this. Linux and macOS are the primary targets.

The skills system compounds slowly. You won't see dramatic value on day one. The payoff comes from sustained use on real tasks, not from kicking the tires once.

Model quality still matters. Hermes is the runtime, not the model. The quality of its reasoning depends on which model you point it at.

The Larger Point

We're at an interesting inflection point in AI agent development. The question is no longer "can agents do complex tasks?" They can. The question is "who controls the agent infrastructure?"

Hermes is a clear answer: you do. The agent runs where you put it, talks through the platforms you choose, learns procedures in formats you can read and edit, and doesn't require a subscription to a specific cloud to keep working.

For developers who care about infrastructure ownership and long-term capability building, that matters. Not because the alternatives are bad — some are excellent — but because having a capable open option shapes the entire ecosystem.

That's what Hermes Agent actually is. Not just another capable agent, but a demonstration that capable agents and open infrastructure aren't in tension.

Try it yourself:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
hermes setup --portal
hermes chat

Documentation · GitHub · Discord

Gemma 4 Made Me Rethink Local AI: Not Just Text, But Images Too

Prashant Maurya — Mon, 25 May 2026 05:27:52 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Most people (including me, initially) think of "local AI" as a text‑only chatbot running on a laptop.

Gemma 4 completely broke that mental model for me.

When I started experimenting with it, I realised it is not just a smaller, cheaper alternative to cloud models — it is a multimodal engine that can understand both text and images, and still run on normal hardware if you choose the right variant.

In this post I want to share how that changed the way I think about building AI tools as a student developer.

What makes Gemma 4 different for me

Gemma 4 is Google's latest open‑weight model family, built to be highly capable per parameter and still practical to run locally.

Instead of giving you just one "take it or leave it" model, it comes in multiple sizes that target different devices and budgets.

Small models like E2B and E4B are designed specifically for edge devices and laptops, while the larger 26B/31B variants push quality and long‑context reasoning on stronger machines.

The moment I understood this design, I stopped thinking "can I run AI locally?" and started thinking "which Gemma 4 variant is the right match for this idea and this hardware?"

The moment I noticed this is not just a chatbot

The real surprise came when I realised that all Gemma 4 models are multimodal: they can take image input as well as text, and still generate text output.

On some setups, the small models can even accept audio, which means spoken language can become a first‑class input too.

This changes the kind of tools you can imagine building locally:

you are no longer limited to "ask a question, get a paragraph." You can show the model a screenshot, a chart, a photo of handwritten notes, or a diagram, and let it reason about that.

For me as a student, that means AI can sit closer to my real workflow: messy notebooks, saved PDFs, and random screenshots from class, instead of only clean text prompts.

A simple mental model for choosing Gemma 4 variants

One thing I like about Gemma 4 is that the family feels intentional.

Here's the way I now think about the main variants when planning a project, based on the official docs and model cards.

E2B – When I care most about portability. Tiny edge‑style model for ultra‑limited devices, quick prototypes, or when RAM is really tight.
E4B – When I want a balanced local model for a regular 8–16 GB laptop or desktop, still with multimodal support and long context.
26B / 31B – When quality and long, complex reasoning matter more than strict resource limits, like desktop workstations or servers.

This "fit the model to the hardware and use‑case" mindset is very different from simply asking "what is the biggest model I can download?"

For the challenge, I think judges care a lot about this kind of intentional model selection.

How I used Gemma 4's multimodality in a small local concept

To explore multimodal behaviour without building a huge app, I tried a simple concept:

"Can Gemma 4 act as a local study helper that understands both my text questions and the images I already have on my laptop?"

I focused on three small but realistic tasks:

Explaining diagrams

I used saved images of textbook diagrams (like physics setups and biology charts) and asked Gemma 4 to explain them in plain language. The multimodal support made it possible to ask things like "Explain this circuit in simple words and tell me what each component does."
Summarising handwritten notes

I took pictures or scans of handwritten pages and asked the model to summarise the main points, or turn them into cleaner bullet points for revision. Again, this was image in, text out — all processed locally.
Checking small UI mockups

I showed it screenshots of rough UI sketches and asked basic questions like "What do you think this screen is trying to do?" and "What could confuse a user here?" For a local model, the feedback was surprisingly coherent.

I was not trying to build a production system here; I just wanted to see if the multimodal behaviour felt "real" enough to be useful. After a few sessions, my answer was yes.

What impressed me about running it locally

Running Gemma 4 locally with multimodal input changed my expectations in a few ways.

First, it felt very different to send personal screenshots and notes to a model that never leaves my machine.

The open‑weight nature of Gemma 4 plus the ability to host it myself means I can keep sensitive material (like class slides, project diagrams, or drafts) inside my own environment.

Second, the long context window on Gemma 4 means it can keep track of more information than typical small local models. The smaller variants support around 128K tokens of context, while the larger ones go up to 256K.

In practice, that allowed me to combine multiple prompts, screenshots, and follow‑up questions in one session without the conversation falling apart.

Third, because the family is designed for efficient local execution, the experience stayed "good enough" even without a GPU — which is important if you are working on a regular student machine instead of a high‑end workstation.

How this changes the way I think about future projects

Before Gemma 4, my default architecture for any serious AI idea looked like this:

client → cloud model → response back

Now I find myself sketching a different default:

local app → Gemma 4 running on my own hardware → optional cloud only when truly needed

Knowing that a model can read both text and images, handle long context, and still run reasonably well on a laptop changes what "small project" even means.

Even something as simple as "help me understand my notes and diagrams offline" becomes a realistic weekend project instead of a full infrastructure job.

It also lines up nicely with the official intended‑use guidance around education, analysis of documents, and privacy‑sensitive workloads.

For students and indie developers, that combination of flexibility and control is powerful.

Final thoughts

Gemma 4 is described as "byte for byte, the most capable open models," but what stood out to me in practice was not a benchmark number.

It was the feeling that, for the first time, a multimodal model that understands both text and images can actually live on my own machine instead of only existing behind an API.

As a student developer, that shifts AI from something I call to something I can own and shape.

And that, for me, is the most exciting part of Gemma 4.