Hermes Quant: Zero-Cost Autonomous Equity Research Agent Powered by Hermes 3

Atharva Atal — Sun, 31 May 2026 14:30:57 +0000

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

🛠️ What I Built

Navigating the noise of the stock market requires serious algorithmic filtering and rapid data synthesis. Hermes Quant is a fully autonomous, localized AI financial analyst designed to eliminate information overload for retail investors and algorithmic traders.

Instead of manually checking multiple screeners, technical charts, and news feeds, traders can deploy Hermes Quant to run deep, multi-step quantitative analyses on any global or domestic ticker (from AAPL to RELIANCE). The application instantly aggregates live technical indicators, quarterly fundamentals, and breaking news sentiment into a cohesive, actionable investment thesis.

Best of all? It runs entirely locally. 🔒 By leveraging Hermes 3, the platform provides institutional-grade AI equity research with zero API costs and total data privacy.

🎥 Demo

📸 The Agent in Action:

Generated Pdf

💻 Code

AtharvaAtal / Hermes_quant

⚙️ My Tech Stack

🧠 Core Agent: Hermes 3 (running locally via Ollama for zero-cost inference)
🔗 Agent Orchestration: Python, LangChain / LangGraph
🖥️ Frontend: React / Tailwind CSS
📊 Market Data APIs: yfinance / AlphaVantage / NewsAPI

🤖 How I Used Hermes Agent

Hermes 3 is not just a text generator in this project; it is the core reasoning and orchestration engine driving the entire application. I leaned heavily into its exceptional tool-calling and ReAct (Reasoning and Acting) capabilities.

When a user requests an analysis, Hermes enters an autonomous, iterative execution loop:

🔍 Observation & Tool Calling: It decides which specific financial tools to call based on the asset. For example, it autonomously triggers get_market_sentiment(AAPL) to parse headlines or fetches RSI/MACD indicators.
🧠 Live Reasoning: It evaluates the returned JSON data in real-time, matching overbought/oversold signals against market sentiment to detect hidden risks.
📝 Synthesis: It stops the loop once it has sufficient data, formatting the complex financial metrics into a clean, easy-to-read "Risk Divergence Matrix" and Executive Summary.

Hermes was the perfect fit for this use case because financial analysis requires strict adherence to system prompts and highly reliable, deterministic JSON tool-calling without hallucinating metrics—capabilities where Hermes absolutely excels.

📊 Example Walkthrough: Analyzing AAPL

To see Hermes Quant's agentic capabilities in action, here is how the agent autonomously evaluated Apple Inc. (AAPL):

📉 Step 1 (Technicals): The agent fetched core indicators, identifying that while the price was above the SMA50, the RSI was sitting at 79.0, flagging a highly overbought condition.
📰 Step 2 (Sentiment Extraction): Hermes called the sentiment tool, parsed recent macro headlines, and computed a neutral aggregate sentiment score of 0.0928.
⚠️ Step 3 (Divergence Detection): The agent cross-referenced these data points, catching a critical macroeconomic mismatch: highly bullish medium-term price action versus completely neutral, mixed market sentiment.
📑 The Final Artifact: It successfully generated a comprehensive report advising a contrarian strategy, entirely on its own.

Built by Atharva Atal

Your Agent Is Only as Smart as Its Toolbox — Hermes Agent Challenge Wants to Change That

Atharva Atal — Fri, 29 May 2026 10:51:29 +0000

Most production agents are built around a fixed set of tools. You write the functions, wire them up, and the LLM picks from a menu you designed ahead of time. This works — until the task needs something you didn't anticipate.

Nous Research's Hermes Agent Challenge (HAC) flips this entirely. Instead of giving the agent a toolbox, it gives the agent the ability to build tools. At runtime. Mid-task. By writing Python.

The Core Idea: Skill Memory Over Static Tools

Traditional agent frameworks (LangChain, AutoGPT, Semantic Kernel) follow the same formula:

Traditional: Agent = LLM + predefined tools

HAC: Agent = LLM + sandbox + skill memory

The agent starts with almost nothing — a Python executor and file access. If it needs to parse a CSV, it writes a parser. If it needs a moving average, it codes one, tests it, and keeps it in a persistent skill library for later reuse.

This is not just a clever trick. Capability is a dynamic output of the agent, not a fixed input from the developer.

The GEPA Loop: Goal → Environment → Plan → Act

Most agents run a flavor of ReAct: Thought → Action → Observation. GEPA is more deliberate, and the separation matters.

Letter	Phase	What it does
G	Goal	Fixed natural language target. Doesn't change across the episode.
E	Environment	Files, outputs, and crucially — the agent's current skill library.
P	Plan	Multi-step strategy: "write new skill" or "call existing skill."
A	Act	Execute one step, update environment, loop back.

💡 Key insight: The agent sees its own skill library as part of the environment. So planning can explicitly say "use parse_csv" — just like a developer checks imports before writing new code. This is impossible in a naive ReAct loop.

Skill Creation: Generate → Test → Store

When no existing skill handles a sub-problem, the agent writes a Python function, sandbox-tests it, and stores it only if tests pass.

Step 1 — Generate: LLM writes a def solve(...): function from a natural language description.

Step 2 — Sandbox test: Run the function in a subprocess with a test input. Capture stdout, check return code.

Step 3 — Store (if pass): Add the callable to skill_library. Failures are discarded; the agent retries with a revised description.

def create_skill(description, context, skill_lib):
    # Step 1: LLM generates the function
    code = llm_generate_code(
        f"Write a Python function named solve that {description}."
    )

    # Step 2: Run in a subprocess sandbox
    result = subprocess.run(
        ['python', tmpname], capture_output=True, text=True, timeout=5
    )

    # Step 3: Store only if tests pass
    if result.returncode == 0:
        exec(code, {}, local_ns := {})
        skill_lib[description] = local_ns['solve']
        print(f"✓ Skill saved: {description}")
    else:
        print(f"✗ Skill failed, not saved")

    return skill_lib

This generate → test → store pattern turns the agent into a self-extending system. Over an episode, it accumulates a personal standard library tuned to the task at hand.

A Real Example: CSV Smoothing Task

Goal: "Read sales.csv, compute a 7-day moving average of the revenue column, save to smoothed.csv."

Starting with zero skills, the agent's first plan looks like this:

WRITE_SKILL: parse a CSV file and return a list of dictionaries
WRITE_SKILL: calculate a moving average for a list of numbers given a window size
CALL_SKILL:  parse_csv, "sales.csv"
CALL_SKILL:  moving_average, revenue_data, 7
WRITE_FILE:  smoothed.csv, result

After the first two steps, those skills exist permanently. If the agent gets a similar task tomorrow — different CSV, different column — it plans around skills it already has. No new code generation needed.

The Full GEPA Agent

Here's a minimal but complete implementation you can wire to any LLM:

from dataclasses import dataclass, field
from typing import Callable

@dataclass
class Environment:
    file_system: dict[str, str] = field(default_factory=dict)
    last_action_result: str = ""
    skill_library: dict[str, Callable] = field(default_factory=dict)

class GEPAAgent:
    def __init__(self, model_call: Callable[[str], str]):
        self.model = model_call
        self.goal = ""
        self.env = Environment()

    def perceive(self) -> str:
        return (
            f"Goal: {self.goal}\n"
            f"Files: {list(self.env.file_system.keys())}\n"
            f"Last result: {self.env.last_action_result}\n"
            f"Available skills: {list(self.env.skill_library.keys())}\n"
        )

    def plan_next(self) -> list[str]:
        prompt = (
            "You are an autonomous agent using the GEPA loop.\n"
            f"{self.perceive()}\n"
            "Produce a step-by-step plan. Use WRITE_SKILL: to create new "
            "functions, CALL_SKILL: to use existing ones. One step per line."
        )
        response = self.model(prompt)
        return [line.strip() for line in response.strip().split("\n") if line.strip()]

    def act(self, step: str) -> str:
        if step.startswith("WRITE_SKILL:"):
            desc = step.split("WRITE_SKILL:", 1)[-1].strip()
            self.env.skill_library = create_skill(
                desc, self.perceive(), self.env.skill_library
            )
            return f"Skill created: {desc}"
        elif step.startswith("CALL_SKILL:"):
            parts = step.split(":", 1)[-1].strip().split(",", 1)
            name = parts[0].strip()
            args = eval(parts[1]) if len(parts) > 1 else []
            skill = self.env.skill_library.get(name)
            return str(skill(*args)) if skill else f"Skill not found: {name}"
        return "Unknown command."

    def run(self, goal: str, max_steps: int = 10):
        self.goal = goal
        for _ in range(max_steps):
            steps = self.plan_next()
            if not steps:
                break
            step = steps[0]
            self.env.last_action_result = self.act(step)
            print(f"→ {step}\n  {self.env.last_action_result}\n")

Wire model_call to any LLM API and you have a working GEPA agent.

How This Compares to Existing Frameworks

Aspect	LangChain / AutoGPT	Hermes Agent Challenge
Tool origin	❌ Human-coded, fixed at deploy	✅ Generated by agent at runtime
Capability growth	❌ None during a run	✅ Accumulates across episodes
Failure recovery	❌ Retry with same limited set	✅ Write a better tool, retry
Task generality	❌ Bounded by predefined tools	✅ Limited only by sandbox safety
Control flow	ReAct	GEPA: explicit plan before acting

LangChain gives you a Tool object with a name and a function. You're on the hook for every capability the agent will ever need. HAC makes capability itself a dynamic output of the agent.

What This Means for Practitioners

The mental shift required:

From: "What tools does my agent need?"
To: "What tool-building process should my agent have?"

🔒 Sandboxing is non-negotiable. LLM-generated code executing on your host is a serious security risk. HAC uses Docker or gVisor. Don't skip this.

🧠 Skill quality depends on your model. Hermes models are fine-tuned specifically for this loop. Let the agent generate its own test cases too — it catches edge cases you wouldn't think to write.

📈 Skill persistence is where value compounds. A skill library saved across runs means the agent genuinely gets better over time — not just within a single session. That's the real payoff.

Getting Started

The minimal GEPA loop above is enough to experiment with. For the full sandbox setup, evaluation harness, and Hermes model integrations, the official Hermes Agent Challenge repository is the right starting point.

Try it on a task your current agent can't handle without custom tool work. You'll quickly see where the leverage is.

Have you experimented with dynamic skill creation in your agents? Drop your approach in the comments.

DEV Community: Atharva Atal