This is a submission for the Hermes Agent Challenge
Our goal was not to build another AI wrapper, but to explore how Hermes Agent behaves as a persistent orchestration layer coordinating specialized autonomous workers inside real engineering governance workflows.
Most AI systems today are still fundamentally single-threaded assistants wrapped inside nicer interfaces.
You type a prompt, the model responds, and the workflow ends there.
But our problem was different.
Over the last few years we worked closely with alumni groups, business operators, SaaS platforms, and community engineering teams. One recurring issue appeared everywhere:
People did not simply want AI-generated text.
They wanted workflow intelligence.
They wanted systems capable of:
- coordinating technical tasks,
- evaluating operational risks,
- planning execution flows,
- synthesizing structured engineering decisions,
- and operating reliably across multiple autonomous workers.
That realization eventually led us toward Hermes Agent.
Not because we wanted another chatbot.
But because we wanted to explore orchestration.
The Core Idea
We started asking ourselves a simple question:
What happens when Hermes stops behaving like a conversational assistant and starts behaving like a managerial orchestration layer?
That question became the foundation of our experiment.
The result was Gotihub Hermes Crew.
The name itself carries the philosophy behind the project.
Gotihub is derived from the Bengali word Goti (গতি), meaning Speed.
We wanted to explore whether autonomous engineering workers could coordinate quickly, reliably, and structurally inside real governance workflows.
The result became a high-speed multi-agent engineering orchestration system capable of analyzing GitHub repositories through specialized autonomous workers coordinated by Hermes.
Project Links
Live Demo
GitHub Repository
https://github.com/apurba-labs/gotihub-hermes-crew
Why We Didn’t Want a Single Monolithic Agent
One massive prompt window handling:
- security analysis,
- architecture auditing,
- roadmap planning,
- and executive synthesis
quickly becomes expensive, unstable, and difficult to govern.
So instead of forcing one model to think about everything simultaneously, we separated:
Execution from Governance
Execution Layer
Specialized Gemma workers execute focused engineering tasks independently.
Governance Layer
Hermes coordinates, synthesizes, and manages the outputs generated by those workers.
That separation became the most important architectural decision in the project.
The Multi-Agent Architecture
Our orchestration pipeline follows four major stages:
- SecurityAgent performs repository security analysis.
- ArchitectureAgent evaluates structural and maintainability health.
- PlanningAgent generates engineering roadmap recommendations.
- Hermes Master synthesizes everything into a structured managerial report.
The important detail is that the first stage executes concurrently.
We intentionally used Python’s native asynchronous execution model instead of sequential blocking pipelines.
Stage 1 Concurrency with asyncio.gather
The first orchestration layer launches multiple specialized workers simultaneously:
- SecurityAgent
- ArchitectureAgent
Both execute inside an asyncio.gather() orchestration block.
This allowed us to explore:
- concurrent repository analysis,
- isolated engineering responsibilities,
- and structured task specialization.
Instead of treating AI as a single giant context window, we treated it like a coordinated engineering crew.
System Workflow Architecture
Here is the orchestration workflow powering the system:
The workflow is intentionally separated into:
- concurrent execution,
- planning synthesis,
- and executive orchestration.
This structure allowed us to keep responsibilities isolated while still producing a consolidated engineering report.
Hermes as the Orchestrator
This is where Hermes became genuinely interesting.
Hermes does not directly parse raw repositories in our architecture.
Instead, Hermes behaves like a managerial synthesis layer.
The worker agents generate:
- summaries,
- issue reports,
- confidence scores,
- engineering recommendations.
Hermes then:
- resolves overlap,
- synthesizes cross-agent conclusions,
- generates executive summaries,
- and produces structured JSON outputs.
In other words:
The workers execute.
Hermes governs.
That orchestration philosophy changed how we approached agent systems entirely.
Multi-Subdomain Infrastructure Design
As the system evolved, we realized orchestration architecture alone was not enough.
We also needed infrastructure separation.
So we deployed the ecosystem using multiple subdomains and isolated routing layers:
-
gotihub.com→ corporate site -
agl.gotihub.com→ SaaS engine -
crew.gotihub.com→ Hermes orchestration platform
Behind the scenes:
- FastAPI handled orchestration,
- Docker managed runtime isolation,
- Nginx routed ingress traffic,
- Ollama powered local inference,
- and Hermes coordinated the synthesis layer.
Most importantly:
The inference backbone was never exposed directly to the public internet.
Internal AI Backbone Architecture
The deployment topology evolved into something closer to a lightweight orchestration mesh:
This allowed multiple services to share:
- one centralized inference core,
- isolated application routing,
- and internal-only AI communication.
Real Engineering Problems We Hit
This project was not smooth.
And honestly, that’s where most of the learning happened.
The Local Compute Bottleneck
Our earliest orchestration runs were extremely slow.
One real telemetry session looked like this:
[TELEMETRY] GitHubLoader fetched 8 files in 5.91 seconds.
[Orchestrator] Starting Full Pipeline...
[TELEMETRY] Stage 1 took 218.68 seconds.
[TELEMETRY] Stage 2 took 72.19 seconds.
[TELEMETRY] Stage 3 took 120.18 seconds.
[TELEMETRY] Pipeline Complete! Total Runtime: 411.05 seconds.
The bottleneck was not orchestration.
It was:
- oversized repository context,
- local inference latency,
- verbose prompt chains,
- and massive token generation overhead.
That distinction mattered.
Because it meant the architecture itself was scalable — but inference strategy needed optimization.
What We Optimized
We eventually began improving runtime by:
- reducing repository context size,
- prioritizing critical engineering files,
- limiting unnecessary token generation,
- shrinking synthesis payloads,
- and improving async orchestration boundaries.
The system became dramatically more stable once we stopped treating every file equally.
Defensive Failure Engineering
One of the most important lessons came from structured output failures.
Large orchestration chains occasionally returned:
- malformed JSON,
- partial synthesis blocks,
- or incomplete manager responses.
Instead of allowing pipeline collapse, we added:
- fallback execution paths,
- JSON cleanup layers,
- defensive parsing,
- and structured failure recovery.
That forced us to think less like prompt engineers and more like systems engineers.
Why Hermes Actually Worked Well
Frameworks like CrewAI are excellent for rapidly assembling conversational agent pipelines.
But our exploration focused on something slightly different:
- persistent orchestration,
- structured engineering outputs,
- governance-oriented workflows,
- and isolated worker responsibilities.
We wanted Hermes to operate less like a conversational assistant and more like an engineering coordination layer.
That distinction became the entire philosophy behind the project.
What Fascinated Us Most
The most interesting part was not whether AI could generate text.
It was whether autonomous workers could coordinate reliably inside real operational systems.
That changes the conversation entirely.
Instead of asking:
“Can AI answer questions?”
We started asking:
“Can AI workers collaborate responsibly inside engineering governance workflows?”
Hermes gave us a practical way to explore that future.
And honestly, that exploration became far more valuable than simply building another AI wrapper.
Built With
- Hermes Agent
- FastAPI
- Python AsyncIO
- Ollama
- Gemma 3
- Docker
- Nginx
- SQLite
- Next.js
Final Thoughts
This project is still evolving.
We are actively optimizing:
- orchestration runtime,
- inference efficiency,
- streaming telemetry,
- structured synthesis,
- and governance reliability.
But the biggest thing we learned was this:
Autonomous systems become genuinely interesting when they stop behaving like isolated chatbots and start behaving like coordinated engineering workers.
That is the future we wanted to explore with Hermes.
And we are excited to continue building toward it.


Top comments (0)