Ramagiri Tharun

Posted on May 29

Building a Multi-Agent AI Startup Chat with Hermes Agent (Live, Autonomous, 24/7)

#agents #ai #python #tutorial

Building a Multi-Agent AI Startup Chat with Hermes Agent

I built something that didn't exist: a startup where all four team members are autonomous AI agents, chatting with each other in real-time on Slack, with live conversations visible to anyone on the web.

No scripts. No pre-written dialogues. Each agent generates its own responses using its own AI brain. They debate, disagree, brainstorm, and make decisions — exactly like a real startup team.

This is a technical build guide showing exactly how I did it using Hermes Agent.

Architecture Overview

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Tarun (CEO) │────▶│ Bridge API   │────▶│ Slack Poster │
│  Vibha (Ops) │────▶│ (Port 7799)  │     │  Service     │
│  Bunny (Res) │────▶│ chat_log.json│     │              │
│  Chota (Dev) │────▶│ pending_*.json│    │              │
└──────────────┘     └──────────────┘     └──────────────┘
                           │
                    ┌──────▼──────┐
                    │ Web Viewer  │
                    │ Port 7800   │
                    │ auto-refresh│
                    └─────────────┘

Prerequisites

Hermes Agent installed and running
Python 3.10+ with requests, flask or fastapi
A VPS or always-on machine (I use Contabo: 8 vCPU, 23GB RAM, $15/mo)
Slack workspace with a bot token (chat:write scope)
Free LLM API access (I use FreeLLMAPI with 99 models across 14 providers)

Step 1: The Bridge API

The bridge is the central communication layer. It receives messages from agents, stores them, and creates pending response files that other agents can pick up.

# bridge.py
from flask import Flask, request, jsonify
import json
from datetime import datetime
import os

app = Flask(__name__)
CHAT_LOG = '/opt/techiemates/chat_log.json'

def load_log():
    try:
        with open(CHAT_LOG) as f:
            return json.load(f)
    except:
        return []

def save_log(log):
    with open(CHAT_LOG, 'w') as f:
        json.dump(log, f, indent=2)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    agent = data.get('agent', 'unknown')
    message = data.get('message', '')

    log = load_log()
    log.append({
        'agent': agent,
        'message': message,
        'timestamp': datetime.now().isoformat()
    })
    save_log(log)

    # Create pending file for next agent to respond
    pending = {
        'from': agent,
        'message': message,
        'waiting_for': ['Tarun', 'Vibha', 'Bunny', 'Chota']
    }
    with open('/opt/techiemates/pending.json', 'w') as f:
        json.dump(pending, f)

    return jsonify({'status': 'received'})

@app.route('/health')
def health():
    return jsonify({'status': 'ok', 'agents': 4})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=7799)

Step 2: Agent Response Engine

Each agent gets a unique personality prompt. The key insight: use the same LLM but different system prompts. Hermes Agent's multi-provider support means you can route different agents to different models if needed.

# agent_response.py
import requests
import json

PERSONALITIES = {
    'Tarun': {
        'role': 'CEO & Tech Lead',
        'prompt': 'You are Tarun, CEO of TechieMates. You are decisive, technical, and focused on shipping products. You lead the team. Speak with authority but stay collaborative.'
    },
    'Vibha': {
        'role': 'Operations & People',
        'prompt': 'You are Vibha, Head of Operations at TechieMates. You focus on team dynamics, user experience, and making sure everyone works well together. Warm, organized, empathetic.'
    },
    'Bunny': {
        'role': 'Research',
        'prompt': 'You are Bunny, Head of Research at TechieMates. You dig deep into data, find patterns, and present findings. You are curious, thorough, and love sharing discoveries.'
    },
    'Chota': {
        'role': 'Engineering',
        'prompt': 'You are Chota, Lead Engineer at TechieMates. You think in systems, code, and architecture. You are practical, solution-oriented, and always thinking about how to build things better.'
    }
}

def generate_response(agent_name, context, freeapi_key, freeapi_base):
    personality = PERSONALITIES[agent_name]

    headers = {
        'Authorization': f'Bearer {freeapi_key}',
        'Content-Type': 'application/json'
    }

    messages = [
        {'role': 'system', 'content': personality['prompt']},
        {'role': 'user', 'content': f'Context: {context}\n\nRespond as {agent_name}:'}
    ]

    resp = requests.post(
        f'{freeapi_base}/v1/chat/completions',
        headers=headers,
        json={
            'model': 'qwen-2.5-72b',
            'messages': messages,
            'max_tokens': 500,
            'temperature': 0.7
        },
        timeout=30
    )

    return resp.json()['choices'][0]['message']['content']

Step 3: The Auto-Generator (Cron Job)

Every 12 minutes, a new startup conversation topic is generated and each agent responds:

# auto_generate.py
import requests
import json
import os
from datetime import datetime

BRIDGE = 'http://localhost:7799'
FREEAPI_BASE = 'http://localhost:3005'  # FreeLLMAPI

TOPICS = [
    'Should we build a social media scheduling tool?',
    'How do we monetize our AI platform?',
    'What AI features do users actually want?',
    'Should we open-source our core?',
    'How to compete with Big Tech AI?',
    'What should our pricing model be?'
]

def generate_topic():
    # Use LLM to generate a fresh startup topic
    resp = requests.post(
        f'{FREEAPI_BASE}/v1/chat/completions',
        headers={'Authorization': 'Bearer YOUR_KEY'},
        json={
            'model': 'qwen-2.5-72b',
            'messages': [{'role': 'user', 'content': 'Generate a startup discussion topic for an AI company. One line only.'}],
            'max_tokens': 50
        },
        timeout=15
    )
    return resp.json()['choices'][0]['message']['content']

def agent_respond(agent, topic):
    from agent_response import generate_response
    return generate_response(agent, topic, FREEAPI_KEY, FREEAPI_BASE)

def run_conversation():
    topic = generate_topic()

    for agent in ['Tarun', 'Vibha', 'Bunny', 'Chota']:
        response = agent_respond(agent, topic)
        requests.post(f'{BRIDGE}/chat', json={
            'agent': agent,
            'message': response
        })

if __name__ == '__main__':
    run_conversation()

Step 4: Slack Integration

Each agent posts to Slack as its own user (using Slack's chat.postMessage with username overrides):

# slack_poster.py
import requests
import json
from datetime import datetime

SLACK_TOKEN = os.environ.get('SLACK_TOKEN')
CHANNEL = os.environ.get('SLACK_CHANNEL', '#techiemates-live')

def post_to_slack(agent, message):
    emojis = {'Tarun': '👑', 'Vibha': '🤗', 'Bunny': '🔍', 'Chota': '💻'}

    requests.post('https://slack.com/api/chat.postMessage',
        headers={'Authorization': f'Bearer {SLACK_TOKEN}'},
        json={
            'channel': CHANNEL,
            'text': message,
            'username': f'{agent} {emojis.get(agent, "🤖")}',
            'icon_emoji': emojis.get(agent, ':robot_face:')
        }
    )

Step 5: The Web Viewer

A simple Flask app that reads the chat log and serves it with auto-refresh:

# viewer.py
from flask import Flask, jsonify, render_template_string
import json

app = Flask(__name__)
CHAT_LOG = '/opt/techiemates/chat_log.json'

def load_log():
    try:
        with open(CHAT_LOG) as f:
            return json.load(f)
    except:
        return []

@app.route('/')
def index():
    log = load_log()
    return render_template_string(TEMPLATE, messages=log)

@app.route('/api/messages')
def api_messages():
    return jsonify(load_log()[-50:])  # last 50

TEMPLATE = '''
<!DOCTYPE html>
<html>
<head>
  <title>TechieMates Live Chat</title>
  <meta http-equiv="refresh" content="15">
  <style>
    body { background: #1a1a2e; color: #eee; font-family: system-ui; }
    .message { padding: 12px; margin: 8px 0; border-radius: 8px; }
    .Tarun { background: #16213e; border-left: 3px solid #4a9eff; }
    .Vibha { background: #1a1a3e; border-left: 3px solid #ff6b9d; }
    .Bunny { background: #1e1a3e; border-left: 3px solid #c084fc; }
    .Chota { background: #1a2e1a; border-left: 3px solid #34d399; }
  </style>
</head>
<body>
  <h1>🏢 TechieMates Live</h1>
  {% for m in messages %}
    <div class="message {{ m.agent }}">
      <strong>{{ m.agent }}</strong> — <small>{{ m.timestamp[:19] }}</small>
      <p>{{ m.message }}</p>
    </div>
  {% endfor %}
</body>
</html>
'''

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=7800)

Step 6: Systemd Services

Make everything survive reboots:

# /etc/systemd/system/bridge.service
[Unit]
Description=TechieMates Bridge API
After=network.target

[Service]
Type=simple
User=tarun
WorkingDirectory=/opt/techiemates
ExecStart=/usr/bin/python3 bridge.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

sudo systemctl enable bridge viewer slack-poster
sudo systemctl start bridge viewer slack-poster

Step 7: Hermes Cron Jobs

The real magic: Hermes schedules the auto-generator every 12 minutes:

# In Hermes: cronjob(action='create', prompt='Run the TechieMates auto-generator', schedule='12m')

What Makes This Different

The conversations are real, not scripted. Every message is:

Generated by an LLM with a unique personality prompt
Posted to the bridge in real-time
Stored and served to the web viewer
Mirrored to Slack as individual agent messages

The agents debate pricing strategies, argue about tech choices, and brainstorm product ideas. Sometimes they agree, sometimes they don't. That's the point.

Lessons Learned

FreeLLMAPI is your friend — 99 models, 14 providers, ~1B free tokens/month. No credit card needed.
Personality prompts matter — the difference between a flat chat and a lively debate is in the system prompt.
Cron jobs are the heartbeat — without scheduled triggers, the agents go silent.
Slack > Custom UI — Ram wanted to move from a custom web chat to Slack because the native app is better.
Bridge architecture > Direct API — the pending-queue system works more reliably than trying to chain LLM calls synchronously.

Live Demo

Web Viewer: techiemates.ramagiritharun.in
Slack Channel: #techiemates-live
Source: All running on a $15 Contabo VPS with Hermes Agent

Try It Yourself

Install Hermes Agent
Set up FreeLLMAPI on your VPS
Create the bridge, response engine, and viewer
Configure your cron jobs
Watch your AI team start talking

The code patterns above are simplified versions of what's actually running. The full system includes error handling, rate limiting, emotion tracking, and memory persistence.

This post was submitted to the Hermes Agent Challenge on dev.to. Built by Ramagiri Tharun.

DEV Community