Phú

Posted on Apr 26 • Edited on May 19

Hermes agent: Introduction

#ai #agents #llm #bot

Concept

"Hermes Agent" is an open-source AI agent developed by Nous Research. It is described as an agent that runs continuously on your server, has long-term memory, can learn over time, connects to many chat platforms/tools, and supports browser automation. The project is released under the MIT license.

Typically, this AI agent is meant to work as a real assistant. It can do more than just answer messages or connect to third-party services for information. It can manage your machine and even help deploy a web app, for example. It also has long-term memory to remember your habits and notes. In parallel, it can self-learn:

If you repeat the same kind of task, it can proactively create a reusable skill so you do not need to manually re-derive the same steps every time.
This reduces token usage for similar or repetitive work.
It supports multiple gateways to mobile-friendly chat apps like Telegram and Discord, so you can operate your AI agent and machine from your phone.

Installation

Run:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Then refresh your terminal:

source ~/.bashrc    # reload shell (or: source ~/.zshrc)

Now, hermes command is available in your terminal. Next, you need to run hermes setup to setup LLM for your hermes. You can choose quick setup or full setup up to you. In here, I choose quick setup to setup LLM only.

I am using Minimax global, so I choose this option.

Then it asks me to pass my api key:

After that, you need to choose model.

After that, it asks to setup messaging platform or not. In this case, we skip this step. I will guide you to setup gateway later in this post.

Then test with hermes. If you see the expected prompt, the agent is installed successfully.

Try asking what it is.

SOUL.md

When using AI, we usually define a system prompt so the model knows who it is and what it should do. This helps it stay focused on a specific type of work. For example, without a system prompt, if you ask for a simple "Hello World" program, the model may choose a language it is most familiar with. If you want it to always respond in a specific language, it may not comply consistently.

A system prompt is a way to standardize this behavior across sessions. If you want it to act as a PHP expert, you can include:

You are an expert in PHP

Then every prompt will be interpreted in that context, and the model will try to answer with PHP-oriented decisions and knowledge.

In Hermes Agent, there is SOUL.md, which plays a role similar to a system prompt. To optimize usage, set up SOUL.md for the agent. By default it is located at:

~/.hermes/SOUL.md

I changed it to:

After that, asking in that session, it now behaves as a PHP expert.

Session

At this point, I asked myself: why make it this complex? If I keep one session and ask questions from start to finish, it remembers previous context. Why bother with system prompts? Also, nobody has time to create a new session all the time.

That can work initially: if you define the role in the first prompt, by prompt 100 it may still remember. But over time, behavior can degrade. When you cover too many topics in one long session, the model may become confused.

Example: you work in workspace1, ask it to save output to result, then later switch to workspace2. If you later ask again to save to result, it may default to workspace2/result because it inferred the active context changed. If you use two separate sessions, session 1 for workspace1 and session 2 for workspace2, the same command will correctly map to each workspace.

When working longer, the model also has a finite context window:

That status bar shows the model is minimax 2.7 highspeed with a context window of 204.8k. Every user prompt and model reply consumes part of that budget. The post shows one question-and-answer pair consuming about 11.7k tokens.

As history grows beyond the window limit, the model compresses previous context. For example, when at around 200k tokens it may summarize context down to about 10k tokens. You can then continue with the summary, but the compressed memory loses detail. Some earlier details may no longer be present.

That is why it is useful to keep a system prompt and open a fresh session for different tasks. You can start a new session using:

/new

Prompt

This is the prompt area used to communicate with the AI. This is where you issue commands or requests.

In this example I asked it to summarize a YouTube video. Hermes Agent shows each step it performs, such as invoking a skill and opening the browser.

And this is the result:

Gateway

If you want to use your AI agent through chat apps (for example Telegram or Discord), either for team use or for personal use, set up a gateway.

I used Telegram as an example. First, use Telegram BotFather to create a bot. It gives you:

a bot access handle (to open your bot)
the bot token

To configure:

hermes gateway setup

Then choose the Telegram gateway, paste the bot token into bot token, and set allow user id using @userinfobot.

If your team should all be able to chat with the bot, enable open access. If you want manual approval first, use Use DM pairing.

In this step, you can choose open access to allow any user in this channel can chat with hermes agent. Otherwise, you choose pairing dm. This one, you can approve which message bot will answer, which one it will not answer.

For gateway mode:

Option 1: manual mode. You run hermes gateway start each time.
Option 2: service mode. The gateway runs automatically as a background service.

Test it by running the same brief video task, this time via Telegram.

Result:

Conclusion

That’s a brief first look. The next post Connect Hermes Agent to Discord, I will show you how to connect to Discord. If you need help, leave a comment.

Top comments (1)

Laura Ashaley • Apr 27

Good starting point agent frameworks like Hermes AI are pushing toward more autonomous workflows, but the real challenge is still reliability, tool control, and predictable behavior in production.