When using AI agents for web browsing, CAPTCHAs often stand as the most significant hurdle. These security measures can block agents, prevent form submissions, and halt automated tasks until a human steps in.
Hermes Agent, developed by Nous Research, is a versatile, self-improving AI agent capable of running on everything from a basic $5 VPS to a powerful GPU cluster. It connects with you through familiar platforms like Telegram, Discord, Slack, WhatsApp, Signal, and email. While it can navigate websites, interact with buttons, and extract data, it still faces the common challenge of getting stuck on CAPTCHAs.
CapSolver provides a seamless solution to this problem. By integrating the CapSolver Chrome extension into the browser used by Hermes, CAPTCHAs are resolved automatically and silently in the background. This setup requires no extra code, no manual API calls, and no complex prompt engineering.
The best part? You don't even have to mention CAPTCHAs to your agent. Simply instruct it to pause for a moment before submitting a form—by the time it proceeds, the CAPTCHA is already handled.
What is Hermes Agent?
Hermes Agent is an open-source autonomous tool from Nous Research. It operates on three core pillars: persistent memory (retaining project details across sessions), autonomous skill development (learning and repeating procedures from experience), and infrastructure flexibility (deployable via VPS, Docker, serverless sandboxes, or local GPU setups).
Key Features
- Unified Gateway: Access your agent through Telegram, Discord, Slack, WhatsApp, Signal, email, or a terminal interface.
-
Flexible Model Support: Use
hermes modelto switch between 200+ models via OpenRouter, Nous Portal, NVIDIA NIM, or your own endpoints. - Long-term Memory: Utilizes FTS5 session search and LLM summarization to remember past interactions.
- Skill Repository: An evolving procedural memory system that follows the agentskills.io standard.
- Diverse Backends: Supports seven terminal environments, including Local, Docker, SSH, and Vercel Sandbox.
- Integrated Browser: Controls Chromium through Playwright and the Chrome DevTools Protocol.
The Browser Tool
Hermes utilizes a Chromium browser for tasks like navigation, DOM reading, and data scraping. Its browser implementation is unique because it offers five interchangeable providers:
| Provider | Type | Extension Support? |
|---|---|---|
| Browserbase | Cloud | ✗ |
| Browser Use | Cloud | ✗ |
| Firecrawl | Cloud | ✗ |
| Camoufox | Local (Stealth Firefox) | ✗ |
| CDP attach | Local (Any Chromium) | ✓ |
Cloud-based providers typically don't allow for custom extensions, and Camoufox is built on Firefox, making it incompatible with Chrome extensions. The ideal solution is the CDP attach method, where Hermes connects to a Chromium instance you've already launched. This is where CapSolver excels.
Unlike tools like OpenClaw or Crawlee which manage their own browser launches, Hermes allows you to provide your own Chrome instance with the extension already active, connecting to it via the DevTools protocol.
What is CapSolver?
CapSolver is a premier CAPTCHA-solving platform that uses AI to bypass modern security challenges. It supports all major CAPTCHA types and offers rapid response times, making it easy to integrate into automated systems—whether through direct API calls or by running its Chrome extension within an agent's browser session.
Why This Integration is Different
Most CAPTCHA solutions involve writing code to handle API requests and token injections. This is the standard approach for tools like Puppeteer or Playwright.
The Hermes + CapSolver approach is a paradigm shift:
| Traditional Method (Code-Heavy) | Hermes Method (Natural Language) |
|---|---|
Create a CapSolverService class |
Start Chrome with --load-extension=...
|
Manage createTask() and getTaskResult()
|
Simply chat with your agent |
| Manually inject tokens via script | The extension automates the process |
| Write logic for errors and retries | Tell the agent to "wait a minute, then submit" |
| Specific code needed for each CAPTCHA | Works universally across all types |
The Core Advantage: The CapSolver extension operates within the browser Hermes is controlling. When the agent reaches a CAPTCHA, the extension detects it, contacts the CapSolver API, and solves it in the background. By the time the agent is ready to submit the form, the token is already there.
All you need to do is provide time. Instead of explaining CAPTCHAs to the agent, just say:
"Navigate to the page, wait 60 seconds, and then click Submit."
The agent remains completely unaware of the technical process happening behind the scenes.
Prerequisites
To set up this integration, ensure you have:
- Hermes Agent installed with the gateway active (see installation guide).
- A CapSolver account and an API key (register here).
- Chromium or Chrome for Testing (see the note below regarding standard Chrome).
Important: Use Chromium, Not Branded Google Chrome
As of mid-2025, Google Chrome 137+ has disabled the
--load-extensionflag in branded versions. This means extensions cannot be loaded during automated sessions in standard Chrome or Edge.
You must use one of the following instead:
| Browser Choice | Extension Support | Recommended? |
|---|---|---|
| Google Chrome 137+ | No | No |
| Microsoft Edge | No | No |
| Chrome for Testing | Yes | Yes |
| Chromium (standalone) | Yes | Yes |
| Playwright Chromium | Yes | Yes |
How to install Chrome for Testing:
# Recommended: Install via Playwright
npx playwright install chromium
# Note the path to the binary:
# Linux: ~/.cache/ms-playwright/chromium-XXXX/chrome-linux64/chrome
# macOS: ~/Library/Caches/ms-playwright/chromium-XXXX/chrome-mac/Chromium.app/Contents/MacOS/Chromium
Alternatively, download it directly from the Chrome for Testing portal.
Step-by-Step Setup
This setup involves two main parts:
-
Running a Chrome process with the CapSolver extension and CDP enabled (on port
9222). -
Updating Hermes'
config.yamlto connect to this existing browser.
Step 1: Download the CapSolver Extension
Get the extension and extract it to a known directory:
- Visit the CapSolver GitHub releases.
- Download the latest Chrome extension zip file.
- Extract it:
mkdir -p ~/.hermes/capsolver-extension
unzip CapSolver.Browser.Extension-chrome-v*.zip -d ~/.hermes/capsolver-extension/
Confirm the manifest.json file is present in that folder.
Note on Paths: Always use absolute paths for the
--load-extensionflag to avoid issues with service worker registration in some Chromium builds.
Step 2: Configure Your API Key
Update the extension's configuration file at ~/.hermes/capsolver-extension/assets/config.js with your key:
export const defaultConfig = {
apiKey: 'YOUR_CAPSOLVER_API_KEY', // Insert your key here
useCapsolver: true,
enabledForRecaptcha: true,
enabledForRecaptchaV3: true,
// ... other settings
};
Your key is available on your CapSolver dashboard.
Step 3: Launch Chrome with Extension and CDP
Start Chrome separately with these essential flags:
-
--remote-debugging-port=9222: Enables Hermes to connect. -
--load-extension=...: Loads the CapSolver tool. -
--user-data-dir=...: Keeps the agent's profile separate.
Option A: Manual Launch (for testing)
/path/to/chrome-for-testing/chrome \
--remote-debugging-port=9222 \
--remote-debugging-address=127.0.0.1 \
--user-data-dir="$HOME/.hermes/chrome-debug" \
--load-extension="$HOME/.hermes/capsolver-extension" \
--disable-extensions-except="$HOME/.hermes/capsolver-extension" \
--no-first-run \
--no-default-browser-check \
--no-sandbox
Option B: Background Script (for continuous use)
Create a script at ~/.hermes/chrome-debug.sh:
#!/usr/bin/env bash
CHROME_BIN="$HOME/.cache/ms-playwright/chromium-1200/chrome-linux64/chrome"
EXT_DIR="$HOME/.hermes/capsolver-extension"
USER_DATA_DIR="$HOME/.hermes/chrome-debug"
export DISPLAY=:99 # Required for headless environments
exec "$CHROME_BIN" \
--remote-debugging-port=9222 \
--remote-debugging-address=127.0.0.1 \
--user-data-dir="$USER_DATA_DIR" \
--load-extension="$EXT_DIR" \
--disable-extensions-except="$EXT_DIR" \
--no-first-run \
--no-default-browser-check \
--no-sandbox \
--disable-dev-shm-usage \
--disable-features=Translate
Run it in the background using nohup or manage it with a tool like systemd.
Step 4: Configure Hermes to Use CDP
Modify ~/.hermes/config.yaml to include the cdp_url:
browser:
inactivity_timeout: 120
cdp_url: http://127.0.0.1:9222
This tells Hermes to route all browser actions through your pre-configured Chrome instance.
Step 5: Restart the Hermes Gateway
Apply the changes by restarting Hermes:
hermes gateway run
Step 6: Verify the Integration
Run the diagnostic tool:
hermes doctor
Look for browser-cdp under Tool Availability. If it's there, your setup is active. You can also verify the CDP endpoint directly:
curl -s http://127.0.0.1:9222/json/version
Troubleshooting
browser-cdp is missing in hermes doctor
This usually indicates a configuration error in config.yaml. Ensure cdp_url is correctly nested under the browser: section.
Extension fails to solve CAPTCHAs
Check if you are using branded Google Chrome 137+, which ignores extension loading. Switch to Chrome for Testing or Chromium. Also, ensure your CapSolver balance is sufficient.
Browser timeouts on startup
The first connection might take longer. If it fails, try the command again or increase the inactivity_timeout in your configuration.
Chrome crashes after version updates
If you change Chrome versions, the existing user data directory might be incompatible. Delete ~/.hermes/chrome-debug and restart Chrome to generate a fresh profile.
Best Practices
- Allow Ample Time: Set a wait time of 30–60 seconds to ensure the CAPTCHA has time to be solved and the token injected.
- Use Natural Language: Instruct the agent to "wait a minute before submitting" rather than using technical terms about CAPTCHAs.
- Monitor Credits: Regularly check your CapSolver dashboard to keep your balance topped up.
-
Isolate Browser Data: Always use a dedicated
--user-data-dirto keep the agent's environment separate from your personal data. -
Security First: Ensure
--remote-debugging-addressis set to127.0.0.1to prevent unauthorized remote access to your browser. -
Headless Servers: Use
Xvfbon Linux servers without a GUI to provide the necessary display context for extensions. - Cost Efficiency: Since the extension handles the hard work, you can use more affordable models (like those from OpenRouter) for navigation and interaction tasks.
Conclusion
The combination of Hermes Agent and CapSolver offers a revolutionary, zero-code approach to handling CAPTCHAs. By following this guide, you can:
- Launch a customized Chrome instance with the CapSolver extension.
- Connect Hermes via CDP with a simple configuration change.
- Interact with your agent naturally, letting the background processes handle security hurdles.
This setup transforms CAPTCHA solving into an invisible, automated process, allowing your AI agent to operate without interruption.
Ready to enhance your agent? Sign up for CapSolver today and use the code
hermefor a special bonus on your first deposit!
FAQ
Do I need to explain CapSolver to the agent?
No. The extension works independently. Just give the agent enough time (e.g., "wait 60 seconds") to allow the solve to complete.
Why is branded Chrome not working?
Recent updates to Google Chrome (v137+) removed the ability to load extensions via command-line flags in automated sessions. Chrome for Testing or Chromium are the required alternatives.
Can I use cloud-based browsers?
No, cloud providers like Browserbase don't allow for the custom extension loading required for this specific integration.
What CAPTCHA types are supported?
The extension handles reCAPTCHA (v2/v3), hCaptcha, FunCaptcha, and AWS WAF CAPTCHA automatically. Note that Cloudflare Turnstile requires a different approach via the CapSolver API.
Is Hermes Agent free?
Yes, it is open-source. You only pay for the AI model usage (via providers like OpenRouter) and the CAPTCHA solving credits from CapSolver.



Top comments (0)