DEV Community: Avinash Seethalam

Running Hermes Agent with NVIDIA-Hosted Models and Local Ollama

Avinash Seethalam — Sat, 09 May 2026 11:59:12 +0000

I spent about a week migrating my agent setup off OpenClaw and onto Hermes Agent, with a hybrid backend. NVIDIA-hosted inference for the heavy stuff, a local Ollama daemon for everything I didn't want leaving the box. This is what I ended up with after two false starts and one evening of yelling at WSL networking.

If you're already happy with your current loop, skip this. If you've been hitting the same things I have (flaky routing, agents that lose the plot on multi-file edits, surprise bills), maybe useful.

References I actually opened while writing:

Hermes repo: https://github.com/NousResearch/hermes-agent
Hermes docs: https://hermes-agent.nousresearch.com/docs/
OpenClaw repo: https://github.com/openclaw/openclaw
OpenClaw docs: https://docs.openclaw.ai
NVIDIA NIM / build catalog: https://build.nvidia.com
Ollama repo: https://github.com/ollama/ollama
OpenRouter coding category: https://openrouter.ai/apps/category/coding

Note on framing. OpenClaw and Hermes overlap but they are not the same shape of tool. OpenClaw is a personal-assistant gateway whose main surface is messaging channels (WhatsApp, Telegram, Discord, iMessage). Hermes ships a terminal TUI plus a messaging gateway plus a skills/memory loop, and includes a hermes claw migrate command that imports OpenClaw configs directly. So the comparison below is based on how I was using OpenClaw in practice (terminal-first), not its actual elevator pitch. If you came to OpenClaw for the WhatsApp bot, your mileage will differ.

Why I bothered

The OpenRouter coding category keeps growing. Hermes started showing up in those threads, and Nous shipping an explicit OpenClaw migration path made "try it for a week" cheap.

My OpenClaw setup had drifted into a state I didn't trust. Three things in particular:

Long sessions silently lost context after compaction. Asked it to recall a migration plan we'd sketched two hours earlier and got back a confidently wrong summary that mixed in details from a totally different repo.
Provider routing was opaque. I'd ask for a specific model and it'd quietly fall back to something cheaper. Only noticed because the latency dropped.
Multi-file refactors needed too much hand-holding. Edit file A correctly, edit B as if A's edit hadn't happened, loop.

Not OpenClaw-specific. General failure mode of agents that conflate "context window" with "memory." But it added up.

Why Hermes Over an OpenClaw-Style Workflow

What got better, in roughly a week of use:

Skills + persistent memory as first-class concepts. Hermes has a built-in skill loop and FTS5 session search. OpenClaw has a skills system too (ClawHub) but cross-session recall in Hermes felt tighter. Asking "how did we set up the OpenRouter pinning last week" actually returned the snippet.
A real terminal UI. hermes drops into a TUI with multiline editing, slash-command autocomplete, conversation history, streaming tool output. OpenClaw's chat surface is fine. Hermes' is just better suited to how I work.
Config is YAML. Everything in ~/.hermes/config.yaml, secrets in ~/.hermes/.env. You can diff it. You can copy it.
hermes model for switching providers. Or hermes config set model openrouter/google/gemini-2.5-flash directly. No restart dance.

Where OpenClaw is still the better pick:

Messaging-first workflows. OpenClaw's channel coverage is broader (WeChat, Matrix, Feishu, LINE, Nostr, the long tail). If your bot lives on WhatsApp, stay there.
Live Canvas and Voice Wake are nice if you're building a voice assistant rather than a coding agent. Hermes has voice memo transcription, not the same thing.
If you're on Node-only infra, npm install -g openclaw@latest is one line. Hermes pulls in uv, Python 3.11, Node, ripgrep, ffmpeg.

The thing that mattered most to me architecturally: Hermes treats the provider as configuration, not code. The same model.base_url field handles NVIDIA NIM, Ollama (local or Cloud), OpenRouter, anything OpenAI-compatible. One CLI command flips between them. OpenClaw can do this too. Hermes' YAML-first version is just faster to reason about when something breaks at 11pm.

Installation

macOS daily, Ubuntu workstation, WSL2 on a Windows laptop I travel with. Same one-liner everywhere.

macOS

The official installer handles uv, Python 3.11, Node, ripgrep, ffmpeg:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Output is illustrative, lines vary by version:

==> Installing uv
==> Installing Python 3.11
==> Installing Node.js
==> Cloning hermes-agent
==> Symlinking ~/.local/bin/hermes
✓ Hermes installed. Run: source ~/.zshrc && hermes

Then:

source ~/.zshrc
hermes --version
hermes doctor

hermes doctor is the most useful command during setup. Checks PATH, config location, provider reachability. Run it before anything else.

One macOS thing that cost me twenty minutes: if you have an older Homebrew Python on PATH, the installer prefers its own uv-managed Python (correct), but python3 on your shell is now a different interpreter than the one Hermes is using. Mostly fine, occasionally surprising when you're debugging. If you're hacking on Hermes itself, prefer the dev path:

git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
./setup-hermes.sh
./hermes

Ubuntu / Linux

Same one-liner.

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

If you're on a minimal server image without curl, install it first (sudo apt install -y curl). Installer pulls into ~/.hermes/ and symlinks ~/.local/bin/hermes. Make sure that's on PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
which hermes
hermes doctor

If which hermes is empty, your rc is overriding PATH late. zsh+oh-my-zsh does this. Grep your .zshrc for export PATH= lines that come after the installer's edits.

Windows + WSL2

The Hermes README has a native Windows PowerShell installer flagged as early beta:

irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex

I tried it. It works, but I went back to WSL2 within a day. Inside WSL2 Ubuntu the Linux one-liner is fine:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

WSL was annoying as hell, mostly for reasons that aren't Hermes' fault.

Don't put your project on /mnt/c/.... The 9P translation layer makes file watches and large reads slow enough that tool calls visibly lag. Workspaces on the WSL native filesystem (~/work/...) only.
If you install Ollama on Windows and Hermes inside WSL, you have to reach the Windows host from WSL. Host IP is in /etc/resolv.conf as nameserver. On some configurations it changes between reboots. I gave up. Installed Ollama inside WSL.
Ctrl+C in some Windows terminals doesn't propagate cleanly to long tool calls. Windows Terminal is better than the legacy console here. Don't use the legacy one.

The Hermes browser-based dashboard chat pane requires WSL2 specifically (it uses a POSIX PTY). Classic CLI and gateway run natively. So if you only need the terminal, the PowerShell install is technically fine. I just didn't trust it enough.

First-Run Setup

hermes setup

Wizard. Walks you through provider selection, key entry, writes the config. If you have an existing ~/.openclaw it offers to migrate skills, memories, command allowlists, API keys. From the README:

hermes claw migrate              # Interactive migration
hermes claw migrate --dry-run    # Preview what would be migrated
hermes claw migrate --preset user-data   # Migrate without secrets
hermes claw migrate --overwrite  # Overwrite existing conflicts

Run --dry-run first. It prints exactly what would be copied where. Useful, and the kind of thing that suggests someone actually thought about the migration UX. I imported user-data only and re-pasted my keys by hand because the OpenClaw config had three stale keys I'd forgotten about.

After setup:

~/.hermes/
├── config.yaml     # Settings (model, terminal, TTS, compression, etc.)
├── .env            # API keys and secrets
├── auth.json       # OAuth provider credentials
├── SOUL.md         # Primary agent identity
├── memories/       # Persistent memory
├── skills/         # Agent-created and imported skills
├── cron/           # Scheduled jobs
├── sessions/       # Gateway sessions
└── logs/           # Error and gateway logs

HERMES_HOME overrides the location if you want parallel installations.

hermes config after setup gives you a one-screen view of where everything resolves from. Useful to verify the model is actually pointed where you think.

NVIDIA Model Configuration

NVIDIA's hosted endpoints (build.nvidia.com, the NIM-style ones) are OpenAI-compatible. Hermes already speaks OpenAI-compatible. So plugging them in is base URL plus key.

Why I leaned on them:

Latency was good and stayed good. I expected hosted endpoints to be uneven. Over a week they weren't, with one exception (more on that below).
Llama variants, Qwen-Coder variants, DeepSeek-Coder, Nemotron, all reachable from one provider with one key. No juggling four credentials.
A 70B-class model running in the cloud is, from my workstation's perspective, free RAM.

The exception: on April 18 the integrate.api.nvidia.com endpoint started throwing 5xx for about twenty minutes around midday Pacific. Hermes retried with backoff but the session was effectively frozen until I noticed and flipped to local Ollama. Not a big deal. Worth knowing the failure mode.

Getting a key

Wiring it into Hermes

Fastest path is the env-var route. Hermes reads provider keys and base URLs from ~/.hermes/.env:

# ~/.hermes/.env
NVIDIA_API_KEY=nvapi-...
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
HERMES_INFERENCE_PROVIDER=nvidia

NVIDIA_BASE_URL defaults to https://integrate.api.nvidia.com/v1 per the Hermes environment variables reference, so you can omit it unless you're hitting a self-hosted NIM. HERMES_INFERENCE_PROVIDER accepts values like nvidia, openrouter, anthropic, ollama-cloud (from the same reference). It's the global "which provider is the default" switch.

Pick a model:

hermes model
# or directly
hermes config set model nvidia/meta/llama-3.1-70b-instruct

Model identifier depends on what NVIDIA exposes in the catalog at the time. The catalog moves. Verify the slug before pasting. Common ones I've used:

meta/llama-3.1-70b-instruct
qwen/qwen2.5-coder-32b-instruct
deepseek-ai/deepseek-coder-...
Various nemotron variants

If the slug doesn't resolve, Hermes tells you on first call rather than at config time. Mildly annoying. Fine once you know.

YAML equivalent

# ~/.hermes/config.yaml
model:
  default: meta/llama-3.1-70b-instruct
  provider: nvidia
  base_url: ""        # leave empty to use NVIDIA_BASE_URL from .env
  context_length: 32768

The Hermes config docs are explicit about base_url: when set, Hermes ignores the provider and calls that endpoint directly. Useful for self-hosted NIMs. Footgun if you forget about a stale URL from an experiment three weeks ago. Empty string is the safe default.

Operational notes

Rate limits exist and I haven't found a definitive published cap. In practice agent loops hit limits well before chat sessions do, because every tool result is going back into the context.

Free-tier quotas are real. I'd planned to do bulk repo analysis on hosted models. Switched to local once I realized how fast the quota burns. Reserve the hosted ones for the parts that benefit from a 70B-class model.

Advertised context windows and the windows that actually behave well are not the same. Past ~32K tokens on some models the recall got noticeably worse. I cap context_length at 32768 even on models that claim more. (There's a separate question about whether the model is "using" the long context or just paying its memory cost. I haven't dug in.)

Default timeouts were fine for chat, occasionally too short for long tool-augmented planning. Bump if you're seeing premature aborts.

Ollama Configuration

NVIDIA-hosted is great until you're on a plane, on a hotspot, or working on something you don't want leaving the machine.

Install

macOS:

brew install ollama
brew services start ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama

Default listen: 127.0.0.1:11434. Need it reachable on a LAN, set OLLAMA_HOST=0.0.0.0:11434 before starting. Heads up: there's no auth on the Ollama API. Don't expose it on a public network. Don't bind it to 0.0.0.0 on a coffee-shop wifi.

Pulling models

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:14b      # if you have the VRAM
ollama pull deepseek-coder-v2:16b   # MoE, surprisingly fast for its size
ollama pull llama3.1:8b

pulling manifest
pulling abc123... 100% ▕████████████████▏ 4.7 GB
verifying sha256 digest
writing manifest
success

Verify it runs before wiring it into Hermes:

ollama run qwen2.5-coder:7b "write a python function that reverses a string"

If that hangs >30s on first invocation, the model is loading. Subsequent calls are fast. Hangs forever, you're probably on CPU fallback because the GPU couldn't initialize. Check ollama ps and nvidia-smi (or Activity Monitor on Mac).

Wiring local Ollama into Hermes

This is the gotcha that cost me an hour. Hermes' default for OLLAMA_BASE_URL is https://ollama.com/v1, which is Ollama Cloud, not your local daemon. Want local? Override it:

# ~/.hermes/.env
OLLAMA_API_KEY=ollama                       # any non-empty string; local Ollama ignores it
OLLAMA_BASE_URL=http://localhost:11434/v1   # local daemon, NOT Ollama Cloud

Doc-verified path uses the YAML provider: custom form, which bypasses provider-name routing and calls base_url directly:

# ~/.hermes/config.yaml
model:
  default: qwen2.5-coder:14b
  provider: custom
  base_url: "http://localhost:11434/v1"

Or from the CLI:

hermes config set model.provider custom
hermes config set model.base_url http://localhost:11434/v1
hermes config set model.default qwen2.5-coder:14b

For Ollama Cloud, leave OLLAMA_BASE_URL at default and set HERMES_INFERENCE_PROVIDER=ollama-cloud. The env-vars reference lists ollama-cloud explicitly. A bare ollama provider isn't documented there at the time of writing, so I stuck with provider: custom for local rather than guess.

Sanity check:

hermes doctor
hermes config

hermes doctor will tell you if the configured base URL is unreachable. Faster signal than waiting for the first chat turn to fail.

Operational notes

VRAM is the constraint. A 14b Q4 quant runs comfortably on a 16 GB GPU. A 32b does not. On my M2 Pro 16 GB Mac, 14b is the practical ceiling and I notice the memory pressure with a browser open.

Quantization matters more than I expected going in. q4_K_M is the sweet spot for coding tasks. q8_0 is noticeably better on nuanced refactors but the memory cost is real and you'll feel it.

CPU fallback is unusable for interactive work. A 7b on pure CPU can take 30+ seconds per response. Fine for batch, painful for an agent loop.

Ollama default context is 2048 tokens on some models. Trips people up constantly. Set num_ctx via the model's Modelfile or pass it through Hermes; verify with ollama show <model>. I lost an evening to this before realizing the model wasn't dumb, it was just blind past the first 2K tokens.

Recommended Model Setup

Qualitative, week of real use, no benchmarks.

Model	Provider	Coding Quality	Latency	VRAM	Cost	Best For
Llama 3.1 70B Instruct	NVIDIA	Strong	Medium	n/a (hosted)	Free-tier OK	Planning, long-context reasoning
Qwen2.5-Coder 32B	NVIDIA	Very strong	Medium	n/a (hosted)	Free-tier OK	Multi-file refactors, code review
DeepSeek-Coder (large)	NVIDIA	Strong	Medium	n/a (hosted)	Free-tier OK	Algorithmic / DSA-style tasks
Nemotron family	NVIDIA	Variable	Variable	n/a (hosted)	Free-tier OK	Worth A/B-testing on your domain
Qwen2.5-Coder 14B (q4_K_M)	Ollama	Solid	Fast	~10–12 GB	Local only	Daily driver, offline work
Qwen2.5-Coder 7B (q4_K_M)	Ollama	OK	Very fast	~5–6 GB	Local only	Quick edits, autocomplete-style use
DeepSeek-Coder-V2 16B (MoE)	Ollama	Strong	Fast	~10–12 GB	Local only	Surprisingly capable for its footprint
Llama 3.1 8B	Ollama	OK	Very fast	~5–6 GB	Local only	Lightweight planning / chat

Day-to-day I use Qwen2.5-Coder 32B on NVIDIA for serious work, Qwen2.5-Coder 14B locally for everything else, Llama 3.1 70B on NVIDIA when I need long-context planning. Tried the rest, rotated them out. A coworker on an M3 Max says the local 32B is usable for him; on my 16 GB Pro it isn't, so don't take VRAM numbers above as the floor for everyone.

Switching between them is one line for hosted, three for local because of the base_url switch:

# hosted
hermes config set model nvidia/qwen/qwen2.5-coder-32b-instruct

# local
hermes config set model.provider custom
hermes config set model.base_url http://localhost:11434/v1
hermes config set model.default qwen2.5-coder:14b

hermes model (the interactive picker) does the same thing in fewer keystrokes once you've used it twice.

Real Workflow Improvements

Concrete things that got better:

Repository analysis. Pointing at a 200-file Python repo and asking "where does the auth flow start" used to be a coin flip. With Hermes routing the analysis pass to a 32B-class hosted model and edits to a local 14B, I get useful answers in under a minute, with file paths I can actually open.

Multi-file refactors. Renaming a domain concept across a service used to require me to micromanage every file. Hermes' tool-call sequencing handles "edit A, re-read A, then edit B based on A's new state" without me nudging it each step. Not magic, it still gets confused on circular imports, but the baseline is better.

Long-context exploration works. Pasting a stack trace plus three relevant files into context and asking for a hypothesis is reliable on the 70B hosted model. Local 14B handles shorter cases.

Cross-session recall is the feature I miss most when I temporarily switch back to anything else. "How did I configure the NVIDIA timeout last week" returns the actual config snippet, not a guess. Different in kind.

Skills. I haven't gone deep here yet. The bundled openclaw-migration skill walked me through the import with dry-run previews and that alone saved a chunk of time. The autonomous skill creation after complex tasks is the part I want to evaluate over a longer horizon, ask me in a month.

What didn't change:

Tool calls run my tests fine. Interpreting flaky test output is still on me.

Frontend work. All current models are mediocre at non-trivial CSS. Hermes doesn't fix that.

Truly novel architectural decisions, the agent produces something plausible, which is worse than producing nothing if you're not careful.

Failure Modes and Rough Edges

The section that made me want to write the post.

OLLAMA_BASE_URL defaults to Ollama Cloud, not local. Most common silent failure I've seen. Override to http://localhost:11434/v1 for local.
API key not picked up. Hermes reads ~/.hermes/.env at startup. Edit while running, restart the session or run hermes config check. (I keep meaning to file an issue about a hermes reload command. Haven't.)
OpenRouter routing inconsistencies. The underlying provider OpenRouter selects can change between requests. Pin a provider preference if reproducibility matters.
Ollama context default of 2048 on some models. Your model isn't dumb. Set num_ctx, verify with ollama show <model>.
WSL filesystem. File watch events on /mnt/c/... are unreliable. Workspaces on the WSL native filesystem only.
model.base_url overrides model.provider silently per the docs. A stale base_url from an earlier experiment will quietly route everything to the wrong endpoint. I did this to myself twice.
Free-tier throttling. NVIDIA's free tier will throttle. Hermes retries on 429s. You'll see a session pause for 5–30s with no obvious indicator unless you're tailing ~/.hermes/logs/.
Reasoning-heavy variants. Some Nemotron-family reasoning models produce great output 90% of the time and absolute nonsense the other 10%. Worth keeping in your config, don't make them the default.
Token cost surprises. Long agent loops consume an order of magnitude more tokens than chat sessions because every tool call result goes back in. Watch the dashboard the first few days.
Migration imports more than you might want. Default preset brings API keys over. Use --preset user-data to skip.

The defaults aren't great. Not wrong, exactly. Just the combination of "Ollama base URL pointing at Cloud, plus 2048 context, plus free-tier quota" produces a setup that works for twenty minutes and then mysteriously degrades, and you spend an evening figuring out which knob.

Troubleshooting

Quick reference for things I've actually hit:

hermes: command not found. ~/.local/bin not on PATH. Add it.
PermissionError on config write. Set HERMES_HOME to a writable path.
401 Unauthorized from NVIDIA. Key not in ~/.hermes/.env, or rotated. cat ~/.hermes/.env | grep NVIDIA.
connection refused to Ollama. Daemon not running. ollama serve, or brew services start ollama, or systemctl start ollama.
Hermes calls https://ollama.com/v1 instead of localhost. OLLAMA_BASE_URL not overridden.
Ollama model "doesn't follow instructions". Almost always the 2048-context default.
Tool calls hang forever. Provider timeout too short, or the model is in a tool-call loop. Inspect ~/.hermes/logs/.
Hermes "loses" the workspace. You're on WSL with the project on /mnt/c/....
Different answers from the same prompt. Provider-side cache, or a routing layer selecting a different upstream. Pin the provider, disable cache while debugging.
Migration wizard doesn't see OpenClaw. Wizard looks at ~/.openclaw. Symlink if elsewhere.
Sudden latency spike. Check the provider's status page. NVIDIA's hosted endpoints have been stable, mostly, but they're not magic and April 18 happened.

When in doubt, hermes doctor first. Catches more first-line problems than you'd expect.

Final Thoughts

Hermes is best for engineers who already have an opinion about how their tooling should work and want an agent that exposes its config rather than hiding it. If you want defaults that just work with no thought, OpenClaw and the more polished alternatives are friendlier on day one. And OpenClaw is genuinely the better tool if your primary surface is messaging channels rather than a terminal.

Where Hermes still needs work, in my opinion:

OLLAMA_BASE_URL defaulting to Cloud is a usability footgun. A clearer default or a louder warning on first call would help.
Configuration documentation lags the schema in places. I read source more than once.
Error messages on misconfig are sometimes cryptic. I'd take slower startup for better diagnostics.

Hybrid setups make sense right now because hosted inference is fast and capable but unreliable in ways out of your control (rate limits, quotas, occasional regressions on newly-deployed models). Local inference is reliable but capacity-constrained. Running both, routing deliberately, gives you a setup that degrades gracefully. Not a revolution. Just how the production-engineering side of any "use a service" problem has always worked. The fact that we're now doing it for inference is the new part.

I'll keep using this. If the NVIDIA endpoints change, or the Hermes config schema churns again, I'll update the post. Probably.

Appendix A — Suggested image directory layout

images/
├── hermes-doctor.png               # 'hermes doctor' diagnostic output
├── hermes-config.png               # 'hermes config' resolved settings
├── nvidia-dashboard.png            # build.nvidia.com API key management
├── nvidia-key-creation.png         # NVIDIA API key creation (optional)
├── ollama-running.png              # ollama ps / loaded model
├── ollama-pull.png                 # 'ollama pull' progress bar (optional)
├── workflow-multifile-refactor.gif # multi-file refactor session (optional)
├── workflow-repo-analysis.png      # repo analysis output (optional)
└── failure-mode-429.png            # 429 retry-with-backoff (optional)

Appendix B — `assets/commands.sh`

Snippets I keep around as quick-reference. Adjust paths and model identifiers for your setup.

#!/usr/bin/env bash
set -euo pipefail

# --- Hermes install (Linux/macOS/WSL2) ---
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Reload shell so 'hermes' is on PATH
# shellcheck disable=SC1090
source "${ZDOTDIR:-$HOME}/.zshrc" 2>/dev/null || source "$HOME/.bashrc"

# --- First-run setup (also offers OpenClaw migration if ~/.openclaw exists) ---
hermes setup

# --- Optional: explicit OpenClaw migration ---
hermes claw migrate --dry-run
# hermes claw migrate --preset user-data
# hermes claw migrate --overwrite

# --- Provider env: append to ~/.hermes/.env ---
ENV_FILE="$HOME/.hermes/.env"
mkdir -p "$HOME/.hermes"
{
  echo "NVIDIA_API_KEY=${NVIDIA_API_KEY:-replace-me}"
  echo "NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1"
  echo "OLLAMA_API_KEY=ollama"                       # any non-empty value
  echo "OLLAMA_BASE_URL=http://localhost:11434/v1"   # local Ollama, NOT Ollama Cloud
  echo "HERMES_INFERENCE_PROVIDER=nvidia"
} >> "$ENV_FILE"

# --- Ollama install + local models ---
# macOS: brew install ollama && brew services start ollama
# Linux: curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl enable --now ollama

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:14b
ollama pull deepseek-coder-v2:16b
ollama pull llama3.1:8b

# --- Sanity checks ---
hermes --version
hermes doctor || true
hermes config
ollama list
ollama ps

# --- Smoke tests ---
ollama run qwen2.5-coder:7b "print('hello from local')" || true
curl -sS "${NVIDIA_BASE_URL:-https://integrate.api.nvidia.com/v1}/models" \
  -H "Authorization: Bearer ${NVIDIA_API_KEY:-replace-me}" | head -c 400 || true

# --- Switch models from CLI ---
# hermes config set model nvidia/meta/llama-3.1-70b-instruct
# hermes config set model nvidia/qwen/qwen2.5-coder-32b-instruct
#
# Local Ollama (provider=custom + base_url; bare 'ollama/...' isn't documented):
# hermes config set model.provider custom
# hermes config set model.base_url http://localhost:11434/v1
# hermes config set model.default qwen2.5-coder:14b

That's the setup.

Building a Complete Developer Terminal Setup for Claude Code — Part 6: Dotfiles and Wrap-up

Avinash Seethalam — Sun, 26 Apr 2026 11:55:31 +0000

A setup you can't reproduce is a setup you'll eventually lose. Hard drives fail, Macs get replaced, and without a dotfiles repo everything built in this series disappears with them. This final part covers packaging everything into a maintainable dotfiles repo and wrapping up the series.

The Dotfiles Repo

A dotfiles repo is a version-controlled collection of your configuration files. The goal is simple: clone the repo on a new machine, follow the checklist, and have a fully configured environment in under an hour.

mkdir ~/dotfiles
cd ~/dotfiles
git init

Copy all configuration files in:

# Claude Code
mkdir -p .claude/hooks
cp ~/.claude/statusline.sh .claude/statusline.sh
cp ~/.claude/hooks/notify-stop.sh .claude/hooks/notify-stop.sh
cp ~/.claude/hooks/notify-permission.sh .claude/hooks/notify-permission.sh
cp ~/.claude/settings.json .claude/settings.json

# Terminal
cp ~/.tmux.conf .tmux.conf
cp ~/.zshrc .zshrc

# Starship
mkdir -p .config
cp ~/.config/starship.toml .config/starship.toml

Commit and push:

git add .
git commit -m "Initial dotfiles — Claude Code, tmux, starship, zsh"
gh repo create dotfiles --private --source=. --push

File Structure

dotfiles/
├── README.md
├── .tmux.conf
├── .zshrc
├── .config/
│   └── starship.toml
└── .claude/
    ├── settings.json
    ├── statusline.sh
    └── hooks/
        ├── notify-stop.sh
        └── notify-permission.sh

Fresh Machine Checklist

The README in the repo contains the full step-by-step setup guide. The checklist at the end covers every action in sequence:

[ ] Install Homebrew
[ ] Install core dependencies: brew install git jq node fzf
[ ] Install iTerm2 and set JetBrains Mono Nerd Font
[ ] Set terminal type to xterm-256color
[ ] Import tokyo-night color theme
[ ] Install zsh-autosuggestions and zsh-syntax-highlighting
[ ] Install fzf and run key bindings setup
[ ] Install starship and apply tokyo-night preset
[ ] Install tmux and clone tpm
[ ] Copy .tmux.conf and install plugins with Ctrl+B I
[ ] Install Claude Code
[ ] Copy .claude/settings.json, statusline.sh, and hook scripts
[ ] chmod +x the statusline and hook scripts
[ ] Add plugin marketplaces in Claude Code
[ ] Install all 9 plugins and reload
[ ] Verify plugin counts: 9 plugins · 35 skills · 18 agents · 10 hooks · 2 plugin MCP servers · 1 plugin LSP server
[ ] Verify claude-mem worker at http://localhost:37777
[ ] Test sound notifications with afplay
[ ] Test statusline with mock JSON input
[ ] Create tmux sessions and save layout with Ctrl+B Ctrl+S

What I'd Do Differently

If I were starting this setup from scratch with the knowledge I have now, I'd install claude-mem and pyright-lsp on day one. They have the highest ongoing return of anything in the stack — persistent memory and real-time type checking compound in value over time in a way that one-time tools don't.

I'd also build the statusline earlier. Flying blind on token usage and rate limits for the first weeks of Pro plan use cost me more than the hour it took to write the script.

The one thing I'd skip entirely is trying to get visual notification banners working. osascript and terminal-notifier are both unreliable on macOS Sequoia. afplay is the right answer and I should have gone there first.

The Full Stack

Layer	Tool	Purpose
Terminal	iTerm2 + tokyo-night	True color, Nerd Font support
Multiplexer	tmux + resurrect	Persistent sessions, 3-pane layout
Prompt	Starship tokyo-night	Git, Python, time at a glance
Shell	zsh + autosuggestions + fzf	Faster command entry
AI IDE	Claude Code	Primary development tool
Statusline	Custom bash script	Real-time token/cost/rate limit visibility
Hooks	`afplay` sound notifications	Async task completion awareness
Plugins	9 curated plugins	Code review, memory, type checking, workflow
Config	Private dotfiles repo	Reproducible setup in under an hour

The Series

Part 1 — The Problem and Overview
Part 2 — Custom Statusline
Part 3 — Sound Notification Hooks
Part 4 — Plugin Stack
Part 5 — Terminal Environment
Part 6 — Dotfiles and Wrap-up (this article)

All scripts and configuration files are at https://github.com/ai-with-avinash/claude-code-best-setup.

If you build on this setup or find improvements, I'd genuinely like to know — leave a comment or open a PR on the dotfiles repo.

Building a Complete Developer Terminal Setup for Claude Code — Part 5: Terminal Environment

Avinash Seethalam — Sun, 26 Apr 2026 11:48:59 +0000

The terminal environment around Claude Code matters as much as Claude Code itself. You spend hours in this environment — the less friction it has, the more thinking you can direct at the actual work.

This part covers everything outside Claude Code: iTerm2, tmux, starship, fzf, and zsh plugins.

iTerm2

Replace macOS Terminal with iTerm2. The reasons that matter for this setup specifically are true 24-bit color rendering (ANSI color gradients in the statusline render correctly), better escape code support (the blinking compaction warning needs this), and mouse support in tmux.

brew install --cask iterm2

Two settings to configure immediately after installing:

Font — install JetBrains Mono Nerd Font for starship icons and powerline segments:

brew install --cask font-jetbrains-mono-nerd-font

Then in iTerm2: Settings → Profiles → Text → Font → set to JetBrainsMono Nerd Font, size 13.

Terminal type — in iTerm2: Settings → Profiles → Terminal → Report Terminal Type → set to xterm-256color.

Color theme — download and import the tokyo-night theme:

curl -L -o ~/Downloads/tokyo-night.itermcolors "https://raw.githubusercontent.com/folke/tokyonight.nvim/main/extras/iterm/tokyonight_night.itermcolors"

Then Settings → Profiles → Colors → Color Presets → Import → select the file → apply.

tmux

tmux is a terminal multiplexer. For Claude Code development the value is twofold: persistent sessions that survive terminal restarts, and multiple panes visible simultaneously without switching tabs.

brew install tmux

My standard 3-pane layout:

┌─────────────────────┬──────────────┐
│                     │  Logs/Watch  │
│    Claude Code      ├──────────────┤
│                     │     Git      │
└─────────────────────┴──────────────┘

Claude Code runs in the large left pane. Test output or log watching runs top right. Git and manual commands run bottom right. Everything visible at once — no tab switching mid-flow.

Essential config — add to ~/.tmux.conf:

set -g mouse on              # trackpad scrolling
set -g history-limit 50000   # large scrollback buffer
set -sg escape-time 0        # no delay for escape key — important for Claude Code

Session persistence — install tmux-resurrect and tmux-continuum:

git clone https://github.com/tmux-plugins/tpm ~/.tmux/plugins/tpm

Add to ~/.tmux.conf:

set -g @plugin 'tmux-plugins/tpm'
set -g @plugin 'tmux-plugins/tmux-resurrect'
set -g @plugin 'tmux-plugins/tmux-continuum'
set -g @continuum-restore 'on'
set -g @resurrect-capture-pane-contents 'on'
run '~/.tmux/plugins/tpm/tpm'

Press Ctrl+B then I inside tmux to install plugins. Save layout with Ctrl+B then Ctrl+S. After this, closing iTerm2 and reopening it restores your exact layout — pane positions, working directories, and running processes.

Starship Prompt

The default macOS prompt tells you almost nothing. Starship shows git branch, git status, Python version, and time directly in your prompt:

ocr-eval-framework on  main [x!?] via 🐍 v3.12.9  18:43

The [x!?] git status indicators: x = staged changes, ! = unstaged modifications, ? = untracked files. At a glance you know your exact repo state without running git status.

brew install starship
starship preset tokyo-night -o ~/.config/starship.toml
echo 'eval "$(starship init zsh)"' >> ~/.zshrc
source ~/.zshrc

fzf

fzf replaces linear Ctrl+R command history search with an interactive fuzzy finder. Type any fragment of a previous command and it filters in real time across your entire history. For long evaluation commands with specific flags you ran three sessions ago, this is invaluable.

brew install fzf
$(brew --prefix)/opt/fzf/install  # say y to all three prompts
source ~/.zshrc

Three shortcuts now available:

Ctrl+R — fuzzy search command history
Ctrl+T — fuzzy search files, paste path to prompt
Alt+C — fuzzy search directories and cd into selected

zsh Plugins

Two additions that change daily terminal use immediately:

zsh-autosuggestions shows previous commands in grey as you type based on history. Right arrow to accept the suggestion. After a day of use your muscle memory adapts and you stop retyping long paths from scratch.

zsh-syntax-highlighting colors commands green (valid) or red (invalid) as you type. Catches typos before you press Enter.

brew install zsh-autosuggestions zsh-syntax-highlighting
echo 'source /opt/homebrew/share/zsh-autosuggestions/zsh-autosuggestions.zsh' >> ~/.zshrc
echo 'source /opt/homebrew/share/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh' >> ~/.zshrc
source ~/.zshrc

The Full Visual Stack

With everything configured, your terminal looks like this:

Dark navy tokyo-night background in iTerm2
JetBrains Mono Nerd Font rendering icons cleanly
tmux status bar at the bottom showing session name and time
Starship prompt showing directory, git branch, git status, Python version
Claude Code statusline above the prompt showing tokens, cost, rate limits
Grey autosuggestions completing commands as you type
Green/red syntax highlighting on every command

Every layer of information has a deliberate place. Nothing is decorative.

All configuration files are at https://github.com/ai-with-avinash/claude-code-best-setup.

← Back to Part 4 | Continue to Part 6 → Dotfiles and Wrap-up

Building a Complete Developer Terminal Setup for Claude Code — Part 4: Plugin Stack

Avinash Seethalam — Sun, 26 Apr 2026 11:46:21 +0000

By Avinash, GenAI Practice Lead | Part 1 | Part 2 | Part 3 | Part 4 of 6

Claude Code has a growing plugin ecosystem. The temptation is to install everything — more agents, more skills, more coverage. This is the wrong approach, especially on a Pro plan.

Every plugin injects instructions into your session context at startup. A plugin with 38 agents and 156 skills adds a meaningful token overhead to every single session, whether you use those skills or not. On Pro where you're paying per input token, a bloated plugin stack is a recurring tax on every conversation.

I installed, evaluated, and removed several plugins before settling on 9 that earn their place. Here's the final stack and the reasoning behind each.

The 9 Plugins

caveman — strips filler from Claude's responses. No pleasantries, no hedging, just signal. The benchmarks show ~65% output token reduction on coding tasks. In practice the savings are real and the response style actually improves for tight coding loops — you get code and decisions, not explanations you didn't ask for. Activate with /caveman, deactivate with "normal mode" before switching to documentation or client-facing writing.

claude-mem — persistent memory across sessions. This is the most impactful plugin in the stack. Without it every Claude Code session starts completely blank — no knowledge of what you built yesterday, no context on architectural decisions made last week. With it, the session opens with a compressed summary of relevant past work injected automatically. For multi-week projects this eliminates the re-establishment overhead that quietly consumes 10-15 minutes of every session. It runs a background worker on port 37777 with a web viewer for your observation history.

code-review — 5 parallel Sonnet agents reviewing your code before pushes. Covers CLAUDE.md compliance, bug detection, historical context, PR history, and code comments simultaneously. Trigger with /code-review after any meaningful change.

pr-review-toolkit — deeper review covering tests, error handling, type design, code quality, and simplification. Use this before anything that goes into a production codebase or published whitepaper. Run with /pr-review-toolkit:review-pr all.

feature-dev — three-agent workflow for new features: explore codebase → architect solution → review quality. Reserve it for moments where you'd naturally step back and think about design before writing code. Overkill for bug fixes, well-suited for new model wrappers or evaluation dimensions.

commit-commands — auto-generates meaningful commit messages from staged changes. Replaces the cognitive overhead of writing commit messages at the end of a long session when you're tired and just want to push.

context7 — pulls current SDK documentation into context automatically when you're working with external libraries. When your code references a LiteLLM function or a Boto3 call, context7 fetches the current docs rather than relying on Claude's training data which may be stale on specific SDK versions.

taches-cc-resources — a collection of workflow commands worth knowing: /create-plans for structured project planning with PLAN.md, /debug-like-expert for systematic debugging with evidence gathering, and /ask-me-questions for requirement clarification before starting a large ambiguous task.

pyright-lsp — Python language server via Pyright. Gives Claude real-time type errors, import resolution, and go-to-definition across your codebase. Without it Claude reads files reactively when something breaks. With it, it sees problems as they exist in your code continuously.

What I Removed

everything-claude-code — 38 agents, 156 skills, 72 legacy command shims. The context footprint is enormous and the coverage largely duplicates what the targeted plugins above already handle. Removed.

Installing the Stack

Add the marketplaces first — these commands run inside a Claude Code session, not in your terminal:

/plugin marketplace add anthropics/claude-code
/plugin marketplace add anthropics/claude-plugins-official
/plugin marketplace add glittercowboy/taches-cc-resources
/plugin marketplace add thedotmack/claude-mem
/plugin marketplace add JuliusBrussee/caveman

Then install:

/plugin install code-review@claude-code-plugins
/plugin install pr-review-toolkit@claude-code-plugins
/plugin install feature-dev@claude-code-plugins
/plugin install commit-commands@claude-code-plugins
/plugin install context7@claude-plugins-official
/plugin install pyright-lsp@claude-plugins-official
/plugin install taches-cc-resources@taches-cc-resources
/plugin install claude-mem@thedotmack
/plugin install caveman@caveman

Reload with /reload-plugins. Expected output: 9 plugins · 35 skills · 18 agents · 10 hooks · 2 plugin MCP servers · 1 plugin LSP server.

Quick Reference

Plugin	Trigger	When to use
caveman	`/caveman`	All coding sessions
claude-mem	Auto	Always on
code-review	`/code-review`	Before any meaningful push
pr-review-toolkit	`/pr-review-toolkit:review-pr`	Before production or published code
feature-dev	`/feature-dev`	New features requiring design thinking
commit-commands	`/commit-commands:commit`	Every commit
context7	Auto	Working with external SDKs
taches-cc-resources	`/create-plans`, `/debug-like-expert`	Planning and complex debugging
pyright-lsp	Auto	Always on for Python projects

← Back to Part 3 | Continue to Part 5 → Terminal Environment

Building a Complete Developer Terminal Setup for Claude Code — Part 3: Sound Notification Hooks

Avinash Seethalam — Sun, 26 Apr 2026 11:40:26 +0000

By Avinash, GenAI Practice Lead | Part 1 | Part 2 | Part 3 of 6

The problem is simple: Claude Code finishes a task while you're reading documentation, reviewing a PR, or staring out the window. You have no idea it's done. You check back 5 minutes later to find it waiting. Multiply this across a full workday and the lost time adds up significantly.

The solution is a sound notification. Hear the sound, look at the screen. Simple.

Getting there was less simple than I expected.

What I Tried First

osascript is the standard macOS approach for sending notifications from bash:

osascript -e 'display notification "Task completed" with title "Claude Code ✅" sound name "Hero"'

On macOS Sequoia this runs silently and does nothing. No error, no notification. Apple tightened notification sandboxing in recent versions and osascript notifications now require Script Editor to be registered in System Settings → Notifications — and on many machines it never appears there at all.

terminal-notifier is the community-standard alternative:

brew install terminal-notifier
terminal-notifier -title "Claude Code ✅" -message "Task completed" -sound "Hero"

On Apple Silicon Macs running Sequoia, terminal-notifier 2.0.0 sends no notification and produces no error. After removing Gatekeeper quarantine flags with sudo xattr -dr com.apple.quarantine, trying the .app bundle path directly, and verifying the binary location — still nothing. The package is effectively broken on modern macOS.

What Actually Works

afplay is a macOS command-line audio player. It ships with every Mac, requires zero setup, and plays system sounds reliably:

afplay /System/Library/Sounds/Hero.aiff

No notification banner. Just sound. And for the actual use case — knowing when Claude is done without watching the screen — sound alone is sufficient. You're already in the terminal when you care about the visual output.

The Three Hooks

Claude Code fires hooks on specific events. I set up two hook scripts covering three scenarios:

Task complete (Stop event) → Hero sound
The most important hook. Fires when Claude finishes responding. Deep tone, clearly distinct from system sounds.

Permission needed (permission_prompt) → Glass sound
Fires when Claude needs your approval before proceeding. Higher pitch, slightly urgent. You hear this and know you need to look at the screen and make a decision.

Awaiting input (idle_prompt) → Ping sound
Fires when Claude is waiting for your next message. Softer, lower priority.

The notify-permission.sh script reads the notification_type field from the JSON payload using jq to distinguish between permission_prompt and idle_prompt and play the appropriate sound.

Wiring Up the Hooks

In ~/.claude/settings.json:

"hooks": {
  "Stop": [
    {
      "matcher": "",
      "hooks": [{"type": "command", "command": "bash ~/.claude/hooks/notify-stop.sh"}]
    }
  ],
  "Notification": [
    {
      "matcher": "permission_prompt|idle_prompt",
      "hooks": [{"type": "command", "command": "bash ~/.claude/hooks/notify-permission.sh"}]
    }
  ]
}

Also add the hook scripts to the permissions allow list so Claude Code doesn't prompt for approval every time they run:

"permissions": {
  "allow": [
    "Bash(bash ~/.claude/hooks/notify-stop.sh)",
    "Bash(bash ~/.claude/hooks/notify-permission.sh)"
  ]
}

Hooks only activate for new sessions — restart Claude Code after updating settings.json.

Testing the Sounds

Before wiring up the hooks, verify all three sounds play on your machine:

afplay /System/Library/Sounds/Hero.aiff
afplay /System/Library/Sounds/Glass.aiff
afplay /System/Library/Sounds/Ping.aiff

If any of these don't play, check System Settings → Sound → Output volume. afplay respects system volume but is not affected by Do Not Disturb.

Both hook scripts are at https://github.com/ai-with-avinash/claude-code-best-setup.

← Back to Part 2 | Continue to Part 4 → Plugin Stack

Building a Complete Developer Terminal Setup for Claude Code — Part 2: Custom Statusline

Avinash Seethalam — Sun, 26 Apr 2026 10:31:31 +0000

Building a Complete Developer Terminal Setup for Claude Code — Part 2: Custom Statusline

By Avinash, GenAI Practice Lead | Part 1 | Part 2 of 6

⚠️ Note: This setup is macOS-specific. All tools, commands, and configurations in this series are tested on macOS (Apple Silicon). Linux and Windows users will need to adapt certain steps, particularly around afplay, brew, iTerm2, and system font installation.

Claude Code supports a statusLine configuration that pipes a live JSON object to a bash script on every update. Most developers ignore this. I spent time building it out properly and it's now the most-glanced piece of my development environment.

Here's what my statusline shows in a real session:

Claude Sonnet 4.6 [Pro] | ⎇ feature/docling-eval | in:42k out:8k | ctx 61% | $0.0284 | +142/-38 | 5h 31% ⏱ 3h12m | 7d 18% ⏱ 4d6h

Each segment is deliberate. Let me walk through them.

What Each Segment Shows

Model name — Claude Sonnet 4.6 [Pro]. Useful when switching between Sonnet and Opus mid-project. You always know what you're paying for.

Git branch — ⎇ feature/docling-eval. The script uses workspace.current_dir from the JSON payload for accuracy in worktree setups, not the shell's current directory.

Token usage — in:42k out:8k. Input and output tokens in compact k format. Abbreviated above 1000, raw below — so early in a session you see in:340 out:89 and it flips to in:1k naturally as it grows.

Context % — ctx 61% with color thresholds tuned for Pro plan. Green below 50%, yellow at 50%, orange at 60%, red at 75%. These are tighter than defaults because on Pro every token in a compacted context gets re-billed.

Cost — $0.0284 to 4 decimal places. Three decimal places rounds sub-cent sessions in a way that loses signal. At 4dp you can see the actual cost of a session clearly.

Lines changed — +142/-38. Only appears when there are actual file edits. Stays clean during pure Q&A work.

5-hour rate limit — 5h 31% ⏱ 3h12m. Usage percentage and countdown to reset. On Pro this is what tells you whether to push through a task or wrap up cleanly.

7-day rate limit — 7d 18% ⏱ 4d6h. The weekly ceiling is the one that bites during multi-day project sprints. Knowing you're at 18% on Wednesday is actionable in a way that daily usage alone isn't.

Compaction warning — at 75% context the display shows a blinking ⚠ COMPACT. This fires earlier than the default because on Pro, hitting auto-compaction mid-task reruns your entire context through the API.

The Implementation

The script is pure bash with jq as the only dependency. jq is a command-line JSON parser — install it with brew install jq.

Wire it up in ~/.claude/settings.json:

{
  "statusLine": {
    "type": "command",
    "command": "~/.claude/statusline.sh"
  }
}

The script reads from stdin (input=$(cat)), extracts fields with jq, applies color logic with ANSI escape codes, and outputs a single line. The JSON payload contains everything — model info, context window state, cost, rate limits, and workspace path.

Testing Without a Live Session

You don't need to start a Claude Code session to test the script. Pipe mock JSON directly:

echo '{"model":{"display_name":"Claude Sonnet 4.6"},"workspace":{"current_dir":"/your/project"},"context_window":{"total_input_tokens":12000,"total_output_tokens":3000,"used_percentage":45},"cost":{"total_cost_usd":0.0084,"total_lines_added":42,"total_lines_removed":7},"rate_limits":{"five_hour":{"used_percentage":38,"resets_at":'"$(( $(date +%s) + 7200 ))"'},"seven_day":{"used_percentage":22,"resets_at":'"$(( $(date +%s) + 345600 ))"'}}}' | ~/.claude/statusline.sh

The reset timestamps are computed from now + seconds so the countdown shows real numbers.

One Tradeoff Worth Knowing

The used_percentage field is calculated from input tokens only — it does not include output tokens. So the context % reflects input-side pressure, which is the more meaningful signal for context window exhaustion, but it may differ slightly from what /context reports. The script displays raw input and output token counts separately precisely for this reason.

The full script is at https://github.com/ai-with-avinash/claude-code-best-setup.

← Back to Part 1 | Continue to Part 3 → Sound Notification Hooks

Building a Complete Developer Terminal Setup for Claude Code — Part 1: The Problem

Avinash Seethalam — Sun, 26 Apr 2026 10:12:40 +0000

By Avinash, GenAI Practice Lead

⚠️ Note: This setup is macOS-specific. All tools, commands, and configurations in this series are tested on macOS (Apple Silicon). Linux and Windows users will need to adapt certain steps, particularly around afplay, brew, iTerm2, and system font installation.

I lead a GenAI practice and manage a broad portfolio of active AI projects. A significant part of my day is spent inside Claude Code — building model evaluation frameworks, debugging pipelines, writing architecture documents, and reviewing code with my team. Claude Code is genuinely powerful, but after weeks of daily use I kept running into the same friction points.

Sessions started completely blind every time. No memory of what we built yesterday, no context on decisions made last week. I'd spend the first 10 minutes of every session re-establishing context that Claude had already processed the day before.

There was no visibility into token usage or cost while working. I'd hit a rate limit mid-task with no warning, losing momentum at the worst possible moment. On a Pro plan where every token has a cost, flying blind is expensive.

Claude finishing a task while I was context-switched elsewhere meant I'd come back 5 minutes later to find it waiting. No notification, no signal — just a blinking cursor.

And the terminal itself was a plain, low-information environment. No git context, no Python version, no time — just a prompt.

These aren't complaints about Claude Code. They're gaps in the surrounding environment that any developer can close with the right setup. So I spent a day closing them.

What I Built

Over a focused session I assembled a complete terminal environment optimised specifically for Claude Code development on macOS. Here's the full stack:

Claude Code layer:

Custom bash statusline showing model, git branch, token usage, cost, context %, and rate limit countdowns
Sound notification hooks using macOS afplay for task completion, permission requests, and idle state
9 curated plugins covering code review, persistent memory, Python type checking, and workflow automation

Terminal layer:

iTerm2 with tokyo-night color theme and JetBrains Mono Nerd Font
tmux with a 3-pane layout and session persistence across restarts
Starship prompt with tokyo-night preset showing git status and Python version
fzf for fuzzy command history search
zsh-autosuggestions and zsh-syntax-highlighting

Everything is committed to a dotfiles repo at https://github.com/ai-with-avinash/claude-code-best-setup with a fresh machine checklist so the entire setup can be reproduced on a new Mac in under an hour.

The Series

This is Part 1 of 6. Each subsequent article covers one layer of the setup in detail:

Part 2 — Custom Statusline: real-time token, cost, and rate limit visibility
Part 3 — Sound Notification Hooks: knowing when Claude is done without watching the screen
Part 4 — Plugin Stack: 9 plugins that earn their place and what I removed
Part 5 — Terminal Environment: iTerm2, tmux, starship, fzf, and zsh
Part 6 — Dotfiles Repo: packaging everything for reproducibility

Each part is self-contained — you can read them in any order depending on what's most relevant to your setup. But if you're starting fresh, the sequence matters and Part 2 is where I'd begin.

Continue to Part 2 → Custom Statusline

DEV Community: Avinash Seethalam

Running Hermes Agent with NVIDIA-Hosted Models and Local Ollama

Why I bothered

Why Hermes Over an OpenClaw-Style Workflow

Installation

macOS

Ubuntu / Linux

Windows + WSL2

First-Run Setup

NVIDIA Model Configuration

Getting a key

Wiring it into Hermes

YAML equivalent

Operational notes

Ollama Configuration

Install

Pulling models

Wiring local Ollama into Hermes

Operational notes

Recommended Model Setup

Real Workflow Improvements

Failure Modes and Rough Edges

Troubleshooting

Final Thoughts

Appendix A — Suggested image directory layout

Appendix B — assets/commands.sh

Building a Complete Developer Terminal Setup for Claude Code — Part 6: Dotfiles and Wrap-up

The Dotfiles Repo

File Structure

Fresh Machine Checklist

What I'd Do Differently

The Full Stack

The Series

Building a Complete Developer Terminal Setup for Claude Code — Part 5: Terminal Environment

iTerm2

tmux

Starship Prompt

fzf

zsh Plugins

The Full Visual Stack

Building a Complete Developer Terminal Setup for Claude Code — Part 4: Plugin Stack

The 9 Plugins

What I Removed

Installing the Stack

Quick Reference

Building a Complete Developer Terminal Setup for Claude Code — Part 3: Sound Notification Hooks

What I Tried First

What Actually Works

The Three Hooks

Wiring Up the Hooks

Testing the Sounds

Building a Complete Developer Terminal Setup for Claude Code — Part 2: Custom Statusline

Building a Complete Developer Terminal Setup for Claude Code — Part 2: Custom Statusline

What Each Segment Shows

The Implementation

Testing Without a Live Session

One Tradeoff Worth Knowing

Building a Complete Developer Terminal Setup for Claude Code — Part 1: The Problem

What I Built

The Series

Appendix B — `assets/commands.sh`