DEV Community: Elliott

Protect Yourself, Mesh Yourself

Elliott — Fri, 03 Jul 2026 13:45:00 +0000

In my last post, my SSH keys moved off disk and into 1Password. This one is about the network those keys travel over, and the foundation: Tailscale.

Here's the before, and it wasn't pretty. A homelab box behind my router. A VPS somewhere I'm still paying for. My laptop on some coffee shop WiFi, my phone on cellular, and a couple of machines stuck behind a CGNAT with no real public IP at all. Getting any of them to talk to each other meant port forwarding, firewall rules, a dynamic DNS record I'd always forget to update, and a spiral-bound notebook consisting of every IP address in my house. Reaching my homelab from outside the house was a small side quest every single time.

Tailscale deleted all of it. Every device I own now sits on one flat private network, addressable by name and reachable from anywhere, from my closet to the far side of the planet.

What it actually is

Tailscale is a mesh VPN built on WireGuard. You install the client on each device, sign in with an identity provider (Google, GitHub, whatever you already use), and the device joins your private network, your tailnet. It picks up a stable 100.x address and a name through MagicDNS, so instead of memorizing IP addresses I type the hostname like a civilized person.

The part that still feels like magic is that it punches through NAT on its own. No port forwarding, no holes poked in a firewall. Tailscale's coordination server brokers the key exchange, then your devices connect directly where they can and fall back to an encrypted relay when they can't. Traffic is end-to-end encrypted the whole way. For personal use, it's free, and not in a stingy way.

SSH without managing keys

The first post was all about hoarding SSH keys in 1Password. On my own Tailscale, I mostly don't need them.

Flip on Tailscale SSH for a machine:

tailscale up --ssh

Now ssh homelab works, and Tailscale handles the auth using the same identity that got the box onto the network in the first place. No public key to copy into authorized_keys, nothing to rotate when I add a machine, no key file to lose. Who's allowed to SSH where is a few lines in the Tailscale access policy instead of a graveyard of ~/.ssh/config entries. For anything I want to treat as sensitive, check mode forces a re-auth before the session opens.

So the two tools split the job cleanly. Inside the tailnet, Tailscale is my SSH auth. Anywhere outside it, the 1Password agent still holds the keys. No overlap, no turf war.

Serving apps to an audience of one

A big chunk of my homelab is little self-hosted web apps and dashboards I want to reach from my phone or laptop wherever I happen to be, with precisely nobody else able to load them.

tailscale serve does that in one line:

tailscale serve 3000

That grabs whatever's running on localhost:3000 and publishes it at a stable HTTPS URL on my tailnet, something like notes.my-tailnet.ts.net, real TLS cert and all. Only my devices can open it. Nothing faces the public internet, so there's no login page to bolt on and no attack surface pointed at the world. Add --bg and it survives a reboot.

The only concern is that turning on HTTPS certs publishes your machine names to the public Certificate Transparency log. I just went with the random Tailscale name they gave me.

If I ever want something out on the open internet, tailscale funnel is the same trick pointed outward. I reach for it about once a year, but it's there.

Private egress with Mullvad exit nodes

Tailscale also has a Mullvad add-on, and it's the tidiest way I've found to push my traffic through a real privacy VPN. Five bucks a month gets you Mullvad's whole server fleet as exit nodes you pick right from the Tailscale client. Select one and my outbound traffic leaves through Mullvad instead of my home connection. No second Mullvad app running in the tray, and it lands on the same Tailscale bill.

HOWEVER: This is not the anonymity you'd get walking cash up to Mullvad with a random account number. Tailscale is identity-aware by design (that's part of the whole trick that lets your devices recognize each other), so it knows exactly who you are, even though Mullvad doesn't. The traffic to Mullvad is end-to-end encrypted, but if you require real anonymity, buy Mullvad the anonymous way. For my everyday "don't hand my browsing to my ISP or the coffee shop," it's great.

What this lets me build

The real reason I lean on Tailscale is what it lets me stack on top of it. I run a little gateway called Aperture that all my apps and agents point at instead of hitting model providers directly. It listens only on the tailnet, pulls its keys from 1Password, and sends its traffic out through Mullvad. One private endpoint, one place to rotate keys and watch spend.

It routes all my agent calls through one unified mem-0 layer, serving all my agents from my phone, laptop, desktop, and my Hermes instance, so they all share the same context.

It allows me to run isolated sandbox environments without having to have port 22 open!!! (Not that huge of a brag but I still wanted to share)

That's the whole next post, so I'll leave it there. The point for now: none of it works without a private network that every device can reach, which is the boring job Tailscale does without me ever having to think about it.

The catch

Fair warning, Tailscale's coordination server is a hosted service you don't run. Your traffic is end-to-end encrypted and never touches it, but the thing that brokers connections and holds your network policy is Tailscale's, not yours. Headscale is an open-source implementation of that control server you can self-host, and the normal clients talk to it happily. On the to-do list, to migrate over to Headscale, but I keep seeming to add more projects to my list.

Keep yourself secure

Install it on two machines and run tailscale up --ssh on one. Then ssh into it by name from the other, from any network, no keys and no port forwarding. That's the moment it clicks, and everything else here is built on that one trick.

Set it up once, and you stop thinking about your network, which is about the highest praise I can give a piece of infrastructure.

The most useful tool in my dev setup is a password manager

Elliott — Thu, 02 Jul 2026 22:26:58 +0000

I still remember sending API keys to my friend across Discord during hackathons. For years, everything sensitive I owned lived in a file. SSH keys in ~/.ssh, API tokens in .env, and the odd password in Notepad. That works right up until you set up a new laptop, or open a project and can't tell which .env file holds the live key.

1Password is what replaced all of that; it's the tool from this series I'd least want to give up. The passwords are almost beside the point. What changed how I work is how a developer-first password manager fixes the problems in your workflow you never see.

This is the first post in a short series on the tools I lean on, and one idea runs through all of them: stay fluid across a pile of machines and operating systems, and keep it all under my control.

What "developer-first" actually means

A normal password manager stores logins, autofills them in your browser, and ends there. A developer-first one handles everything beyond that: SSH keys, API tokens, database URLs, and .env contents. It gives you a CLI to inject those into your shell at runtime, and it can act as an SSH agent so your keys never sit on disk at all.

Here's how I actually use it.

SSH keys live in the vault, not on disk

You generate or import an SSH key as a vault item. 1Password runs an SSH agent, and you point your SSH client at its socket. On my Arch laptop, that's one block in ~/.ssh/config:

Host *
    IdentityAgent ~/.1password/agent.sock

After that, ssh homelab or git push just works whenever I unlock the vault. The private key is never a file in ~/.ssh. Set up a new machine, and there's no key to copy over: sign in, and the agent already has every key you own.

Unlocking is as simple as Face ID or a master password. Either way, nothing on the machine holds the keys at rest.

Secrets get injected at runtime with `op run`

The problem this solves is the plaintext .env file. Instead of real values, my env files hold references that point into the vault:

# .env
DATABASE_URL=op://Homelab/postgres/connection_string
STRIPE_KEY=op://Dev/stripe/test_key
DISCORD_TOKEN=op://Dev/mybot/token

Then I launch the app through op run:

op run --env-file=.env -- go run ./cmd/server

op resolves each op://vault/item/field reference at launch, sets the real values as environment variables for that subprocess only, and they vanish when the process exits. The references are just pointers, so the file is safe to commit, although between you and me, I might still gitignore it. Whether that's out of habit or fear, I don't know.

Signed commits without babysitting a GPG key

Since the SSH key already lives in the vault, you can point Git at it for SSH commit signing:

git config --global gpg.format ssh
git config --global user.signingkey "ssh-ed25519 AAAA...you@host"
git config --global commit.gpgsign true

Every signed commit prompts for verification, and there's no GPG key to manage or lose. If you've never turned on commit signing because it's a hassle, this turns it into three shell commands.

Keys off disk is a real security win

There's a security payoff on top of the convenience. Several of the npm supply chain compromises over the past year worked the same way: a malicious package runs a post-install script that greps your home directory for credentials in clear text (~/.ssh, ~/.aws, ~/.config) and exfiltrates whatever it finds.

If your private keys and cloud creds live in an encrypted vault behind a biometric unlock, that script finds nothing. It doesn't make you immune, of course: a running unlocked agent can still be asked to sign things, and malware inside your processes is still bad news. But it deletes the easiest vector of attack.

There's a newer version of the same problem, and the same fix covers it. AI coding agents cat your .env constantly, usually just to understand the project, and every one of those reads is a chance for a live key to end up in a prompt, a log, or a model provider's servers. Because my .env holds op:// references instead of real values, the worst an agent finds there is a pointer. It's useless without the vault, so the file can leak, and the keys stay put. The secret it went looking for was never in the file.

Should you use 1Password? What about Bitwarden?

Now, I'm not trying to float 1Password as the holy grail. I use it every day, and honestly, Bitwarden might be the better call for you. For a lot of people, it is.

Both now cover the core developer workflow. That's newer than you might expect. Bitwarden only shipped a built-in SSH agent in early 2025. Before that, storing SSH keys in Bitwarden meant a community helper script. Now it's a first-class feature, it works well, and it even runs on self-hosted Vaultwarden with a feature flag. If you'd written Bitwarden off for missing things like this, that's outdated.

What keeps me on 1Password

op run and op:// references are part of the base product. Bitwarden's runtime secret-injection equivalent lives in Bitwarden Secrets Manager, a separate product with its own pricing. Pulling a project's secrets into a local dev run is one less thing to think about, and I'm not paying for or running a second product to do it.
The developer tooling is older and more finished: official SDKs for Go, Python, and JavaScript, service accounts for CI, and commit signing that works on the first try.
Biometric unlock everywhere, and it's quick to set up across Windows, Linux, iOS, and supposedly macOS. (Anyone want to donate their M5 Pro?)

Where Bitwarden is the better pick

It's open source, and you can self-host the whole stack (Vaultwarden is a lightweight variant that speaks the same protocol). If "I want to own the box my secrets sit on" is a hard requirement, this is basically the answer you've got.
The free tier is very usable, and premium runs about $10/year. 1Password has no free-standing tier and is subscription-only, though students can get a free year through the GitHub Student Developer Pack. For a non-student hobbyist, the price difference is real.
Its SSH agent is free. 1Password's sits behind the paywall.

Why I didn't self-host it

I run a homelab and will yell into the void about avoiding vendor lock-in, so the purist move here is obvious: self-host. Bitwarden's server is open source, or you can run Vaultwarden and own the whole thing. I looked at that and passed, for two reasons that both come back to the core of this post.

The first is uptime. The vault is the one service I can't afford to have go down, because it's the thing I need to log into everything else. My homelab is solid right up until a bad update or a power outage, and then I can't get into anything. Having the vault be someone else's uptime problem is worth every penny I spend.

The second is attack surface. Moving it to a VPS just swaps one box for another: now I've got a public-facing server to patch and lock down, which is the opposite of what a post about getting secrets off disk should recommend. Every service you self-host is a service you have to defend.

The cost argument that usually points people at Bitwarden doesn't apply to me right now either, because 1Password is free for a year through the GitHub Student Pack. So the math was simple: free and hosted by people whose whole job is keeping it up, versus a box I'd have to babysit.

The lock-in worry is still there, just bounded. The vault exports, and the developer-relevant bits (op:// references) are plain text. If 1Password stops being worth it, or my free year runs out and I want to swap, moving off is a couple of grep commands. If your risk tolerance runs differently, Bitwarden is a completely reasonable place to land. The brand matters less than the habit: get keys and secrets out of files.

If you do one thing

If you do one thing, make it this: move your keys into either manager and turn on the agent. It's the change you notice immediately, the first time you sign into a fresh machine and everything's already there. The op run workflow is the bigger win once you're set up, but the keys are what sell it. Give it a week and see if you'd go back.

Next up is the network all of this runs over. Getting my SSH keys onto a new machine is only half the trick. Tailscale is the other half.

Your AI slop bores me

Elliott — Thu, 04 Jun 2026 17:51:28 +0000

Using AI is fine; I use it daily. Posting the raw output without reading it first is the tell.

The em dashes in every other sentence. The emoji bullets in a README. A stock-feeling header image that gestures at your topic without saying anything specific about it. That opener about today's fast-paced landscape. Wikipedia maintains a public list of these patterns ("Signs of AI writing"), and a solid chunk of my feed is speedrunning it.

Here's what those patterns tell me. You saw a generated draft and posted it. You read 'leverage synergies' and hit publish. The writing isn't yours, and it looks like you never noticed.

If that's how you treat a post, I have to assume it's how you treat a pull request.

You skipped the whole job. The model hands you a draft. Your value is what you add on top: catching the bug it was confident about, throwing out the approach that looks clean but is quietly wrong.

Relay the output as is, and you've made yourself a hackathon GPT wrapper.

When Claude hands me a function, I rename its variables and hunt the edge case it missed. Then I cut half its comments.

You're the editor between the draft and the publish button.
Put the effort back in.

Anyway, enjoy the sunrise I did in Microsoft paint (no AI).

Drop the worst AI tell you've seen in comments; I'm collecting them for a Claude skill.

Skills are Prompts. Here's how Hermes Apprentice turns them into weights

Elliott — Fri, 29 May 2026 21:29:54 +0000

This is a submission for the Hermes Agent Challenge

What I Built

It's 2 AM and Telegram lights up:

[gc-7f3a] Graduation candidate: "SKU extraction" — 14 examples, agreement 91%.
Reply train gc-7f3a to start training, skip gc-7f3a to dismiss.

You're half-asleep. You reply train gc-7f3a and put the phone down. Forty
minutes later you check Grafana, and the orange line (upstream tokens per
hour for this Hermes agent) has bent downward. A green line marked
specialist routed requests has stepped into its place. The next ten
thousand SKU-extraction requests cost nothing.

Skills are prompts. Apprentice turns some of them into weights.

Hermes Agent already ships an answer for the first ten patterns you want
it to handle: a Markdown skill file with a YAML frontmatter, dropped into
~/.hermes/skills/<name>/SKILL.md. The agent's LLM-judged selector picks
the right skill per request, and you're done. This works for the first
ten patterns. It strains at twenty. By thirty you spend more time editing
SKILL.md files than writing features, and the model is still paying full
upstream cost on tasks it has now seen a thousand times.

The obvious answer is to fine-tune. The unobvious cost is the
infrastructure: pair extraction from session history, PII redaction, a
baseline runner, a promotion gate, a versioned registry, a router that
decides per request whether to use the specialist or fall back to the
big model, a canary that rolls the new specialist out safely, and some
way to find out when any of this breaks. Most teams won't build it.

Apprentice is that infrastructure, packaged as a tool you install
alongside Hermes. It observes Hermes' SQLite session database, clusters
recurring patterns with a small embedding model, fires a Telegram
graduation message when a pattern matures, kicks off an Unsloth QLoRA
training run on your local GPU (or on RunPod), and runs the trained
model through a held-out validation gate. If the specialist beats a
baseline, the proxy starts serving it. Future matching requests get
routed to that local specialist. Misses fall through to OpenRouter.

The v0.2 surface organizes into two groups. On the rollout side, new
specialists begin at 5% traffic and auto-advance through 15, 25, 50, and
100 percent as their shadow-comparison agreement with the upstream model
stays above threshold; a drop triggers auto-demote and quarantine. The
trainer accepts three base models out of the box (Qwen2.5-1.5B as
default, Qwen2.5-3B, Llama-3.2-3B), chosen per pattern from a
user-editable trainer/supported_models.yaml. Two related specialists
can be collapsed into one via an MCP-proposed merge that requires
Telegram approval and survives a regression gate against both parent
baselines.

On the operations side, the proxy authenticates per request via
X-Apprentice-Tenant plus an API key header, applies a per-tenant
token-bucket rate limit, and tracks quotas; global patterns remain
visible to every tenant. A monthly budget posts Telegram alerts at 80%,
95%, and 100%, with budget increase 10 as the recovery path. When the
local GPU is busy, the orchestrator spills training to RunPod A100,
A6000, or L40S spot instances, gated by the same budget. Grafana shows
eight panels covering request rate, latency p50/p95/p99, error rate,
cost saved, top patterns, specialist-vs-upstream latency, status, and
24-hour counters. OpenRouter handles upstream traffic, with Fireworks,
MiniMax, and Together as fallback tiers.

These all live in real, named modules of the repo. They aren't roadmap
promises in a slide deck.

Demo

One command brings the whole loop up against a seeded fixture:

bash scripts/demo-run.sh

That script seeds a synthetic Hermes session log, runs the detector,
graduates a pattern, executes the full pipeline (dataset-builder,
trainer, merge, validate, promote), starts the serving and proxy, sends
a test request that matches the new specialist, and prints the Grafana
dashboard URL with a summary table at the end. The whole thing finishes
in well under an hour on a 2080 Ti.

The Grafana view that matters most during the demo is the cost-saved
panel. The orange "upstream tokens" line goes down while the green
"specialist routed requests" line goes up. The latency panel shows
specialist inference settling around 38 ms p50 and 85 ms p95, well

Code

Repo: github.com/eschmechel/hermes-apprentice
License: Apache 2.0

The project splits into small modules. Go handles the hot path (observer,
detector, dataset-builder, proxy, registry, burst). Python handles the ML
and orchestration (trainer, validator, serving, orchestrator, telegram,
installer):

hermes-apprentice/
├── observer/             — Go    Tails ~/.hermes/state.db, normalises pairs
├── detector/             — Go    BGE-small ONNX → HDBSCAN → candidate patterns
├── dataset-builder/      — Go    Fetches pairs, redacts PII, splits 80/10/10
├── trainer/              — Py    Unsloth QLoRA + manifest signer + multi-base-model
├── validator/            — Py    Baseline runner + promotion gate + registry
├── serving/              — Py    vLLM HTTP server + residency control plane
├── proxy/                — Go    OpenAI-compat router with canary/tenants/aliases
├── registry-service/     — Go    Read-only HTTP over ~/.apprentice/registry/
├── orchestrator/         — Py    Autonomous pipeline driver + MCP tools + budget
├── telegram/             — Py    Templates + outbox + getUpdates reply poller
├── installer/            — Py    Interactive setup: detect host, build venvs + Go
├── burst/                — Go    RunPod A100 spot dispatcher (signed jobs)
└── deploy/               — YAML  Docker compose, Grafana dashboards, Prometheus

The interactive installer is the intended entry point. It detects your
host's GPU, KVM, Docker, and uv state, recommends an isolation profile,
walks you through Telegram and OpenRouter credentials, picks a base
model, sets a monthly cloud budget, and emits cron lines for your
scheduler:

apprentice-setup --apply
apprentice-setup --apply --profile docker     # if you'd rather not run Firecracker
bash scripts/demo-run.sh                       # end-to-end smoke test

All settings persist in ~/.apprentice/.env. Re-running the installer
only updates what you provide.

My Tech Stack

Languages: Go 1.26 (proxy, observer, detector, dataset-builder, registry, burst), Python 3.10+ (trainer, validator, serving, orchestrator, telegram, installer).
Base model: Qwen2.5-1.5B-Instruct (Apache 2.0). Fits 11 GB of VRAM for QLoRA training and fp16 serving on the same card. Qwen2.5-3B and Llama-3.2-3B are configured alternates.
Training: Unsloth QLoRA, 4-bit base plus LoRA rank 16, sized per GPU via trainer/profiles/profile_*.yaml.
Serving: vLLM 0.21 with --enable-lora --max-loras 4. Multiple specialists share one warm base model; adapters hot-swap in for about 18 MB of extra VRAM each.
Routing: BGE-small (Apache 2.0) via ONNX runtime, 384-dimensional L2-normalized embeddings, cosine match against per-pattern centroids.
Privacy: Microsoft Presidio sidecar for PII redaction in dataset-builder. Secrets scanner runs pre-train. Per-pattern data cards capture provenance.
Observability: Prometheus scrape against the proxy's /metrics, Grafana dashboards in deploy/docker/compose.monitoring.yml.
Cloud burst: RunPod A100 spot instances, dispatched by signed jobs from burst/. Budget-gated.
Upstream: OpenRouter primary, multi-provider fallback chain (Fireworks, MiniMax, Together).
Isolation: Firecracker microVM is the default for the Hermes process; Docker Compose is the portable alternative.
Control plane: MCP server in the orchestrator exposes dispatch_training, propose_merge, cost_summary, roi, demote, and the budget tools.
Operator UX: Telegram for graduation approval, merge approval, and budget increases. Apprentice rides Hermes' own Telegram adapter rather than running a separate bot process on the host.

How I Used Hermes Agent

Apprentice would not work the same way against any other agent runtime.
That's the point of building it for this challenge. Hermes' substrate is
what we built on: a SQLite session database, a Markdown skill registry,
no_agent cron jobs, and an existing Telegram adapter.

The session DB is the input

Hermes writes every chat to ~/.hermes/state.db, a SQLite database in
WAL mode. The schema is straightforward: a sessions table with id,
source, model, system prompt, and token counts; a messages table with
role, content, tool_calls, and timestamps; an FTS5 virtual table that
makes full-text search a single query away. Apprentice's observer
(Go) tails this database and normalises each session into clean
(user-input, big-model-output) pairs. There are no Hermes patches
required, no schema migrations, and no fork of the agent. The observer
reads what Hermes already writes.

The detector (Go, BGE-small via ONNX) ingests the pair stream from the
observer, computes 384-dimensional embeddings on the user side of each
pair, and clusters them with HDBSCAN. When a cluster crosses a sample
threshold and shows consistent upstream-response shape, it becomes a
graduation candidate, a row in the orchestrator's job state.

Hermes skills are the output

When a specialist passes the promotion gate, the validator writes a
Markdown skill file to ~/.hermes/skills/<pattern-id>/SKILL.md. Under
the Firecracker profile, that file is scp'd into the microVM, and Hermes
picks it up in its skill registry on the next /reload-skills or
session start. This does two things at once. It tells Hermes'
LLM-judged selector that the pattern exists (so it shows up in
hermes skills list), and it points the proxy at the right adapter
via the pattern id stored in the skill's frontmatter. Routing itself
happens deterministically in the proxy, via cosine match on the
embedding, not in the LLM selector. The SKILL.md exists for ecosystem
visibility; the centroid exists for correctness.

`hermes cron no_agent` jobs are the heartbeat

The autonomous side of Apprentice runs as hermes cron --no-agent jobs
registered inside the Hermes microVM:

ssh root@GUEST 'hermes cron create --name apprentice-telegram --no-agent \
    --script apprentice-telegram-dispatch.sh --deliver telegram "every 5m"'
ssh root@GUEST 'hermes cron create --name apprentice-poll-replies --no-agent \
    --script apprentice-telegram-poll.sh "every 1m"'

no_agent mode matters: we do not want Hermes' LLM to interpret these
crons. They are shell scripts that run on a schedule and exit. The
dispatch script flushes the outbox of graduation notifications, merge
proposals, and budget alerts. The poll script reads Telegram replies
through Hermes' getUpdates adapter and turns train gc-7f3a into a
structured job request for the orchestrator. The orchestrator's
watcher.tick is a third cron job that reads pending requests and runs
the pipeline.

This kept the Apprentice process model small. We did not have to run
python-telegram-bot on the host or stand up a separate webhook server;
every operator-facing piece of the loop rides infrastructure Hermes
already exposes.

A graduation, end to end

Concrete paths for one full loop:

~/.hermes/state.db. Hermes writes a new session for a prompt that asked, in essence, "extract SKU and quantity from this email."
observer. Tails the DB, normalises the chat into a pair, ships it to the detector.
detector. Embeds the user side (BGE-small ONNX, about 2 ms), clusters via HDBSCAN. After the 14th pair, the cluster crosses threshold.
orchestrator. Creates a graduation candidate and enqueues a Telegram notification with id gc-7f3a.
Telegram, via Hermes cron. Your phone buzzes at 2 AM. You reply train gc-7f3a. The poll cron picks the reply up within the minute.
dataset-builder. Fetches all 14 pairs, runs them through Presidio for PII redaction, applies quality filters and fuzzy dedup, and splits 80/10/10. Roughly 30 seconds for about a thousand records on the demo profile.
apprentice-trainer. Unsloth QLoRA on Qwen2.5-1.5B-Instruct, rank
1. About 25 minutes on a 2080 Ti, or about 45 minutes on a RunPod A100 spot instance. Output is an 18 MB adapter.
Merge to fp16. save_pretrained_merged is the right path here. It avoids the Unsloth and vLLM tokenizer drift that bites adapter hot-swap.
apprentice-validate. Runs the merged model against the held-out 10% test set and a baseline model on the same prompts. The promotion gate requires the specialist beat baseline by at least 10 percentage points on F1; anything less is a failure report rather than a promotion.
Registry promote. The manifest is signed with the Ed25519 trainer key, and the SKILL.md is pushed to the microVM.
Canary ramp. The proxy starts routing 5% of matching requests to the new specialist while shadow-comparing every routed turn against the upstream model. Above 80% agreement the ramp auto-advances; below, it auto-demotes to "broken" and alerts you.
Live. Once at 100%, all matching requests stay local. The specialist serves at roughly 38 ms p50 and 85 ms p95.

The numbers

All measured on a single 2080 Ti, per docs/benchmarks.md:

Stage	Latency / size
Embedding (BGE-small ONNX)	~2 ms
Cosine match against 100 centroids	<0.1 ms
Specialist inference (Qwen2.5-1.5B fp16)	p50 ~38 ms, p95 ~85 ms, p99 ~150 ms
LoRA adapter on disk	~18 MB
Adapter VRAM cost on a warm base	+18 MB
Training (60 steps, QLoRA r=16)	~25 minutes
Throughput	~120 tokens/sec

An end-to-end routed turn lands at roughly 40 ms p50 (2 ms embed plus
38 ms inference), and around 166 ms at p99 when the long tail of
inference hits. The upstream OpenRouter round-trip for the same prompt
is multiple seconds, and it costs real money per token.

The promotion gate's design floor is a 10-point F1 delta versus
baseline. Specialists that fail to clear the un-tuned base model never
leave the validator. The ROI ledger tracks training cost in
GPU-seconds (plus any teacher tokens) against the cumulative dollars
saved by routing matched requests locally instead of upstream.
Break-even arrives when the saved side passes the spent side,
typically within hours for any high-volume pattern.

Why this only works because of Hermes

In principle we could have pointed Apprentice at any
OpenAI-compatible upstream, but each piece of the loop borrows
something specific from Hermes. The session DB is what makes pair
extraction free. The skill registry is the deployment surface the rest
of the Hermes ecosystem already understands. no_agent cron is the
heartbeat we did not have to invent. The Telegram adapter is the
operator UX we did not have to stand up. From the outside, Apprentice
ends up looking like a feature of Hermes, because that's how it was
built.

The v0.3 work continues in the same direction: multimodal pattern
detection against vision skills, federated training across tenants on
a shared registry, and a deeper canary with full %-ramp and A/B
multi-LoRA comparison. All of it sits on top of Hermes rather than
next to it.

Building my Portfolio Website: Lessons Learned

Elliott — Thu, 19 Jun 2025 17:28:42 +0000

Welcome to my first blog post! In this article, I’ll share my journey building this portfolio website, the challenges I faced, and the tools I used along the way.

As someone with minimal interest in web development and grudgingly powering through my college’s Intro to Web Programming course, I knew sooner rather than later I needed to create a portfolio. With just one week of experience with HTML in a high school marketing course, I decided to go all in.

Part 1: Planning

I spent days researching different formats and libraries to create your own website. The counterintuitive comparisons from people praising React to those dismissing Angular. I tried to get a better understanding by watching YouTube How-To’s but quickly felt like I was back in Tutorial hell.

I was sitting in my Web Programming lecture when a friend leaned over and showed me a website he had found; motherfuckingwebsite.com. Despite its overall satirical tone, it raised some valid points.

1.
“You. Are. Over-designing.”

I had lost one of the key elements when first designing prototypes: the Minimal Viable Product (MVP). In all the hurry to create an enticing portfolio, I had overcomplicated my original design.

In game development, there is often a discussion about generalists versus specialists. Typically, you want someone who is a mixture of both, being good at their specialty but also knowledgeable about the other processes, to remain flexible. However, in my plight to try to showcase my desire for growth and love for Computer Science, I instead forced myself into a generalist for all of CS. As a C++ developer, I didn’t need to showcase my skills in Front-End as it wouldn’t be something I would ever work with.

2.
“All the problems we have with websites are ones we create ourselves.”

I didn’t need a website built with React or various libraries that, in the worst-case scenario, I wouldn’t be able to debug. I needed a bare-bones website that conveyed information and character but was simplistic enough for me to edit and change whenever required.

Any problem that arose from designing the website was either due to over-complication or my need for a refresher on how to code effectively.

I didn't realize the website was a huge thing in the Web-Dev community and had even led to copycat websites to appear. I read many of the copycats' similar satirical articles, which had troves of actual solid advice. My favourite came from perfectmotherfuckingwebsite.com.

It was again focused on this MVP of websites, but instead chose to focus on reliability and accessibility. The point stood that if it takes less than 5 minutes for me to make the website more accessible, it's worth it. If I spend hours creating a simple banner to fly across the screen for style, but then it isn't accessible, it's a waste of time. Regardless of whether anyone used any of the accessibility features, if I didn't learn best practices, then I wasn't truly learning; I was regurgitating.

Similarly, you may notice an MIT license at the bottom of my website or blog. Honestly, I don't believe my website is worth copying or deriving work from, nor do I believe anyone will ever do so; however, the point remains that I wanted to create a website that was accessible and had the proper practices in mind.

I rediscovered the passion I originally had for creating my own portfolio. It wasn't because I wanted the flashiest or most high-tech webpage. I wanted something that I could be proud of, and that would meet my needs.

“Comparison is the thief of joy”
- Theodore Roosevelt

For anyone else planning to create their own website or portfolio, it's essential not only to create something you want but also something that's uniquely yours and makes you proud.