<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Elliott</title>
    <description>The latest articles on DEV Community by Elliott (@eschmechel).</description>
    <link>https://dev.to/eschmechel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3277387%2F14fb7cff-81a1-4675-985c-08fb111da3bf.png</url>
      <title>DEV Community: Elliott</title>
      <link>https://dev.to/eschmechel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/eschmechel"/>
    <language>en</language>
    <item>
      <title>Your AI slop bores me</title>
      <dc:creator>Elliott</dc:creator>
      <pubDate>Thu, 04 Jun 2026 17:51:28 +0000</pubDate>
      <link>https://dev.to/eschmechel/your-ai-slop-bores-me-4g7k</link>
      <guid>https://dev.to/eschmechel/your-ai-slop-bores-me-4g7k</guid>
      <description>&lt;p&gt;Using AI is fine; I use it daily. Posting the raw output without reading it first is the tell.&lt;/p&gt;

&lt;p&gt;The em dashes in every other sentence. The emoji bullets in a README. A stock-feeling header image that gestures at your topic without saying anything specific about it. That opener about today's fast-paced landscape. Wikipedia maintains a public list of these patterns ("Signs of AI writing"), and a solid chunk of my feed is speedrunning it.&lt;/p&gt;

&lt;p&gt;Here's what those patterns tell me. You saw a generated draft and posted it. You read 'leverage synergies' and hit publish. The writing isn't yours, and it looks like you never noticed.&lt;/p&gt;

&lt;p&gt;If that's how you treat a post, I have to assume it's how you treat a pull request.&lt;/p&gt;

&lt;p&gt;You skipped the whole job. The model hands you a draft. Your value is what you add on top: catching the bug it was confident about, throwing out the approach that looks clean but is quietly wrong.&lt;/p&gt;

&lt;p&gt;Relay the output as is, and you've made yourself a hackathon GPT wrapper.&lt;/p&gt;

&lt;p&gt;When Claude hands me a function, I rename its variables and hunt the edge case it missed. Then I cut half its comments.&lt;/p&gt;

&lt;p&gt;You're the editor between the draft and the publish button.&lt;br&gt;
Put the effort back in.&lt;/p&gt;

&lt;p&gt;Anyway, enjoy the sunrise I did in Microsoft paint (no AI).&lt;/p&gt;

&lt;p&gt;Drop the worst AI tell you've seen in comments; I'm collecting them for a Claude skill.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Skills are Prompts. Here's how Hermes Apprentice turns them into weights</title>
      <dc:creator>Elliott</dc:creator>
      <pubDate>Fri, 29 May 2026 21:29:54 +0000</pubDate>
      <link>https://dev.to/eschmechel/skills-are-prompts-heres-how-hermes-apprentice-turns-them-into-weights-59eh</link>
      <guid>https://dev.to/eschmechel/skills-are-prompts-heres-how-hermes-apprentice-turns-them-into-weights-59eh</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;It's 2 AM and Telegram lights up:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;[gc-7f3a] Graduation candidate: "SKU extraction" — 14 examples, agreement 91%.&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Reply train gc-7f3a to start training, skip gc-7f3a to dismiss.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You're half-asleep. You reply &lt;code&gt;train gc-7f3a&lt;/code&gt; and put the phone down. Forty&lt;br&gt;
minutes later you check Grafana, and the orange line (upstream tokens per&lt;br&gt;
hour for this Hermes agent) has bent downward. A green line marked&lt;br&gt;
&lt;em&gt;specialist routed requests&lt;/em&gt; has stepped into its place. The next ten&lt;br&gt;
thousand SKU-extraction requests cost nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills are prompts. Apprentice turns some of them into weights.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hermes Agent already ships an answer for the first ten patterns you want&lt;br&gt;
it to handle: a Markdown skill file with a YAML frontmatter, dropped into&lt;br&gt;
&lt;code&gt;~/.hermes/skills/&amp;lt;name&amp;gt;/SKILL.md&lt;/code&gt;. The agent's LLM-judged selector picks&lt;br&gt;
the right skill per request, and you're done. This works for the first&lt;br&gt;
ten patterns. It strains at twenty. By thirty you spend more time editing&lt;br&gt;
SKILL.md files than writing features, and the model is still paying full&lt;br&gt;
upstream cost on tasks it has now seen a thousand times.&lt;/p&gt;

&lt;p&gt;The obvious answer is to fine-tune. The unobvious cost is the&lt;br&gt;
infrastructure: pair extraction from session history, PII redaction, a&lt;br&gt;
baseline runner, a promotion gate, a versioned registry, a router that&lt;br&gt;
decides per request whether to use the specialist or fall back to the&lt;br&gt;
big model, a canary that rolls the new specialist out safely, and some&lt;br&gt;
way to find out when any of this breaks. Most teams won't build it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apprentice is that infrastructure, packaged as a tool you install&lt;br&gt;
alongside Hermes.&lt;/strong&gt; It observes Hermes' SQLite session database, clusters&lt;br&gt;
recurring patterns with a small embedding model, fires a Telegram&lt;br&gt;
graduation message when a pattern matures, kicks off an Unsloth QLoRA&lt;br&gt;
training run on your local GPU (or on RunPod), and runs the trained&lt;br&gt;
model through a held-out validation gate. If the specialist beats a&lt;br&gt;
baseline, the proxy starts serving it. Future matching requests get&lt;br&gt;
routed to that local specialist. Misses fall through to OpenRouter.&lt;/p&gt;

&lt;p&gt;The v0.2 surface organizes into two groups. On the rollout side, new&lt;br&gt;
specialists begin at 5% traffic and auto-advance through 15, 25, 50, and&lt;br&gt;
100 percent as their shadow-comparison agreement with the upstream model&lt;br&gt;
stays above threshold; a drop triggers auto-demote and quarantine. The&lt;br&gt;
trainer accepts three base models out of the box (Qwen2.5-1.5B as&lt;br&gt;
default, Qwen2.5-3B, Llama-3.2-3B), chosen per pattern from a&lt;br&gt;
user-editable &lt;code&gt;trainer/supported_models.yaml&lt;/code&gt;. Two related specialists&lt;br&gt;
can be collapsed into one via an MCP-proposed merge that requires&lt;br&gt;
Telegram approval and survives a regression gate against both parent&lt;br&gt;
baselines.&lt;/p&gt;

&lt;p&gt;On the operations side, the proxy authenticates per request via&lt;br&gt;
&lt;code&gt;X-Apprentice-Tenant&lt;/code&gt; plus an API key header, applies a per-tenant&lt;br&gt;
token-bucket rate limit, and tracks quotas; global patterns remain&lt;br&gt;
visible to every tenant. A monthly budget posts Telegram alerts at 80%,&lt;br&gt;
95%, and 100%, with &lt;code&gt;budget increase 10&lt;/code&gt; as the recovery path. When the&lt;br&gt;
local GPU is busy, the orchestrator spills training to RunPod A100,&lt;br&gt;
A6000, or L40S spot instances, gated by the same budget. Grafana shows&lt;br&gt;
eight panels covering request rate, latency p50/p95/p99, error rate,&lt;br&gt;
cost saved, top patterns, specialist-vs-upstream latency, status, and&lt;br&gt;
24-hour counters. OpenRouter handles upstream traffic, with Fireworks,&lt;br&gt;
MiniMax, and Together as fallback tiers.&lt;/p&gt;

&lt;p&gt;These all live in real, named modules of the repo. They aren't roadmap&lt;br&gt;
promises in a slide deck.&lt;/p&gt;
&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjuxrzlnbch2wvlntolo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjuxrzlnbch2wvlntolo.png" alt="Demo tool output" width="800" height="744"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One command brings the whole loop up against a seeded fixture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bash scripts/demo-run.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That script seeds a synthetic Hermes session log, runs the detector,&lt;br&gt;
graduates a pattern, executes the full pipeline (dataset-builder,&lt;br&gt;
trainer, merge, validate, promote), starts the serving and proxy, sends&lt;br&gt;
a test request that matches the new specialist, and prints the Grafana&lt;br&gt;
dashboard URL with a summary table at the end. The whole thing finishes&lt;br&gt;
in well under an hour on a 2080 Ti.&lt;/p&gt;

&lt;p&gt;The Grafana view that matters most during the demo is the &lt;strong&gt;cost-saved&lt;/strong&gt;&lt;br&gt;
panel. The orange "upstream tokens" line goes down while the green&lt;br&gt;
"specialist routed requests" line goes up. The latency panel shows&lt;br&gt;
specialist inference settling around 38 ms p50 and 85 ms p95, well &lt;/p&gt;
&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/eschmechel/hermes-apprentice" rel="noopener noreferrer"&gt;github.com/eschmechel/hermes-apprentice&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; Apache 2.0&lt;/p&gt;

&lt;p&gt;The project splits into small modules. Go handles the hot path (observer,&lt;br&gt;
detector, dataset-builder, proxy, registry, burst). Python handles the ML&lt;br&gt;
and orchestration (trainer, validator, serving, orchestrator, telegram,&lt;br&gt;
installer):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hermes-apprentice/
├── observer/             — Go    Tails ~/.hermes/state.db, normalises pairs
├── detector/             — Go    BGE-small ONNX → HDBSCAN → candidate patterns
├── dataset-builder/      — Go    Fetches pairs, redacts PII, splits 80/10/10
├── trainer/              — Py    Unsloth QLoRA + manifest signer + multi-base-model
├── validator/            — Py    Baseline runner + promotion gate + registry
├── serving/              — Py    vLLM HTTP server + residency control plane
├── proxy/                — Go    OpenAI-compat router with canary/tenants/aliases
├── registry-service/     — Go    Read-only HTTP over ~/.apprentice/registry/
├── orchestrator/         — Py    Autonomous pipeline driver + MCP tools + budget
├── telegram/             — Py    Templates + outbox + getUpdates reply poller
├── installer/            — Py    Interactive setup: detect host, build venvs + Go
├── burst/                — Go    RunPod A100 spot dispatcher (signed jobs)
└── deploy/               — YAML  Docker compose, Grafana dashboards, Prometheus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The interactive installer is the intended entry point. It detects your&lt;br&gt;
host's GPU, KVM, Docker, and &lt;code&gt;uv&lt;/code&gt; state, recommends an isolation profile,&lt;br&gt;
walks you through Telegram and OpenRouter credentials, picks a base&lt;br&gt;
model, sets a monthly cloud budget, and emits cron lines for your&lt;br&gt;
scheduler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apprentice-setup &lt;span class="nt"&gt;--apply&lt;/span&gt;
apprentice-setup &lt;span class="nt"&gt;--apply&lt;/span&gt; &lt;span class="nt"&gt;--profile&lt;/span&gt; docker     &lt;span class="c"&gt;# if you'd rather not run Firecracker&lt;/span&gt;
bash scripts/demo-run.sh                       &lt;span class="c"&gt;# end-to-end smoke test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All settings persist in &lt;code&gt;~/.apprentice/.env&lt;/code&gt;. Re-running the installer&lt;br&gt;
only updates what you provide.&lt;/p&gt;
&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Languages: Go 1.26 (proxy, observer, detector, dataset-builder,
registry, burst), Python 3.10+ (trainer, validator, serving,
orchestrator, telegram, installer).&lt;/li&gt;
&lt;li&gt;Base model: Qwen2.5-1.5B-Instruct (Apache 2.0). Fits 11 GB of VRAM for
QLoRA training and fp16 serving on the same card. Qwen2.5-3B and
Llama-3.2-3B are configured alternates.&lt;/li&gt;
&lt;li&gt;Training: Unsloth QLoRA, 4-bit base plus LoRA rank 16, sized per GPU
via &lt;code&gt;trainer/profiles/profile_*.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Serving: vLLM 0.21 with &lt;code&gt;--enable-lora --max-loras 4&lt;/code&gt;. Multiple
specialists share one warm base model; adapters hot-swap in for about
18 MB of extra VRAM each.&lt;/li&gt;
&lt;li&gt;Routing: BGE-small (Apache 2.0) via ONNX runtime, 384-dimensional
L2-normalized embeddings, cosine match against per-pattern centroids.&lt;/li&gt;
&lt;li&gt;Privacy: Microsoft Presidio sidecar for PII redaction in
&lt;code&gt;dataset-builder&lt;/code&gt;. Secrets scanner runs pre-train. Per-pattern data
cards capture provenance.&lt;/li&gt;
&lt;li&gt;Observability: Prometheus scrape against the proxy's &lt;code&gt;/metrics&lt;/code&gt;,
Grafana dashboards in &lt;code&gt;deploy/docker/compose.monitoring.yml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Cloud burst: RunPod A100 spot instances, dispatched by signed jobs
from &lt;code&gt;burst/&lt;/code&gt;. Budget-gated.&lt;/li&gt;
&lt;li&gt;Upstream: OpenRouter primary, multi-provider fallback chain
(Fireworks, MiniMax, Together).&lt;/li&gt;
&lt;li&gt;Isolation: Firecracker microVM is the default for the Hermes process;
Docker Compose is the portable alternative.&lt;/li&gt;
&lt;li&gt;Control plane: MCP server in the orchestrator exposes
&lt;code&gt;dispatch_training&lt;/code&gt;, &lt;code&gt;propose_merge&lt;/code&gt;, &lt;code&gt;cost_summary&lt;/code&gt;, &lt;code&gt;roi&lt;/code&gt;,
&lt;code&gt;demote&lt;/code&gt;, and the budget tools.&lt;/li&gt;
&lt;li&gt;Operator UX: Telegram for graduation approval, merge approval, and
budget increases. Apprentice rides Hermes' own Telegram adapter rather
than running a separate bot process on the host.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How I Used Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Apprentice would not work the same way against any other agent runtime.&lt;br&gt;
That's the point of building it for this challenge. Hermes' substrate is&lt;br&gt;
what we built on: a SQLite session database, a Markdown skill registry,&lt;br&gt;
&lt;code&gt;no_agent&lt;/code&gt; cron jobs, and an existing Telegram adapter.&lt;/p&gt;
&lt;h3&gt;
  
  
  The session DB is the input
&lt;/h3&gt;

&lt;p&gt;Hermes writes every chat to &lt;code&gt;~/.hermes/state.db&lt;/code&gt;, a SQLite database in&lt;br&gt;
WAL mode. The schema is straightforward: a &lt;code&gt;sessions&lt;/code&gt; table with id,&lt;br&gt;
source, model, system prompt, and token counts; a &lt;code&gt;messages&lt;/code&gt; table with&lt;br&gt;
role, content, tool_calls, and timestamps; an FTS5 virtual table that&lt;br&gt;
makes full-text search a single query away. Apprentice's &lt;code&gt;observer&lt;/code&gt;&lt;br&gt;
(Go) tails this database and normalises each session into clean&lt;br&gt;
&lt;code&gt;(user-input, big-model-output)&lt;/code&gt; pairs. There are no Hermes patches&lt;br&gt;
required, no schema migrations, and no fork of the agent. The observer&lt;br&gt;
reads what Hermes already writes.&lt;/p&gt;

&lt;p&gt;The detector (Go, BGE-small via ONNX) ingests the pair stream from the&lt;br&gt;
observer, computes 384-dimensional embeddings on the user side of each&lt;br&gt;
pair, and clusters them with HDBSCAN. When a cluster crosses a sample&lt;br&gt;
threshold and shows consistent upstream-response shape, it becomes a&lt;br&gt;
&lt;em&gt;graduation candidate&lt;/em&gt;, a row in the orchestrator's job state.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hermes skills are the output
&lt;/h3&gt;

&lt;p&gt;When a specialist passes the promotion gate, the validator writes a&lt;br&gt;
Markdown skill file to &lt;code&gt;~/.hermes/skills/&amp;lt;pattern-id&amp;gt;/SKILL.md&lt;/code&gt;. Under&lt;br&gt;
the Firecracker profile, that file is scp'd into the microVM, and Hermes&lt;br&gt;
picks it up in its skill registry on the next &lt;code&gt;/reload-skills&lt;/code&gt; or&lt;br&gt;
session start. This does two things at once. It tells Hermes'&lt;br&gt;
LLM-judged selector that the pattern exists (so it shows up in&lt;br&gt;
&lt;code&gt;hermes skills list&lt;/code&gt;), and it points the proxy at the right adapter&lt;br&gt;
via the pattern id stored in the skill's frontmatter. Routing itself&lt;br&gt;
happens deterministically in the proxy, via cosine match on the&lt;br&gt;
embedding, not in the LLM selector. The SKILL.md exists for ecosystem&lt;br&gt;
visibility; the centroid exists for correctness.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;hermes cron no_agent&lt;/code&gt; jobs are the heartbeat
&lt;/h3&gt;

&lt;p&gt;The autonomous side of Apprentice runs as &lt;code&gt;hermes cron --no-agent&lt;/code&gt; jobs&lt;br&gt;
registered inside the Hermes microVM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@GUEST &lt;span class="s1"&gt;'hermes cron create --name apprentice-telegram --no-agent \
    --script apprentice-telegram-dispatch.sh --deliver telegram "every 5m"'&lt;/span&gt;
ssh root@GUEST &lt;span class="s1"&gt;'hermes cron create --name apprentice-poll-replies --no-agent \
    --script apprentice-telegram-poll.sh "every 1m"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;no_agent&lt;/code&gt; mode matters: we do not want Hermes' LLM to interpret these&lt;br&gt;
crons. They are shell scripts that run on a schedule and exit. The&lt;br&gt;
dispatch script flushes the outbox of graduation notifications, merge&lt;br&gt;
proposals, and budget alerts. The poll script reads Telegram replies&lt;br&gt;
through Hermes' &lt;code&gt;getUpdates&lt;/code&gt; adapter and turns &lt;code&gt;train gc-7f3a&lt;/code&gt; into a&lt;br&gt;
structured job request for the orchestrator. The orchestrator's&lt;br&gt;
&lt;code&gt;watcher.tick&lt;/code&gt; is a third cron job that reads pending requests and runs&lt;br&gt;
the pipeline.&lt;/p&gt;

&lt;p&gt;This kept the Apprentice process model small. We did not have to run&lt;br&gt;
&lt;code&gt;python-telegram-bot&lt;/code&gt; on the host or stand up a separate webhook server;&lt;br&gt;
every operator-facing piece of the loop rides infrastructure Hermes&lt;br&gt;
already exposes.&lt;/p&gt;

&lt;h3&gt;
  
  
  A graduation, end to end
&lt;/h3&gt;

&lt;p&gt;Concrete paths for one full loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;~/.hermes/state.db&lt;/code&gt;. Hermes writes a new session for a prompt that
asked, in essence, "extract SKU and quantity from this email."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;observer&lt;/code&gt;. Tails the DB, normalises the chat into a pair, ships it
to the detector.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;detector&lt;/code&gt;. Embeds the user side (BGE-small ONNX, about 2 ms),
clusters via HDBSCAN. After the 14th pair, the cluster crosses
threshold.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orchestrator&lt;/code&gt;. Creates a graduation candidate and enqueues a
Telegram notification with id &lt;code&gt;gc-7f3a&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Telegram, via Hermes cron. Your phone buzzes at 2 AM. You reply
&lt;code&gt;train gc-7f3a&lt;/code&gt;. The poll cron picks the reply up within the minute.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dataset-builder&lt;/code&gt;. Fetches all 14 pairs, runs them through Presidio
for PII redaction, applies quality filters and fuzzy dedup, and
splits 80/10/10. Roughly 30 seconds for about a thousand records on
the demo profile.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;apprentice-trainer&lt;/code&gt;. Unsloth QLoRA on Qwen2.5-1.5B-Instruct, rank

&lt;ol&gt;
&lt;li&gt;About 25 minutes on a 2080 Ti, or about 45 minutes on a RunPod
A100 spot instance. Output is an 18 MB adapter.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Merge to fp16. &lt;code&gt;save_pretrained_merged&lt;/code&gt; is the right path here. It
avoids the Unsloth and vLLM tokenizer drift that bites adapter
hot-swap.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;apprentice-validate&lt;/code&gt;. Runs the merged model against the held-out
10% test set and a baseline model on the same prompts. The promotion
gate requires the specialist beat baseline by at least 10 percentage
points on F1; anything less is a failure report rather than a
promotion.&lt;/li&gt;
&lt;li&gt;Registry promote. The manifest is signed with the Ed25519 trainer
key, and the SKILL.md is pushed to the microVM.&lt;/li&gt;
&lt;li&gt;Canary ramp. The proxy starts routing 5% of matching requests to
the new specialist while shadow-comparing every routed turn against
the upstream model. Above 80% agreement the ramp auto-advances;
below, it auto-demotes to "broken" and alerts you.&lt;/li&gt;
&lt;li&gt;Live. Once at 100%, all matching requests stay local. The specialist
serves at roughly 38 ms p50 and 85 ms p95.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The numbers
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsls0dg2844k6c5p0cmdb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsls0dg2844k6c5p0cmdb.png" alt="Local Run Benchmarks" width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All measured on a single 2080 Ti, per &lt;code&gt;docs/benchmarks.md&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Latency / size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding (BGE-small ONNX)&lt;/td&gt;
&lt;td&gt;~2 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cosine match against 100 centroids&lt;/td&gt;
&lt;td&gt;&amp;lt;0.1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialist inference (Qwen2.5-1.5B fp16)&lt;/td&gt;
&lt;td&gt;p50 ~38 ms, p95 ~85 ms, p99 ~150 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA adapter on disk&lt;/td&gt;
&lt;td&gt;~18 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adapter VRAM cost on a warm base&lt;/td&gt;
&lt;td&gt;+18 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training (60 steps, QLoRA r=16)&lt;/td&gt;
&lt;td&gt;~25 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;~120 tokens/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An end-to-end routed turn lands at roughly 40 ms p50 (2 ms embed plus&lt;br&gt;
38 ms inference), and around 166 ms at p99 when the long tail of&lt;br&gt;
inference hits. The upstream OpenRouter round-trip for the same prompt&lt;br&gt;
is multiple seconds, and it costs real money per token.&lt;/p&gt;

&lt;p&gt;The promotion gate's design floor is a 10-point F1 delta versus&lt;br&gt;
baseline. Specialists that fail to clear the un-tuned base model never&lt;br&gt;
leave the validator. The ROI ledger tracks training cost in&lt;br&gt;
GPU-seconds (plus any teacher tokens) against the cumulative dollars&lt;br&gt;
saved by routing matched requests locally instead of upstream.&lt;br&gt;
Break-even arrives when the saved side passes the spent side,&lt;br&gt;
typically within hours for any high-volume pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this only works because of Hermes
&lt;/h3&gt;

&lt;p&gt;In principle we could have pointed Apprentice at any&lt;br&gt;
OpenAI-compatible upstream, but each piece of the loop borrows&lt;br&gt;
something specific from Hermes. The session DB is what makes pair&lt;br&gt;
extraction free. The skill registry is the deployment surface the rest&lt;br&gt;
of the Hermes ecosystem already understands. &lt;code&gt;no_agent&lt;/code&gt; cron is the&lt;br&gt;
heartbeat we did not have to invent. The Telegram adapter is the&lt;br&gt;
operator UX we did not have to stand up. From the outside, Apprentice&lt;br&gt;
ends up looking like a feature of Hermes, because that's how it was&lt;br&gt;
built.&lt;/p&gt;

&lt;p&gt;The v0.3 work continues in the same direction: multimodal pattern&lt;br&gt;
detection against vision skills, federated training across tenants on&lt;br&gt;
a shared registry, and a deeper canary with full %-ramp and A/B&lt;br&gt;
multi-LoRA comparison. All of it sits on top of Hermes rather than&lt;br&gt;
next to it.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building my Portfolio Website: Lessons Learned</title>
      <dc:creator>Elliott</dc:creator>
      <pubDate>Thu, 19 Jun 2025 17:28:42 +0000</pubDate>
      <link>https://dev.to/eschmechel/building-my-portfolio-website-lessons-learned-52o4</link>
      <guid>https://dev.to/eschmechel/building-my-portfolio-website-lessons-learned-52o4</guid>
      <description>&lt;p&gt;&lt;em&gt;Welcome to my first blog post! In this article, I’ll share my journey building this portfolio website, the challenges I faced, and the tools I used along the way.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckitgeo32acjwqkrfihg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckitgeo32acjwqkrfihg.png" alt="Blog Source Code" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As someone with minimal interest in web development and grudgingly powering through my college’s &lt;em&gt;Intro to Web Programming&lt;/em&gt; course, I knew sooner rather than later I needed to create a portfolio. With just one week of experience with HTML in a high school marketing course, I decided to go all in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Planning
&lt;/h2&gt;

&lt;p&gt;I spent days researching different formats and libraries to create your own website. The counterintuitive comparisons from people praising React to those dismissing Angular. I tried to get a better understanding by watching YouTube &lt;strong&gt;How-To&lt;/strong&gt;’s but quickly felt like I was back in &lt;a href="https://www.wbscodingschool.com/blog/what-is-tutorial-hell-how-to-get-out" rel="noopener noreferrer"&gt;Tutorial hell&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I was sitting in my Web Programming lecture when a friend leaned over and showed me a website he had found; &lt;a href="https://motherfuckingwebsite.com" rel="noopener noreferrer"&gt;motherfuckingwebsite.com&lt;/a&gt;. Despite its overall satirical tone, it raised some valid points. &lt;/p&gt;

&lt;h2&gt;
  
  
  1. &lt;blockquote&gt; “You. Are. Over-designing.”&lt;/blockquote&gt;
&lt;/h2&gt;

&lt;p&gt;I had lost one of the key elements when first designing prototypes: the Minimal Viable Product (MVP). In all the hurry to create an enticing portfolio, I had overcomplicated my original design.&lt;/p&gt;

&lt;p&gt;In game development, there is often a discussion about generalists versus specialists. Typically, you want someone who is a mixture of both, being good at their specialty but also knowledgeable about the other processes, to remain flexible. However, in my plight to try to showcase my desire for growth and love for Computer Science, I instead forced myself into a generalist for all of CS. As a C++ developer, I didn’t need to showcase my skills in Front-End as it wouldn’t be something I would ever work with. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;blockquote&gt; “All the problems we have with websites are ones we create ourselves.”&lt;/blockquote&gt;
&lt;/h2&gt;

&lt;p&gt;I didn’t need a website built with React or various libraries that, in the worst-case scenario, I wouldn’t be able to debug. I needed a bare-bones website that conveyed information and character but was simplistic enough for me to edit and change whenever required.&lt;/p&gt;

&lt;p&gt;Any problem that arose from designing the website was either due to over-complication or my need for a refresher on how to code effectively.&lt;/p&gt;





&lt;p&gt;I didn't realize the website was a huge thing in the Web-Dev community and had even led to copycat websites to appear. I read many of the copycats' similar satirical articles, which had troves of actual solid advice. My favourite came from &lt;a href="https://perfectmotherfuckingwebsite.com/" rel="noopener noreferrer"&gt;perfectmotherfuckingwebsite.com&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;It was again focused on this MVP of websites, but instead chose to focus on &lt;strong&gt;reliability and accessibility&lt;/strong&gt;. The point stood that &lt;u&gt;if it takes less than 5 minutes for me to make the website more accessible, it's worth it&lt;/u&gt;. &amp;nbsp;If I spend hours creating a simple banner to fly across the screen for style, but then &lt;strong&gt;it isn't accessible, it's a waste of time&lt;/strong&gt;. Regardless of whether anyone used any of the accessibility features, if I didn't learn best practices, then I wasn't truly learning; I was regurgitating.&lt;/p&gt;

&lt;p&gt;Similarly, you may notice an MIT license at the bottom of my website or blog. Honestly, I don't believe my website is worth copying or deriving work from, nor do I believe anyone will ever do so; however, the point remains that I wanted to create a website that was accessible and had the proper practices in mind.&lt;/p&gt;

&lt;p&gt;I rediscovered the passion I originally had for creating my own portfolio. It wasn't because I wanted the flashiest or most high-tech webpage. I wanted something that I could be proud of, and that would meet my needs.&lt;/p&gt;

&lt;blockquote&gt;“Comparison is the thief of joy” &lt;br&gt;- Theodore Roosevelt
&lt;/blockquote&gt;

&lt;p&gt;For anyone else planning to create their own website or portfolio, it's essential not only to create something you want but also something that's uniquely yours and makes you proud.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
