DEV Community: gtapps

The 4 Plugins I Run on 9 Always-On Claude Code Agents

gtapps — Mon, 29 Jun 2026 22:18:47 +0000

I run 9 Always-On agents with Claude Code and claude-code-hermit plugin. These are the 4 official Anthropic plugins I run on all of them.

claude-code-setup

I run it as a weekly routine

Analyzes the repo (stack, structure, package.json, layout) and recommends the highest-value automations to add: skills, hooks, MCP servers, subagents, slash commands.

Skill:
/claude-code-setup:claude-automation-recommender

claude-code-setup on GitHub →

claude-md-management

Run the audit weekly or the revise-md on session-close or on a reflection schedule

Audit and improve CLAUDE.md files in repositories. Scans for all CLAUDE.md files, evaluates quality against templates, outputs quality report, then makes targeted updates. There's also a revise-claude-md skill where the session context is used for revision which is really useful in reflection routines in always-on agents.

Skills:

/claude-md-management:claude-md-improver # general audit on your CLAUDE.md files

/claude-md-management:revise-claude-md # revise with session context

claude-md-management on GitHub →

skill-creator

Whenever I want to create, edit, evaluate, benchmark, and improve skills

Use when you want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

Skill:
/skill-creator:skill-creator [describe the skill you want to create, modify or improve]

skill-creator on GitHub →

feature-dev

For the changes big enough to deserve a plan instead of a one-shot edit.

The other three recommend, maintain, and package. feature-dev builds. It runs a seven-phase workflow (discovery, exploration, clarifying questions, architecture, implementation, review, summary) backed by three read-only Sonnet agents: code-architect blueprints the files, code-explorer traces execution paths, code-reviewer flags only high-confidence issues. Crucially, it gates on your approval before writing any code and offers architecture options first, a checkpoint that matters more, not less, when the agent runs with some independence. Not for one-line fixes or hotfixes; for the work that deserves a plan without entering Plan mode.

Skill:
/feature-dev:feature-dev [optional feature description]

feature-dev on GitHub →

Why these four

Recommend, maintain, skill, build. That's the loop.

Claude Code PushNotification tool: what it does and how to use it

gtapps — Fri, 05 Jun 2026 22:21:02 +0000

PushNotification is a native Claude Code tool you can call to send PushNotification to your phone or Claude app.

The obvious use case is long-running tasks:

"Run the full test suite, fix what you can, and notify me using the PushNotification tool when the suite is green or blocked.""

But the more interesting cases are background workflows:

Useful examples:

notify when a build finishes
notify when a monitor sees a production-like error in logs
notify when a scheduled task finds something
notify when Claude is blocked on a human decision
notify when a remote/headless session needs attention
notify when a channel integration failed and the agent could not DM you
notify when a morning brief or weekly review is ready

This is especially useful with Monitor Tool, /loop, Crons.

How it works

You enable mobile push in /config with "Push when Claude decides."
Activate remote-control for mobile push through the Claude app.
Ask for it directly in a prompt or set a rule in CLAUDE.md, for example: "run the tests and notify me with the PushNotification tool when they finish."

How I wire it

On my Claude Code Always-on Agents (claude-code-hermit plugin), the notification policy is:

if a real two-way channel is reachable, send the message there first
if no channel is configured, use PushNotification
if a configured channel is broken, use PushNotification as fallback
never rely on push as the only record
keep push messages short, one-line, actionable, and low-frequency

That keeps the signal clean. Push should not duplicate every Discord/Telegram message. It should cover the cases where the normal path is absent, broken, or too slow.

Best practices

Keep the message short:

CI failed on api-tests: 3 auth regressions. Open session for details.

Put the actionable bit first:

Decision needed: choose migration strategy before Claude edits schema.

Do not push for routine progress. Push for state changes: done, blocked, failed, needs decision.

If you need to answer Claude, use Remote Control or a two-way channel. Push only gets your attention.

Links

Tools reference: https://code.claude.com/docs/en/tools-reference

Remote Control and mobile push: https://code.claude.com/docs/en/remote-control
Channels reference: https://code.claude.com/docs/en/channels-reference
Hermit example policy: https://github.com/gtapps/claude-code-hermit/blob/main/plugins/claude-code-hermit/state-templates/CLAUDE-APPEND.md

My take: PushNotification is a small tool, but it changes the workflow. Claude Code no longer has to sit silently in a terminal while you periodically check whether it finished. It can now tap you when the work is done, blocked, or worth opening again.

Anthropic's June 15th pricing reframes Claude Personal AI Assistants

gtapps — Sun, 17 May 2026 22:26:04 +0000

Anthropic's June 15th pricing change is not just a billing update. It changes where Claude personal assistants should live.

If you want a Claude-powered assistant that runs beyond a single chat session, you now have two serious paths.

Managed Agents: hosted, scalable, API-native, and priced like infrastructure.
Local always-on Claude Code instance: close to your files, tools, repos, permissions, memory, and personal context.

Tools like Hermes Agent, OpenClaw-style assistants, T3 Code, Zed's Claude Agents, or OpenCode wrappers when they route unattended Claude work through subscription-authenticated Agent SDK usage or claude -p, now live inside the new Agent SDK credit model. That can mean cost increases of 12x to 25x typically, with extremes near 175x.

What changes on June 15th

Starting June 15th, 2026, Claude Agent SDK usage and claude -p no longer draw from the same subscription bucket as interactive Claude usage.

They move to a separate monthly Agent SDK credit.

That credit covers:

Claude Agent SDK usage in your own projects
claude -p / claude --print
Claude Code GitHub Actions
Third-party apps authenticated through the Agent SDK

It does not cover:

Interactive Claude Code
Claude chat
Claude Cowork

Interactive Claude stays subscription-shaped. Unattended programmatic Claude becomes credit-shaped, then extra-usage/API-priced once the credit is gone.

Reported effective cost increases land in the 12x–25x range for typical programmatic workloads, with extreme cases (heavy harness usage on a Pro subscription) quoted as high as 175x. The right framing is not "the bill went up." It is "the subsidy went away."

Plan	Monthly Agent SDK credit
Pro	$20
Max 5x	$100
Max 20x	$200
Team Standard	$20
Team Premium	$100
Enterprise usage-based	$20
Enterprise seat-based Premium	$200

Monthly Agent SDK credit by Claude plan tier, effective June 15th, 2026.

The local alternative: always-on Claude Code instance

An always-on local Claude Code instance keeps the assistant alive locally, with durable state in files and inbound channels or terminal sessions. That is the future. Run it under Docker so it survives reboots and stays available between sessions.

Claude Code already provides much of the substrate a personal assistant needs:

Authentication and model access without a separate API key
A permission-gated tool layer (Read, Write, Bash, Grep, Edit)
Hooks at session start, tool pre/post, and session end
Subagents with isolated context windows, so heavy reads do not poison the main session
MCP server integration, skills, and a plugin system with marketplace
CLAUDE.md as canonical project policy, read on every session
Auto-memory keyed to the project working directory, persistent across sessions
Session persistence, resume, and fork

A plugin loaded into that instance adds the operating layer on top: scheduled routines, background watches, a proposal pipeline that gates mutation, and channel-based notification. That is what turns the substrate into the scenarios below.

What that looks like in practice:

An assistant that wakes at 7am, summarizes overnight Slack, and posts the digest to a Discord / Telegram DM
A heartbeat that checks CI every twenty minutes, stays silent when green, and pings the operator when a job fails with the failing test pre-quoted
A routine that picks gh issues when idle and tackles them

Disclosure: I maintain claude-code-hermit, an open-source Claude Code plugin that turns CC into an always-on personal AI assistant. The patterns described here are not theoretical for me.

Sources

Anthropic Agent SDK credit: https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan
Claude Code programmatic usage: https://code.claude.com/docs/en/headless
Claude Code CLI reference: https://code.claude.com/docs/en/cli-reference
Claude API pricing: https://platform.claude.com/docs/en/about-claude/pricing
Hermes Agent (NousResearch): https://github.com/NousResearch/hermes-agent
VentureBeat on the OpenClaw reinstatement and credit policy: https://venturebeat.com/technology/anthropic-reinstates-openclaw-and-third-party-agent-usage-on-claude-subscriptions-with-a-catch
Effective price-increase math (MagnaCapax gist): https://gist.github.com/MagnaCapax/d9177e35b355853f03c730dfcaa693ef

How I turned Claude Code into multiple 24/7 personal AI assistants

gtapps — Sat, 09 May 2026 15:47:07 +0000

I use Claude Code everyday, lately Anthropic started shipping some tools that are really relevant for always-on autonomous agent, and I decided to create the glue of all these tools & features to create personal AI assistants that are just a CC instance. My goal was simple, to be able to run multiple always-on personal AI assistants on my laptop with a single Claude Subscription.

The result is claude-code-hermit, an open-source Claude Code Plugin, a glue layer that turn Claude Code into a 24/7 personal assistant, with a heartbeat, routines, reflection, it learns from its work, and runs on your hardware. One Claude subscription, multiple hermits, one laptop. The compute lives in the cloud at Anthropic; the state, the credentials, and the agency live with you.

This is the story of what I built and what 30 days of running 8 of them actually looks like.

Why Claude Code was already 90% of the way there

/loop for heartbeat. CronCreate does idle-gated cron routines. Channels handles Discord, Telegram. Auto-memory. Native Tasks for plan tracking. Auto mode or bypassPermissions inside a Docker container without anxiety for non prompted loop.

The compute lives at Anthropic. I don't host a model. I don't pay per runtime hour. My laptop is the orchestrator and the policy boundary, not the compute. That's the trick. Claude Code is already doing the expensive part; the only question is whether you've wired the rest into a 24/7 loop.

So the first version of hermit was small: a hatch wizard, an OPERATOR.md template, a proposal pipeline. The rest is integration.

What hermit added

Hermit boils down to four things on top of the native stack.

A wizard that adapts to whatever folder you point it at. /hatch scans the folder. It asks four or five questions: agent identity, what should it focus on, who approves what, how should it notify you etc. It writes an OPERATOR.md. Every session loads that file first. Works on an existing codebase, an empty directory, or a folder I created just to host the hermit.

A self-learning loop gated by proposals. Hermit is token usage aware, it reflects on a cron, between sessions.
New routines, skills, subagents, claude.md changes etc, anything relevant, it drafts self-improvement proposals from repeated patterns, and surfaces them in Discord / Telegram for me to accept, defer, or dismiss.

A raw/compiled knowledge model. Borrowed from Andrej Karpathy's llm-wiki: the hermit writes to raw/ unedited source material it collects (snapshots, audit logs, source dumps, etc). And to compiled/ the short, curated artifacts the agent distills from raw (briefings, reviews, assessments, etc). At the start of every new session, recent compiled notes get loaded back into context, so the agent picks up where it left off without dragging the entire history along. The result is a small, growing wiki you can read and edit, instead of a giant prompt you can't.

Docker is the always-on substrate. Running a 24/7 agent on a host means solving four problems at once: defensible bypassPermissions if you are not using Auto mode, automatic recovery from crashes and host reboots, isolated credentials, and a reproducible environment. /docker-setup ships them in one wizard: kernel-enforced hardening (no-new-privileges, cap_drop: ALL, pids_limit), restart: unless-stopped so crashes and reboots self-heal, named volumes that persist OAuth across container rebuilds, and a SIGTERM trap that closes the session cleanly before exit. An opt-in /docker-security layer adds nftables LAN containment, CPU/memory bounds, and a plugin-install audit log - each toggle with honest disclosure of what it doesn't catch.

Setup

claude plugin marketplace add gtapps/claude-code-hermit adds the marketplace.
claude plugin install claude-code-hermit@claude-code-hermit --scope project installs the CC plugin.
/claude-code-hermit:hatch runs the setup wizard — agent identity, language, timezone, four-or-five behavior questions, channel preference, optional plugin recommendations. Generates OPERATOR.md from a fresh project scan plus your answers. Quick path is ~3 minutes.
/claude-code-hermit:docker-setup writes the Docker scaffolding, builds the image, runs OAuth login, walks through channel pairing, and verifies the container is healthy.

If you want the always-on flow without Docker, .claude-code-hermit/bin/hermit-start boots a tmux session instead.

How the self-learning loop works

The naive way to run a "self-improving" agent is to ask it, every session, "how could you do better?" and let it write whatever comes to mind. In the beginning of the project it might be useful, but overtime that produces noise. Half the suggestions are local minima, half are inventions of patterns that haven't actually recurred, and the operator gets fatigue-buried within a week. What hermit does instead is gate every step.

/reflect (cron, idle, manual)
        |
        v
precheck.js: 5 phases
   |
   +-- EMPTY (~94%) ---> no LLM, exit
   |
   +-- RUN|phases
        |
        v
reflect (LLM): three-condition rule
        |
        v
reflection-judge (Sonnet)
   |
   +-- SUPPRESS ---> memory only
   |   codes: no-evidence,
   |          no-sessions,
   |          covered-by-memory
   |
   +-- DOWNGRADE:N ---> lower tier
   |
   +-- ACCEPT
        |
        v
proposal-triage (Haiku)
   |
   +-- SUPPRESS ---> memory only
   |   codes: weak-recurrence,
   |          weak-consequence,
   |          not-actionable,
   |          covered-by-memory
   |
   +-- DUPLICATE:PROP-NNN ---> link
   |
   +-- CREATE ---> PROP-NNN.md
                   -> Discord / Telegram

Every arrow that doesn't end in CREATE is a gate that fired before tokens were spent on the next step. Verdict strings (ACCEPT, DOWNGRADE:N, SUPPRESS — <code>, DUPLICATE:PROP-NNN, CREATE) are the literal codes the two subagents return; the 94% precheck-EMPTY rate is the fleet measurement reported later in §Cost & local-first. On the noisiest hermit it's still 87%, on the quietest a research project that ran a full month without a single LLM-backed reflection, it's 100%.

The precheck

Before reflect even runs, a Node script (reflect-precheck.js) decides whether any of five phases are actually due:

compute — has there been new session activity since the last reflect?
resolution_check — are any accepted proposals due for a "did this fix the pattern?" verification?
cost_spike — is today's spend more than 2× the 7-day median?
digest — has it been 7+ days since the last juvenile-phase digest?
newborn — is this hermit less than 3 days old (different surfacing rules apply)?

If none of those fire, the precheck writes a single line to the session log and exits — no LLM call. On a typical idle day, the precheck handles most reflect invocations for free. Only when something is actually due does the script return RUN|<phases-json>, and only then does Claude read it and proceed.

This is the load-bearing reason hermit can run 24/7 without tearing through tokens. The expensive thing, letting Claude reflect on session journals and spot patterns only happens when the cheap thing has already established that there's something to reflect on.

Two subagents stand between reflect and your inbox

When reflect does run and produces a candidate, the candidate doesn't go straight to a proposal file. Two subagents run on it first.

reflection-judge verifies the evidence. The candidate cites session reports (S-NNN-REPORT.md); the judge actually reads them and confirms the pattern is in there. If reflect hallucinated the citation, which LLMs sometimes do under context pressure, the judge returns SUPPRESS with a canonical code (no-evidence, no-sessions, covered-by-memory). This is the load-bearing check against the failure mode where the agent cites a session it didn't actually read.

proposal-triage deduplicates. It reads the existing PROP-NNN.md files and memory files and checks whether a proposal already covers this pattern, whether it conflicts with a stated operator preference, or whether one of the three conditions fails. Verdict is one of CREATE | DUPLICATE:<id> | SUPPRESS.

Survive both gates, and the candidate becomes a proposal that surfaces in Discord / Telegram. Fail either, and it's recorded as a sub-threshold observation in memory, there if a recurrence later promotes it, invisible to the operator until it does.

The three-condition rule

Before reflect even drafts a candidate, it applies a rule that's quoted verbatim from the skill source:

Repeated pattern - observed in 2+ sessions (1+ in newborn phase for Tier-1 micro-proposals)
Meaningful consequence - something tangibly goes wrong without fixing it
Operator-actionable change - a specific concrete change you can approve

If any of the three can't be stated without hand-waving, the candidate doesn't make it past reflect. This is what stops the "you should be more efficient" generic-pattern fatigue that always-on agents drift into.

Phase-aware behavior

Hermit knows how old it is. Reflect classifies it on every run:

newborn (< 3 days) - surfaces single-occurrence sub-threshold observations inline as Noticed: <pattern> so the operator gets visible early signal
juvenile (3–14 days) - emits weekly digests instead of per-observation lines
adult (≥ 14 days) - silent baseline; only graduating recurrences surface

Newborn hermits aren't useful if they're silent for two weeks. Adult hermits aren't useful if they're noisy. The phasing is what makes the same skill work at day 1 and day 30.

What "accept" actually does

The operator accepts a proposal in Discord / Telegram. What happens next depends on the proposal type:

Routine proposal - the JSON config block is parsed and appended to config.json directly. The next session picks up the new routine. No further work.
Skill proposal - if /skill-creator (an Anthropic-shipped plugin) is installed, hermit queues a session task with guidance to use it else Claude does it on its own.
Everything else - writes a NEXT-TASK.md file. The next /session-start offers it as the default task, with the proposal's evidence and proposed plan attached.

There is no path where the hermit auto-modifies its own skills, agents, or hooks. Operator approval is a precondition for anything that mutates the hermit's behavior beyond a config-only routine entry. The auto-curriculum is real — Voyager-style, in that the agent proposes new capabilities based on what it observed itself doing — but the editor-in-chief role is permanent and human.

Cost & local-first

I run several of these: dev work, company ops, my Home Assistant, fitness with strava sync, my wife's social media management and a few other. Each lives in its own project directory with its own memory, budget, and routines. They share nothing but my Claude subscription and my laptop's CPU. One important thing for me was token usage efficiency, since I run this with one subscription. That's why self learning was so important, that's why Hermits are token usage aware, they brief you on it's token usage and you can ask it directly regarding it as well.

Two numbers tell the architectural story:

Cache hit ratio: ~92% across the fleet. Hermit hits Anthropic's prompt cache aggressively - memory, compiled artifacts, OPERATOR.md, the bottom of CLAUDE.md. Of every token Claude reads in a hermit session, more than nine in ten are cache reads, billed at a fraction of the input rate. My quietest hermit hits 90.4%, my noisiest 95.5%. The single largest cost-control lever is keeping the cache warm — and the wizard-driven memory layout makes that the default rather than something I have to engineer.

Precheck-EMPTY: ~94% across the fleet. This is the architectural claim from earlier, now measured. On one hermit, a quiet research project, reflect-precheck.js short-circuited 100% of reflect invocations — zero LLM calls for the entire month. On the noisiest, it still handled 87%. Across the fleet, ~94% of reflects never hit the model. The cheap script does more cost work than any model-routing decision could.

Try it yourself

Hermit is open-source.

claude plugin marketplace add gtapps/claude-code-hermit
claude plugin install claude-code-hermit@claude-code-hermit --scope project
/claude-code-hermit:hatch
/claude-code-hermit:docker-setup

Repo and full README at github.com/gtapps/claude-code-hermit.