DEV Community

Fernando Rodriguez
Fernando Rodriguez

Posted on • Originally published at frr.dev

I'm paying $15 per million tokens to write 'fix: typo'

Yesterday I wrote a commit message with Claude Code. The diff was a one-line change: a typo in a comment. Claude Opus read the diff, thought for two seconds, and generated fix: correct typo in auth comment. That consumed about 800 input tokens and 30 output tokens, at $15 and $75 per million respectively. Cost: a fraction of a cent. But multiply that by 40 commits per day, 250 days per year, across a company with 200 developers using coding agents, and the fraction of a cent becomes thousands of dollars spent on the intellectual equivalent of applying band-aids.

The problem isn't that Opus is expensive. The problem is that coding agents don't distinguish between $0.001 tasks and $0.10 tasks. Everything goes through the same model. Generate a commit message, classify an issue, validate a format -- everything hits the big model at the same cost as designing a microservices architecture. It's the equivalent of hiring a surgeon to apply band-aids.

The numbers

Let's run the numbers with Claude Opus 4 pricing (the previous generation, which most teams still use in production):

Task Input tokens Output tokens Cost
Commit message (small diff) ~800 ~30 $0.014
Classify an issue ~500 ~50 $0.011
Validate commit format ~300 ~20 $0.006
Standup summary ~2000 ~200 $0.045

None of these tasks need a model with 2 trillion parameters and multi-step reasoning capability. They're classification and constrained generation tasks. The equivalent of sorting cards by color.

With Apple Intelligence's on-device model (3B parameters, included in macOS 26): cost $0.00, latency ~300ms, no network, no API key.

foundation-hooks

foundation-hooks is a set of 4 Swift binaries that use Apple's Foundation Models framework to automate development tasks that don't justify a cloud model:

Binary Function Git hook
fm-commit-msg Generates conventional commit messages from diff prepare-commit-msg
fm-validate-msg Validates format and suggests corrections commit-msg
fm-lql-create Classifies and creates Linear issues via lql CLI
fm-lql-standup Generates standup summary from git log + issues CLI

All four share the same pattern: define a Swift struct with @Generable, feed the model minimal context, get structured output in milliseconds.

Installation:

git clone https://github.com/frr/foundation-hooks
cd foundation-hooks
make build && make install-hooks REPO=/path/to/your/repo
Enter fullscreen mode Exit fullscreen mode

From that point on, every git commit automatically generates a conventional message. The hook has been installed in 11 production repositories for two weeks.

How it works: @Generable and constrained decoding

This is the part that deserves technical attention. @Generable isn't "ask the model to return JSON and hope for the best". It's constrained decoding -- the model literally cannot generate tokens that violate the schema.

The mechanism

  1. @Generable is a Swift macro that generates a JSON Schema at compile time from the struct.
  2. The framework injects that schema into the prompt as a response format specification.
  3. During inference, at each decoding step, token masking is applied: vocabulary tokens that would produce invalid output according to the schema are masked (probability 0 in the softmax).
  4. The model can only choose from valid tokens.

Apple describes this as "guided generation" in the WWDC25 documentation. It's the same technique OpenAI uses with response_format: json_schema and Anthropic applies in tool use. The difference: Apple integrates it into Swift's type system. Define the struct, the compiler generates the schema, the runtime applies it during inference. Type safety end-to-end.

Three levels of constraint

@Generable
struct CommitMessage {
    // Level 1: HARD constraint — effective enum
    // Active token masking: only "fix", "feat", "refactor", etc.
    // Tokens that would form "bug" or "update" have probability 0.
    @Guide(.anyOf(["fix", "feat", "refactor", "test", "docs", "chore", "style"]))
    var type: String

    // Level 2: SOFT constraint — like a system prompt for this field
    // The model tends to follow it but isn't forced to.
    @Guide(description: "Scope of the change, e.g. auth, ui, db. One word, lowercase.")
    var scope: String

    // Level 3: no constraint — free string, the model decides
    var subject: String
}
Enter fullscreen mode Exit fullscreen mode

The analogy: anyOf is a dropdown, description is an input with placeholder, and a field without Guide is an empty textarea. The difference between the three isn't one of degree but of mechanism. The first operates at the token level (the model cannot deviate), the second operates at the prompt level (the model tends to follow it), the third has no guidance.

This is relevant because the git hooks use case is exactly the scenario where hard constraints shine. A commit type must be one of 7 values. No ambiguity, no creativity, no reasoning. It's pure classification. A 3B parameter model with constrained decoding does this as well as a 200B model. The difference is one takes 300ms and is free, the other takes 2 seconds and costs money.

Complete code for a hook

This is fm-commit-msg, the prepare-commit-msg hook. It's 106 lines of Swift with no external dependencies:

import Foundation
import FoundationModels

@Generable
struct CommitMessage {
    @Guide(description: "Type of change")
    @Guide(.anyOf(["fix", "feat", "refactor", "test", "docs", "chore", "style"]))
    var type: String

    @Guide(description: "Scope of the change, e.g. auth, ui, db, api. One word, lowercase.")
    var scope: String

    @Guide(description: "Imperative summary of the change, max 50 chars, lowercase, no period")
    var subject: String
}

guard SystemLanguageModel.default.isAvailable else {
    exit(0) // No Apple Intelligence — exit silently, user writes their own
}
Enter fullscreen mode Exit fullscreen mode

Three things worth highlighting:

  1. Graceful degradation: if Apple Intelligence isn't available (Mac without Apple Silicon, model not downloaded), the hook exits with code 0 and git continues normally. Never blocks.

  2. Doesn't fabricate: the model receives git diff --cached --stat and a patch truncated to 3000 characters. Enough to classify and summarize, insufficient to confabulate.

  3. Doesn't replace the human: the message is written to the commit file with git comments (#), so git commit displays it in the editor. The user can modify or discard it.

The generation:

let session = LanguageModelSession(instructions: """
    You generate git commit messages in conventional commits format.
    Focus on WHY the change was made, not WHAT changed.
    The subject must be imperative mood, lowercase, no period, max 50 chars.
    """)

let result = try await session.respond(to: prompt, generating: CommitMessage.self)
let msg = result.content
let message = "\(msg.type)(\(msg.scope)): \(msg.subject)"
Enter fullscreen mode Exit fullscreen mode

session.respond(to:generating:) returns a CommitMessage instance, not a String. No parsing. No regex. No try? JSONDecoder().decode(...). The struct is the contract and the compiler guarantees it.

Issue tracking integration: fm-lql-create

The same pattern works for issue tracking. fm-lql-create classifies a natural language description and creates a Linear issue via lql, a Linear CLI written in Rust:

@Generable
struct IssueClassification {
    @Guide(.anyOf(["bug", "feature", "improvement", "task", "chore"]))
    var type: String

    @Guide(.anyOf(["urgent", "high", "medium", "low", "none"]))
    var priority: String

    @Guide(description: "Clean, professional issue title. Max 80 chars.")
    var title: String

    @Guide(description: "One-line description for the issue body")
    var description: String
}
Enter fullscreen mode Exit fullscreen mode

Usage:

$ fm-lql-create "auth token refresh crashes when expired"
PROD | high | bug | TOK: Auth: token refresh crashes on expiry
Token refresh fails silently when the OAuth token has expired, causing auth loop.

Press Enter to create, Ctrl-C to cancel:
Enter fullscreen mode Exit fullscreen mode

The local model classifies the issue in ~500ms: type bug, priority high, clean title, one-line description. Then lql create creates it in Linear. The --dry-run flag shows the proposal without executing anything.

Two fields with anyOf (type, priority) guarantee the classification is valid. It cannot return "priority: very important" or "type: bugfix". The tokens are masked. Two fields with description (title, description) give controlled freedom to the model.

Before and after

Step With coding agent (Opus) With foundation-hooks
Generate commit message ~2s, ~800 tokens, ~$0.014 ~300ms, 0 tokens, $0.00
Validate format ~1.5s, ~300 tokens, ~$0.006 ~200ms, 0 tokens, $0.00
Classify issue ~2s, ~500 tokens, ~$0.011 ~500ms, 0 tokens, $0.00
Generate standup ~3s, ~2000 tokens, ~$0.045 ~800ms, 0 tokens, $0.00
Requires network Yes No
Requires API key Yes No
Works on airplane No Yes

The local model times are actual measurements on a MacBook Pro M4 Pro. Not synthetic benchmarks.

What it can't do

Apple's on-device model is a 3B parameter model with a 4096-token context window. It has clear limits:

  • Large diffs: above ~3000 characters of patch, the context is truncated. For massive refactors touching 20 files, the model only sees the statistical summary (--stat), not the complete patch. The commit message will be generic but correct in format.

  • Architectural decisions: "Should I use a protocol or a concrete type here?" is a question that needs project context, codebase history, and multi-step reasoning. That's still big model territory.

  • Code generation: foundation-hooks doesn't generate code. It generates metadata about code: commit messages, classifications, summaries. The boundary is clear: if the task is to "write" something a human will review, use the big model. If the task is to "label" something a human already wrote, use the local model.

  • macOS 26+ with Apple Silicon only: doesn't work on Linux, doesn't work on Intel Macs. For heterogeneous teams, the hook exits silently and the user writes their own message.

Installation

# Prerequisites: macOS 26, Xcode 26, Apple Intelligence enabled
git clone https://github.com/frr/foundation-hooks
cd foundation-hooks
make build

# Install hooks in a specific repo
make install-hooks REPO=/path/to/your/repo

# Install CLI binaries to ~/.local/bin
make install-lql

# Install hooks in all known repos (edit Makefile to adjust the list)
make install-all
Enter fullscreen mode Exit fullscreen mode

The Makefile copies the compiled binaries directly to .git/hooks/. No runtime, no daemon, no configuration. If the binary is in the hook, it works. If you don't want AI on a commit, git commit --no-verify.

The thesis

Coding agents are extraordinary tools for tasks requiring complex reasoning. But the current pricing model doesn't distinguish between complexity. Every interaction with the model -- from designing an architecture to writing "fix: typo" -- goes through the same pipeline, at the same cost, with the same latency.

The solution isn't to stop using coding agents. It's to stop using them for everything. Classification, validation, and constrained generation tasks are solvable with a 3B parameter model running locally. The hardware is already in your machine. The framework is already in the operating system. Only the code to connect them was missing.

foundation-hooks is 400 lines of Swift connecting those dots. make install-hooks REPO=. and every commit generates its own message, every issue classifies itself, every standup writes itself in 800ms. No network, no tokens, no cost.

The surgeon can stop applying band-aids.

Top comments (0)