DEV Community

Fernando Rodriguez
Fernando Rodriguez

Posted on • Originally published at frr.dev

Your Mac Has a Free LLM and You're Not Using It

You're paying anywhere from $20 to $200 a month for access to LLMs. Claude, GPT, Gemini, whatever. And most of the calls you're making from your scripts and dev tools boil down to something like this:

  • "Classify this bug into one of these five categories"
  • "Name this variable for me"
  • "Tell me if this commit is a fix, feat, or refactor"
  • "Summarize this block of text in two sentences"

Meanwhile, your Apple Silicon Mac has a 3-billion-parameter language model baked right in, integrated with the operating system. No cost, no internet needed, no API key, no network latency. And you're probably not using it at all.

The Foundation Models Framework

With macOS 26 (Tahoe), Apple unlocked access to the model that powers Apple Intelligence via the Foundation Models framework. It’s a native Swift framework, available on macOS 26, iOS 26, and iPadOS 26—on any Apple Silicon device that supports Apple Intelligence.

What’s remarkable isn’t just that it’s free (though that’s nice). What’s truly amazing is that it generates typed output in Swift. It doesn’t just give you a String that you have to struggle to parse with regex or prayer. It delivers a strongly-typed struct.

import FoundationModels

@Generable
struct CommitClassification {
    @Guide(description: "The type of change")
    @Guide(.anyOf(["fix", "feat", "refactor", "test", "docs", "chore"]))
    let type: String

    @Guide(description: "One-line summary of the change, max 72 chars")
    let summary: String
}
---

The `@Generable` macro tells the framework to generate a schema at compile time. The model then uses that schema to produce structured output. `@Guide` restricts possible valuesin plain language, you're putting the train on rails and it can't go off track.

To use it:

Enter fullscreen mode Exit fullscreen mode


swift
let session = LanguageModelSession(instructions: """
You are a commit message classifier. Given a git diff,
classify the change and write a summary.
""")

let diff = "..." // your git diff here
let result = try await session.respond(
to: "Classify this diff:\n(diff)",
generating: CommitClassification.self
)

print("(result.type): (result.summary)")
// "fix: handle nil response in auth flow"


That's it. No `URLSession`. No API key. No parsing JSON. No `try? JSONDecoder().decode(WhoKnowsWhat.self, from: data)`. The model runs on-device, on the Neural Engine in your Mac, and spits out a Swift type validated by the compiler.

## What It’s Good For (and What It’s Not)

Here’s where honesty matters. Apple’s model has about ~3 billion parameters, optimized for energy efficiency and low latency—not for complex reasoning. Apple explicitly states in its documentation: the model is designed for **classification, extraction, summarization, and similar tasks**. It’s not designed for advanced reasoning or encyclopedic knowledge.

In public benchmarks, Apple’s model scores ~44% in MMLU—lower than models like Llama 3.2 3B or Gemma 2 2B. Why? Because Apple prioritized running the model efficiently without draining your battery, not winning a pub trivia contest.

And that’s fine for what we’re discussing. A significant percentage of development tooling tasks don’t require deep reasoning. They require **quick classification with a controlled vocabulary**:

| Task                                              | Need GPT-4?       | Is Apple’s Model Enough? |
| ------------------------------------------------- | ----------------- | ------------------------ |
| Classify a commit as fix/feat/refactor            | No                | Yes                     |
| Generate a variable name from context             | No                | Yes                     |
| Summarize a compile error                         | No                | Yes                     |
| Decide if an issue is a bug or a feature          | No                | Yes                     |
| Classify the tone of a PR message                 | No                | Yes, with caution        |
| Design a distributed system architecture          | Yes               | No                      |
| Explain a subtle concurrency bug                  | Yes               | No                      |
| Write a complex algorithm from scratch            | Yes               | No                      |

The line is clear: if the task has a finite set of possible answers and the context fits into a few sentences, Apple’s model likely works. If you need reasoning over hundreds of lines of code with cross-dependencies, you need a bigger model.

## Copy-Pasteable Recipes

Let’s make this practical. Three examples you can copy and use today (well, once you get macOS 26).

### 1. Error Triage

Enter fullscreen mode Exit fullscreen mode


swift
@Generable
struct ErrorTriage {
@Guide(.anyOf(["critical", "warning", "info", "noise"]))
let severity: String

@Guide(description: "Which team should handle this")
@Guide(.anyOf(["backend", "frontend", "infra", "ignore"]))
let owner: String

@Guide(description: "One sentence explaining the issue")
let summary: String
Enter fullscreen mode Exit fullscreen mode

}

let session = LanguageModelSession(instructions: """
You triage error messages from a CI pipeline.
Classify severity and assign to the right team.
""")

let error = "FATAL: column 'user_id' does not exist"
let triage = try await session.respond(
to: "Triage: (error)",
generating: ErrorTriage.self
)
// severity: "critical", owner: "backend",
// summary: "Missing column in database schema"


This runs in milliseconds. No internet connection. You can integrate it into pre-commit hooks, local CI scripts, or even a menu bar notifier. Imagine a monitor that classifies your pipeline errors and alerts you about only the critical ones—all running on your Mac, with zero server calls.

### 2. Naming Assistant

Enter fullscreen mode Exit fullscreen mode


swift
@Generable
struct NamingSuggestion {
@Guide(description: "camelCase name for the variable or function")
let name: String

@Guide(description: "Why this name is appropriate")
let reasoning: String
Enter fullscreen mode Exit fullscreen mode

}

let session = LanguageModelSession(instructions: """
You suggest variable and function names following
Swift naming conventions (camelCase, descriptive,
no abbreviations except standard ones like URL, ID).
""")

let context = "A function that takes a list of timestamps and returns the average interval between consecutive entries"
let suggestion = try await session.respond(
to: "Suggest a name for: (context)",
generating: NamingSuggestion.self
)
// name: "averageIntervalBetweenTimestamps"


### 3. Commit Message Generator

Enter fullscreen mode Exit fullscreen mode


swift
@Generable
struct CommitMessage {
@Guide(.anyOf(["fix", "feat", "refactor", "test", "docs", "chore"]))
let type: String

@Guide(description: "Scope of the change, e.g., auth, ui, db")
let scope: String

@Guide(description: "Imperative summary, max 50 chars")
let subject: String
Enter fullscreen mode Exit fullscreen mode

}

let session = LanguageModelSession(instructions: """
Generate a conventional commit message from a git diff.
Use imperative mood. Be concise.
""")

let diff = try String(contentsOfFile: "/tmp/current.diff")
let msg = try await session.respond(
to: "Generate commit message:\n(diff)",
generating: CommitMessage.self
)
print("(msg.type)((msg.scope)): (msg.subject)")
// "fix(auth): handle expired token in refresh flow"


## Real-Life Example: SentimentKit and Apple’s Irony

I maintain an open-source project called [SentimentKit](https://github.com/frr/sentimentkit)—a Swift framework for sentiment analysis specialized in technical text. It exists because Apple’s official NLP tool, `NLTagger`, is hilariously bad at handling developer messages.

How bad? `NLTagger` scores "delete the temp file" as **-0.8** (very negative). "run make test" gets -0.6. "commit and push," -0.4. According to Apple sentiment analysis, programming is emotionally devastating. Deleting a temp file is basically a death threat.

No one noticed sooner because literally no one uses `NLTagger` for serious work. It doesn’t show up in a single academic paper on sentiment analysis in software engineering. We tried it, documented the bias, and built something better.

[...]
Enter fullscreen mode Exit fullscreen mode

Top comments (0)