DEV Community

Cover image for The Decision Subtraction Framework: How to Evaluate Any AI Tool
Harry Floyd
Harry Floyd

Posted on • Originally published at telegra.ph

The Decision Subtraction Framework: How to Evaluate Any AI Tool

Last week someone asked me which AI tools they should be using. The question hides a problem that costs real money: there are more capable AI tools available than any single person can evaluate.

ChatGPT Plus at $20/month. Claude at $20. Grok at $30. Cursor at $20. Copilot at $10. Each with a $100, $200, or $300 variant underneath. Each claims to earn its place.

The real question is not which tool is best. The real question is: which tools subtract more decisions than they add?

The Three Lenses

1. Replacement Ratio

Formula: decisions replaced by the tool ÷ decisions it creates.

List every decision the tool makes for you. Then list every new decision it forces you to make. Divide the first by the second.

Thresholds:

  • ≥ 2.0 → Keep
  • 1.0–2.0 → Evaluate
  • < 1.0 → Drop

Example: A code completion tool that writes a function body (replaces 5 decisions about syntax, structure, naming) but requires review (adds 2 decisions about correctness) has a ratio of 2.5. It passes.

A meeting summariser that replaces 1 decision (should I re-listen?) but creates 3 (verify accuracy, add context, decide what to share) has a ratio of 0.33. It fails.

2. Friction Delta

Formula: time without the tool ÷ time with the tool.

Include onboarding time amortised over your first 10 uses. A tool that saves 30 minutes per use but took 2 hours to learn breaks even at 4 uses. After that, it is pure gain.

Threshold: Break-even within 5 uses.

Catch: This lens breaks for tools that enable tasks you could not do at all before. A drug discovery simulation has infinite Friction Delta because the alternative is impossible. Score those as "can't evaluate on this lens" and rely on the others.

3. Attention ROI

Formula: output quality ÷ attention consumed.

Estimate cognitive load per use on a simple scale: 1 (fire and forget) to 4 (full attention required). Track whether it goes up or down over 10 uses.

Threshold: Attention per use should decrease over time. If you need to watch it more closely after ten uses than after one, something is wrong.

Where This Framework Lies to You

I tested this framework against the hardest cases I could find. It failed in five ways. Knowing them makes it useful:

  1. Decision quality matters more than quantity. One high-stakes judgment (should I deploy?) outweighs 10 trivial picks (camelCase or snake_case?). Weight strategically.

  2. Friction Delta can't measure capability expansion. If a tool lets you do something new rather than just faster, skip this lens.

  3. Attention ROI rewards deskilling. The descending attention threshold is a Goodhart target — it rewards tools that train you to rubber-stamp.

  4. Erasure cost is invisible. The framework never asks: if I use this for a year, what can I no longer do without it?

  5. Error asymmetry is invisible. Two tools can score identically while producing catastrophically different results when they fail.

The Fourth Lens: Erasure Cost

Ask: "If I use this tool for six months and then stop, what skill will I have lost?"

Score it: 1 (nothing lost) to 4 (core competency outsourced). Score 1-2 is safe. Score 3 is a deliberate trade. Score 4 is dependency, not tooling.

How to Apply: Monday Morning

  1. List every AI tool you have used in the last 30 days
  2. Score Replacement Ratio and Friction Delta for each
  3. Both pass → Keep. One fails → 7-day trial. Both fail → Cancel
  4. Score Erasure Cost for the survivors
  5. When evaluating a new tool: score it before subscribing

Worked Examples

ChatGPT Plus ($20/month)

  • Replacement Ratio: 3.5. Replaces research lookups, drafting, formatting. Creates verification and prompt decisions. Pass.
  • Friction Delta: Breakeven in 2-3 uses. Shallow learning curve. Pass.
  • Attention ROI: Decreasing. Gets faster as you learn its patterns. Pass.
  • Erasure Cost: 2. The underlying skill (structuring an argument) is reinforced, not replaced.
  • Verdict: Keep.

Cursor Pro ($20/month)

  • Replacement Ratio: 4.0. Replaces syntax lookups, boilerplate, function structure. Creates code review decisions. Pass.
  • Friction Delta: Breakeven in 1-2 uses. Tab completion is instant. Pass.
  • Attention ROI: Steeply decreasing. Pass.
  • Erasure Cost: 3. Heavy users report difficulty writing syntax without it after 3+ months. A deliberate trade worth making.
  • Verdict: Keep for daily coding. Monitor erasure.

Meeting Summariser ($20/month, anonymised)

  • Replacement Ratio: 0.33. Replaces 1 decision. Creates 3. Fails.
  • Friction Delta: Never breaks even. Still attend meetings, still verify. Fails.
  • Attention ROI: Flat. Must check every summary at same level. Fails.
  • Erasure Cost: 2. Minor skill atrophy.
  • Verdict: Cancel.

This framework connects to a deeper structural principle: a tool's value is the difficulty it removes. If it creates new difficulty of a different kind, it is not a tool. It is a job.

Full framework with diagram: https://telegra.ph/The-Decision-Subtraction-Framework-How-to-Evaluate-Any-AI-Tool-05-28

Top comments (0)