Discussion on: I was paying 3x too much for AI APIs. Here's what I changed.

View post

This resonates. There's something quietly humbling about realizing you've been using a scalpel to spread butter.

The part that stuck with me isn't the caching or the prompt trimming—though both are solid—it's the psychological layer of it. We default to the "best" model not because we've evaluated the task, but because using the flagship feels like we're doing better work. There's a weird, subconscious prestige tied to picking GPT-4o or Sonnet, even when the output is functionally identical. It's like driving a sports car to check the mail.

Your rule about "transform this text a little" is a good heuristic. It makes me wonder how many of my own API calls are just glorified sed commands with a polite voice attached. We've gotten so good at offloading thinking to the cloud that we forgot some things just need a blunt instrument, not a philosopher.

Curious—did you notice any change in your own writing or code once you stopped expecting the model to handle all the nuance? I found that when I switched to cheaper models, I started being more precise with my prompts because I couldn't lean on the model's raw intelligence to fill in my vague gaps. It was an accidental discipline.

Phillip Tori • Apr 23

Yeah this is exactly it. The accidental discipline part hits.

Once I stopped throwing everything at Sonnet I realized how much of my prompting was basically "here's a vibe, figure out what I mean." The flagship models were papering over my
own laziness. I'd write something ambiguous, the model would pick a reasonable interpretation, and I'd feel like it understood me. What actually happened was I got away with underspecifying. Cheaper models break that spell fast. When the reply is technically correct but obviously not what you meant, you sit there annoyed for a second before realizing the prompt could have meant three different things. That's a you problem, not a model problem.

The carryover into other writing was unexpected. I started catching the same vagueness in PR comments and in my own docs. When you've been forced to spell out exactly what "clean this up" means for a small model, you notice how often you're asking humans to do the same guessing.