Models are becoming utilities and apps get cloned in a week. I think the moat is one layer up — before you've even finished the thought.
The whole industry is grinding on one thing right now: making AI answer better. Bigger models, sharper prompts, another wrapper app every week.
The more I build, the more convinced I am that the valuable part isn't on the answer side at all.
Look at the two layers everyone's fighting over. The model layer is turning into a utility — you rent it by the token, and everyone rents the same thing. The app layer is a cloning race: one feature takes off, and someone ships a copy within the week. Long term, neither layer is defensible. You can't build a moat out of something everyone can rent, or something anyone can copy.
So where does a moat actually fit?
In a place almost nobody is seriously working on: the thought you haven't finished having yet.
Every AI tool today stands downstream of "you already figured out what you want." You sort the idea out in your head, translate it into a prompt, and feed it to the machine. We even turned that translation step into a skill with a name — prompt engineering. Which, if you stop and look at it, is backwards: it's the human accommodating the machine. That's not a stable arrangement. It never has been, for any technology.
Upstream of all that is the moment an idea first surfaces — when you couldn't even articulate it to yourself yet. Whoever catches you there decides everything that happens after: which model gets called, which app opens, which path you take. The entire chain downstream gets routed by that first touch.
We've seen this movie. The search box won the internet not by having the best pages, but by owning the first moment of wanting to find something. Own that moment and you distribute everything behind it.
The intent layer is this generation's search box. Whoever owns it takes the biggest piece on the table.
So what is it, concretely?
A translation layer: from the fuzzy thing in your head to something a machine can execute precisely. It doesn't belong to any model or any app — it sits in front of all of them. Today that translation is done by hand, by you, and we call it "writing prompts." What I want to do is take that job away from the human and hand it to the machine.
Three concrete bets on how
One: stop making the human explain. Flip it — let the machine read your context and work out what you're doing. Most of what you want is already written in the thing you're looking at and acting on. You shouldn't have to say it again.
Two: do this where the context is richest. That's why my first move is the browser, not an input method. An IME only knows you're typing. The page knows which box you're typing into, against what content, for what purpose — that's an order of magnitude more signal. At this stage, depth of context beats breadth of coverage, and it's not close.
Three: in the middle you need a component whose only job is recognizing intent — compressing a pile of messy context into one precise, executable intent that everything downstream can run with. That's the core I'm still grinding on myself. I won't pretend it's solved.
One line I hold hard
The system only goes as far as preparing the intent. Whether to fire it — that last press — is always yours. I'll pave the road right up to your feet. I won't take the step for you.
This one isn't written for people passing by. If you're seriously thinking about where this layer lives and who ends up owning it — I want to meet you. Say hi: lingchong@iterant-ai.com.
Top comments (0)