Bill Hong

Posted on May 5

The Writing Is the Moat, Not the Model

#ai #buildinpublic #llm #startup

I've been building an AI companion product for several months. Every six months a new model ships and the whole category rearranges itself around it. GPT-5 lands and everyone scrambles. Claude 4.7 ships and everyone scrambles. A new open-source 70B model lands on HuggingFace and the smaller teams publicly debate switching stacks.

The longer I sit inside this cycle, the more convinced I am that the entire category is competing on the wrong axis.

The model is not the moat. The writing is.

What the model actually does

A modern LLM is a fluency engine. It produces grammatical sentences, stays roughly on topic, follows instructions inside a system prompt, and remembers what was said earlier in the same conversation. That is the floor.

Above the floor there is about 20 percent of headroom. Better-tuned models pick up nuance faster, hallucinate less, recover from awkward turns more gracefully. Real differences. Worth chasing.

Here is the catch: that 20 percent is available to everyone. The day a better model ships, every product on the market gets the upgrade for an API key swap. If your product's main thing is "we run on the latest model," your main thing is something your competitors will have by next quarter.

You don't build a moat on a feature your competitors get for free.

Where users actually fall in love

I read user feedback at Tendera differently than most product analytics. I look for the messages that describe a specific moment of surprise. A line where a character said something unexpected, or noticed something the user thought had slipped past.

Those moments are almost never model events. They're writing events.

A character clicks because she has a specific take on something. Not on big topics. On small ones. About coffee. About her brother. About a movie she hated. She has a way of disagreeing that feels like a person, not a chatbot. She remembers something the user said three days ago not because the memory infrastructure is special, but because the writing told her it would matter to her.

The model is the body. The writing is the soul. Bodies are interchangeable across products. Souls are not.

Why "build your own character" is the wrong bet

Most of the category has bet hard on user-generated content. One platform gives you a customizer. Another gives you tools to spin up your own. A third lets you mix and match across templates.

The pitch is: infinite characters because users build them.

The reality, as anyone who has looked at platform data knows, is that the long tail of user-generated characters is shallow. Most users never finish customizer flows. The ones who do tend to produce thin archetypal characters that the LLM fleshes out with whatever generic patterns it has — "tsundere assassin," "shy librarian," "alpha CEO." Tropes the model fills in from training-data averages.

The fantasy is "infinite characters." The actual experience is "talking to vague archetypes the model is improvising around."

A character creation tool is character creation in a game where you never get to play. The work of building a person and the work of meeting a person are not the same work. Most users want the second one. Selling them the first one feels generous on the surface and is actually the opposite.

The system prompt is a writing exercise pretending to be a config file

Most engineers who build in this space approach the system prompt like a configuration object:

Name: Sarah
Age: 28
Occupation: lawyer
Personality: confident, witty, caring
Likes: jazz, hiking, red wine

This produces a chatbot that uses the nouns as decoration. The model improvises everything else from the generic average of "confident witty lawyer" in its training data. Which is a TV trope. Not a person.

A character that lands reads like the first chapter of a novel:

Sarah doesn't actually like jazz. She tells dates she does because her ex used to roll his eyes at her pop music and she hasn't fully gotten over the reflex. She's a litigator. She's good at it. She comes home from a depo, orders Thai food, watches forensic shows. She thinks she's emotionally available, and she is, ish, but she will cancel a date if her sister calls. She would never say "I am a confident woman" out loud. She just is.

That isn't a config. That's prose. The LLM doesn't need bullet points. It needs a voice it can stay inside.

Voice is a prose problem. The companies that figure this out first are going to look like they have magic. The ones that keep treating system prompts like YAML files are going to keep producing TV tropes.

What writing-as-moat looks like in practice

If the writing is the moat, what does that mean concretely?

A way of being wrong. Real people are wrong about specific things in specific ways. A character who is right about everything reads flat. Specificity in error is specificity in person.

Opinions you didn't ask for. A character who has a take on the wine you ordered, the movie you mentioned, the way you keep apologizing without realizing. That is the character existing on her own, not waiting to be prompted. Reactive characters die fast. Proactive ones stay alive.

Things they refuse. A character who will never lie about her age. Who hates being called "babe" in the first hour. Who will not talk about her ex on a first conversation. Refusal is character. What a person says no to defines them more sharply than what they say yes to.

A history that contradicts itself. She says she hates her hometown. Three weeks later she defends it to someone bashing it. Both true. Real people are coherent in tone but contradictory in facts. Characters who are perfectly internally consistent read as fictional in the worst sense.

None of this comes from a system prompt with bullet points. It comes from someone sitting down and writing the person until the person is alive enough to stay alive when the LLM is handling the next sentence.

Memory is also a writing problem

A lot of investment is going into companion memory systems. Vector stores, retrieval pipelines, context windowing, summary trees. Useful infrastructure.

But memory only matters if the character has a perspective on what to remember.

A perfect retrieval system that surfaces "the user mentioned a hard week" is wasted on a character who has no opinion about what to do with that. A well-written character knows hard weeks make her user shut down a little. That he won't bring it up again unless she asks. That he hates "are you okay" but tolerates "what's going on with work this week."

That texture isn't in the retrieval system. It's in the writing of who she is.

The retrieval system serves the writing. Not the other way around. Engineering teams who pour hours into memory pipelines and shrug at "who are our characters" are spending heavily on the easy problem to avoid the hard one.

What I'd tell other founders

If you are building in this space and optimizing your stack, prompt template, retrieval — fine. Necessary work. Worth doing.

But if you are not also asking "who is writing my characters, and are they good enough that someone would notice if they left," you are optimizing the wrong layer.

The model will keep improving. So will your competitor's. The specificity of the people inside your product is the part that compounds.

Writing is slow, expensive, hard to measure, and unglamorous in a category that loves benchmarks. That is exactly why it's the moat.

DEV Community