I version every prompt I send to Claude. Here's why.

#ai #llm #prompts #discuss

Q1: Why did I start logging every prompt I send to Claude and Cursor?

A. Because I lost a Sunday afternoon in September 2025 trying to reconstruct a prompt I had nailed down three weeks earlier. The model's built-in history was useless — I could not search by the shape of the prompt I half-remembered. After rewriting it from scratch and getting a worse answer, I decided to treat my prompts the way I treat my commits: append-only, plain text, lived in the editor.

Q2: What does each entry look like?

A. One block per non-trivial prompt. ISO timestamp, model, one-line task, the prompt verbatim, a few lines of outcome. No tags. No app. No CLI. A real entry from August 14, 2025:

## 2025-08-14T16:32+09:00  claude-sonnet-4-5  ios/captio
TASK: Make the share extension's preview card render in <50ms when the
host app passes a 4KB plain-text string.

PROMPT:
> I have an iOS share extension built in Swift. When the user shares
> plain text, I want to render a preview card showing the first 200
> chars and the character count. The preview is currently taking ~180ms
> and I think it is the AttributedString conversion. Show me a version
> that uses NSAttributedString and a CATextLayer instead, and explain
> the tradeoff in one paragraph.

OUTCOME:
- Worked first try. Dropped to 38ms median on iPhone 13.
- Tradeoff was correctly named: lost dynamic type support.
- Reused the pattern two weeks later for the keyboard ext.

The timestamp is the only required field. Everything else is optional. Half my entries have no OUTCOME: block because the prompt failed and I bailed. That's fine. A log that punishes you for being honest is a log you stop writing.

Q3: How is this different from what ChatGPT, Claude, and Cursor already give me?

A. Three histories, three search bars, three different relevance algorithms. None of them index by the thing I actually remember about old prompts — the shape, not the literal first sentence. I remember "the one where I asked it to write a property wrapper that throttled writes." Built-in searches are bad at shape. grep over my own plaintext log is good at shape, because I named the shape in the TASK: line.

Q4: When does the log start paying back?

A. Around month three. Months one and two are pure capture overhead — there is nothing to recall yet, so the work feels one-directional. The flywheel starts when I notice that I am looking things up in prompts.log more often than I am opening a model's sidebar. I tally-marked the "wait, what worked last time?" moments on a sticky note for two months: down from roughly five per day to under one. That is two of my daily 90-second context breaks reclaimed, which compounds.

The second-order payback is harder to measure but more interesting. Reading my own log end-to-end on a Saturday morning in March (ninety minutes total) surfaced three pattern clusters I had not noticed while writing them. One: roughly two-thirds of my failed prompts were failures of context, not capability. The model could have solved the problem; I had not given it the surrounding code or constraints. Two: my highest-hit-rate prompts cluster around naming the exact file and stating the constraint as a hard number. Three: I rephrase the same five questions every month across different projects. That observation gave me a small templates/ folder of reusable prompt scaffolds, and dropped my average turns-to-correct-answer on those repeating shapes from about 3.4 to 1.6 over the next two months.

Q5: When is it overkill?

A. If you write fewer than maybe 20 non-trivial prompts a week, the capture cost outweighs the recall benefit. If your team already shares prompts in a repo or doc, that shared store is more valuable than a private log. If your work is mostly one-shot ("write me a regex," "fix this typo"), the recall path doesn't matter — you'll never look the prompt up again. I am also not religious about the format. I piggyback on a journaling habit I already had. If you don't have a "open a text file and write something" muscle, a fancier tool may be a better starting point for you than a flat file.

Q6: What's the unexpected win that I didn't plan for?

A. The log became a personal RAG. In April I was trying to get Claude to write a Swift property wrapper, and after two unsatisfying turns I pasted about 60 lines from prompts.log (every previous time I had asked for a property wrapper, including the failures) into the conversation as context, and asked it to write a new one in the same style. The third turn was the answer I wanted. Now I run an 11-line shell function:

prompthist () {
  local q="$1"
  grep -B1 -A20 -i "$q" ~/notes/prompts.log \
    | head -n 400 \
    | pbcopy
  echo "Copied matching prompt history to clipboard ($(pbpaste | wc -l) lines)."
}

prompthist "property wrapper" puts the entire context of every previous time I asked about property wrappers into my clipboard. The model reads my past failures and writes around them. This is grep, not embeddings. For 14,000 lines and a sample size of one user, grep is enough.

Q7: What did I get wrong about it?

A. Three things. I assumed I would need tags. I don't. I tried JSON for "structure," and stopped within a month because schema decisions ate the writing energy. I assumed the log would teach me about the model; it actually taught me about myself. The highest-hit-rate prompts I write are uniformly under 80 words, second-person, name the exact file or function I care about, and state the constraint as a hard number. The clever, multi-example prompts I was proud of in 2024 had a worse track record than the boring ones. I had to read 200 of my own failures in one Saturday morning to see that.

Q8: What would I change if I started over today?

A. Two things. I would co-locate the log with my git repo from day one. Mine sits at ~/notes/prompts.log and is symlinked into every project, but for the first six months it lived only in ~/notes/ and I kept forgetting to look at it inside a Cursor session. The fix was a Cmd+T jump-to-file shortcut to prompts.log from any workspace. The second change: I would version it. The log itself is now in a private git repo with daily auto-commits. The commit history of my prompt history has answered "when did I last care about X?" twice already, and it cost me one afternoon to set up.

I would not start with anything fancier than that. Vector embeddings, RAG pipelines, a homegrown CLI: I have looked at all three and the marginal benefit over grep is, for one user and a five-figure number of lines, statistically indistinguishable from zero. The thing the log gives me is not retrieval; it is the habit of capturing in the first place. Every tool I have evaluated lowered the retrieval cost at the expense of raising the capture cost. That trade is bad for me.

Q9: This one's for you.

If you've kept a prompt log, or tried and abandoned one, I'd like to hear what made it stick for you, or what made you drop it. Especially if you switched away from a database or app back to a flat file, or went the other direction. Two sentences is plenty.

I build Captio-style Simple Memo — a one-screen iOS app that emails my note to my inbox in under half a second. I have shipped it alone since 2024. I post here whenever a habit changes how I work.

DEV Community

I version every prompt I send to Claude. Here's why.

Top comments (0)