DEV Community

signalscout
signalscout

Posted on

I Analyzed 215 of My ChatGPT Conversations. Here's My "Usage DNA."

I Analyzed 215 of My ChatGPT Conversations. Here's My "Usage DNA."

Everyone talks about prompt engineering. Nobody talks about prompt patterns — the habits you don't know you have.


The Setup

I exported my ChatGPT history and ran it through an analysis pipeline I built. Not a scraper — I used OpenAI's official data export, then wrote Python to cluster topics, classify intents, detect conversation loops, and fingerprint my prompting style.

Think of it as Spotify Wrapped, but for your AI usage.

Here's what 215 conversations, 695 messages, and 25,618 words revealed about how I actually use AI.

My Usage DNA

Metric Value
Average prompt length 39.5 words
Median prompt length 23 words
Vocabulary richness 0.18 (4,610 unique / 25,618 total)
Avg conversation length 6.7 turns
Most active hour 12 AM ET (4 UTC)
Most active day Monday
Sessions per week 43

The median (23 words) vs average (39.5) gap is telling. Most of my prompts are short commands. But when I go long, I go long — dragging the average up. I'm either firing off "fix this" or writing a paragraph of context. There's no middle.

43 sessions per week means I'm opening ChatGPT about 6 times a day. That's less than I expected. It feels like I live in the chat window, but apparently I batch my usage into focused sessions rather than constant drip queries.

How I Prompt: The Shape Distribution

Every prompt has a "shape" — a combination of length and structure:

Shape % What It Means
Medium instruction 38.1% "Do X with Y constraints" — 16-50 words, directive
Short command 19.7% ≤15 words, imperative — "fix the build", "summarize this"
Long instruction 16.3% 50+ word specifications with context
Ultra short 8.2% "yes", "continue", "try again"
Medium question 7.2% Genuine information-seeking
Short question 5.2% Quick lookups
Essay prompt 3.5% Full context dumps
Code paste 1.2% Pasting code for analysis

The insight: I'm 74% instruction, 12% question, 3.5% essay. I use AI as a tool operator, not a search engine. I already know what I want — I'm delegating execution, not seeking knowledge.

This maps directly to how power users differ from casual users. Casual users ask questions ("What is X?"). Power users give instructions ("Build X with these constraints"). The intent distribution confirms it:

Intent Count %
Question 202 29%
Instruction 79 11%
Brainstorm 46 7%
Debug 44 6%
Meta 27 4%
Creative 9 1%
Other 288 41%

6% of my prompts are debugging. That's a conversation with an AI about why the AI's previous output was wrong. The recursive irony isn't lost on me.

What I Talk About: 20 Topic Clusters

The topic clustering found 20 distinct domains across 215 conversations. The top 5:

  1. Work/Management (20 convos, 146 msgs) — Boss dynamics, union questions, workplace strategy. Longest conversations by far — 7.3 msgs average.
  2. Business/Finance (20 convos, 75 msgs) — Company analysis, bitcoin, investment reasoning. High breadth, lower depth.
  3. People/Content (18 convos, 35 msgs) — Content strategy, audience analysis. Short, punchy sessions.
  4. AI/Frontier Models (16 convos, 55 msgs) — Model comparisons, frontier capabilities, wild speculation.
  5. Career/Resume (14 convos, 25 msgs) — Resume writing, job applications, OpenAI research.

The insight: My heaviest AI usage isn't coding. It's workplace strategy — navigating human dynamics with an AI advisor. The conversations about boss interactions are 2x longer than anything else. I'm using ChatGPT as a management consultant.

The Loop: Where I Got Stuck

The loop detector found one significant conversation loop — a pair of conversations 4 days apart about the same unresolved topic (similarity: 0.41):

  • "Gateway Password Recovery" (April 9)
  • "OpenClaw vs Paperclip" (April 13)

Both were about OpenClaw configuration. Same problem, two attempts, no resolution. The loop detector flagged it as repeated_question / unresolved.

Only 1 loop out of 215 conversations sounds good, but the real number is probably higher — the detector uses semantic similarity with a conservative threshold. What it caught was a verbatim repeat. The subtler loops — rephrasing the same question, approaching the same problem from different angles — need a more sophisticated model.

The insight: Conversation loops are a signal of tool failure. When you ask the same thing twice across separate sessions, either the AI failed to solve it or you failed to retain the solution. Either way, it's wasted tokens and wasted time.

What Companies Already Know (That You Don't)

Here's the uncomfortable part: every major AI provider already has this data about you. OpenAI, Anthropic, Google — they can see your prompt patterns, your topic clusters, your conversation loops, your usage DNA. They use it for model training, safety research, and product decisions.

You can't see any of it.

There's no "Prompt Analytics" tab in ChatGPT settings. No "Your Usage Report" email. No "You asked about Python debugging 47 times this month — here's a shortcut." The data exists. The insights are extractable. They just don't give them to you.

The argument for building this as a user-facing tool isn't technical — it's philosophical. You should have at least as much insight into your own AI usage as the companies hosting it.

What This Means for AI Tooling

If you're building AI products, here's what my data suggests:

  1. Power users don't ask questions — they give instructions. Your UX should optimize for the imperative case, not the interrogative one. The chat input box is fine for questions. For instructions, you need structured input.

  2. Conversation loops are a product bug. If your users are asking the same thing in multiple sessions, your memory/context system has failed. Track repeat queries.

  3. Usage DNA is a feature. Show users their patterns. "You tend to write long prompts for coding tasks but short prompts for writing tasks — want to try being more specific on the writing side?" This is the AI equivalent of screen time reports, and it's equally valuable.

  4. The heaviest usage isn't what you think. I expected my top category to be coding. It was workplace strategy. Product teams optimizing for the "developer use case" might be missing their actual power users.

How I Built This

The pipeline is straightforward:

  • Input: conversations.json from OpenAI's data export
  • Topic clustering: TF-IDF + keyword extraction, no ML models needed
  • Intent classification: Rule-based (prompt length + structural patterns)
  • Loop detection: Cosine similarity between conversation pairs
  • Shape analysis: Word count + punctuation patterns
  • Output: JSON reports + Markdown summary

No API calls. No cloud processing. Everything runs locally on a laptop in under 10 seconds for 215 conversations. The analysis is deterministic — same input, same output, every time.

The code is Python, ~500 lines total. No transformers, no embeddings, no GPU. Just TF-IDF and heuristics. The point isn't sophistication — it's that useful insights don't require expensive infrastructure.


Try It Yourself

Export your ChatGPT data (Settings → Data Controls → Export), then ask yourself:

  1. What's your instruction-to-question ratio?
  2. Which topic gets your longest conversations?
  3. Where are you looping — asking the same thing twice?

You might be surprised. I was.

Open Source

The analysis pipeline is open source: PromptLens on GitHub

MIT licensed. ~500 lines of Python. No API keys needed.


Ryan builds AI analysis tools and agent infrastructure. Find him on GitHub and DreamSiteBuilders.com.

Top comments (0)