How 1,002,243 words of journal entries revealed patterns of a writer's mind.
"Today, I start writing down every thought that goes into my head."
I was 15 when I typed those words on September 17, 2011 when I first joined 750words. I had no idea I was beginning what would become a habit I actually keep up to this day.My current stats on 750words.
Fourteen years, 134 files, 1,167 individual entries, and over a million words of raw, unfiltered consciousness.
I also had no idea that one day I'd feed all of this into semantic analysis algorithms and discover things which shifted how I see the entire practice of journaling. Things about myself that I never could have seen from the inside.
This is what it means to document a life in real-time.
The Underestimated Art of Journal Writing
Let's be honest: journaling gets no respect in the literary world. It's seen as the training wheels of writing, something you do before you're ready for "real" work. But here's what's being missing --- some of the most profound literature of the past century came from writers who kept obsessive personal records.An excerpt from Woolf's diary.
Virginia Woolf's diaries span thirty volumes. Joan Didion famously said she wrote entirely to find out what she was thinking. Anaïs Nin's diaries became more influential than her novels. Susan Sontag's journals, published posthumously, revealed the intellectual scaffolding behind her cultural criticism.
Journals are laboratories of consciousness, places where writers could experiment with voice, work through ideas, and document the daily texture of existence that forms the bedrock of all meaningful writing.
Here's what none of these writers had: the ability to analyze their own patterns with silicon capable of tricking people it's alive.
The Numbers Don't Lie
When I ran semantic analysis on my journal collection, the scale made things a little tricky. 1,065,212 words. To put that in perspective, that's roughly equivalent to twelve novels. Twelve books I've written without even trying to write books.
For the developers reading, let me get into the weeds:
Building the Semantic Analysis Engine
The core challenge was processing 134 markdown files (roughly one for each month) spanning 14 years while maintaining semantic context. With Cursor, I built a Python analyzer that combines regex pattern matching with categorical word mapping:
class SemanticAnalyzer:
def __init__(self, journal_dir):
self.journal_dir = Path(journal_dir)
self.results = {
'tense_analysis': defaultdict(int),
'sensory_analysis': {
'sight': defaultdict(int),
'sound': defaultdict(int),
'touch': defaultdict(int),
'smell': defaultdict(int),
'taste': defaultdict(int)
},
'pronoun_analysis': defaultdict(int),
'semantic_keywords': defaultdict(int),
# ... more categories
}
The key insight was creating semantic word sets rather than relying purely on NLP libraries. I manually curated categories like:
self.sight_words = {
'see', 'look', 'watch', 'observe', 'notice', 'glance', 'stare',
'bright', 'dark', 'colorful', 'vivid', 'clear', 'beautiful'
}
self.past_markers = {
'was', 'were', 'had', 'did', 'been', 'went', 'remembered',
'yesterday', 'ago', 'before', 'previously', 'used to'
}
Why this approach works better than pure NLP: Context matters enormously in journal writing. "I feel" could be tactile or emotional depending on usage. Manual categorization captured nuance that automated parsing missed.
Pronoun Psychology
Pronoun analysis reveals perspective:
def analyze_pronouns(self, text):
pronouns = {
'first_person_singular': r'\b(I|me|my|mine|myself)\b',
'first_person_plural': r'\b(we|us|our|ours|ourselves)\b',
'second_person': r'\b(you|your|yours|yourself)\b',
'third_person_singular': r'\b(he|him|his|she|her|its)\b',
'third_person_plural': r'\b(they|them|their|theirs)\b'
}
for pronoun_type, pattern in pronouns.items():
matches = re.findall(pattern, text, re.IGNORECASE)
self.results['pronoun_analysis'][pronoun_type] += len(matches)
Technical note:The _`\b_`word boundaries were crucial here. Without them, "theme" would match "them" and skew the results. Regular expressions in semantic analysis require surgical precision.
Processing Scale: 1M+ Words Efficiently
To handle the volume, I implemented streaming processing with memory-efficient file handling:
def clean_text(self, text):
# Remove metadata that would skew semantic analysis
lines = text.split('\n')
cleaned_lines = []
for line in lines:
line = line.strip()
# Skip journal metadata
if (line.startswith('Words:') or line.startswith('Minutes:') or
line.startswith('-----') or line.startswith('#')):
continue
cleaned_lines.append(line)
text = ' '.join(cleaned_lines)
text = re.sub(r'http\S+', '', text) # Remove URLs
text = re.sub(r'\s+', ' ', text) # Normalize whitespace
return text.strip()
Performance insight: The biggest bottleneck was data structure growth rather than processing. Using defaultdict
with sorted final output kept memory usage reasonable even with 33,000+ unique keywords.
Patterns Found
I am 61.8% present-tense focused. That's not something I would have guessed about myself. I always thought I was a nostalgic person, someone who lived in the past. But the data reveals instead that I'm someone who processes experience in real-time, who uses writing to make sense of what's happening now rather than what's already finished.
This changed how I understand my relationship to time. The past gets 28.3% of my linguistic attention, the future only 9.9%. I'm not actually nostalgic --- I'm present. I'm someone who writes to stay awake to the moment I'm living in.
My pronoun usage reveals an intense (but not isolated) consciousness. 52.2% of my pronouns are first-person singular (I, me, my), which confirms what you'd expect from a journal. But here's what's interesting: 12% are second-person pronouns (you, your). I'm constantly talking to an imagined reader, or addressing my journal --- writing letters never sent. There's more going on than writing --- there's performing, teaching, connecting.
I write for a future version of myself, some imagined audience, some connection I was hoping to forge across time and space.
The Body Writes Before the Mind Does
When looking at sensory language, touch dominated everything else. "Feel" appears 1,516 times *in my writing. The next most frequent sensory word, *"see," appears only 799 times.
I feel my way through the world rather than thinking. Writing is profoundly embodied. I process reality through skin before I process it through my brain.
This can explain a lot about the creative process. Why I write better when I'm walking. Why I can't think clearly when I'm physically uncomfortable. Why my best insights come when I'm paying attention to the texture of experience, not just its meaning.
Code Deep Dive: Sensory Language Detection
Here's how I detected and categorized sensory language across the corpus:
def analyze_sensory_language(self, text):
words = re.findall(r'\b\w+\b', text.lower())
sensory_categories = {
'sight': self.sight_words,
'sound': self.sound_words,
'touch': self.touch_words,
'smell': self.smell_words,
'taste': self.taste_words
}
for word in words:
for sense, word_set in sensory_categories.items():
if word in word_set:
self.results['sensory_analysis'][sense][word] += 1
The challenge: Words like "sharp" could be visual, tactile, or even taste-related. I solved this by accepting semantic overlap rather than forcing exclusive categorization. Some words appear in multiple sensory categories, which actually reflects how language works --- our sensory experiences are inherently cross-modal.
Data structure choice: I used nested dictionaries (sensory_analysis['touch']['feel']
) rather than flattened keys. This made aggregation easier and kept related data together for analysis.
The Compulsion to Document Everything
Embarrassingly, the script found 3,015 instances of the word "writing" *across my journals. That's more frequent than words like *"love" (1,160 times) *or *"today" (1,042 times).
I don't just write --- I'm obsessed with the act of writing. I write about writing about writing. It's recursive. Compulsive.
This meta-pattern reveals that I'm documenting the documentation of my life. The journal isn't an impartial record, rather, it's become a character in my story.
"I need to" appears 1,194 times in my writing. More than any other phrase pattern. I'm constantly setting goals, making plans, promising myself I'll do better. But "actually" appears 1,023 times --- I'm also constantly correcting myself, revising my statements, trying to get closer to the truth.
There is a precision-orientation rather than a goal-orientation. I'm someone who cares more about accuracy than progress, more about honesty than achievement.
Evolution Is Real (And Measurable)
When I looked at yearly patterns, the data told a story I could never have seen from inside the experience. My language has become more sophisticated over time, but not in the way you'd expect.
In 2011--2013, my writing was raw, emotional, stream-of-consciousness.
By 2014--2016, I was writing structured manifestos and systematic goal-setting documents.
**2017--2019 **brought philosophical depth and existential questioning.
**2020--2023 **showed academic integration and theoretical frameworks.
**2024--2025 **reveals a mature literary voice developing something I call "bloodwriting" --- the practice of writing that costs something to produce, that emerges from lived experience rather than academic theory.
But here's what's fascinating: my most productive years (2022--2023, when I wrote nearly half a million words) coincided with the periods when I simplified my approach. When I stopped obsessing over productivity systems and just wrote.
The more you try to optimize the writing process, the less you will actually wrote. My most prolific periods happened when I forgot about optimization entirely.
What Changed Everything
These discoveries changed how I understand consciousness itself.
I used to think self-awareness was about introspection, about looking inward and examining your thoughts. But the analysis revealed self-awareness is more so about pattern recognition across time.
I couldn't see these patterns from inside my own experience. Too close to the data. But when I stepped back to see the larger structures, I saw myself clearly.
This is why journaling matters more than we think. Not because it helps you "process your feelings" (though it does), but because it creates a dataset large enough to reveal patterns that would otherwise remain invisible.
Your consciousness has rhythms, preferences, defaults, and blind spots that you can't see while you're living inside them. But if you document enough of your thinking over enough time, patterns emerge that can teach you who you actually are, not just who you think you are.
The Illusion of Neoliberal Consumption
One of my recent entries contained a line that the omniscient chatbot flagged as significant:
"I think it's easy to get caught up in the trick that creating an organization system is the same as being organised. That collecting stationery is the same as being somebody who writes by hand a lot. The illusion of neoliberal consumption is appealing and seductive, isn't it." (July 31st, 2025)
This pattern showed up everywhere in my data. I've spent years obsessing over the perfect productivity system, the ideal writing app, the optimal journaling method. I mentioned 50+ different productivity tools across my journals. I averaged 2 or 3 months of commitment to each new digital tool before abandoning it for something else.
But here's the beautiful irony: my actual writing happened regardless of the system. The words came when they came, usually in spite of my elaborate organizational schemes, not because of them.
The analysis revealed that I'm someone who thinks systematically but creates intuitively. The systems provide psychological comfort, but the creation happens in the spaces between systems.
What the Data Can't Tell You (But Writing Can)
Of course, algorithms can only reveal so much. They can tell you that "content" is my most frequent emotional word (474 times), but they can't tell you what contentment feels like when you're twenty-nine years old, sitting in a basement bedroom in Calgary's summer heat, finally understanding that you don't need to solve yourself --- you just need to meet yourself where you are.
They can tell you that I write about "time" constantly, but they can't capture the specific quality of 4AM when the world feels both infinite and intimate, when you're the only person awake and the words come faster than you can type them.
They can reveal that I use increasingly complex punctuation patterns --- em-dashes, semicolons, ellipses --- but they can't explain how punctuation becomes a kind of breathing, a way of creating space and silence in the rush of language.
The algorithm found that I mention "legacy" 156 times across fourteen years, but it can't capture the specific terror and hope of wondering whether your words will matter to anyone after you're gone.
This is why we still need the human element. The data reveals the skeleton; the writing provides the flesh, blood, and breath.
Why You Should Start (If You Haven't Already)
If you're a writer who doesn't journal, you're missing out on the most powerful tool for understanding your own creative process. Not because journaling makes you a better writer (though it might), but because it creates a long-term experiment in consciousness that can teach you things about yourself you could never learn any other way.
Start simple. Don't worry about systems or apps or productivity optimization. Just write down what you're thinking. Every day, if you can. But don't make it a moral obligation --- make it a scientific experiment.
One of my favourite proverbs is that the best time to plant a tree was ten years ago. The second best time is now.
If you're going to write anyway, you might as well document the mind that's doing the writing. Fourteen years from now, you'll thank yourself for the data.
And who knows? Maybe you'll discover something about consciousness itself that changes how you understand what it means to be human.
The scripts are still running. The patterns are still emerging. The story is still being written.
"I am the sole record-keeper of this myth --- this archive is the only thing that leaves meaning, expanding slowly like newborn lungs." (April 2nd, 2022)
The data doesn't lie. But it also doesn't tell the whole truth.
That's why we still need writers.
Appendix
Script and Analysis
For those interested, here are links to the full JSON analysis and Python script.
My Previous Articles on Journal Writing
Rules of Journal Writing Learned After 5 Years
What I've Learned from Journal Writing for 10 Years
Want to Support Me?
If you've read this far, consider buying one of my books or a coffee on Ko-fi. Thank you so much!
Top comments (0)