Intro
Day 9. Today is less about model internals and more of a personal experiment: have a local AI analyze the entire chat history with one LINE friend. (LINE is the dominant messaging app in Japan.)
When I exported it, 8 years were sitting there — from the very first message to today. It started, we talked a lot, it went quiet for a while, then picked up again. That whole arc is in there.
Because the content is what it is, nothing left my machine: everything ran locally on my DGX Spark.
What I used: my home AI box (DGX Spark) + a Japanese sentiment model (for tone) + a bigger local model (to guess events from numbers).
Today's setup
What I wanted to do
Re-reading 8 years of messages one by one isn't realistic. So instead of reading the content, I looked only at the "shape" of the conversation — when, how much, and in what tone we talked.
Concretely:
- monthly message volume
- the trend of tone (positive / negative)
- then asking an AI to find "when something big happened"
Heads-up (the result)
From message counts and tone alone, the 8-year arc came out clearly on a chart. Started, went quiet, came back — the flow was visible without me re-reading a thing.
🔧 Pipeline
LINE chat export (text)
│
▼
1. Parse: split each message into {datetime, who, type, text}
│ (from here on, message text never leaves the machine)
▼
2. Aggregate: monthly counts, time-of-day, reply gaps
│
▼
3. Tone scoring: classify each of 66k messages pos/neu/neg
│
▼
4. Turning-point detection: from sudden changes in the numbers
│ + also show ONLY the numbers to a bigger AI and ask it to guess
▼
5. Answer check: compare against the real timeline
You can export a LINE chat as text from the chat screen ("send chat history").
Data size:
| Item | Value |
|---|---|
| Span | ~8 years 2 months |
| Total messages | 87,621 |
| Text messages | 66,329 |
| Stickers | 15,605 |
| Photos | 3,982 |
15,605 stickers… that's a lot.
The two AIs
| Step | Model | What it does | What it sees |
|---|---|---|---|
| 3. Tone | Japanese sentiment model (koheiduck/bert-japanese-finetuned-sentiment) |
scores each message pos/neu/neg | 66k message texts (scores averaged per month) |
| 4. Turning points | a bigger local model (Qwen2.5 72B) |
guesses "what happened to these two?" | only the per-month table of counts + tone scores (no conversation, no words) |
Both run locally on my own machine.
📊 Results
The 8-year arc of volume and tone
This chart is the highlight. Top: monthly message count. Bottom: tone (up = positive, down = negative). The x-axis is months since the conversation started. (Axis labels are in Japanese.)
Plotted, it isn't a steady climb or a flat line — it splits cleanly into "chapters": ramp-up → an 8-month silence → a second peak → a stable plateau. Four phases, at a glance.
Tone has two peaks of about +0.6, around the start and around when things resumed (overall mean ≈ 0, slightly negative in the later years). The interesting part: in the month before the silence, tone had already dropped to −0.1. The mood dimmed before the volume did.
There are two dips into negative tone. The one before the silence was an "omen." The other is the recent years — not an omen, but the effect of logistics-y messages ("what time are you home?") piling up.
💡 Mini-note: how is "tone" turned into a number?
The scoring is done by a Japanese sentiment model. Roughly:
- pre-trained on lots of Japanese text labeled positive / negative
- judges with context, not just by spotting keywords
- returns a probability of "positive-ness" / "negative-ness" per message
- I used the difference as a per-message score
What kinds of messages scored how?
A few actual judgments (short, name- and place-free one-liners):
| Message | Verdict |
|---|---|
| 「楽しかったね!」 (that was fun!) | Positive |
| 「これめちゃうまい」 (this is so good) | Positive |
| 「おはようございます」 (good morning) | Neutral |
| 「もうお家?」 (home already?) | Neutral |
| 「全く集中できない」 (can't focus at all) | Negative |
| 「それは悔しいな、、」 (that's frustrating…) | Negative |
| (a long trip-planning message) | Neutral |
| (a snappy one-liner sent in a huff) | Negative |
Plain happy lines score positive; logistics ("good morning", "home already?") score neutral; tiredness or irritation scores negative. Even long, businesslike planning messages lean neutral.
Mornings are when we talk
Message density by weekday × hour (brighter = more).
A clear concentration at 7–9 a.m.!
Could the AI guess the turning points?
First, the simple method: mechanically pick the points where message volume jumped or dropped, then check against the real timeline.
| Real event | Auto-detected timing |
|---|---|
| When it started | exact match |
| When it went quiet | exact match |
| When it resumed | exact match |
| When it got lively again | a few months off |
| A big life milestone | hard to detect (barely shows in counts) |
Sharp volume changes were nailed. But "a big life milestone" got missed. So I showed the same numbers to the bigger local model and asked "what happened?" — and got back:
- "around when it started" → roughly matches
- "a stretch of going silent" → matches the quiet period
- "a major life change" → almost exactly before the real milestone
Rather than hunting for a single spike, it reads the whole sequence of numbers as a "flow," so it could pick up even an event that barely moves the counts.
💡 Takeaways
1. Volume + tone alone reveal the arc
Counts and tone were enough to see the 8-year shape. Silence marks the quiet stretch; a surge marks the resumption — straight off the chart.
2. A local model reads a story out of numbers
Given only monthly numbers, the model inferred even a barely-visible event ("something big around here"), and it lined up with reality. It connects scattered points into one flow.
3. A "negative" tone doesn't mean a bad relationship
The slight negative lean in later years isn't about getting along badly. Logistics messages ("what time are you home?") just don't score high. Low score ≠ trouble. It isn't that sentiment analysis is poor — the scores need to be read together with context.
🛠️ Technical details
Parsing & aggregation
- LINE export format is a date header plus
time<TAB>name<TAB>text. Multi-line messages (4,987 of them) are merged back into the previous message. - Speakers normalized to "A / B" by message count (no real names in anything public). Temporary group members and system lines excluded.
- Messages tagged by type (text / sticker / photo / call / unsent…). Tone uses text only; volume counts use all types.
- Aggregation and plotting in Python (pandas / matplotlib).
Tone (sentiment)
-
koheiduck/bert-japanese-finetuned-sentiment, a 3-class (pos / neu / neg) Japanese model. - 66,329 texts scored on GPU in batches; per message I take P(pos) − P(neg) in [−1, +1], then average per month.
Turning-point detection
- Rule-based: long near-zero stretches (silence), large month-over-month surges, and tone peaks — all from numbers only.
- Plus: the per-month table of counts + tone scores fed to a bigger local model (Qwen2.5-72B via ollama) to guess events. No message text was given.
- Real event dates were kept in a local note only, used for annotation and the answer check.
Privacy
- Every file containing message text (raw export, parsed data, scores) stays in a non-public folder.
- Only aggregate numbers and charts are published. The chart x-axis is relativized to "months since the conversation started," hiding actual dates.
- Apart from a few short, name- and place-free one-liners shown as scoring examples, no conversation content, real names, specific dates, or long text appears in the article or charts.
Tomorrow: Day 10
Weather forecasts say one temperature, but everyone feels it differently. Same degrees, different "do I need a coat?" So next I'm building my own personal "weather officer" AI: from past weather data, it'll tell me each morning something like "coat + beanie today." Over the next 100 days I'll teach it my own sense of cold — the start of a longer project.


Top comments (0)