PEPPERCORN

Posted on May 29 • Edited on Jul 25

[Day 9] A local Japanese sentiment AI (BERT) read 8 years of a LINE chat, and the ups and downs surfaced from numbers alone

#localllm #ai #dgxspark #privacy

Intro

Day 9. Today is less about model internals and more of a personal experiment: have a local AI analyze the entire chat history with one LINE friend. (LINE is the dominant messaging app in Japan.)

When I exported it, 8 years were sitting there — from the very first message to today. It started, we talked a lot, it went quiet for a while, then picked up again. That whole arc is in there.

Because the content is what it is, nothing left my machine: everything ran locally on my DGX Spark.

What I used: my home AI box (DGX Spark) + a Japanese sentiment model (for tone) + a bigger local model (to guess events from numbers).

Today's setup

What I wanted to do

Re-reading 8 years of messages one by one isn't realistic. So instead of reading the content, I looked only at the "shape" of the conversation — when, how much, and in what tone we talked.

Concretely:

monthly message volume
the trend of tone (positive / negative)
then asking an AI to find "when something big happened"

Heads-up (the result)

From message counts and tone alone, the 8-year arc came out clearly on a chart. Started, went quiet, came back — the flow was visible without me re-reading a thing.

🔧 Pipeline

LINE chat export (text)
        │
        ▼
 1. Parse: split each message into {datetime, who, type, text}
        │   (from here on, message text never leaves the machine)
        ▼
 2. Aggregate: monthly counts, time-of-day, reply gaps
        │
        ▼
 3. Tone scoring: classify each of 66k messages pos/neu/neg
        │
        ▼
 4. Turning-point detection: from sudden changes in the numbers
        │   + also show ONLY the numbers to a bigger AI and ask it to guess
        ▼
 5. Answer check: compare against the real timeline

You can export a LINE chat as text from the chat screen ("send chat history").

Data size:

Item	Value
Span	~8 years 2 months
Total messages	87,621
Text messages	66,329
Stickers	15,605
Photos	3,982

15,605 stickers… that's a lot.

The two AIs

Step	Model	What it does	What it sees
3. Tone	Japanese sentiment model (`koheiduck/bert-japanese-finetuned-sentiment`)	scores each message pos/neu/neg	66k message texts (scores averaged per month)
4. Turning points	a bigger local model (`Qwen2.5` 72B)	guesses "what happened to these two?"	only the per-month table of counts + tone scores (no conversation, no words)

Both run locally on my own machine.

📊 Results

The 8-year arc of volume and tone

This chart is the highlight. Top: monthly message count. Bottom: tone (up = positive, down = negative). The x-axis is months since the conversation started. (Axis labels are in Japanese.)

Plotted, it isn't a steady climb or a flat line — it splits cleanly into "chapters": ramp-up → an 8-month silence → a second peak → a stable plateau. Four phases, at a glance.

Tone has two peaks of about +0.6, around the start and around when things resumed (overall mean ≈ 0, slightly negative in the later years). The interesting part: in the month before the silence, tone had already dropped to −0.1. The mood dimmed before the volume did.

There are two dips into negative tone. The one before the silence was an "omen." The other is the recent years — not an omen, but the effect of logistics-y messages ("what time are you home?") piling up.

💡 Mini-note: how is "tone" turned into a number?
The scoring is done by a Japanese sentiment model. Roughly:

pre-trained on lots of Japanese text labeled positive / negative

judges with context, not just by spotting keywords

returns a probability of "positive-ness" / "negative-ness" per message

I used the difference as a per-message score

What kinds of messages scored how?

A few actual judgments (short, name- and place-free one-liners):

Message	Verdict
「楽しかったね！」 (that was fun!)	Positive
「これめちゃうまい」 (this is so good)	Positive
「おはようございます」 (good morning)	Neutral
「もうお家？」 (home already?)	Neutral
「全く集中できない」 (can't focus at all)	Negative
「それは悔しいな、、」 (that's frustrating…)	Negative
(a long trip-planning message)	Neutral
(a snappy one-liner sent in a huff)	Negative

Plain happy lines score positive; logistics ("good morning", "home already?") score neutral; tiredness or irritation scores negative. Even long, businesslike planning messages lean neutral.

Mornings are when we talk

Message density by weekday × hour (brighter = more).

A clear concentration at 7–9 a.m.!

Could the AI guess the turning points?

First, the simple method: mechanically pick the points where message volume jumped or dropped, then check against the real timeline.

Real event	Auto-detected timing
When it started	exact match
When it went quiet	exact match
When it resumed	exact match
When it got lively again	a few months off
A big life milestone	hard to detect (barely shows in counts)

Sharp volume changes were nailed. But "a big life milestone" got missed. So I showed the same numbers to the bigger local model and asked "what happened?" — and got back:

"around when it started" → roughly matches
"a stretch of going silent" → matches the quiet period
"a major life change" → almost exactly before the real milestone

Rather than hunting for a single spike, it reads the whole sequence of numbers as a "flow," so it could pick up even an event that barely moves the counts.

💡 Takeaways

1. Volume + tone alone reveal the arc

Counts and tone were enough to see the 8-year shape. Silence marks the quiet stretch; a surge marks the resumption — straight off the chart.

2. A local model reads a story out of numbers

Given only monthly numbers, the model inferred even a barely-visible event ("something big around here"), and it lined up with reality. It connects scattered points into one flow.

3. A "negative" tone doesn't mean a bad relationship

The slight negative lean in later years isn't about getting along badly. Logistics messages ("what time are you home?") just don't score high. Low score ≠ trouble. It isn't that sentiment analysis is poor — the scores need to be read together with context.

🛠️ Technical details

Parsing & aggregation

LINE export format is a date header plus time<TAB>name<TAB>text. Multi-line messages (4,987 of them) are merged back into the previous message.
Speakers normalized to "A / B" by message count (no real names in anything public). Temporary group members and system lines excluded.
Messages tagged by type (text / sticker / photo / call / unsent…). Tone uses text only; volume counts use all types.
Aggregation and plotting in Python (pandas / matplotlib).

Tone (sentiment)

koheiduck/bert-japanese-finetuned-sentiment, a 3-class (pos / neu / neg) Japanese model.
66,329 texts scored on GPU in batches; per message I take P(pos) − P(neg) in [−1, +1], then average per month.

Turning-point detection

Rule-based: long near-zero stretches (silence), large month-over-month surges, and tone peaks — all from numbers only.
Plus: the per-month table of counts + tone scores fed to a bigger local model (Qwen2.5-72B via ollama) to guess events. No message text was given.
Real event dates were kept in a local note only, used for annotation and the answer check.

Privacy

Every file containing message text (raw export, parsed data, scores) stays in a non-public folder.
Only aggregate numbers and charts are published. The chart x-axis is relativized to "months since the conversation started," hiding actual dates.
Apart from a few short, name- and place-free one-liners shown as scoring examples, no conversation content, real names, specific dates, or long text appears in the article or charts.

Tomorrow: Day 10

Weather forecasts say one temperature, but everyone feels it differently. Same degrees, different "do I need a coat?" So next I'm building my own personal "weather officer" AI: from past weather data, it'll tell me each morning something like "coat + beanie today." Over the next 100 days I'll teach it my own sense of cold — the start of a longer project.

100ExperimentsWithDGX #LocalLLM

DEV Community