[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements
Intro
Day 3: I'm going to hand a year of credit card statements over to a local LLM and see what it can do.
This is experiment #3.
What I'm using today: DGX Spark + Ollama + Qwen2.5 (comparing 7B vs 72B). Ollama is the de-facto local-LLM runtime, and Qwen2.5 is a multilingual model from Alibaba (China) that handles Japanese reasonably well, apparently.
Today's setup
- Data: 12 months of credit card statements from a single card.
- Volume: 383 transactions, ¥2,761,555 in total spend.
- Goal: get the AI to spot waste patterns and propose savings.
-
Comparison axes:
- Model size: 7B (light) vs 72B (heavy)
- Input format: raw CSV vs pandas-aggregated summary
- → 4 patterns total
Takeaway: "If you ask an AI to aggregate raw data, the numbers come out way off." / "If you pre-aggregate with a spreadsheet tool first and then feed the AI, you get fast and accurate results." A small but practical finding.
1. Get the CSVs onto the DGX
Log into the credit card company's web statements page on myPC1 (my Windows laptop), download 12 months of CSVs, then push them to the DGX.
I deliberately skipped GitHub for the transfer this time — once you push something, it's in the history forever, and credit card data shouldn't be there even briefly. Instead, I used direct PC-to-PC transfer over SSH (one command, finishes in seconds; details in the collapsibles at the end). The .gitignore excludes private-data/ too, so accidental commits are ruled out.
2. Install Ollama
Ollama is the de-facto runtime for local LLMs. One command should be enough.
There was a small password hiccup during install (details below), but eventually it was up and running.
The DGX Spark specs really show through:
- Memory: 121 GB
- Default context window: ~262,144 tokens
In other words: "throw a whole book at it, no problem" territory. Reassuring.
3. Two model sizes: Qwen2.5 7B vs 72B
The strategy: same model family, different sizes. That way the differences come from size, not architecture.
- 7B (light): ~4.7 GB, downloads in 5 minutes. Fast.
- 72B (heavy): ~47 GB, 25 minutes to download. Slow but smart.
What does "B" mean? Short for Billion. It's the number of "weights" inside the AI — more weights, more it remembers, basically. So 7B has 7 billion weights, 72B has 72 billion.
Loading both onto the DGX simultaneously, memory usage looks like:
| AI model | Memory occupied |
|---|---|
| qwen2.5:72b | 61 GB |
| qwen2.5:7b | 8.2 GB |
| Total | 69 GB |
69 GB. Spacious!
4. Prepping the CSVs
Once I had the CSVs in hand, three small headaches before they were ready for the AI:
- Headache 1: An older encoding (Windows Japanese flavor) → needs converting to modern UTF-8
- Headache 2: Some merchant names contain commas, which breaks naive CSV parsing
- Headache 3: Each file has a "monthly total" line at the end that isn't actually data
Details in the collapsible. After cleanup, the 12 files merge into a single dataset:
| Item | Value |
|---|---|
| Transactions | 383 |
| Period | 12 months (1 year) |
| Total spend | ¥2,761,555 |
| Avg per tx | ¥7,210 |
| Median per tx | ¥3,000 |
| Largest single tx | ¥209,283 (overseas flight) |
| Smallest | ¥-3,980 (refund) |
Now to feed this to 7B and 72B and see what each of them says.
5. Experiment 1: Throw the raw CSV at the AI
No tricks: all 383 rows, straight at the AI. Prompt is the full ask: "As a household budget consultant, output category breakdown / monthly trend / waste patterns / savings suggestions / lifestyle hypothesis."
7B's answer (75 seconds)
...this is where the numbers go wildly off.
| Item | What 7B said | Real data | Match? |
|---|---|---|---|
| Amazon total | ¥2,014,386 (257 tx) | ¥693,663 (166 tx) | ❌ |
| Amazon Downloads | ¥2,014,386 (257 tx) | ¥80,323 (50 tx) | ❌ |
| Outdoor brand | ¥495,740 | ¥154,820 | ❌ |
| A local recreation venue | "¥49,574" cited | (a different small charge actually exists) | ❌ |
None of the numbers line up. Amazon total is roughly 3× off, Amazon Downloads about 25× off, and the cited venue context is a different charge entirely.
Reading 383 rows of CSV and computing totals turned out to be a heavy lift for the 7B model.
72B's answer (12m 9s)
What if we throw size at the problem? After 12 minutes of patience:
| Item | What 72B said | Real data | Match? |
|---|---|---|---|
| Amazon total | ¥635,792 (104 tx) | ¥693,663 (166 tx) | ❌ |
| AI/dev tools | ¥193,629 (21 tx) | ¥176,850 (24 tx) | ❌ |
| Travel | ¥487,555 (43 tx) | ¥416,268 (8 tx) | ❌ |
Not exact, but the off-by amounts are within ~10%, and there are no fabricated venues. A real improvement.
However — when asked about the monthly trend, here's what 72B said:
Month 1: ¥316,789 → Month 2: ¥229,600 → Month 3: ¥237,500 → ... → Month 12: ¥291,500
(Gradually increasing.)
The actual range is ¥69,961 (low) to ¥493,072 (high) — a chaotic up-and-down waveform. "Gradually increasing" isn't quite right. Even 72B isn't great at aggregating distributed data over a long CSV.
6. Experiment 2: Aggregate first, then feed the AI
If the AI struggles with aggregation, do the aggregation in a different tool first and only hand the AI the result.
The flow:
📥 Raw CSV (22,132 chars, 383 rows)
↓
🔧 Pre-aggregate with a spreadsheet tool (Python's pandas)
↓
📋 Aggregate summary (1,884 chars, ~90% smaller)
↓
🤖 Hand it to the AI (let it interpret and propose)
Python's pandas = a spreadsheet-like library, but ~10,000× more powerful than Excel functions, used for tabular data analysis.
7B + pre-aggregated input (50 seconds)
Numbers are fully accurate now.
| Item | What 7B said | Real data | Match? |
|---|---|---|---|
| Amazon total | ¥693,663 | ¥693,663 | ✅ |
| AI/dev tools | ¥176,850 | ¥176,850 | ✅ |
| Monthly max | ¥493,072 | ¥493,072 | ✅ |
| Monthly min | ¥69,961 | ¥69,961 | ✅ |
Quoting straight from the pre-aggregated numbers, the hallucinations vanished.
And 7B did this in 50 seconds — better quality than the 72B + raw CSV at 12 minutes. Quietly remarkable.
| Before (raw CSV) | After (aggregated) | |
|---|---|---|
| Time | 75s | 50s |
| Numbers | wildly off | exact |
| Verdict | not usable as-is | quote directly |
72B + pre-aggregated input (12m 13s)
72B's numbers also match exactly (well, since they're being quoted from pre-aggregated data, that's expected). The proposal quality was the strongest of the four patterns:
Reduce Amazon dependency
- Current: online shopping (Amazon family) is 25.1% of total (¥693,663).
- Suggestion: stick to essentials only, regular review, avoid impulse buys.
- Expected savings: ¥57,805/month average (25% reduction) → ¥693,660/year
...wait, hold on. Annual Amazon spend was ¥693,663. The "savings" 72B suggests is ¥693,660. That's basically the same number. So the proposal is effectively "stop buying on Amazon entirely (100%)" — definitely not 25%. Apparently 72B's percentage arithmetic isn't bulletproof either.
That aside, the lifestyle hypothesis section was kind of striking. Here's what 72B observed:
- Heavy reliance on apps and subscriptions: "App/subscription" category is 10.5% of total
- Frequent international travel: "Travel/airline" is 15.1%, with notable overseas charges
- Frequent online shopping: "Online (Amazon)" is 25.1% of total
It's just one card's data, so this isn't a complete picture — but if I fed an AI my full household financials, the analysis and advice would probably go a lot deeper.
Summary: 4 patterns
| # | Model | Input | Time | Numerical accuracy | Proposal quality |
|---|---|---|---|---|---|
| 1 | 7B | Raw CSV | 75s | ❌ Numbers way off | △ |
| 2 | 72B | Raw CSV | 12m 9s | △ Misread monthly trend | ○ |
| 3 | 7B | Aggregated | 50s | ✅ Exact | ○ Some repetition |
| 4 | 72B | Aggregated | 12m 13s | ✅ Exact | ◎ Best (mind the % math) |
Quietly notable: 72B takes ~12 minutes regardless of input size (shrinking the prompt didn't change wall-clock time much). Output generation is the bottleneck. Which strengthens the case for "small model + pre-aggregate" as the cost-effective default.
7. Cross-check: the actual graphs
Before trusting any of the AI output, let me put the real numbers on charts using the spreadsheet tool (pandas).
Monthly spending
Average ¥230,130/month, but the range is ¥69,961 (lowest) to ¥493,072 (highest) — about a 7× spread. The 72B's "gradually increasing" claim was a bit off the mark; the reality is bouncy.
Category share
"Other" being 32% is because my categorization rule is sloppy. I just wrote a simple "if the merchant name contains keyword X, bucket Y" rule, and lots of merchants didn't match any keyword and ended up in "Other." Reading meaning from a merchant name is exactly the kind of thing AI is good at, so next time I'll let the AI do the categorization itself.
Top 15 merchants
Amazon at ¥421,978 (105 tx) is far and away #1. Amazon really is too convenient...
Weekday rhythm
Tuesday alone is ¥692,549 — way above the rest. Probably because that's when most of the subscription auto-charges land.
8. Today's takeaways
Separate "aggregation" from "interpretation"
| AI is bad at | AI is good at |
|---|---|
| Multi-row sum/average (numbers go wildly off) | Categorization (interpreting fuzzy meaning) |
| Percentage math (saw "25% off → 100% off") | Pattern recognition / hypothesis generation |
| Distributed aggregation like monthly totals | Narrative interpretation, savings proposals |
→ Aggregation is the spreadsheet tool's job; interpretation is the AI's. When you split the work, things go fast and accurate. "Data prep matters before analysis" — yeah, that old saying really is true. Note to self.
Sometimes input quality beats raw size
"7B + pre-aggregated input in 50 seconds" outperformed "72B + raw CSV in 12 minutes". Sometimes you don't need a bigger model — you need cleaner input. Felt that one today.
The local-LLM angle
Feeding 12 months of raw credit card data to an AI without a single byte going to the cloud — it was surprisingly stress-free. This is one of the spots local LLMs really shine. Got personal info, or anything cloud-uncomfortable? This is the place for them.
9. Tech details (Claude explains)
The technical bits, written up by my AI pair.
- SCP transfer to the DGX (mDNS, no IP needed)
NVIDIA Sync auto-configures a Host alias in ~/AppData/Local/NVIDIA Corporation/Sync/config/ssh_config:
Host spark-XXXX.local
Hostname spark-XXXX.local
User [user]
Port 22
IdentityFile "...\\nvsync.key"
Which means I can SSH/SCP using spark-XXXX.local without ever looking up an IP. The .local suffix uses mDNS (Multicast DNS) for hostname resolution within the LAN.
Transfer command (one line, from PowerShell on the Windows side):
scp -r "C:\Users\[user]\Desktop\docs\dgx\csv" spark-XXXX.local:/home/[user]/personal/dgx-100-experiments/private-data/credit-card-csv
- Ollama install + the sudo-TTY catch + GPU detection log
Ollama install:
curl -fsSL https://ollama.com/install.sh | sh
Running this through Claude Code's Bash, it errored at the sudo password prompt — an interactive TTY is required there:
sudo: a terminal is required to read the password
Reopened a separate SSH session, ran the same command manually, and it went through.
Once installed, systemd auto-starts the service. The GPU detection log via journalctl -u ollama:
inference compute id=GPU-986c194b... name=CUDA0 description="NVIDIA GB10"
total="121.7 GiB" available="79.0 GiB"
default_num_ctx=262144
- VRAM (DGX Spark unified memory): 121.7 GiB
- Default context: 262,144 tokens
Compared with a typical RTX 4090 (24 GB VRAM, 8K–32K default context), the gap is significant.
- Loading both models simultaneously
ollama pull qwen2.5:7b # 4.7 GB
ollama pull qwen2.5:72b # 47 GB
After loading both, ollama ps shows:
NAME SIZE PROCESSOR CONTEXT
qwen2.5:72b 61 GB 100% GPU 32768
qwen2.5:7b 8.2 GB 100% GPU 32768
Total ~69 GB used out of 79 GB available. Both models stay resident, switching between them is instant.
- Custom CSV parser for the credit card data
Three quirks needed handling: CP932 encoding, no quotes (commas in some merchant names break parsing), and a trailing summary row in each file.
def parse_line(line: str) -> list[str] | None:
fields = line.rstrip("\r\n").split(",")
if len(fields) < 7 or not fields[0]:
return None # skip blank/summary rows
if len(fields) > 7:
merchant = ",".join(fields[1:-5])
fields = [fields[0], merchant] + fields[-5:]
return fields
def load_one(path: Path) -> pd.DataFrame:
rows = []
with path.open(encoding="cp932") as f:
next(f) # skip header (cardholder metadata)
for line in f:
parsed = parse_line(line)
if parsed is not None:
rows.append(parsed)
df = pd.DataFrame(rows, columns=COLUMNS)
df["利用日"] = pd.to_datetime(df["利用日"], format="%Y/%m/%d")
df["利用金額"] = df["利用金額"].astype(int)
return df
- Japanese fonts in matplotlib
japanize-matplotlib doesn't work on Python 3.12 — it imports distutils, which was removed from the standard library.
The modern replacement is matplotlib-fontja:
pip install matplotlib-fontja
import matplotlib_fontja # noqa: F401 ← just importing it sets up IPAexGothic
- Calling Ollama from Python
The official ollama Python client is straightforward:
import ollama
client = ollama.Client()
stream = client.chat(
model="qwen2.5:72b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
options={"temperature": 0.3},
stream=True,
)
for chunk in stream:
print(chunk["message"]["content"], end="", flush=True)
Streaming makes long generation easier to watch unfold.
Tomorrow: Day 4
Day 4 plan: let a local AI sort 20,000 iPhone photos.
The actual goal is to have a local image-recognition model (CLIP family?) clean up my photo library so I can stop paying iCloud for storage upgrades...!




Top comments (0)