DEV Community

Cover image for [Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements
PEPPERCORN
PEPPERCORN

Posted on

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

Intro

Day 3: I'm going to hand a year of credit card statements over to a local LLM and see what it can do.

This is experiment #3.

What I'm using today: DGX Spark + Ollama + Qwen2.5 (comparing 7B vs 72B). Ollama is the de-facto local-LLM runtime, and Qwen2.5 is a multilingual model from Alibaba (China) that handles Japanese reasonably well, apparently.


Today's setup

  • Data: 12 months of credit card statements from a single card.
  • Volume: 383 transactions, ¥2,761,555 in total spend.
  • Goal: get the AI to spot waste patterns and propose savings.
  • Comparison axes:
    • Model size: 7B (light) vs 72B (heavy)
    • Input format: raw CSV vs pandas-aggregated summary
    • 4 patterns total

Takeaway: "If you ask an AI to aggregate raw data, the numbers come out way off." / "If you pre-aggregate with a spreadsheet tool first and then feed the AI, you get fast and accurate results." A small but practical finding.


1. Get the CSVs onto the DGX

Log into the credit card company's web statements page on myPC1 (my Windows laptop), download 12 months of CSVs, then push them to the DGX.

I deliberately skipped GitHub for the transfer this time — once you push something, it's in the history forever, and credit card data shouldn't be there even briefly. Instead, I used direct PC-to-PC transfer over SSH (one command, finishes in seconds; details in the collapsibles at the end). The .gitignore excludes private-data/ too, so accidental commits are ruled out.


2. Install Ollama

Ollama is the de-facto runtime for local LLMs. One command should be enough.

There was a small password hiccup during install (details below), but eventually it was up and running.

The DGX Spark specs really show through:

  • Memory: 121 GB
  • Default context window: ~262,144 tokens

In other words: "throw a whole book at it, no problem" territory. Reassuring.


3. Two model sizes: Qwen2.5 7B vs 72B

The strategy: same model family, different sizes. That way the differences come from size, not architecture.

  • 7B (light): ~4.7 GB, downloads in 5 minutes. Fast.
  • 72B (heavy): ~47 GB, 25 minutes to download. Slow but smart.

What does "B" mean? Short for Billion. It's the number of "weights" inside the AI — more weights, more it remembers, basically. So 7B has 7 billion weights, 72B has 72 billion.

Loading both onto the DGX simultaneously, memory usage looks like:

AI model Memory occupied
qwen2.5:72b 61 GB
qwen2.5:7b 8.2 GB
Total 69 GB

69 GB. Spacious!


4. Prepping the CSVs

Once I had the CSVs in hand, three small headaches before they were ready for the AI:

  • Headache 1: An older encoding (Windows Japanese flavor) → needs converting to modern UTF-8
  • Headache 2: Some merchant names contain commas, which breaks naive CSV parsing
  • Headache 3: Each file has a "monthly total" line at the end that isn't actually data

Details in the collapsible. After cleanup, the 12 files merge into a single dataset:

Item Value
Transactions 383
Period 12 months (1 year)
Total spend ¥2,761,555
Avg per tx ¥7,210
Median per tx ¥3,000
Largest single tx ¥209,283 (overseas flight)
Smallest ¥-3,980 (refund)

Now to feed this to 7B and 72B and see what each of them says.


5. Experiment 1: Throw the raw CSV at the AI

No tricks: all 383 rows, straight at the AI. Prompt is the full ask: "As a household budget consultant, output category breakdown / monthly trend / waste patterns / savings suggestions / lifestyle hypothesis."

7B's answer (75 seconds)

...this is where the numbers go wildly off.

Item What 7B said Real data Match?
Amazon total ¥2,014,386 (257 tx) ¥693,663 (166 tx)
Amazon Downloads ¥2,014,386 (257 tx) ¥80,323 (50 tx)
Outdoor brand ¥495,740 ¥154,820
A local recreation venue "¥49,574" cited (a different small charge actually exists)

None of the numbers line up. Amazon total is roughly 3× off, Amazon Downloads about 25× off, and the cited venue context is a different charge entirely.

Reading 383 rows of CSV and computing totals turned out to be a heavy lift for the 7B model.

72B's answer (12m 9s)

What if we throw size at the problem? After 12 minutes of patience:

Item What 72B said Real data Match?
Amazon total ¥635,792 (104 tx) ¥693,663 (166 tx)
AI/dev tools ¥193,629 (21 tx) ¥176,850 (24 tx)
Travel ¥487,555 (43 tx) ¥416,268 (8 tx)

Not exact, but the off-by amounts are within ~10%, and there are no fabricated venues. A real improvement.

However — when asked about the monthly trend, here's what 72B said:

Month 1: ¥316,789 → Month 2: ¥229,600 → Month 3: ¥237,500 → ... → Month 12: ¥291,500
(Gradually increasing.)

The actual range is ¥69,961 (low) to ¥493,072 (high) — a chaotic up-and-down waveform. "Gradually increasing" isn't quite right. Even 72B isn't great at aggregating distributed data over a long CSV.


6. Experiment 2: Aggregate first, then feed the AI

If the AI struggles with aggregation, do the aggregation in a different tool first and only hand the AI the result.

The flow:

📥 Raw CSV (22,132 chars, 383 rows)
       ↓
🔧 Pre-aggregate with a spreadsheet tool (Python's pandas)
       ↓
📋 Aggregate summary (1,884 chars, ~90% smaller)
       ↓
🤖 Hand it to the AI (let it interpret and propose)
Enter fullscreen mode Exit fullscreen mode

Python's pandas = a spreadsheet-like library, but ~10,000× more powerful than Excel functions, used for tabular data analysis.

7B + pre-aggregated input (50 seconds)

Numbers are fully accurate now.

Item What 7B said Real data Match?
Amazon total ¥693,663 ¥693,663
AI/dev tools ¥176,850 ¥176,850
Monthly max ¥493,072 ¥493,072
Monthly min ¥69,961 ¥69,961

Quoting straight from the pre-aggregated numbers, the hallucinations vanished.

And 7B did this in 50 seconds — better quality than the 72B + raw CSV at 12 minutes. Quietly remarkable.

Before (raw CSV) After (aggregated)
Time 75s 50s
Numbers wildly off exact
Verdict not usable as-is quote directly

72B + pre-aggregated input (12m 13s)

72B's numbers also match exactly (well, since they're being quoted from pre-aggregated data, that's expected). The proposal quality was the strongest of the four patterns:

Reduce Amazon dependency

  • Current: online shopping (Amazon family) is 25.1% of total (¥693,663).
  • Suggestion: stick to essentials only, regular review, avoid impulse buys.
  • Expected savings: ¥57,805/month average (25% reduction) → ¥693,660/year

...wait, hold on. Annual Amazon spend was ¥693,663. The "savings" 72B suggests is ¥693,660. That's basically the same number. So the proposal is effectively "stop buying on Amazon entirely (100%)" — definitely not 25%. Apparently 72B's percentage arithmetic isn't bulletproof either.

That aside, the lifestyle hypothesis section was kind of striking. Here's what 72B observed:

  • Heavy reliance on apps and subscriptions: "App/subscription" category is 10.5% of total
  • Frequent international travel: "Travel/airline" is 15.1%, with notable overseas charges
  • Frequent online shopping: "Online (Amazon)" is 25.1% of total

It's just one card's data, so this isn't a complete picture — but if I fed an AI my full household financials, the analysis and advice would probably go a lot deeper.

Summary: 4 patterns

# Model Input Time Numerical accuracy Proposal quality
1 7B Raw CSV 75s ❌ Numbers way off
2 72B Raw CSV 12m 9s △ Misread monthly trend
3 7B Aggregated 50s ✅ Exact ○ Some repetition
4 72B Aggregated 12m 13s ✅ Exact ◎ Best (mind the % math)

Quietly notable: 72B takes ~12 minutes regardless of input size (shrinking the prompt didn't change wall-clock time much). Output generation is the bottleneck. Which strengthens the case for "small model + pre-aggregate" as the cost-effective default.


7. Cross-check: the actual graphs

Before trusting any of the AI output, let me put the real numbers on charts using the spreadsheet tool (pandas).

Monthly spending

Monthly spending

Average ¥230,130/month, but the range is ¥69,961 (lowest) to ¥493,072 (highest) — about a 7× spread. The 72B's "gradually increasing" claim was a bit off the mark; the reality is bouncy.

Category share

Categories

"Other" being 32% is because my categorization rule is sloppy. I just wrote a simple "if the merchant name contains keyword X, bucket Y" rule, and lots of merchants didn't match any keyword and ended up in "Other." Reading meaning from a merchant name is exactly the kind of thing AI is good at, so next time I'll let the AI do the categorization itself.

Top 15 merchants

Top merchants

Amazon at ¥421,978 (105 tx) is far and away #1. Amazon really is too convenient...

Weekday rhythm

Weekday pattern

Tuesday alone is ¥692,549 — way above the rest. Probably because that's when most of the subscription auto-charges land.


8. Today's takeaways

Separate "aggregation" from "interpretation"

AI is bad at AI is good at
Multi-row sum/average (numbers go wildly off) Categorization (interpreting fuzzy meaning)
Percentage math (saw "25% off → 100% off") Pattern recognition / hypothesis generation
Distributed aggregation like monthly totals Narrative interpretation, savings proposals

Aggregation is the spreadsheet tool's job; interpretation is the AI's. When you split the work, things go fast and accurate. "Data prep matters before analysis" — yeah, that old saying really is true. Note to self.

Sometimes input quality beats raw size

"7B + pre-aggregated input in 50 seconds" outperformed "72B + raw CSV in 12 minutes". Sometimes you don't need a bigger model — you need cleaner input. Felt that one today.

The local-LLM angle

Feeding 12 months of raw credit card data to an AI without a single byte going to the cloud — it was surprisingly stress-free. This is one of the spots local LLMs really shine. Got personal info, or anything cloud-uncomfortable? This is the place for them.


9. Tech details (Claude explains)

The technical bits, written up by my AI pair.

  1. SCP transfer to the DGX (mDNS, no IP needed)

NVIDIA Sync auto-configures a Host alias in ~/AppData/Local/NVIDIA Corporation/Sync/config/ssh_config:

Host spark-XXXX.local
  Hostname spark-XXXX.local
  User [user]
  Port 22
  IdentityFile "...\\nvsync.key"
Enter fullscreen mode Exit fullscreen mode

Which means I can SSH/SCP using spark-XXXX.local without ever looking up an IP. The .local suffix uses mDNS (Multicast DNS) for hostname resolution within the LAN.

Transfer command (one line, from PowerShell on the Windows side):

scp -r "C:\Users\[user]\Desktop\docs\dgx\csv" spark-XXXX.local:/home/[user]/personal/dgx-100-experiments/private-data/credit-card-csv
Enter fullscreen mode Exit fullscreen mode
  1. Ollama install + the sudo-TTY catch + GPU detection log

Ollama install:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Running this through Claude Code's Bash, it errored at the sudo password prompt — an interactive TTY is required there:

sudo: a terminal is required to read the password
Enter fullscreen mode Exit fullscreen mode

Reopened a separate SSH session, ran the same command manually, and it went through.

Once installed, systemd auto-starts the service. The GPU detection log via journalctl -u ollama:

inference compute id=GPU-986c194b... name=CUDA0 description="NVIDIA GB10"
total="121.7 GiB" available="79.0 GiB"
default_num_ctx=262144
Enter fullscreen mode Exit fullscreen mode
  • VRAM (DGX Spark unified memory): 121.7 GiB
  • Default context: 262,144 tokens

Compared with a typical RTX 4090 (24 GB VRAM, 8K–32K default context), the gap is significant.

  1. Loading both models simultaneously
ollama pull qwen2.5:7b   # 4.7 GB
ollama pull qwen2.5:72b  # 47 GB
Enter fullscreen mode Exit fullscreen mode

After loading both, ollama ps shows:

NAME           SIZE      PROCESSOR    CONTEXT    
qwen2.5:72b    61 GB     100% GPU     32768
qwen2.5:7b     8.2 GB    100% GPU     32768
Enter fullscreen mode Exit fullscreen mode

Total ~69 GB used out of 79 GB available. Both models stay resident, switching between them is instant.

  1. Custom CSV parser for the credit card data

Three quirks needed handling: CP932 encoding, no quotes (commas in some merchant names break parsing), and a trailing summary row in each file.

def parse_line(line: str) -> list[str] | None:
    fields = line.rstrip("\r\n").split(",")
    if len(fields) < 7 or not fields[0]:
        return None  # skip blank/summary rows
    if len(fields) > 7:
        merchant = ",".join(fields[1:-5])
        fields = [fields[0], merchant] + fields[-5:]
    return fields


def load_one(path: Path) -> pd.DataFrame:
    rows = []
    with path.open(encoding="cp932") as f:
        next(f)  # skip header (cardholder metadata)
        for line in f:
            parsed = parse_line(line)
            if parsed is not None:
                rows.append(parsed)
    df = pd.DataFrame(rows, columns=COLUMNS)
    df["利用日"] = pd.to_datetime(df["利用日"], format="%Y/%m/%d")
    df["利用金額"] = df["利用金額"].astype(int)
    return df
Enter fullscreen mode Exit fullscreen mode
  1. Japanese fonts in matplotlib

japanize-matplotlib doesn't work on Python 3.12 — it imports distutils, which was removed from the standard library.

The modern replacement is matplotlib-fontja:

pip install matplotlib-fontja
Enter fullscreen mode Exit fullscreen mode
import matplotlib_fontja  # noqa: F401  ← just importing it sets up IPAexGothic
Enter fullscreen mode Exit fullscreen mode
  1. Calling Ollama from Python

The official ollama Python client is straightforward:

import ollama

client = ollama.Client()
stream = client.chat(
    model="qwen2.5:72b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt},
    ],
    options={"temperature": 0.3},
    stream=True,
)
for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Streaming makes long generation easier to watch unfold.


Tomorrow: Day 4

Day 4 plan: let a local AI sort 20,000 iPhone photos.

The actual goal is to have a local image-recognition model (CLIP family?) clean up my photo library so I can stop paying iCloud for storage upgrades...!


100ExperimentsWithDGX #LocalLLM #Ollama

Top comments (0)