PEPPERCORN

Posted on May 5 • Edited on May 19

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

#localllm #ai #dgxspark #ollama

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

Intro

Day 3: I'm going to hand a year of credit card statements over to a local LLM and see what it can do.

This is experiment #3.

What I'm using today: DGX Spark + Ollama + Qwen2.5 (comparing 7B vs 72B). Ollama is the de-facto local-LLM runtime, and Qwen2.5 is a multilingual model from Alibaba (China) that handles Japanese reasonably well, apparently.

Today's setup

Data: 12 months of credit card statements from a single card.
Volume: 383 transactions, ¥2,761,555 in total spend.
Goal: get the AI to spot waste patterns and propose savings.
Comparison axes:
- Model size: 7B (light) vs 72B (heavy)
- Input format: raw CSV vs pandas-aggregated summary
- → 4 patterns total

Takeaway: "If you ask an AI to aggregate raw data, the numbers come out way off." / "If you pre-aggregate with a spreadsheet tool first and then feed the AI, you get fast and accurate results." A small but practical finding.

1. Get the CSVs onto the DGX

Log into the credit card company's web statements page on myPC1 (my Windows laptop), download 12 months of CSVs, then push them to the DGX.

I deliberately skipped GitHub for the transfer this time — once you push something, it's in the history forever, and credit card data shouldn't be there even briefly. Instead, I used direct PC-to-PC transfer over SSH (one command, finishes in seconds; details in the collapsibles at the end). The .gitignore excludes private-data/ too, so accidental commits are ruled out.

2. Install Ollama

Ollama is the de-facto runtime for local LLMs. One command should be enough.

There was a small password hiccup during install (details below), but eventually it was up and running.

The DGX Spark specs really show through:

Memory: 121 GB
Default context window: ~262,144 tokens

In other words: "throw a whole book at it, no problem" territory. Reassuring.

3. Two model sizes: Qwen2.5 7B vs 72B

The strategy: same model family, different sizes. That way the differences come from size, not architecture.

7B (light): ~4.7 GB, downloads in 5 minutes. Fast.
72B (heavy): ~47 GB, 25 minutes to download. Slow but smart.

What does "B" mean? Short for Billion. It's the number of "weights" inside the AI — more weights, more it remembers, basically. So 7B has 7 billion weights, 72B has 72 billion.

Loading both onto the DGX simultaneously, memory usage looks like:

AI model	Memory occupied
qwen2.5:72b	61 GB
qwen2.5:7b	8.2 GB
Total	69 GB

69 GB. Spacious!

4. Prepping the CSVs

Once I had the CSVs in hand, three small headaches before they were ready for the AI:

Headache 1: An older encoding (Windows Japanese flavor) → needs converting to modern UTF-8
Headache 2: Some merchant names contain commas, which breaks naive CSV parsing
Headache 3: Each file has a "monthly total" line at the end that isn't actually data

Details in the collapsible. After cleanup, the 12 files merge into a single dataset:

Item	Value
Transactions	383
Period	12 months (1 year)
Total spend	¥2,761,555
Avg per tx	¥7,210
Median per tx	¥3,000
Largest single tx	¥209,283 (overseas flight)
Smallest	¥-3,980 (refund)

Now to feed this to 7B and 72B and see what each of them says.

5. Experiment 1: Throw the raw CSV at the AI

No tricks: all 383 rows, straight at the AI. Prompt is the full ask: "As a household budget consultant, output category breakdown / monthly trend / waste patterns / savings suggestions / lifestyle hypothesis."

7B's answer (75 seconds)

...this is where the numbers go wildly off.

Item	What 7B said	Real data	Match?
Amazon total	¥2,014,386 (257 tx)	¥693,663 (166 tx)	❌
Amazon Downloads	¥2,014,386 (257 tx)	¥80,323 (50 tx)	❌
Outdoor brand	¥495,740	¥154,820	❌
A local recreation venue	"¥49,574" cited	(a different small charge actually exists)	❌

None of the numbers line up. Amazon total is roughly 3× off, Amazon Downloads about 25× off, and the cited venue context is a different charge entirely.

Reading 383 rows of CSV and computing totals turned out to be a heavy lift for the 7B model.

72B's answer (12m 9s)

What if we throw size at the problem? After 12 minutes of patience:

Item	What 72B said	Real data	Match?
Amazon total	¥635,792 (104 tx)	¥693,663 (166 tx)	❌
AI/dev tools	¥193,629 (21 tx)	¥176,850 (24 tx)	❌
Travel	¥487,555 (43 tx)	¥416,268 (8 tx)	❌

Not exact, but the off-by amounts are within ~10%, and there are no fabricated venues. A real improvement.

However — when asked about the monthly trend, here's what 72B said:

Month 1: ¥316,789 → Month 2: ¥229,600 → Month 3: ¥237,500 → ... → Month 12: ¥291,500
(Gradually increasing.)

The actual range is ¥69,961 (low) to ¥493,072 (high) — a chaotic up-and-down waveform. "Gradually increasing" isn't quite right. Even 72B isn't great at aggregating distributed data over a long CSV.

6. Experiment 2: Aggregate first, then feed the AI

If the AI struggles with aggregation, do the aggregation in a different tool first and only hand the AI the result.

The flow:

📥 Raw CSV (22,132 chars, 383 rows)
       ↓
🔧 Pre-aggregate with a spreadsheet tool (Python's pandas)
       ↓
📋 Aggregate summary (1,884 chars, ~90% smaller)
       ↓
🤖 Hand it to the AI (let it interpret and propose)

Python's pandas = a spreadsheet-like library, but ~10,000× more powerful than Excel functions, used for tabular data analysis.

7B + pre-aggregated input (50 seconds)

Numbers are fully accurate now.

Item	What 7B said	Real data	Match?
Amazon total	¥693,663	¥693,663	✅
AI/dev tools	¥176,850	¥176,850	✅
Monthly max	¥493,072	¥493,072	✅
Monthly min	¥69,961	¥69,961	✅

Quoting straight from the pre-aggregated numbers, the hallucinations vanished.

And 7B did this in 50 seconds — better quality than the 72B + raw CSV at 12 minutes. Quietly remarkable.

	Before (raw CSV)	After (aggregated)
Time	75s	50s
Numbers	wildly off	exact
Verdict	not usable as-is	quote directly

72B + pre-aggregated input (12m 13s)

72B's numbers also match exactly (well, since they're being quoted from pre-aggregated data, that's expected). The proposal quality was the strongest of the four patterns:

Reduce Amazon dependency

Current: online shopping (Amazon family) is 25.1% of total (¥693,663).

Suggestion: stick to essentials only, regular review, avoid impulse buys.

Expected savings: ¥57,805/month average (25% reduction) → ¥693,660/year

...wait, hold on. Annual Amazon spend was ¥693,663. The "savings" 72B suggests is ¥693,660. That's basically the same number. So the proposal is effectively "stop buying on Amazon entirely (100%)" — definitely not 25%. Apparently 72B's percentage arithmetic isn't bulletproof either.

That aside, the lifestyle hypothesis section was kind of striking. Here's what 72B observed:

Heavy reliance on apps and subscriptions: "App/subscription" category is 10.5% of total

Frequent international travel: "Travel/airline" is 15.1%, with notable overseas charges

Frequent online shopping: "Online (Amazon)" is 25.1% of total

It's just one card's data, so this isn't a complete picture — but if I fed an AI my full household financials, the analysis and advice would probably go a lot deeper.

Summary: 4 patterns

#	Model	Input	Time	Numerical accuracy	Proposal quality
1	7B	Raw CSV	75s	❌ Numbers way off	△
2	72B	Raw CSV	12m 9s	△ Misread monthly trend	○
3	7B	Aggregated	50s	✅ Exact	○ Some repetition
4	72B	Aggregated	12m 13s	✅ Exact	◎ Best (mind the % math)

Quietly notable: 72B takes ~12 minutes regardless of input size (shrinking the prompt didn't change wall-clock time much). Output generation is the bottleneck. Which strengthens the case for "small model + pre-aggregate" as the cost-effective default.

7. Cross-check: the actual graphs

Before trusting any of the AI output, let me put the real numbers on charts using the spreadsheet tool (pandas).

Monthly spending

Average ¥230,130/month, but the range is ¥69,961 (lowest) to ¥493,072 (highest) — about a 7× spread. The 72B's "gradually increasing" claim was a bit off the mark; the reality is bouncy.

Category share

"Other" being 32% is because my categorization rule is sloppy. I just wrote a simple "if the merchant name contains keyword X, bucket Y" rule, and lots of merchants didn't match any keyword and ended up in "Other." Reading meaning from a merchant name is exactly the kind of thing AI is good at, so next time I'll let the AI do the categorization itself.

Top 15 merchants

Amazon at ¥421,978 (105 tx) is far and away #1. Amazon really is too convenient...

Weekday rhythm

Tuesday alone is ¥692,549 — way above the rest. Probably because that's when most of the subscription auto-charges land.

8. Today's takeaways

Separate "aggregation" from "interpretation"

AI is bad at	AI is good at
Multi-row sum/average (numbers go wildly off)	Categorization (interpreting fuzzy meaning)
Percentage math (saw "25% off → 100% off")	Pattern recognition / hypothesis generation
Distributed aggregation like monthly totals	Narrative interpretation, savings proposals

→ Aggregation is the spreadsheet tool's job; interpretation is the AI's. When you split the work, things go fast and accurate. "Data prep matters before analysis" — yeah, that old saying really is true. Note to self.

Sometimes input quality beats raw size

"7B + pre-aggregated input in 50 seconds" outperformed "72B + raw CSV in 12 minutes". Sometimes you don't need a bigger model — you need cleaner input. Felt that one today.

The local-LLM angle

Feeding 12 months of raw credit card data to an AI without a single byte going to the cloud — it was surprisingly stress-free. This is one of the spots local LLMs really shine. Got personal info, or anything cloud-uncomfortable? This is the place for them.

9. Tech details (Claude explains)

The technical bits, written up by my AI pair.

SCP transfer to the DGX (mDNS, no IP needed)

NVIDIA Sync auto-configures a Host alias in ~/AppData/Local/NVIDIA Corporation/Sync/config/ssh_config:

Host spark-XXXX.local
  Hostname spark-XXXX.local
  User [user]
  Port 22
  IdentityFile "...\\nvsync.key"

Which means I can SSH/SCP using spark-XXXX.local without ever looking up an IP. The .local suffix uses mDNS (Multicast DNS) for hostname resolution within the LAN.

Transfer command (one line, from PowerShell on the Windows side):

scp -r "C:\Users\[user]\Desktop\docs\dgx\csv" spark-XXXX.local:/home/[user]/personal/dgx-100-experiments/private-data/credit-card-csv

Ollama install + the sudo-TTY catch + GPU detection log

Ollama install:

curl -fsSL https://ollama.com/install.sh | sh

Running this through Claude Code's Bash, it errored at the sudo password prompt — an interactive TTY is required there:

sudo: a terminal is required to read the password

Reopened a separate SSH session, ran the same command manually, and it went through.

Once installed, systemd auto-starts the service. The GPU detection log via journalctl -u ollama:

inference compute id=GPU-986c194b... name=CUDA0 description="NVIDIA GB10"
total="121.7 GiB" available="79.0 GiB"
default_num_ctx=262144

VRAM (DGX Spark unified memory): 121.7 GiB
Default context: 262,144 tokens

Compared with a typical RTX 4090 (24 GB VRAM, 8K–32K default context), the gap is significant.

Loading both models simultaneously

ollama pull qwen2.5:7b   # 4.7 GB
ollama pull qwen2.5:72b  # 47 GB

After loading both, ollama ps shows:

NAME           SIZE      PROCESSOR    CONTEXT    
qwen2.5:72b    61 GB     100% GPU     32768
qwen2.5:7b     8.2 GB    100% GPU     32768

Total ~69 GB used out of 79 GB available. Both models stay resident, switching between them is instant.

Custom CSV parser for the credit card data

Three quirks needed handling: CP932 encoding, no quotes (commas in some merchant names break parsing), and a trailing summary row in each file.

def parse_line(line: str) -&gt; list[str] | None:
    fields = line.rstrip("\r\n").split(",")
    if len(fields) &lt; 7 or not fields[0]:
        return None  # skip blank/summary rows
    if len(fields) &gt; 7:
        merchant = ",".join(fields[1:-5])
        fields = [fields[0], merchant] + fields[-5:]
    return fields


def load_one(path: Path) -&gt; pd.DataFrame:
    rows = []
    with path.open(encoding="cp932") as f:
        next(f)  # skip header (cardholder metadata)
        for line in f:
            parsed = parse_line(line)
            if parsed is not None:
                rows.append(parsed)
    df = pd.DataFrame(rows, columns=COLUMNS)
    df["利用日"] = pd.to_datetime(df["利用日"], format="%Y/%m/%d")
    df["利用金額"] = df["利用金額"].astype(int)
    return df

Japanese fonts in matplotlib

japanize-matplotlib doesn't work on Python 3.12 — it imports distutils, which was removed from the standard library.

The modern replacement is matplotlib-fontja:

pip install matplotlib-fontja

import matplotlib_fontja  # noqa: F401  ← just importing it sets up IPAexGothic

Calling Ollama from Python

The official ollama Python client is straightforward:

import ollama

client = ollama.Client()
stream = client.chat(
    model="qwen2.5:72b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt},
    ],
    options={"temperature": 0.3},
    stream=True,
)
for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)

Streaming makes long generation easier to watch unfold.

Tomorrow: Day 4

Day 4 plan: let a local AI sort 20,000 iPhone photos.

The actual goal is to have a local image-recognition model (CLIP family?) clean up my photo library so I can stop paying iCloud for storage upgrades...!

100ExperimentsWithDGX #LocalLLM #Ollama

DEV Community

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

Intro

Today's setup

1. Get the CSVs onto the DGX

2. Install Ollama

3. Two model sizes: Qwen2.5 7B vs 72B

4. Prepping the CSVs

5. Experiment 1: Throw the raw CSV at the AI

7B's answer (75 seconds)

72B's answer (12m 9s)

6. Experiment 2: Aggregate first, then feed the AI

7B + pre-aggregated input (50 seconds)

72B + pre-aggregated input (12m 13s)

Summary: 4 patterns

7. Cross-check: the actual graphs

Monthly spending

Category share

Top 15 merchants

Weekday rhythm

8. Today's takeaways

Separate "aggregation" from "interpretation"

Sometimes input quality beats raw size

The local-LLM angle

9. Tech details (Claude explains)

Tomorrow: Day 4

100ExperimentsWithDGX #LocalLLM #Ollama

Top comments (0)