Originally published at chudi.dev
"I spend 10 hours a week just typing numbers from PDFs into spreadsheets."
That's what a freelance bookkeeper told me. She processes 50+ bank statements per month. Each one takes 10-15 minutes of manual transcription. The tedium is real.
I built StatementSync to fix this. One week from idea to production.
The Pain Point
Bookkeepers work with bank statements constantly. The workflow:
- Client sends PDF bank statement
- Open PDF, open spreadsheet
- Manually type each transaction
- Check for errors
- Repeat 50+ times per month
The tools that exist either:
- Require proprietary software (QuickBooks, Xero)
- Charge per file ($0.25-1.00 per statement)
- Have terrible accuracy (OCR garbage)
For someone processing 50+ statements monthly, per-file pricing adds up fast. $25-50/month minimum, scaling with volume.
Why Existing Tools Fail Bookkeepers
I asked the bookkeeper why she didn't use the tools that existed. The answers were consistent across five more interviews:
Per-file pricing punishes heavy users. At $0.25–0.50 per statement and 50+ statements monthly, the monthly bill rivals a software subscription. But you're not getting subscription-level reliability—one bad batch month and you've paid for nothing.
Proprietary software lock-in. QuickBooks can import bank data, but only after you configure a connection to the bank—which requires administrative access that clients often don't share with their bookkeeper. The bank statement PDF exists precisely because the connection doesn't.
OCR accuracy is unreliable. Generic OCR tools treat a bank statement like a document. Bank statements have fixed patterns: date column, description column, debit column, credit column, balance column. The patterns are predictable enough that OCR is overkill—and OCR's 80–85% accuracy on complex layouts creates more work correcting errors than manual entry would have.
The gap StatementSync fills: structure-aware parsing (not OCR, not AI), delivered via web with no setup, at a flat monthly rate. Three specific failures in the market, one specific fix.
Validation Before Code
MicroSaaSBot's validation phase scored StatementSync before I wrote a single line of code (the full MicroSaaSBot system is introduced in Introducing MicroSaaSBot):
| Criteria | Score | Notes |
|---|---|---|
| Problem Severity | 8/10 | Daily pain point, high time cost |
| Persona Clarity | 9/10 | "Freelance bookkeeper processing 50+ statements/month" |
| Market Size | 7/10 | Niche but clear demand |
| Willingness to Pay | 8/10 | Currently paying for inferior solutions |
| Overall | 78/100 | Proceed |
Problems scoring below 60/100 get killed. No code written. This prevents building products nobody wants.
The validation phase confirmed:
- Real people have this problem
- They're already paying for solutions
- Current solutions have clear weaknesses to exploit
The Week
Day 1-2: Deep Validation
MicroSaaSBot's Researcher agent dug deeper:
- Competitive analysis (TextSoap, HappyFox, manual OCR tools)
- Pricing research ($0.25-1.00 per file is standard)
- Feature gap analysis (batch upload, bank-specific parsing)
Key insight: Flat-rate pricing would be a massive differentiator. Heavy users hate per-file fees.
Day 3: Architecture
MicroSaaSBot's Architect agent designed the system:
Frontend: Next.js 15 (App Router)
Auth: Clerk (handles signup, OAuth)
Database: Supabase PostgreSQL
Storage: Supabase Storage (PDFs, exports)
Payments: Stripe (subscriptions)
PDF Parsing: unpdf (serverless-compatible)
Hosting: Vercel
Critical decision: Pattern-based extraction instead of LLM inference.
LLM extraction would cost $0.01-0.05 per statement in API calls. Pattern-based extraction costs nothing at runtime. For a flat-rate product, this is the difference between profit and loss. Stripe's subscription billing docs make clear that sustainable flat-rate pricing only works when marginal cost per unit is negligible—otherwise usage spikes destroy margins.
Day 4-6: Implementation
MicroSaaSBot's Developer agent built:
Day 4: Auth flow, database schema, file upload
// Prisma schema
model User {
id String @id @default(cuid())
clerkId String @unique
email String
subscriptionTier Tier @default(FREE)
conversionsThisMonth Int @default(0)
lastResetAt DateTime @default(now())
}
model Conversion {
id String @id @default(cuid())
userId String
originalFileName String
status Status @default(PENDING)
extractedData Json?
excelPath String?
csvPath String?
}
Day 5: PDF parsing engine
async function extractTransactions(pdfBuffer: Buffer): Promise<Transaction[]> {
const pdf = await getDocument({ data: pdfBuffer }).promise;
const text = await extractText(pdf);
// Pattern matching for supported banks
const bank = detectBank(text);
const parser = getParser(bank); // Chase, BofA, Wells, Citi, Capital One
return parser.extract(text);
}
Day 6: Export generation, Stripe integration, dashboard
Day 7: Deployment
MicroSaaSBot's Deployer agent:
- Configured Vercel deployment
- Set up Supabase production
- Connected Stripe webhooks
- Ran smoke tests
Live by end of day.
The Technical Challenge
Problem: pdf-parse doesn't work on Vercel serverless.
pdf-parse has native dependencies that fail on Vercel's serverless runtime. I discovered this at 2 AM when the production build crashed.
Solution: Switch to unpdf.
unpdf is built for serverless from the ground up. No native dependencies, works perfectly on Vercel. The switch took 2 hours but saved the deployment.
If you're processing PDFs on Vercel, Netlify, or any serverless platform, use unpdf. Not pdf-parse. Save yourself the debugging.
The Product
StatementSync today:
Free Tier:
- 3 conversions/month
- Single file upload
- 7-day history
Pro Tier ($19/month):
- Unlimited conversions
- Batch upload (20 files)
- 90-day history
- Priority support
The $19/month flat rate is the differentiator. Process 50 statements? Same price. Process 200? Same price. Heavy users save money. Light users get simplicity. The full case for flat-rate over per-file pricing is in flat-rate vs per-file SaaS pricing.
Results
| Metric | Value |
|---|---|
| Time to build | 7 days |
| Processing time | 3-5 seconds per statement |
| Extraction accuracy | 99% |
| Supported banks | 5 (Chase, BofA, Wells, Citi, Capital One) |
| Runtime cost per extraction | $0 (pattern-based) |
What I'd Do Differently
Start with one bank - Supporting 5 banks day one was overkill. Start with Chase (most common), add others based on demand.
Skip the dashboard MVP - Users just want to upload and download. The fancy dashboard came before proving the core value.
Launch before Day 7 - Could have deployed a working version by Day 5 and iterated publicly.
The Validation Framework
The scoring rubric MicroSaaSBot used is replicable for any idea:
Severity (0–10): How much does this problem hurt? Daily annoyance = 7+. Occasional friction = below 5. Quantify the time or money cost if you can.
Persona clarity (0–10): Can you describe the person who has this problem in one sentence with specific attributes? "Freelance bookkeepers who process 50+ bank statements monthly" = 9. "People who work with documents" = 2.
Existing solutions (0–10): Are people already paying for something that partially solves this? Existing paid solutions mean validated willingness to pay. No solutions means either a greenfield opportunity or no demand—confirm which before proceeding.
Differentiation path (0–10): Do you have a specific unfair advantage or structural difference from what exists? StatementSync's pattern-based extraction enabling flat-rate pricing was the differentiation. A technically identical product with the same pricing would score a 3 here.
Threshold for go: 70+. Not 60—that was a typo in my notes that survived into documentation. At 60, the idea is salvageable but needs more validation work before coding.
Kill at 60 or below. Build at 70+. The 10 points between 60 and 70 are where founders rationalize themselves into building things nobody wants.
The Lesson
Building fast doesn't mean building sloppy. It means:
- Validate before you code - Kill bad ideas early
- Architecture matters - Pattern-based vs LLM extraction was the key decision
- Launch before perfect - Iteration beats planning
MicroSaaSBot compressed weeks of work into days by handling the tedious parts automatically. I focused on the decisions that mattered.
StatementSync is proof that AI-assisted development can ship real products, not just demos.
The First Users
StatementSync's first five users came from a single Reddit comment in r/bookkeeping.
I described the problem—10+ hours transcribing bank PDFs monthly—and asked if anyone had found a good solution. Four people replied they'd tried existing tools but found them too expensive or too complex. One asked if I knew of a flat-rate option.
I shared the link. All five signed up within 24 hours. One converted to paid within 48 hours.
This is the signal the scoring rubric can't measure: whether real people in the target community respond to your positioning with recognition rather than explanation. "Finally" is a better signal than "that's interesting." Three of those first five users opened with some version of "finally."
The first month surfaced edge cases no amount of testing with sample statements would have caught. Chase statements processed correctly. Bank of America required a parser fix—their statement format changed in 2024 and the transaction date pattern didn't match. One user reported a missing transaction; it turned out to be a summary balance row that the parser was treating as a transaction.
Each edge case became a specific parser improvement. After six weeks, accuracy was effectively 100% for the five supported banks.
The lesson: building fast puts you in front of real users quickly. Real users reveal the edge cases that exist in the messy real world, not the controlled sample PDFs you test against. The 7-day build wasn't the end of the project—it was the beginning of the iteration loop that actually made the product good.
Related: Serverless PDF Processing: unpdf vs pdf-parse | Portfolio: StatementSync
Top comments (0)