I've been building Autoreport for the past few months alongside my day job. The idea was simple: every Monday morning, get a PDF in your inbox with your Stripe numbers from the previous week — no dashboards, no manual work.
Here's how I built it, the stack I chose, and the lessons I learned along the way.
The problem
I kept opening Stripe every Monday to check the previous week. Revenue, payments, new customers. It wasn't hard — just friction. And because it required effort, I kept skipping it or doing it badly.
I wanted the numbers to come to me, not the other way around.
The stack
The entire backend runs on AWS, deployed with Terraform:
- Lambda (Python) — one function per stage of the pipeline: data extraction, AI narrative generation, PDF building, email delivery
- EventBridge — triggers the pipeline every Monday morning
- S3 — stores raw Stripe data and generated reports
- DynamoDB — tenant registry
- SES — email delivery
- Secrets Manager — stores each customer's Stripe API key
- Bedrock (Claude Haiku) — generates the AI narrative from the weekly metrics
- API Gateway + Lambda — handles Paddle webhooks for subscription management
The landing page is a static site on Netlify. Payments go through Paddle as Merchant of Record.
How it works
Customer subscribes via Paddle checkout
Automated email via SES asks for their Stripe read-only API key
Customer replies with the key, I register them manually via a script (intentionally simple for beta)
Every Monday, EventBridge triggers the pipeline per tenant
Lambda pulls Stripe data, sends it to Bedrock, builds a PDF, delivers it via SES
What surprised me
Bedrock prompt engineering took longer than expected. The first version of the AI narrative was alarmist — "critical concern", "immediate attention" — for perfectly normal week-over-week fluctuations. I had to be very explicit in the prompt about tone, context, and when strong language is actually warranted.
The Merchant of Record route (Paddle) was the right call for avoiding tax and VAT headaches as a solo founder in Spain. The approval process took a few days but was worth it.
What's next
Right now it's in beta. If you run a SaaS on Stripe and want your week summarized every Monday without lifting a finger, give it a try at autoreport.dev.
Happy to answer any questions about the architecture or the build process.
Originally published on my personal blog
Top comments (4)
The "friction makes you skip it" insight is so real. I built something similar for my own projects — an automated weekly review agent that pulls data from Google Search Console, Yandex Webmaster, Gumroad analytics, and GA4 into a single report every Monday. Same motivation: I kept telling myself I'd check the dashboards manually and then just... didn't.
Smart move going serverless with Lambda per pipeline stage. Curious about the AI narrative generation — do you find that the LLM consistently picks up on the important week-over-week changes, or do you have to be pretty prescriptive about what to highlight? That's been my biggest challenge: getting the narrative to surface the signal, not just restate the numbers.
That's exactly the same itch — good to know the motivation translates across different data sources.
On your question about signal vs. restating numbers: honestly, the LLM doesn't consistently surface the right signal on its own. You have to be pretty prescriptive. What worked for me was structuring the prompt around three things: (1) explicitly tell it which metrics matter most and in what order, (2) give it thresholds for when something is actually worth flagging vs. normal variance, and (3) tell it what NOT to say — things like "requires immediate attention" unless success rate drops below a defined threshold or there are multiple open disputes.
The most useful constraint I added was telling it to acknowledge week-over-week drops matter-of-factly without alarm, and to focus the narrative on what the data shows rather than what it might mean. Early on it kept inventing reasons for the drop — "this could indicate seasonal fluctuation or reduced marketing spend" — which is just noise when you have 2 transactions.
Still a work in progress, but being explicit about tone and thresholds made a bigger difference than prompt length.
The three-part prompt structure is really practical — especially point (3) about telling the LLM what NOT to say. I hit the same issue with my weekly review agent: it would confidently explain why impressions dropped 12% when the actual answer was "you have 3 clicks total, this is statistical noise." Explicit thresholds for when to flag vs when to just report the number quietly solved most of the hallucinated insights problem.
This is such a cool “build once, deliver value weekly” idea.
Simple on the surface, but actually solving a real pain point for people using Stripe.