So I've been working on something for a while now. It's called TechVerse. It's a SaaS e-commerce platform, and I built the whole thing from the ground up using the MERN stack.
I want to talk about the cloud architecture side of things. Not the textbook version. The real version. The one where you're staring at your screen at 2am trying to figure out why your WebSocket connections keep dropping, or why your API response times just spiked to 4 seconds.
Let me walk you through it.
The Problem I Was Trying to Solve
Here's the situation. There are thousands of tech retailers in Nigeria who sell laptops, phones, accessories, all kinds of stuff. Most of them run their entire business from a physical shop. No website. No online store. Nothing.
Why? Because getting a custom e-commerce site built costs a fortune. And the international platforms? They charge in dollars. That's a dealbreaker when your local currency fluctuates every other week.
So I thought, what if I built a SaaS platform that lets these businesses spin up a professional online store for a fraction of the cost? Pricing in local currency. Optimized for local internet speeds. Local payment gateways baked right in.
That was the idea. Now I had to figure out how to actually build and deploy it.
Choosing the Stack
I went with what I know best. MongoDB, Express, React, Node. The classic MERN stack. But I made some specific choices that matter for production.
React 19 with Vite 7 on the frontend. Vite is ridiculously fast for builds. The dev experience alone is worth it, but more importantly, the production bundles are tiny. That matters when your users are on 3G connections.
Node.js 20 with Express on the backend. Nothing fancy here. It works. It scales. The ecosystem is massive. I added Socket.io for real-time features like order notifications and live inventory updates.
MongoDB Atlas for the database. I considered self-hosting on EC2, but honestly, managed databases save you so much headache. Automated backups, point-in-time recovery, monitoring. All handled. I went with an M10 cluster to start.
Redis through ElastiCache for caching and session management. This was a game changer for performance. More on that later.
The Architecture (And Why Each Piece Exists)
Alright, let's break down the actual AWS setup. I'll explain why I chose each service, not just what it does.
The Entry Point: Route 53 and CloudFront
Every request starts at Route 53 for DNS resolution. Simple enough. But the real magic is CloudFront.
CloudFront is AWS's CDN, and it has edge locations in Lagos. That's huge. It means my users in Nigeria are hitting a server that's physically close to them, not one sitting in Ireland or Virginia.
I configured CloudFront to do two things. Static file requests go to an S3 bucket where my React build lives. API requests get forwarded to my backend through the Application Load Balancer. One domain, two destinations. Clean and simple.
I also attached an ACM certificate here. Free SSL. No reason not to use it.
The Frontend: S3
The React app gets built by Vite, and the output goes straight into an S3 bucket. No servers involved. S3 serves static files incredibly well, and combined with CloudFront caching, my frontend loads in under 2 seconds on a 3G connection.
I set up error page redirects so that 404s go back to index.html. That's essential for single-page apps with client-side routing. Without it, refreshing any page that isn't the root would give you a blank screen.
The Backend: EC2 with Auto Scaling
Here's where it gets interesting. My Node.js API runs on EC2 instances inside a public subnet. I have two instances behind an Application Load Balancer, with an Auto Scaling Group that can spin up to four instances based on CPU utilization.
Why not Fargate or Lambda? Honestly, for a WebSocket-heavy application, EC2 gives you more control. Lambda has cold starts that would kill the real-time experience. Fargate is great but adds complexity I didn't need yet. EC2 with a good Auto Scaling policy hits the sweet spot.
The ALB distributes traffic evenly and handles health checks. If one instance goes down, traffic automatically routes to the healthy ones. No manual intervention needed.
The Data Layer: MongoDB Atlas and ElastiCache
MongoDB Atlas sits in a private subnet. It's peered with my VPC, so the connection is fast and secure. No public internet involved.
ElastiCache Redis handles three things for me. Session storage, so users stay logged in across multiple EC2 instances. Response caching, so repeated database queries don't hit MongoDB every time. And rate limiting, so I can throttle abusive requests without adding load to my application servers.
Before I added Redis, my average API response time was around 400ms. After? Under 200ms. That's the kind of improvement that users actually feel.
Monitoring and Email: CloudWatch and SES
CloudWatch collects logs and metrics from everything. EC2 instances, the load balancer, Redis, all of it. I set up alarms for CPU spikes, memory usage, and error rates. If something breaks at 3am, I get a notification.
Amazon SES handles transactional emails. Order confirmations, password resets, shipping updates. It's cheap and reliable. Way better than trying to manage your own SMTP server.
Backups
Everything gets backed up to S3. MongoDB Atlas handles its own backups, but I also dump snapshots to S3 for extra safety. CloudWatch logs go there too. Storage is cheap. Losing data is not.
The CI/CD Pipeline
This part I'm actually proud of. GitHub Actions handles everything.
When I push to the main branch, here's what happens. The pipeline runs tests. If they pass, it builds the React frontend and syncs it to S3. Then it deploys the backend to EC2 through the load balancer. Zero downtime. The whole process takes about 4 minutes.
I also have separate workflows for staging and production. Feature branches deploy to staging automatically. Production requires a manual approval step. That one extra click has saved me from shipping broken code more than once.
Stripe Integration
Payments go through Stripe. The integration is bidirectional. My EC2 instances send payment requests to Stripe's API, and Stripe sends webhook events back for things like successful charges, refunds, and subscription updates.
I handle webhooks on a dedicated endpoint with signature verification. Never trust incoming data without verifying it. That's a lesson you only need to learn once.
What This Actually Costs
Here's the part everyone wants to know. My monthly AWS bill for this setup is roughly $90 to $110. That breaks down to about:
- EC2 instances (t3.small): $25-30
- MongoDB Atlas (M10): $57
- ElastiCache Redis (t3.micro): $12
- S3 and CloudFront: $5-10
- Route 53 and misc: $2-3
For a production SaaS platform with auto-scaling, CDN, managed database, caching, monitoring, and automated deployments, that's pretty reasonable. It can comfortably handle hundreds of concurrent users and thousands of daily requests.
Lessons I Learned the Hard Way
Let me share a few things that bit me during this process.
WebSocket connections through CloudFront need specific configuration. You have to set up the right cache behaviors and forward the Upgrade header. I spent an entire weekend debugging why Socket.io worked locally but not in production. The fix was three lines of CloudFront config.
Don't skip the VPC design. I initially put everything in a public subnet because it was easier. Then I realized my database was exposed to the internet. Moved it to a private subnet immediately. Take the time to set up your network properly from day one.
Redis connection pooling matters. My first implementation created a new Redis connection for every request. Under load, I was hitting connection limits within minutes. Connection pooling fixed it instantly.
Auto Scaling needs a cooldown period. Without it, your instances will scale up and down like a yo-yo. I set a 5-minute cooldown, and the scaling became smooth and predictable.
Environment variables are not optional. I had a brief moment where I accidentally committed a JWT secret to GitHub. Rotated it immediately and moved everything to AWS Systems Manager Parameter Store. Use it. It's free.
What's Next
The architecture I have now works well for the current stage. But I'm already thinking about what comes next as the platform grows.
I want to add a message queue. Probably SQS. Right now, some of my background tasks like sending emails and processing images run synchronously. That's fine with 50 users. It won't be fine with 500.
I'm also looking at moving to containers eventually. ECS with Fargate would give me better resource utilization and simpler deployments. But that's a migration I'll do when the current setup starts showing strain, not before.
And I need better observability. CloudWatch is good for basics, but I want distributed tracing. Probably AWS X-Ray or something like Datadog. When you have multiple services talking to each other, you need to see the full picture of a request's journey.
Final Thoughts
Building a SaaS platform is one thing. Making it production-ready on AWS is a completely different challenge. It forces you to think about things you never consider during development. Network security. Scaling behavior. Cost optimization. Disaster recovery.
But here's what I've realized. You don't need a perfect architecture on day one. You need one that works, that you understand, and that you can evolve. Start simple. Add complexity only when you have a real reason to.
If you want to dig into the code, the full project is on GitHub: TechVerse on GitHub
Top comments (0)