Justin Wheeler

Posted on Oct 20

Better Late Than Never: Tackling the 2022 Cloud Portfolio Challenge in 2025 (With a Modern Twist)

#cloud #digitalocean #pulumi #go

Connect with me: LinkedIn
Support my work: Buy me a coffee ☕

You know that feeling when you add something to your to-do list with every intention of knocking it out quickly... and then three years pass? Yeah, me too.

Back in June 2022, Lars Klint posted the Cloud Portfolio Challenge: Load Balancing and Content Delivery Network on Pluralsight. The challenge was straightforward: build an image delivery service that returns images matching search criteria, while learning about load balancing, CDN, compute, and storage fundamentals.

I bookmarked it. Added it to my list. And promptly let life happen.

Fast forward to October 2025, and I finally dusted off that bookmark. But here's the thing—I wouldn't have been able to build what I built if I had actually met that 2022 deadline. The tech stack I used simply didn't exist back then, or wasn't mature enough to use effectively.

So this is my "better late than never" submission, powered by 2025's tech stack, and honestly? I'm glad I waited.

The Original Challenge (And How I Completely Reimagined It)

The original challenge asked participants to build an image delivery service with four key cloud components:

Compute - Running the application logic
Storage - Storing and serving images
Load Balancing - Distributing traffic across instances
CDN - Delivering content globally with low latency

Simple enough, right? Well, I decided to add a twist: What if the images were AI-generated from a curated set of prompts, and users could vote on which images they prefer in head-to-head battles?

Enter: AI Image Battle Arena 🥊

Instead of serving pre-existing images, my application:

Asynchronously generates images using multiple AI providers (Freepik, Google Imagen, Leonardo AI) via scheduled cron jobs
Presents two random images side-by-side for comparison (both from the same provider to ensure fair comparison)
Lets users vote on which image they prefer via swipe gestures
Tracks statistics and winners in a Valkey (Redis-compatible) database
Serves everything through a CDN with load-balanced full-stack droplets

It's like Hot or Not, but for AI-generated art. Each comparison uses images from a single provider to keep things fair—I didn't want the user experience depending on multiple AI APIs being available simultaneously. This makes the system more resilient with ADK managing provider selection behind the scenes.

Why Digital Ocean? (A Love Letter to Simplicity)

Let me be upfront: I'm a major cloud provider person. I'm an AWS Community Builder, AWS User Group Leader, and AWS Gold Jacket holder. I've worked extensively with AWS, Azure, and GCP for years—and I genuinely love these platforms. They power the world's largest applications, and their breadth of services is unmatched.

(Sidebar: Yes, even with today's AWS outage on October 20, 2025—because let's be real, all cloud providers have bad days. The big three have earned their reliability reputations.)

But I kept hearing from other developers in the community: "You should try Digital Ocean. It's so much simpler. The developer experience is amazing."

And you know what? They were right.

After years with the hyperscalers, Digital Ocean felt like a breath of fresh air. The UI is clean, intuitive, and doesn't make you feel like you need a map just to find what you're looking for. Everything is straightforward—no endless service catalogs, no decision paralysis about which of the 17 database options to choose.

Some highlights from my DO experience:

1. Managed Databases (Valkey)

Setting up a Valkey cluster (Redis-compatible) via Pulumi was incredibly straightforward. The API is intuitive—just specify size, region, and VPC attachment. No complex subnet CIDR calculations or security group wizardry. It just works.

2. Spaces + CDN

Digital Ocean Spaces (S3-compatible object storage) comes with built-in CDN. Not "you need to set up CloudFront and configure origins and behaviors"—it's just... included. Upload your images, they're automatically CDN-distributed. Chef's kiss. 👌

3. DO Metrics Agent

Here's a cool feature I didn't expect: Digital Ocean offers an enhanced metrics agent you can install on droplets that measures things like memory usage—metrics that aren't included in the free tier on other providers. I configured it in my UserData script, and suddenly I had deep observability into my droplet performance without additional cost.

4. Credits and Support

DO gave me $200 in promotional credits (60-day expiration) to test things out. And when I had questions about Droplet limits? Their support team responded quickly with helpful, non-robotic answers.

To be clear: AWS, Azure, and GCP are incredible platforms that I'll continue using for enterprise work. But for side projects, learning, and rapid prototyping? Digital Ocean's simplicity is genuinely refreshing. Different tools for different jobs.

5. Domain Management with Namecheap + DO Nameservers

For this project, I grabbed a domain from Namecheap: wheeleraiduel.online for just $0.98/year. Since this is a side project I don't plan to keep live beyond a year, why spend $12+ on a .com?

The setup is beautifully simple:

Register domain on Namecheap (~$1)
Point Namecheap to Digital Ocean's nameservers
Manage all DNS records directly in the DO console
Let's Encrypt SSL certificates auto-provision via Pulumi

This hybrid approach gives me Namecheap's pricing with DO's DNS management UX. Best of both worlds for a temporary project.

Why Random Prompts? (Safety First)

You might notice the application generates images from a curated set of prompts rather than taking user input. This was a deliberate design choice for three reasons:

1. Async Generation Architecture

Since images are generated via scheduled cron jobs (not on-demand), there's no user session to capture input from. The generation happens in the background, building up a library of images that the frontend randomly serves.

2. User Input Requires Special Care

Accepting user input means:

Input validation and sanitization
Rate limiting to prevent abuse
Moderation to filter inappropriate prompts
Storage and management of user data
Potential GDPR/privacy concerns

For a side project focused on cloud architecture and AI orchestration? That's scope creep I didn't need.

3. Prompt Injection Is Real

Generative AI is susceptible to prompt injection attacks where malicious users craft inputs to bypass safety filters or generate harmful content. By using a curated set of prompts (generated by Claude and Gemini during development), I completely eliminate this attack vector.

Example curated prompts:

"Robot holding a red skateboard"
"Astronaut riding a bicycle on the moon"
"Cat wearing sunglasses at a coffee shop"

Safe, fun, and focused on the technical infrastructure—not content moderation.

The Tech Stack That Made This Possible

Here's where 2025 tech really shines:

Backend: Go + Gin Framework

I built the API server in Go using the Gin web framework. Why Go? Fast, statically typed, great concurrency support, and perfect for cloud-native applications. The backend handles:

Image generation orchestration
Provider fallback logic
Vote tracking and statistics
Health checks for the load balancer

Frontend: Next.js 14

The image comparison interface is built with Next.js, featuring:

Mobile-first responsive design with swipe gestures
Framer Motion animations with spring physics
Real-time vote feedback
Server-side rendering for SEO

Infrastructure as Code: Pulumi (with Go)

I've used Terraform extensively, and I've worked with CloudFormation and AWS CDK. But for this project, I went with Pulumi—and I'm genuinely impressed.

Pulumi lets you write infrastructure code in real programming languages (I used Go for consistency with my backend). No HCL to learn, no YAML templating gymnastics—just actual code with real loops, conditionals, and type safety.

Here's what my infrastructure deploys:

// Simplified example from hosting/main.go
droplets := make([]*digitalocean.Droplet, dropletCount)
for i := 0; i < dropletCount; i++ {
    droplet, err := digitalocean.NewDroplet(ctx, logicalName, &digitalocean.DropletArgs{
        Name:     pulumi.String(physicalName),
        Image:    pulumi.String("ubuntu-22-04-x64"),
        Size:     pulumi.String("s-2vcpu-2gb"),
        Region:   pulumi.String("nyc3"),
        VpcUuid:  vpc.ID(),
        UserData: getFullStackUserData(config),
    })
    droplets[i] = droplet
}

Each droplet runs:

Backend Go API server (port 8080)
Frontend Next.js app (port 3000)
Nginx reverse proxy (port 80)
Automated log uploads to Spaces (hourly, gzip compressed)
DigitalOcean metrics agent

The Pulumi console gives you real-time visibility into deployments:

Why Pulumi over Terraform? A few reasons:

Type Safety: Compiler catches errors before deployment
Loops & Logic: Native language constructs instead of count hacks
Single Language: Same language as my backend (Go)
Better Error Messages: Actually tells you what's wrong

Pulumi might not have Terraform's community size yet, but for greenfield projects, it's a compelling choice.

Google Agent Development Kit (ADK)

Here's the real 2025 magic: Google's Agent Development Kit.

I learned about ADK from my connection Kelby Enevold, who told me how cool it was. ADK is Google's framework for building AI agents that can orchestrate tasks, use tools, and handle complex workflows.

In my application, ADK powers the "orchestrator agent" that:

Randomly selects an AI image provider
Calls the provider's API to generate an image
Detects quota limits, rate limits, or errors
Automatically falls back to a different provider
Handles retries and error scenarios

This pattern—intelligent provider selection with automatic fallback—would have been manual spaghetti code without ADK. Instead, it's a clean agent-based architecture that "just works."

This literally wouldn't have been possible in 2022. Google ADK was announced at Google Cloud Next 2025, with the stable Python v1.0.0 release happening in 2024. It's built on the same foundation powering Google's own products like Agentspace and their Customer Engagement Suite.

Hot Storage vs. Cold Storage: Why Valkey Is Optional

One architectural decision I'm particularly proud of: Digital Ocean Spaces is my single source of truth.

Here's how the data architecture works:

Cold Storage: DO Spaces (Source of Truth)

Every generated image is stored in DO Spaces with rich metadata:

Image file: The actual PNG/JPEG
Object metadata:
- provider: Which AI service generated it (freepik, google-imagen, leonardo-ai)
- prompt: The text prompt used for generation
- Other generation details

This metadata lives directly on the S3-compatible object storage. No separate database required.

Hot Storage: Valkey (Performance Cache)

Valkey stores:

Vote counts and statistics
Side win tracking (left vs. right)
Winning image references
Real-time leaderboard data

But here's the key insight: Valkey is purely for performance. It's a cache, not the source of truth.

The `recreate_valkey` Flag

In my GitHub Actions deploy workflow, there's a boolean parameter called recreate_valkey. When set to true, it:

Scans all objects in DO Spaces
Reads metadata from each image
Rebuilds Valkey indexes from scratch
Repopulates provider statistics

This means I can:

Delete the entire Valkey cluster to save costs
Recover from Valkey data corruption
Rebuild after accidental data loss
Migrate to a different caching solution

The images, prompts, and generation history are never lost. They live permanently in Spaces.

Why This Matters

Many applications tightly couple their database and storage layers. If the database fails, critical metadata is gone forever. By storing metadata with the objects themselves, I've created a resilient architecture where:

Valkey failure = temporary performance degradation, not data loss
Spaces backup = complete system backup, including all metadata
Cost optimization = optional caching layer when budget is tight

For the "lite" version I'm planning in Part 2, I could potentially run without Valkey entirely—serving slightly slower but still functional. That's architectural flexibility.

The Build Process: Research First, Deploy Last

People sometimes ask me: "How do you even start large side projects like this?"

My answer: Start small. Start with research.

Here's how I structured this project:

Phase 1: Research (`research/` directory)

Before writing a single line of business logic, I created small Proof of Concepts for each AI provider:

research/
├── freepik/          # Test Freepik API integration
├── google-imagen/    # Test Google Imagen API
├── leonardo-ai/      # Test Leonardo AI API
└── craiyon/          # Test Craiyon (spoiler: broken)

Each research folder had its own README documenting:

How to authenticate
How to make requests
Response formats
Pricing/quota limits
Gotchas and workarounds

This meant when I started building the actual backend, I wasn't debugging API integration issues—I already knew exactly how each provider worked.

Key lesson: Spend time in research mode. It pays dividends later.

Phase 2: Backend (`backend/`)

Once I understood the providers, I built the Go API server:

Provider abstraction layer
ADK orchestrator integration
Valkey vote tracking
Image storage to DO Spaces

Phase 3: Frontend (`frontend/`)

Next.js app with mobile-optimized voting interface:

Swipe left/right to vote
Real-time animations
Statistics display

Phase 4: Infrastructure (`hosting/`)

Last step: Deploy to the cloud.

Why last? Because cloud providers bill you from the moment you provision resources. By building locally first, I:

Avoided weeks of unnecessary billing
Used my DO credits efficiently (60-day expiration)
Deployed a complete, tested application—not a half-baked experiment

This order—research → backend → frontend → hosting—is how I approach every side project. It keeps costs down and reduces cloud debugging headaches.

GitHub Actions: The Full Deployment Pipeline

Once infrastructure code was ready, I built three GitHub Actions workflows to manage everything:

1. Deploy Workflow

One-click deployment via GitHub Actions:

Provisions all infrastructure (load balancer, droplets, database, CDN)
Configurable droplet count (2-10 instances)
Auto-deploys applications via UserData script
Sets up monitoring and log shipping

2. Teardown Workflow

Safe infrastructure destruction:

Requires typing "DESTROY" to confirm (safety first!)
Cleans up auto-created DNS records
Removes all resources to stop billing
Saves ~$68/month when not in use

3. Refresh Workflow

State synchronization:

Syncs Pulumi state with actual cloud resources
Useful after manual changes in DO console
Detects and resolves state drift

Full transparency: This automation is a game-changer. Push code, watch it deploy, tear it down when done. No manual server configuration, no SSH debugging sessions.

What I Learned (And What Surprised Me)

1. Simplicity Has Value

AWS's breadth of services is powerful, but DO's focused simplicity meant I spent more time building features and less time reading documentation about VPC peering topologies.

2. IaC Language Matters

Using Go for both backend and infrastructure code created a cohesive developer experience. Context switching between languages is mentally taxing—Pulumi eliminated that.

3. AI Agents Are Production-Ready

Google ADK isn't just a toy—it handles real production workflows with fallback logic, error handling, and reliability. This is the future of orchestration.

4. UserData Is Underrated

My droplets auto-deploy everything via UserData scripts:

Install dependencies (Go, Node.js, nginx)
Clone and build applications
Configure systemd services
Set up cron jobs for log uploads
Install monitoring agents

No manual SSH configuration. No Ansible playbooks. Just boot and go.

5. Research Time Is Never Wasted

Those small POCs in the research/ folder saved me countless debugging hours later. Investing in understanding your dependencies upfront is time well spent.

The Challenge Requirement Checklist ✅

So, did I actually complete the original challenge requirements?

✅ Compute: Full-stack droplets running Go + Next.js + nginx
✅ Storage: DO Spaces for images and logs
✅ Load Balancing: DO Load Balancer distributing traffic across instances
✅ CDN: Built-in CDN via DO Spaces
✅ Image Delivery: Returns images (AI-generated) matching search criteria
✅ Learning Outcome: Deep understanding of how these components work together

Bonus achievements:

✅ Infrastructure as Code (Pulumi)
✅ AI orchestration (Google ADK)
✅ Automated deployment (GitHub Actions)
✅ Production-grade monitoring (DO Metrics Agent)
✅ Security hardening (non-root service user, VPC-isolated database)
✅ Cost optimization (automated teardown, log compression)

Final Thoughts: Better Late Than Never

Did I miss the 2022 deadline? Absolutely.

Do I regret waiting? Not even a little.

This project showcases technology that didn't exist three years ago:

Google ADK for AI orchestration (released 2024)
Modern generative AI APIs (Imagen 3.0, Leonardo AI)
Pulumi's matured Go SDK
Next.js 14's App Router
Digital Ocean's enhanced features

Sometimes the best time to tackle a challenge is when you have the right tools for the job.

If you're sitting on a dusty to-do item from years ago, consider this your sign: maybe now is actually the perfect time. The tools have gotten better. Your skills have improved. And that "overdue" project might turn into your best work yet.

Thanks to Lars Klint for the original challenge, Kelby Enevold for introducing me to Google ADK, and the Digital Ocean team for making cloud infrastructure genuinely enjoyable to work with.

What's Next? Stay Tuned for Part 2 📅

As my DO promotional credits approach expiration near the end of the year, I'm planning to deploy a "lite" version of this project—a cost-optimized configuration designed to avoid bill shock while keeping the core functionality intact.

Part 2 will cover:

Cost optimization strategies for production
Scaling down gracefully without losing features
Balancing cloud costs vs. capabilities
Real-world lessons from running AI infrastructure on a budget

Follow me on LinkedIn to catch Part 2 when it drops!

Want to Build This Yourself?

The entire project is open source on GitHub:

Repository: wheeleruniverse/cgc-lb-and-cdn
Live Demo: wheeleraiduel.online (when deployed)

The README includes:

Complete setup instructions
Architecture diagrams
API documentation
Cost breakdowns
Deployment guides

Give it a star if you found this interesting, and feel free to fork it for your own experiments!

What cloud challenges are sitting on your dusty to-do list? Drop them in the comments—let's hold each other accountable! 👇

Enjoyed this post? Buy me a coffee ☕ to support more cloud adventures!