DEV Community

JustJinoIT
JustJinoIT

Posted on

Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud

Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud

Published on: 2026-06-06

Reading time: 10 min

Tags: #devops #cloud #fastapi #production

Situation

I deployed 3 FastAPI projects to 3 different clouds. Here's what actually happened (not marketing speak):

contest-agent      → Google Cloud Run
ai-insight-curator → Railway
ai-lifelogger      → Oracle Cloud Always Free
Enter fullscreen mode Exit fullscreen mode

1. Google Cloud Run: 20+ Deployments Before Discovering the Real Problem

Issue: "Container Failed to Start"

Deployed 20+ times, same error every time:

Build: SUCCESS ✅
Push: SUCCESS ✅
Start: TIMEOUT ❌
Port 8080 binding: TIMEOUT ❌
Enter fullscreen mode Exit fullscreen mode

Root cause: FastAPI startup was blocking port binding with I/O operations

# ❌ Problem code (startup blocks port binding)
@asynccontextmanager
async def lifespan(app: FastAPI):
    await telegram_client.send_message("Starting...")  # I/O blocking
    db_check = await db.test_connection()              # I/O blocking  
    scheduler.start()                                   # Heavy init
    yield
Enter fullscreen mode Exit fullscreen mode

Cloud Run waits for port binding to complete before health checks. Startup blocking = timeout.

Solution: Lazy Loading

# ✅ Fixed code (startup returns immediately)
_initialized = False

async def lazy_init():
    global _initialized
    if _initialized:
        return
    _initialized = True
    await telegram_client.send_message("Started")
    scheduler.start()

@app.post("/webhook")
async def webhook(request: Request):
    await lazy_init()  # Init on first actual request
    ...
Enter fullscreen mode Exit fullscreen mode

Result: Startup 100ms (was 60s+ timeout), port binding immediate, health check passes.

Key Lesson: Start Minimal

Don't deploy a complex system all at once. Lessons learned:

# Phase 1: Just "/" endpoint
@app.get("/")
async def root():
    return {"status": "ok"}
# → Deploy, test, pass ✅

# Phase 2: Add health check
@app.get("/health")
async def health():
    return {"status": "healthy"}
# → Deploy, test, pass ✅

# Phase 3-N: Gradually add features
# Each phase = one deployment test
Enter fullscreen mode Exit fullscreen mode

2. Railway: The "Simple" Illusion

Advantages

  • Git push → auto-deploy (very fast)
  • PostgreSQL, Redis built-in
  • Intuitive dashboard

Reality Check

Cost surprises:

Expected: $10/month
Actual: $25/month (250% overage)

Reason:
- 1 vCPU + 512MB RAM always running
- No cold start = memory always consumed
- Bandwidth costs added up
Enter fullscreen mode Exit fullscreen mode

Memory leak detection is hard:

Hour 1: 150MB ✅
Hour 2: 180MB
Hour 3: 220MB
Hour 4: 260MB (OOM incoming)

Cause: RSS feed crawler not releasing memory
Enter fullscreen mode Exit fullscreen mode

Auto-deploy is a double-edged sword:

  • Con: Changes go live without testing
  • Con: Need fast rollback procedure

How I Actually Operate It

# Before pushing to main:
pytest              # Run tests
pylint             # Lint check
docker build && docker run  # Local test

# Only push after passing:
git push origin main  # Auto-deploys
Enter fullscreen mode Exit fullscreen mode

3. Oracle Cloud Always Free: Free but Demanding

Advantages

  • Completely free (4 CPU, 24GB RAM, 200GB storage)
  • No limits
  • Full SSH control

Real Problems

Problem #1: 1GB instance, pip install fails

MemoryError during pip install

Reason: 1GB RAM instance can't handle 
all packages at once
Enter fullscreen mode Exit fullscreen mode

Solution:

# Add swap
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Or: Install only essentials
pip install --no-cache-dir anthropic supabase python-telegram-bot
Enter fullscreen mode Exit fullscreen mode

Problem #2: Docker vs Local Mismatch

Local: anthropic==0.40.0 (already installed)
Docker: Fresh install reads requirements.txt
  - anthropic==0.40.0
  - langchain-anthropic needs anthropic>=0.41.0
  → pip can't resolve
Enter fullscreen mode Exit fullscreen mode

Solution: Remove version pins, let pip resolve

DON'T: anthropic==0.40.0, supabase==2.0.0, ...
DO: anthropic, supabase (let pip figure it out)
Enter fullscreen mode Exit fullscreen mode

Problem #3: SSH Deployment Needs Automation

# Manual (every time):
ssh oracle@your-ip
cd /opt/ai-lifelogger
git pull && systemctl restart

# Better (automated via GitHub Actions):
ssh -i $key oracle@$ip "cd /opt && git pull && systemctl restart"
Enter fullscreen mode Exit fullscreen mode

Performance Comparison (3-Month Data)

Metric Cloud Run Railway Oracle
Deploy time 2-3 min 30 sec 5 min
Cold start 3-5 sec 0 sec <1 sec
Monthly cost $15 $25 $0
CPU limit 2 cores 1 core 4 cores
RAM limit 2GB 512MB 24GB
Stability ✅ Solid ⚠️ Memory issues ✅ Solid

Practical Advice

1. Start Minimal, Add Gradually

  • Deploy "/" endpoint first
  • Test, pass, add next feature
  • Repeat

2. Always Test Locally

docker build -t myapp .
docker run -p 8080:8080 myapp
Enter fullscreen mode Exit fullscreen mode

3. Choose Based on Use Case

  • High traffic: Cloud Run (autoscales)
  • Medium traffic: Railway (simple)
  • Low traffic: Oracle (free)

4. Monitoring is Non-Negotiable

Cloud Run: GCP Logs + Cloud Monitoring
Railway: Built-in dashboard (limited)
Oracle: SSH → journalctl + tail -f
Enter fullscreen mode Exit fullscreen mode

What I Learned

There's no "perfect" platform.

  • Cloud Run: startup timeout (solvable with lazy loading)
  • Railway: memory leaks (code issue, not platform)
  • Oracle: operational overhead (worth it for free tier)

The real skill: Understanding each platform's constraints and designing around them.

The 20+ Cloud Run deployment failures? They taught me more than 10 successful deployments would have.

Final Deployment Architecture (June 7, 2026)

Production Status

🦅 Oracle Cloud (Always Free Tier)
├─ ai-lifelogger (port 8000)
│  ├─ FastAPI + APScheduler
│  ├─ Daily summaries: 05:00 KST
│  ├─ Weekly reviews: Sunday 08:00 KST
│  └─ Memory: 111MB / 954MB
│
└─ ai-insight-curator (port 8001)
   ├─ FastAPI + Telegram Bot
   ├─ RSS collection: Daily 06:00 KST
   ├─ Auto-summarization (Claude/Gemini/Groq fallback)
   └─ Memory: 22MB / 954MB

🌐 Vercel (Free Hosting)
└─ Curator Web Dashboard
   ├─ React + Vite frontend
   ├─ Article search & filtering
   ├─ Image downloads
   └─ https://curator-web-ui.vercel.app

📊 Total Memory: 537MB / 954MB (56% usage, 44% available)
Enter fullscreen mode Exit fullscreen mode

What Changed

Initial Plan:

contest-agent → Cloud Run ❌ (dependency conflicts)
ai-insight-curator → Railway ❌ (over-engineered)
ai-lifelogger → Oracle Cloud ✅
Enter fullscreen mode Exit fullscreen mode

Actual Production:

ai-lifelogger → Oracle Cloud ✅ (running)
ai-insight-curator → Oracle Cloud ✅ (1 instance = better)
Curator Web UI → Vercel ✅ (new, auto-deployed)
Enter fullscreen mode Exit fullscreen mode

Key Insight: Single server + Web UI > Multi-cloud complexity

Performance Metrics

API Response Times:
- Lifelogger /health: < 50ms ✅
- Curator /api/v1/articles: < 100ms ✅
- Curator /api/v1/insights: < 100ms ✅

System Health:
- Memory: 537MB (56%) - 417MB free for scaling
- Availability: 99.9%
- Uptime: Continuous (Always Free tier)
Enter fullscreen mode Exit fullscreen mode

Cost Analysis (Final)

Platform Cost Status
Oracle Cloud $0/month ✅ Always Free
Vercel $0/month ✅ Free tier
Supabase DB $0/month ✅ Free tier
Claude API Needs reset* ⚠️ Using Gemini/Groq backup
TOTAL $0/month Forever Free

*Anthropic tokens exhausted → fallback to Gemini/Groq working

Lessons Learned

Multi-cloud Isn't Always Better

  • Cloud Run: Good for high-traffic APIs
  • Railway: Convenient but expensive
  • Oracle: Best for low-traffic, cost-sensitive projects

Single Server Wins Here

  • 2 concurrent FastAPI services
  • Database included (PostgreSQL via Supabase)
  • Web dashboard on separate CDN (Vercel)
  • Total cost: $0

Design Around Constraints

  • Memory: 954MB available → deployed with 537MB usage
  • Can still run 300MB+ additional services
  • Monitoring via SSH (not ideal, but works)

Conclusion

Don't chase multi-cloud complexity.

The optimal deployment turned out to be:

  • 1 Oracle Cloud instance (FastAPI services)
  • 1 CDN (Vercel for web)
  • 1 Database (Supabase)
  • Everything free

Cost: $0/month ✅
Reliability: 99.9% ✅
Maintainability: Simple ✅

Sometimes simpler is better.

Top comments (0)