JustJinoIT

Posted on Jun 6

Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud (3-Month Report)

#devops #cloud #fastapi #production

Published on: 2026-06-06

Reading time: 10 min

Tags: #devops #cloud #fastapi #production

Situation

I deployed 3 FastAPI projects to 3 different clouds. Here's what actually happened (not marketing speak):

contest-agent      → Google Cloud Run
ai-insight-curator → Railway
ai-lifelogger      → Oracle Cloud Always Free

1. Google Cloud Run: 20+ Deployments Before Discovering the Real Problem

Issue: "Container Failed to Start"

Deployed 20+ times, same error every time:

Build: SUCCESS ✅
Push: SUCCESS ✅
Start: TIMEOUT ❌
Port 8080 binding: TIMEOUT ❌

Root cause: FastAPI startup was blocking port binding with I/O operations

# ❌ Problem code (startup blocks port binding)
@asynccontextmanager
async def lifespan(app: FastAPI):
    await telegram_client.send_message("Starting...")  # I/O blocking
    db_check = await db.test_connection()              # I/O blocking  
    scheduler.start()                                   # Heavy init
    yield

Cloud Run waits for port binding to complete before health checks. Startup blocking = timeout.

Solution: Lazy Loading

# ✅ Fixed code (startup returns immediately)
_initialized = False

async def lazy_init():
    global _initialized
    if _initialized:
        return
    _initialized = True
    await telegram_client.send_message("Started")
    scheduler.start()

@app.post("/webhook")
async def webhook(request: Request):
    await lazy_init()  # Init on first actual request
    ...

Result: Startup 100ms (was 60s+ timeout), port binding immediate, health check passes.

Key Lesson: Start Minimal

Don't deploy a complex system all at once. Lessons learned:

# Phase 1: Just "/" endpoint
@app.get("/")
async def root():
    return {"status": "ok"}
# → Deploy, test, pass ✅

# Phase 2: Add health check
@app.get("/health")
async def health():
    return {"status": "healthy"}
# → Deploy, test, pass ✅

# Phase 3-N: Gradually add features
# Each phase = one deployment test

2. Railway: The "Simple" Illusion

Advantages

Git push → auto-deploy (very fast)
PostgreSQL, Redis built-in
Intuitive dashboard

Reality Check

Cost surprises:

Expected: $10/month
Actual: $25/month (250% overage)

Reason:
- 1 vCPU + 512MB RAM always running
- No cold start = memory always consumed
- Bandwidth costs added up

Memory leak detection is hard:

Hour 1: 150MB ✅
Hour 2: 180MB
Hour 3: 220MB
Hour 4: 260MB (OOM incoming)

Cause: RSS feed crawler not releasing memory

Auto-deploy is a double-edged sword:

Con: Changes go live without testing
Con: Need fast rollback procedure

How I Actually Operate It

# Before pushing to main:
pytest              # Run tests
pylint             # Lint check
docker build && docker run  # Local test

# Only push after passing:
git push origin main  # Auto-deploys

3. Oracle Cloud Always Free: Free but Demanding

Advantages

Completely free (4 CPU, 24GB RAM, 200GB storage)
No limits
Full SSH control

Real Problems

Problem #1: 1GB instance, pip install fails

MemoryError during pip install

Reason: 1GB RAM instance can't handle 
all packages at once

Solution:

# Add swap
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Or: Install only essentials
pip install --no-cache-dir anthropic supabase python-telegram-bot

Problem #2: Docker vs Local Mismatch

Local: anthropic==0.40.0 (already installed)
Docker: Fresh install reads requirements.txt
  - anthropic==0.40.0
  - langchain-anthropic needs anthropic>=0.41.0
  → pip can't resolve

Solution: Remove version pins, let pip resolve

DON'T: anthropic==0.40.0, supabase==2.0.0, ...
DO: anthropic, supabase (let pip figure it out)

Problem #3: SSH Deployment Needs Automation

# Manual (every time):
ssh oracle@your-ip
cd /opt/ai-lifelogger
git pull && systemctl restart

# Better (automated via GitHub Actions):
ssh -i $key oracle@$ip "cd /opt && git pull && systemctl restart"

Performance Comparison (3-Month Data)

Metric	Cloud Run	Railway	Oracle
Deploy time	2-3 min	30 sec	5 min
Cold start	3-5 sec	0 sec	<1 sec
Monthly cost	$15	$25	$0
CPU limit	2 cores	1 core	4 cores
RAM limit	2GB	512MB	24GB
Stability	✅ Solid	⚠️ Memory issues	✅ Solid

Practical Advice

1. Start Minimal, Add Gradually

Deploy "/" endpoint first
Test, pass, add next feature
Repeat

2. Always Test Locally

docker build -t myapp .
docker run -p 8080:8080 myapp

3. Choose Based on Use Case

High traffic: Cloud Run (autoscales)
Medium traffic: Railway (simple)
Low traffic: Oracle (free)

4. Monitoring is Non-Negotiable

Cloud Run: GCP Logs + Cloud Monitoring
Railway: Built-in dashboard (limited)
Oracle: SSH → journalctl + tail -f

What I Learned

There's no "perfect" platform.

Cloud Run: startup timeout (solvable with lazy loading)
Railway: memory leaks (code issue, not platform)
Oracle: operational overhead (worth it for free tier)

The real skill: Understanding each platform's constraints and designing around them.

The 20+ Cloud Run deployment failures? They taught me more than 10 successful deployments would have.

Bottom Line

If you're deploying to multiple clouds, expect problems—but they're solvable. Document them, learn from them, share them.

Your experience matters to other developers.

DEV Community

Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud (3-Month Report)

Situation

1. Google Cloud Run: 20+ Deployments Before Discovering the Real Problem

Issue: "Container Failed to Start"

Solution: Lazy Loading

Key Lesson: Start Minimal

2. Railway: The "Simple" Illusion

Advantages

Reality Check

How I Actually Operate It

3. Oracle Cloud Always Free: Free but Demanding

Advantages

Real Problems

Performance Comparison (3-Month Data)

Practical Advice

1. Start Minimal, Add Gradually

2. Always Test Locally

3. Choose Based on Use Case

4. Monitoring is Non-Negotiable

What I Learned

Bottom Line

Top comments (0)