Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud
Published on: 2026-06-06
Reading time: 10 min
Tags: #devops #cloud #fastapi #production
Situation
I deployed 3 FastAPI projects to 3 different clouds. Here's what actually happened (not marketing speak):
contest-agent → Google Cloud Run
ai-insight-curator → Railway
ai-lifelogger → Oracle Cloud Always Free
1. Google Cloud Run: 20+ Deployments Before Discovering the Real Problem
Issue: "Container Failed to Start"
Deployed 20+ times, same error every time:
Build: SUCCESS ✅
Push: SUCCESS ✅
Start: TIMEOUT ❌
Port 8080 binding: TIMEOUT ❌
Root cause: FastAPI startup was blocking port binding with I/O operations
# ❌ Problem code (startup blocks port binding)
@asynccontextmanager
async def lifespan(app: FastAPI):
await telegram_client.send_message("Starting...") # I/O blocking
db_check = await db.test_connection() # I/O blocking
scheduler.start() # Heavy init
yield
Cloud Run waits for port binding to complete before health checks. Startup blocking = timeout.
Solution: Lazy Loading
# ✅ Fixed code (startup returns immediately)
_initialized = False
async def lazy_init():
global _initialized
if _initialized:
return
_initialized = True
await telegram_client.send_message("Started")
scheduler.start()
@app.post("/webhook")
async def webhook(request: Request):
await lazy_init() # Init on first actual request
...
Result: Startup 100ms (was 60s+ timeout), port binding immediate, health check passes.
Key Lesson: Start Minimal
Don't deploy a complex system all at once. Lessons learned:
# Phase 1: Just "/" endpoint
@app.get("/")
async def root():
return {"status": "ok"}
# → Deploy, test, pass ✅
# Phase 2: Add health check
@app.get("/health")
async def health():
return {"status": "healthy"}
# → Deploy, test, pass ✅
# Phase 3-N: Gradually add features
# Each phase = one deployment test
2. Railway: The "Simple" Illusion
Advantages
- Git push → auto-deploy (very fast)
- PostgreSQL, Redis built-in
- Intuitive dashboard
Reality Check
Cost surprises:
Expected: $10/month
Actual: $25/month (250% overage)
Reason:
- 1 vCPU + 512MB RAM always running
- No cold start = memory always consumed
- Bandwidth costs added up
Memory leak detection is hard:
Hour 1: 150MB ✅
Hour 2: 180MB
Hour 3: 220MB
Hour 4: 260MB (OOM incoming)
Cause: RSS feed crawler not releasing memory
Auto-deploy is a double-edged sword:
- Con: Changes go live without testing
- Con: Need fast rollback procedure
How I Actually Operate It
# Before pushing to main:
pytest # Run tests
pylint # Lint check
docker build && docker run # Local test
# Only push after passing:
git push origin main # Auto-deploys
3. Oracle Cloud Always Free: Free but Demanding
Advantages
- Completely free (4 CPU, 24GB RAM, 200GB storage)
- No limits
- Full SSH control
Real Problems
Problem #1: 1GB instance, pip install fails
MemoryError during pip install
Reason: 1GB RAM instance can't handle
all packages at once
Solution:
# Add swap
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Or: Install only essentials
pip install --no-cache-dir anthropic supabase python-telegram-bot
Problem #2: Docker vs Local Mismatch
Local: anthropic==0.40.0 (already installed)
Docker: Fresh install reads requirements.txt
- anthropic==0.40.0
- langchain-anthropic needs anthropic>=0.41.0
→ pip can't resolve
Solution: Remove version pins, let pip resolve
DON'T: anthropic==0.40.0, supabase==2.0.0, ...
DO: anthropic, supabase (let pip figure it out)
Problem #3: SSH Deployment Needs Automation
# Manual (every time):
ssh oracle@your-ip
cd /opt/ai-lifelogger
git pull && systemctl restart
# Better (automated via GitHub Actions):
ssh -i $key oracle@$ip "cd /opt && git pull && systemctl restart"
Performance Comparison (3-Month Data)
| Metric | Cloud Run | Railway | Oracle |
|---|---|---|---|
| Deploy time | 2-3 min | 30 sec | 5 min |
| Cold start | 3-5 sec | 0 sec | <1 sec |
| Monthly cost | $15 | $25 | $0 |
| CPU limit | 2 cores | 1 core | 4 cores |
| RAM limit | 2GB | 512MB | 24GB |
| Stability | ✅ Solid | ⚠️ Memory issues | ✅ Solid |
Practical Advice
1. Start Minimal, Add Gradually
- Deploy "/" endpoint first
- Test, pass, add next feature
- Repeat
2. Always Test Locally
docker build -t myapp .
docker run -p 8080:8080 myapp
3. Choose Based on Use Case
- High traffic: Cloud Run (autoscales)
- Medium traffic: Railway (simple)
- Low traffic: Oracle (free)
4. Monitoring is Non-Negotiable
Cloud Run: GCP Logs + Cloud Monitoring
Railway: Built-in dashboard (limited)
Oracle: SSH → journalctl + tail -f
What I Learned
There's no "perfect" platform.
- Cloud Run: startup timeout (solvable with lazy loading)
- Railway: memory leaks (code issue, not platform)
- Oracle: operational overhead (worth it for free tier)
The real skill: Understanding each platform's constraints and designing around them.
The 20+ Cloud Run deployment failures? They taught me more than 10 successful deployments would have.
Final Deployment Architecture (June 7, 2026)
Production Status
🦅 Oracle Cloud (Always Free Tier)
├─ ai-lifelogger (port 8000)
│ ├─ FastAPI + APScheduler
│ ├─ Daily summaries: 05:00 KST
│ ├─ Weekly reviews: Sunday 08:00 KST
│ └─ Memory: 111MB / 954MB
│
└─ ai-insight-curator (port 8001)
├─ FastAPI + Telegram Bot
├─ RSS collection: Daily 06:00 KST
├─ Auto-summarization (Claude/Gemini/Groq fallback)
└─ Memory: 22MB / 954MB
🌐 Vercel (Free Hosting)
└─ Curator Web Dashboard
├─ React + Vite frontend
├─ Article search & filtering
├─ Image downloads
└─ https://curator-web-ui.vercel.app
📊 Total Memory: 537MB / 954MB (56% usage, 44% available)
What Changed
Initial Plan:
contest-agent → Cloud Run ❌ (dependency conflicts)
ai-insight-curator → Railway ❌ (over-engineered)
ai-lifelogger → Oracle Cloud ✅
Actual Production:
ai-lifelogger → Oracle Cloud ✅ (running)
ai-insight-curator → Oracle Cloud ✅ (1 instance = better)
Curator Web UI → Vercel ✅ (new, auto-deployed)
Key Insight: Single server + Web UI > Multi-cloud complexity
Performance Metrics
API Response Times:
- Lifelogger /health: < 50ms ✅
- Curator /api/v1/articles: < 100ms ✅
- Curator /api/v1/insights: < 100ms ✅
System Health:
- Memory: 537MB (56%) - 417MB free for scaling
- Availability: 99.9%
- Uptime: Continuous (Always Free tier)
Cost Analysis (Final)
| Platform | Cost | Status |
|---|---|---|
| Oracle Cloud | $0/month | ✅ Always Free |
| Vercel | $0/month | ✅ Free tier |
| Supabase DB | $0/month | ✅ Free tier |
| Claude API | Needs reset* | ⚠️ Using Gemini/Groq backup |
| TOTAL | $0/month | Forever Free |
*Anthropic tokens exhausted → fallback to Gemini/Groq working
Lessons Learned
Multi-cloud Isn't Always Better
- Cloud Run: Good for high-traffic APIs
- Railway: Convenient but expensive
- Oracle: Best for low-traffic, cost-sensitive projects
Single Server Wins Here
- 2 concurrent FastAPI services
- Database included (PostgreSQL via Supabase)
- Web dashboard on separate CDN (Vercel)
- Total cost: $0
Design Around Constraints
- Memory: 954MB available → deployed with 537MB usage
- Can still run 300MB+ additional services
- Monitoring via SSH (not ideal, but works)
Conclusion
Don't chase multi-cloud complexity.
The optimal deployment turned out to be:
- 1 Oracle Cloud instance (FastAPI services)
- 1 CDN (Vercel for web)
- 1 Database (Supabase)
- Everything free
Cost: $0/month ✅
Reliability: 99.9% ✅
Maintainability: Simple ✅
Sometimes simpler is better.
Top comments (0)