How a simple "make it work remotely" task uncovered a complex production infrastructure battle
π― The Mission
Goal: Get Celery working on remote server (it worked perfectly locally)
Expectation: Simple deployment
Reality: Infrastructure archaeology expedition ποΈ
π Starting Point: Local vs Remote
Local Environment:
β
celery worker --app=myapp # Works perfectly
β
celery beat --app=myapp # Schedules tasks
β
Tasks processing smoothly
Remote Server Attempt:
β Same commands failing
β Tasks not processing
β No clear error messages
β "Why doesn't this work remotely?!"
Classic developer moment: "But it works on my machine!" π
π οΈ Our First Solution: PM2
Since local worked, we decided to use PM2 to manage Celery processes remotely:
# Our PM2 setup attempt:
pm2 start "celery worker -A myapp" --name "celery-worker"
pm2 start "celery beat -A myapp" --name "celery-beat"
pm2 save
Expected result: Celery running via PM2 β
Actual result: Chaos and confusion πͺοΈ
π¨ The Mysterious Behavior
The "Zombie Process" Mystery:
# We'd stop PM2 processes:
pm2 stop celery-worker
pm2 delete celery-worker
# But somehow Celery kept running! π€―
ps aux | grep celery
# Still showing celery processes!
The Food Analogy That Explained Everything:
Imagine you're cooking in a shared kitchen:
- You (PM2): "I'll make dinner tonight!"
- Professional Chef (SystemD): Already cooking the same meal
- Result: Two people making the same dish, stepping on each other's toes π³
In our case:
- PM2: "I'll manage Celery!"
- SystemD: Already professionally managing Celery
- Result: Process conflicts, file locks, and chaos
π The Investigation Process
Step 1: Check Process Managers
# Check PM2
pm2 list # Our processes
# Check Supervisor (common alternative)
supervisorctl status # "No supervisor"
# The revelation - Check SystemD:
sudo systemctl status celery*
# π± SIX services already running!
Step 2: The Hierarchy Discovery
# What we found:
Linux System
βββ SystemD (System Service Manager) β The REAL boss
β βββ celery-worker-dev.service β
RUNNING
β βββ celery-beat-dev.service β
RUNNING
β βββ celery-flower-dev.service β
RUNNING
β
βββ PM2 (User Process Manager) β Our attempt
βββ celery-worker β CONFLICTING
βββ celery-beat β CONFLICTING
The lightbulb moment: SystemD supersedes PM2! π‘
π― Why SystemD Always Won
The Service Hierarchy:
- SystemD runs at system level (root privileges)
- PM2 runs at user level
- SystemD has higher priority and auto-restart capabilities
- When we killed PM2 processes, SystemD would restart its own!
The Food Kitchen Analogy Extended:
Professional Restaurant Kitchen (SystemD):
- Head Chef (SystemD) manages everything
- Established recipes and timing
- Auto-restarts if something goes wrong
- Full kitchen control
Home Cook with Microwave (PM2):
- Trying to cook in same kitchen
- Different timing and methods
- Gets confused when Head Chef intervenes
- Limited control and access
π§ The Real Issues We Discovered
1. Environment Conflicts
# Two environments running simultaneously:
DEV Services: Redis :6379, /app/dev/
MAIN Services: Redis :6378, /app/main/
# Beat scheduler file conflict:
_gdbm.error: Resource temporarily unavailable: 'celerybeat-schedule'
# Translation: Two schedulers fighting over the same file!
2. Missing Environment Variables
# SystemD services missing .env access:
β No BE_REDIS_URL
β No DB_URL
β Authentication failures to Redis
3. Wrong Import Paths
# Services using incorrect Celery import:
β -A api.utils.celery.celery_app # Directory approach (wrong)
β
-A api.utils.celery:celery_app # File approach (correct)
π οΈ The Solution Strategy
Step 1: Embrace SystemD (Stop Fighting It)
# Instead of fighting SystemD, work WITH it:
sudo systemctl stop celery-*-main.service # Stop conflicting services
sudo systemctl disable celery-*-main.service # Prevent auto-start
Step 2: Fix Environment Configuration
# Update SystemD service files with proper environment:
Environment="BE_REDIS_URL=redis://:password@host:6379"
Environment="DB_URL=postgresql+asyncpg://user:pass@host/db"
Step 3: Fix Import Paths
# Correct the Celery app reference:
ExecStart=venv/bin/celery -A api.utils.celery:celery_app worker
Step 4: Clean Up File Conflicts
# Remove corrupted beat schedule file:
sudo rm -f celerybeat-schedule*
sudo systemctl restart celery-beat-dev.service
β The Victory
Before (The Chaos):
β PM2 vs SystemD battle
β Process conflicts
β File locking errors
β No visibility into what's happening
β "It works locally but not remotely!"
After (The Harmony):
β
SystemD managing everything professionally
β
0.004s task execution time
β
Real-time Flower monitoring dashboard
β
Clean logs with success messages
β
Automatic midnight operations
Production Logs (The Proof):
[INFO] Task reset_user_swipes[abc123] received
[INFO] Task reset_user_swipes[abc123] succeeded in 0.004s:
{'status': 'success', 'total_users_processed': 47}
π Key Learnings
1. Local β Remote Environment
Just because it works locally doesn't mean remote deployment is straightforward. Production has different service management patterns.
2. Check Existing Infrastructure First
Before adding new process managers, discover what's already running. The server was already professionally configured!
3. Understand Service Hierarchies
SystemD (System Level) > PM2 (User Level)
Don't fight the system - work with it.
4. The "Food Kitchen" Principle
Multiple process managers = Multiple cooks in the same kitchen = Chaos
Better to have one professional system managing everything.
π The Architecture We Built
Remote Server Production Stack:
π± FastAPI App β πΈ Flower Dashboard β π΄ Redis β β‘ Celery Workers
β β β
Real-time monitor Task queue Processing
β β β
β° Celery Beat β SystemD Services Management
Result: Enterprise-grade background task system processing thousands of operations daily.
π‘ The Debugging Journey
1. "Works locally" β Deploy to remote
2. "Doesn't work remotely" β Try PM2
3. "PM2 acting weird" β Investigate processes
4. "Found SystemD!" β Understand conflicts
5. "Fix environment" β Configure properly
6. "Everything works!" β Production success
Time: 2 hours of detective work
Outcome: Robust, monitored, auto-scaling background task system
π The Real Victory
Technical: Transformed apparent failure into production excellence
Learning: Sometimes the best solution is understanding what's already there
Impact: 10k+ users getting automated daily swipe resets
The journey from "but it works locally!" to "production-grade infrastructure" taught us that effective debugging is part detective work, part systems understanding, and part knowing when to work WITH the system instead of against it.
Tools mastered: SystemD, Celery, Redis, Flower, Linux service management
Skills gained: Production debugging, infrastructure archaeology, service conflict resolution
The moral of the story: Before adding new tools, understand what tools are already doing the job. Sometimes the mysterious behavior isn't a bug - it's a feature you didn't know existed. π―
Top comments (0)