DEV Community

Polliog
Polliog

Posted on

Docker Compose for Production: Lessons from Deploying a Log Management Platform

When I started building LogWard - an open-source alternative to Datadog - I made a controversial decision: no custom installation scripts. Just plain Docker Compose files that users can read, understand, and modify.

Most self-hosted platforms give you a curl | bash script that does "magic" behind the scenes. That approach might be convenient, but it breaks trust in a privacy-first platform. If you can't see what's being deployed, how can you trust it with your logs?

Here's what I learned deploying a production-grade log management platform with transparent Docker Compose configurations.

The Philosophy: Transparency Over Convenience

# This is what users see - no hidden steps
services:
  postgres:
    image: timescale/timescaledb:latest-pg16
    environment:
      POSTGRES_DB: logward
      POSTGRES_USER: logward
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
Enter fullscreen mode Exit fullscreen mode

Why this matters:

  • Security teams can audit every line before deployment
  • Users understand what resources they're committing
  • No surprises about ports, volumes, or network configurations
  • Easy to customize for specific infrastructure needs

The trade-off? Users need to understand basic Docker concepts. But my target audience (European SMBs and developers) already runs Docker in production.

Lesson 1: Health Checks Are Non-Negotiable

Early versions of LogWard had a race condition: the backend would start before PostgreSQL was ready, leading to connection errors. The solution? Proper health checks and service dependencies.

postgres:
  image: timescale/timescaledb:latest-pg16
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U logward"]
    interval: 10s
    timeout: 5s
    retries: 5

backend:
  image: logward/backend:latest
  depends_on:
    postgres:
      condition: service_healthy  # πŸ‘ˆ Wait for actual health, not just "started"
Enter fullscreen mode Exit fullscreen mode

The Problem with depends_on Alone

Most tutorials show depends_on without condition: service_healthy. This only ensures services start in order - not that they're ready to accept connections.

Real-world impact: Before health checks, ~30% of first-time deployments failed because the backend tried to connect to PostgreSQL before it finished initializing. After implementing health checks: 0% startup failures.

# ❌ BAD: Service might not be ready
depends_on:
  - postgres

# βœ… GOOD: Wait for actual readiness
depends_on:
  postgres:
    condition: service_healthy
Enter fullscreen mode Exit fullscreen mode

Lesson 2: Pre-Built Images vs. Building Locally

Initially, I assumed users would clone the repo and run docker-compose up --build. This caused three problems:

  1. Slow deployments - building from source takes 5-10 minutes
  2. Build failures - different Node/pnpm versions caused inconsistencies
  3. Trust issues - "What if the build process does something malicious?"

The solution: Push pre-built images to Docker Hub and GitHub Container Registry.

backend:
  image: logward/backend:latest  # πŸ‘ˆ Pre-built, reproducible
  # Build from source is still possible for advanced users
  # build:
  #   context: .
  #   dockerfile: packages/backend/Dockerfile
Enter fullscreen mode Exit fullscreen mode

Results:

  • Deployment time: 10 minutes β†’ 2 minutes
  • Zero build-related support tickets
  • Users can inspect images before pulling: docker inspect logward/backend:latest

Multi-Platform Builds

LogWard supports both AMD64 and ARM64 (for M1 Macs and ARM servers). Using GitHub Actions:

# .github/workflows/publish-images.yml
- name: Build and push
  uses: docker/build-push-action@v5
  with:
    platforms: linux/amd64,linux/arm64
    push: true
    tags: logward/backend:${{ steps.version.outputs.version }}
Enter fullscreen mode Exit fullscreen mode

Now users on any architecture just run docker compose up -d and it works.

Lesson 3: Environment Variables - The Right Way

Early versions had secrets in the compose file. Bad idea.

# ❌ NEVER do this
environment:
  DATABASE_URL: postgresql://logward:supersecret123@postgres:5432/logward
Enter fullscreen mode Exit fullscreen mode

Instead, use .env files with generated secrets:

# docker-compose.yml
environment:
  DATABASE_URL: postgresql://logward:${DB_PASSWORD}@postgres:5432/logward
Enter fullscreen mode Exit fullscreen mode
# .env (not committed to git)
DB_PASSWORD=
REDIS_PASSWORD=
API_KEY_SECRET=
Enter fullscreen mode Exit fullscreen mode

Pro tip: I provide an install.sh script that generates secure random passwords automatically:

generate_password() {
    openssl rand -base64 32 | tr -d "=+/" | cut -c1-32
}

DB_PASSWORD=$(generate_password)
REDIS_PASSWORD=$(generate_password)
Enter fullscreen mode Exit fullscreen mode

But the script is optional - users can still manually create .env if they prefer.

Lesson 4: Volume Management for Data Persistence

Lost data is not an option for a log management platform. Here's how volumes work in LogWard:

services:
  postgres:
    volumes:
      - postgres_data:/var/lib/postgresql/data  # πŸ‘ˆ Named volume

volumes:
  postgres_data:
    driver: local  # πŸ‘ˆ Explicit driver (important for clustering later)
Enter fullscreen mode Exit fullscreen mode

Why Named Volumes vs. Bind Mounts?

Bind mounts (./data:/var/lib/postgresql/data):

  • ❌ Permission issues on different systems
  • ❌ Harder to backup/restore
  • ❌ Not portable across Docker hosts

Named volumes:

  • βœ… Docker manages permissions
  • βœ… Easy to backup: docker run --rm -v postgres_data:/data -v $(pwd):/backup ubuntu tar czf /backup/postgres_backup.tar.gz /data
  • βœ… Can be migrated to network storage later

Backup Strategy

I document this explicitly in the deployment guide:

# Create backup
docker compose exec postgres pg_dump -U logward logward > backup_$(date +%Y%m%d).sql

# Restore from backup
docker compose exec -T postgres psql -U logward logward < backup_20250115.sql
Enter fullscreen mode Exit fullscreen mode

Users need to know how to backup their data before disaster strikes.

Lesson 5: The Worker Pattern

LogWard has a worker service that shares the same image as the backend but runs background jobs (sending email alerts, processing Sigma rules, aggregating stats).

backend:
  image: logward/backend:latest
  command: ["server"]  # Default: runs Fastify API

worker:
  image: logward/backend:latest
  command: ["worker"]  # Same image, different entrypoint
  depends_on:
    postgres:
      condition: service_healthy
Enter fullscreen mode Exit fullscreen mode

Why this architecture?

  • Separation of concerns - API stays responsive even during heavy background processing
  • Independent scaling - can run multiple workers without scaling the API
  • Single image - reduces complexity and storage

The command override is handled in the Dockerfile:

# packages/backend/Dockerfile
ENTRYPOINT ["node"]
CMD ["dist/server.js"]  # Default

# Override with: docker run logward/backend worker
# This runs: node dist/worker.js
Enter fullscreen mode Exit fullscreen mode

Lesson 6: Restart Policies Matter

Production services crash. It's inevitable. The question is: do they recover?

backend:
  restart: unless-stopped  # πŸ‘ˆ Survives reboots and crashes
Enter fullscreen mode Exit fullscreen mode

Restart policy options:

  • no - Never restart (bad for production)
  • always - Always restart (even if manually stopped)
  • on-failure - Only restart on error codes
  • unless-stopped - Best for production - restarts unless explicitly stopped

Real-world scenario: A user reported their server rebooted for kernel updates. With restart: unless-stopped, LogWard came back online automatically. Without it, they would have had downtime until they manually restarted containers.

Lesson 7: Network Configuration

LogWard uses a custom bridge network instead of the default:

services:
  backend:
    networks:
      - logward-network

networks:
  logward-network:
    driver: bridge
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Services can reference each other by name (postgres:5432, redis:6379)
  • Isolated from other Docker projects
  • Better security - services aren't accessible from other networks

Pitfall I discovered: The frontend needs to connect to the backend from outside the Docker network (browser β†’ backend). So I expose ports:

backend:
  ports:
    - "8080:8080"  # Host:Container
  networks:
    - logward-network
Enter fullscreen mode Exit fullscreen mode

But internal services (like the worker) don't need exposed ports - they communicate via the internal network.

Lesson 8: Development vs. Production Compose Files

I maintain two compose files:

docker-compose.dev.yml - For contributors:

services:
  postgres:
    ports:
      - "5432:5432"  # πŸ‘ˆ Exposed for local development tools

  redis:
    ports:
      - "6379:6379"
Enter fullscreen mode Exit fullscreen mode

docker-compose.yml - For production:

services:
  postgres:
    # No ports exposed - only accessible via internal network
Enter fullscreen mode Exit fullscreen mode

Run with: docker compose -f docker-compose.dev.yml up

Why separate files?

  • Development needs database access from host (for migrations, debugging)
  • Production should never expose databases to the host network

Lesson 9: Monitoring and Logs

Every service has logging configured:

# View logs for a specific service
docker compose logs -f backend

# View last 100 lines
docker compose logs --tail=100 backend

# View logs for all services
docker compose logs -f
Enter fullscreen mode Exit fullscreen mode

I also added a health endpoint to the backend:

// GET /health
fastify.get('/health', async () => {
  const dbHealthy = await checkDatabase();
  const redisHealthy = await checkRedis();

  return {
    status: dbHealthy && redisHealthy ? 'healthy' : 'unhealthy',
    services: { postgres: dbHealthy, redis: redisHealthy }
  };
});
Enter fullscreen mode Exit fullscreen mode

Users can monitor this with a simple cron job:

# Add to crontab
*/5 * * * * curl -f http://localhost:8080/health || systemctl restart docker-compose-logward
Enter fullscreen mode Exit fullscreen mode

Lesson 10: What About Kubernetes?

People ask: "Why not Kubernetes? Isn't Docker Compose just for development?"

My take: For 90% of self-hosted use cases (SMBs, startups, personal projects), Docker Compose is perfect:

  • βœ… Runs on a single $5/month VPS
  • βœ… No learning curve beyond basic Docker
  • βœ… Easy to backup and restore
  • βœ… Low operational complexity

Kubernetes makes sense when you need:

  • Multi-node clustering
  • Auto-scaling across dozens of instances
  • Complex orchestration with service meshes

But LogWard's target users don't need that complexity. They need simple, transparent, reliable deployments.

The Results

After 3 months of production deployments:

  • 0 deployment failures due to Docker issues
  • Average deployment time: 2 minutes (from git clone to running services)
  • Support tickets related to deployment: ~5% (vs. 40% in early versions)
  • Self-hosting adoption rate: 35% of users prefer self-hosting over our cloud offering

Key Takeaways

  1. Transparency builds trust - visible configuration files are better than magic scripts
  2. Health checks prevent race conditions - use condition: service_healthy
  3. Pre-built images save time - build once, deploy everywhere
  4. Named volumes for persistence - easier backups and migrations
  5. Restart policies for resilience - unless-stopped is production-ready
  6. Separate dev and production configs - security and convenience aren't the same
  7. Document backup procedures - before users need them
  8. Docker Compose scales - you don't need Kubernetes for everything

Try It Yourself

LogWard is open source (AGPLv3). You can see the complete Docker setup here:

# Try it in 2 minutes
git clone https://github.com/logward-dev/logward.git
cd logward/docker
cp ../.env.example .env
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

What's your experience with Docker Compose in production? Have you encountered different challenges? Let me know in the comments!

Top comments (2)

Collapse
 
saxenaaman628 profile image
Aman Saxena

This is interesting project, Can i also contribute to it if its still in development @polliog

Collapse
 
polliog profile image
Polliog

Hey! Absolutely, that would be amazing!
Yes, the project is very active (we just released v0.2.4). We are building a monorepo with SvelteKit 5 and Fastify, so there is plenty of fun stuff to work on.

The best way to start is to check the "good first issue" label on our GitHub Issues page. Or, if you have a specific feature in mind (or a missing SDK you want to build), feel free to open an issue to discuss it!