DEV Community

Karan Verma for Docker

Posted on

2

Building a Scalable Event-Driven Pipeline with MongoDB, Docker, and Kafka

In modern DevOps workflows, handling real-time data streams efficiently is crucial for building scalable applications. In this guide, we'll explore how to set up an event-driven pipeline using MongoDB, Docker, and Kafka to handle high-throughput data processing with ease.

Imagine an e-commerce platform processing millions of orders in real time. Our setup ensures seamless, fault-tolerant data streaming between services.

1. Why Event-Driven Architectures?

Traditional architectures struggle with real-time processing, batch jobs, and scalability. Event-driven systems address these problems by:

  • Decoupling components for greater scalability.
  • Processing data in real-time instead of batch operations.
  • Enhancing fault tolerance through asynchronous messaging.

Kafka serves as the central message broker, while MongoDB acts as a persistent data store for event logs and structured data.

2. Setting Up MongoDB with Docker

To run MongoDB in a containerized environment, use the following Docker Compose setup:

version: '3.8'
services:
  mongodb:
    image: mongo:latest
    container_name: mongodb
    restart: always
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example
    volumes:
      - mongodb_data:/data/db
volumes:
  mongodb_data:
Enter fullscreen mode Exit fullscreen mode

Run MongoDB with:

docker-compose up -d

Now, MongoDB is up and running on port 27017.

3. Deploying Kafka in Docker

Kafka requires Zookeeper for coordination. We'll deploy both using Docker Compose:

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:latest
    container_name: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
Enter fullscreen mode Exit fullscreen mode

Start Kafka with:

docker-compose up -d

Check Kafka logs to confirm it's running:

docker logs -f kafka

4. Connecting Kafka & MongoDB

Kafka Connect enables data streaming between Kafka and MongoDB.

Step 1: Install MongoDB Kafka Connector

docker exec -it kafka bash
confluent-hub install mongodb/kafka-connect-mongodb:latest

Enter fullscreen mode Exit fullscreen mode

Step 2: Configure Kafka Connector

Create a mongo-sink.json file:

{
  "name": "mongo-sink-connector",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
    "topics": "events",
    "connection.uri": "mongodb://root:example@mongodb:27017",
    "database": "eventDB",
    "collection": "eventLogs"
  }
}
Enter fullscreen mode Exit fullscreen mode

Apply the configuration:

curl -X POST -H "Content-Type: application/json" --data @mongo-sink.json http://localhost:8083/connectors
Enter fullscreen mode Exit fullscreen mode

Now, Kafka will stream events directly into MongoDB! πŸš€

5. Scaling with Docker Swarm and Kubernetes

Deploying with Docker Swarm

To deploy MongoDB and Kafka as a Swarm service, initialize Swarm:

docker swarm init

Deploy services:

docker stack deploy -c docker-compose.yml event-pipeline

Now, the services are running as a scalable stack!

Deploying with Kubernetes and Helm

To deploy Kafka and MongoDB on Kubernetes, use Helm charts:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install kafka bitnami/kafka
helm install mongodb bitnami/mongodb
Enter fullscreen mode Exit fullscreen mode

This ensures high availability and fault tolerance.

6. Optimizing Docker Images for Performance

To build efficient and secure containers:

  • Use small base images like Alpine: FROM alpine:latest
  • Minimize layers with multi-stage builds.
  • Use .dockerignore to exclude unnecessary files.
  • Enable Docker BuildKit for faster builds:

DOCKER_BUILDKIT=1 docker build .

7. Automating with DevOps Tools

  • CI/CD Pipelines: Automate deployment with Jenkins/GitHub Actions.
  • Infrastructure as Code (IaC): Use Terraform or Kubernetes for scalable deployments.
  • Monitoring & Logging: Leverage Prometheus and Grafana for system health.

Final Thoughts

By integrating MongoDB, Kafka, and Docker, we've built a scalable event-driven pipeline. This setup is perfect for real-time analytics, log processing, and microservices architectures.

πŸ’‘ β€œHow have you tackled event-driven architectures? Let’s discuss in the comments!”

Top comments (0)