DEV Community

Cover image for πŸš€ Supercharge Your File Storage: SeaweedFS + PostgreSQL in 15 Minutes
benjeddou monem
benjeddou monem

Posted on

πŸš€ Supercharge Your File Storage: SeaweedFS + PostgreSQL in 15 Minutes

Tired of choosing between file storage performance and metadata flexibility? Discover how combining SeaweedFS's distributed storage with PostgreSQL's SQL power creates a production-ready solution. I'll walk you through a battle-tested Docker setup with pro tips! 🐳

Why This Combo Beats LevelDB πŸ”₯

LevelDB (Default) PostgreSQL Power
Query Capability Basic key-value Full SQL + JOINs
Data Safety Single-node ACID Compliant
Scalability Manual sharding Built-in replication
Maintenance File-level backups pg_dump + WAL archiving

Real-world advantage: Need to find "all PDFs edited by user X last week"? Try that with LevelDB! πŸ”


Core Architecture: The Metadata Engine 🧠

The filemeta Table Schema

CREATE TABLE IF NOT EXISTS filemeta (
  dirhash     BIGINT,         -- Optimized directory hash (no path locking!)
  name        VARCHAR(65535), -- Supports looong filenames (65k chars!)
  directory   VARCHAR(65535), -- Full path storage (e.g., '/user_uploads')
  meta        bytea,          -- Serialized protobuf (timestamps, permissions)
  PRIMARY KEY (dirhash, name) -- Blazing fast lookups + uniqueness
);
Enter fullscreen mode Exit fullscreen mode

Pro Tip: The dirhash uses consistent hashing to prevent directory-level locks during concurrent writes. πŸš«πŸ”’


Production-Grade Docker Setup 🐳

Network & Volume Configuration

# docker-compose.yml
version: "3.7"

volumes:
  postgres_data:  # Persistent metadata
  volume_data:    # Storage volume 1
  volume_data2:   # Storage volume 2 (replication)

networks:
  seaweed:
    driver: bridge
    attachable: true  # For future expansion
Enter fullscreen mode Exit fullscreen mode

Services Breakdown

1. SeaweedFS Master + Volumes

services:
  master:
    image: chrislusf/seaweedfs
    command: "master -ip=master -defaultReplication=001"
    ports: ["9333:9333", "19333:19333"]  # API + metrics

  volume:
    command: "volume -dir=/data -mserver=master:9333 -port=8080"
    ports: ["8080:8080", "18080:18080"]  # Metrics matter!
    volumes: ["volume_data:/data"]

  volume2:
    command: "volume -dir=/data2 -mserver=master:9333 -port=8081" 
    ports: ["8081:8081", "18081:18081"]
    volumes: ["volume_data2:/data2"]
Enter fullscreen mode Exit fullscreen mode

Replication Strategy: defaultReplication=001 means 2 replicas (same as -copy=2)


2. PostgreSQL Configuration

  postgres:
    image: postgres:15-alpine  # Smaller footprint
    environment:
      POSTGRES_USER: seaweed
      POSTGRES_PASSWORD: seaweed
      POSTGRES_DB: seaweed
    healthcheck:  # Critical for startup order
      test: ["CMD-SHELL", "pg_isready -U seaweed"]
      timeout: 5s
      retries: 10  # More retries for slow systems
    volumes:
      - postgres_data:/var/lib/postgresql/data
Enter fullscreen mode Exit fullscreen mode

3. Filer Service (The Glue)

  filer:
    image: chrislusf/seaweedfs
    depends_on:
      postgres:
        condition: service_healthy
      master:
        condition: service_started
    entrypoint: ["/bin/sh", "-c"]
    command:
      - |
        apk add --no-cache postgresql-client && 
        /scripts/check_create_filemeta.sh &&
        weed filer -master=master:9333
    volumes:
      - "./check_create_filemeta.sh:/scripts/check_create_filemeta.sh"
      - "./filer.toml:/etc/seaweedfs/filer.toml"
Enter fullscreen mode Exit fullscreen mode

Critical Configuration Files πŸ”§

Table Initialization Script

#!/bin/sh
# check_create_filemeta.sh
set -eo pipefail  # Fail on any error

PGPASSWORD="seaweed" psql -h postgres -U seaweed -c """
CREATE TABLE IF NOT EXISTS filemeta (
  dirhash     BIGINT,
  name        VARCHAR(65535),
  directory   VARCHAR(65535),
  meta        bytea,
  PRIMARY KEY (dirhash, name)
;""" || exit 1

echo "Metadata table ready βœ…"
Enter fullscreen mode Exit fullscreen mode

Filer Configuration

# filer.toml
[postgres]
enabled = true
hostname = "postgres"
port = 5432
sslmode = "disable"  # Use "require" in production!

# Connection Pooling
connection_max_idle = 50
connection_max_open = 100  # Adjust based on load
Enter fullscreen mode Exit fullscreen mode

Pro Tips from Production πŸ’‘

  1. Monitor These Metrics πŸ“Š

    • PostgreSQL: Connection pool usage, dead tuples
    • SeaweedFS: Volume server disk IOPS
    • Filer: gRPC request latency
  2. Backup Strategy πŸ’Ύ

   # PostgreSQL daily dump
   docker exec postgres pg_dump -U seaweed -Fc seaweed > seaweed.dump

   # SeaweedFS metadata backup
   weed shell -master=master:9333 "fs.meta.backup"
Enter fullscreen mode Exit fullscreen mode
  1. Scale Out βš–οΈ
    • Add more volume servers
    • Use PostgreSQL read replicas for metadata
    • Implement Redis caching for hot metadata

Let's Get Started! πŸš€

# Spin up the stack
docker-compose up -d --wait

# Verify services
docker-compose ps
Enter fullscreen mode Exit fullscreen mode

Challenge: Try adding Prometheus monitoring using the exposed metrics ports! πŸ“ˆ


Discussion Time! πŸ’¬

  • Would you prefer this over S3-like storage for your use case?
  • What other metadata would you store in PostgreSQL?
  • Found an optimization? Let's hear it! πŸ‘‡

Coffee Promise β˜•: First person to spot a typo gets a virtual coffee!

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more