Tired of choosing between file storage performance and metadata flexibility? Discover how combining SeaweedFS's distributed storage with PostgreSQL's SQL power creates a production-ready solution. I'll walk you through a battle-tested Docker setup with pro tips! π³
Why This Combo Beats LevelDB π₯
LevelDB (Default) | PostgreSQL Power | |
---|---|---|
Query Capability | Basic key-value | Full SQL + JOINs |
Data Safety | Single-node | ACID Compliant |
Scalability | Manual sharding | Built-in replication |
Maintenance | File-level backups |
pg_dump + WAL archiving |
Real-world advantage: Need to find "all PDFs edited by user X last week"? Try that with LevelDB! π
Core Architecture: The Metadata Engine π§
The filemeta
Table Schema
CREATE TABLE IF NOT EXISTS filemeta (
dirhash BIGINT, -- Optimized directory hash (no path locking!)
name VARCHAR(65535), -- Supports looong filenames (65k chars!)
directory VARCHAR(65535), -- Full path storage (e.g., '/user_uploads')
meta bytea, -- Serialized protobuf (timestamps, permissions)
PRIMARY KEY (dirhash, name) -- Blazing fast lookups + uniqueness
);
Pro Tip: The dirhash
uses consistent hashing to prevent directory-level locks during concurrent writes. π«π
Production-Grade Docker Setup π³
Network & Volume Configuration
# docker-compose.yml
version: "3.7"
volumes:
postgres_data: # Persistent metadata
volume_data: # Storage volume 1
volume_data2: # Storage volume 2 (replication)
networks:
seaweed:
driver: bridge
attachable: true # For future expansion
Services Breakdown
1. SeaweedFS Master + Volumes
services:
master:
image: chrislusf/seaweedfs
command: "master -ip=master -defaultReplication=001"
ports: ["9333:9333", "19333:19333"] # API + metrics
volume:
command: "volume -dir=/data -mserver=master:9333 -port=8080"
ports: ["8080:8080", "18080:18080"] # Metrics matter!
volumes: ["volume_data:/data"]
volume2:
command: "volume -dir=/data2 -mserver=master:9333 -port=8081"
ports: ["8081:8081", "18081:18081"]
volumes: ["volume_data2:/data2"]
Replication Strategy: defaultReplication=001
means 2 replicas (same as -copy=2
)
2. PostgreSQL Configuration
postgres:
image: postgres:15-alpine # Smaller footprint
environment:
POSTGRES_USER: seaweed
POSTGRES_PASSWORD: seaweed
POSTGRES_DB: seaweed
healthcheck: # Critical for startup order
test: ["CMD-SHELL", "pg_isready -U seaweed"]
timeout: 5s
retries: 10 # More retries for slow systems
volumes:
- postgres_data:/var/lib/postgresql/data
3. Filer Service (The Glue)
filer:
image: chrislusf/seaweedfs
depends_on:
postgres:
condition: service_healthy
master:
condition: service_started
entrypoint: ["/bin/sh", "-c"]
command:
- |
apk add --no-cache postgresql-client &&
/scripts/check_create_filemeta.sh &&
weed filer -master=master:9333
volumes:
- "./check_create_filemeta.sh:/scripts/check_create_filemeta.sh"
- "./filer.toml:/etc/seaweedfs/filer.toml"
Critical Configuration Files π§
Table Initialization Script
#!/bin/sh
# check_create_filemeta.sh
set -eo pipefail # Fail on any error
PGPASSWORD="seaweed" psql -h postgres -U seaweed -c """
CREATE TABLE IF NOT EXISTS filemeta (
dirhash BIGINT,
name VARCHAR(65535),
directory VARCHAR(65535),
meta bytea,
PRIMARY KEY (dirhash, name)
;""" || exit 1
echo "Metadata table ready β
"
Filer Configuration
# filer.toml
[postgres]
enabled = true
hostname = "postgres"
port = 5432
sslmode = "disable" # Use "require" in production!
# Connection Pooling
connection_max_idle = 50
connection_max_open = 100 # Adjust based on load
Pro Tips from Production π‘
-
Monitor These Metrics π
- PostgreSQL: Connection pool usage, dead tuples
- SeaweedFS: Volume server disk IOPS
- Filer: gRPC request latency
Backup Strategy πΎ
# PostgreSQL daily dump
docker exec postgres pg_dump -U seaweed -Fc seaweed > seaweed.dump
# SeaweedFS metadata backup
weed shell -master=master:9333 "fs.meta.backup"
-
Scale Out βοΈ
- Add more volume servers
- Use PostgreSQL read replicas for metadata
- Implement Redis caching for hot metadata
Let's Get Started! π
# Spin up the stack
docker-compose up -d --wait
# Verify services
docker-compose ps
Challenge: Try adding Prometheus monitoring using the exposed metrics ports! π
Discussion Time! π¬
- Would you prefer this over S3-like storage for your use case?
- What other metadata would you store in PostgreSQL?
- Found an optimization? Let's hear it! π
Coffee Promise β: First person to spot a typo gets a virtual coffee!
Top comments (0)