SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake. It implements an object store with O(1) disk seek and transparent cloud integration, handling billions of files efficiently.
What Is SeaweedFS?
SeaweedFS started as a distributed file system inspired by Facebook Haystack paper. It has evolved into a full-featured distributed storage system with S3 API compatibility, FUSE mount, Hadoop integration, and WebDAV support.
Key Features:
- O(1) disk seek for file access
- S3 API compatible
- FUSE mount support
- Automatic data replication
- Erasure coding for storage efficiency
- Built-in tiering to cloud storage
- WebDAV, HDFS support
Quick Start
# Install SeaweedFS
wget https://github.com/seaweedfs/seaweedfs/releases/download/3.71/linux_amd64.tar.gz
tar xzf linux_amd64.tar.gz
# Start master server
./weed master -mdir=/tmp/mdata -port=9333 &
# Start volume server
./weed volume -dir=/tmp/vdata -max=5 -mserver=localhost:9333 -port=8080 &
# Start filer (optional, for directory structure)
./weed filer -master=localhost:9333 -port=8888 &
SeaweedFS API: Upload and Retrieve Files
import requests
MASTER = "http://localhost:9333"
FILER = "http://localhost:8888"
# Upload via master (volume-level)
# Step 1: Get a file ID
assign = requests.get(f"{MASTER}/dir/assign").json()
fid = assign["fid"]
url = assign["url"]
print(f"Assigned: fid={fid}, url={url}")
# Step 2: Upload the file
with open("photo.jpg", "rb") as f:
response = requests.post(
f"http://{url}/{fid}",
files={"file": f}
)
print(f"Uploaded: {response.json()}")
# Step 3: Read it back
data = requests.get(f"http://{url}/{fid}")
with open("downloaded.jpg", "wb") as f:
f.write(data.content)
S3 API Compatibility
import boto3
# Connect to SeaweedFS S3 gateway
s3 = boto3.client(
"s3",
endpoint_url="http://localhost:8333",
aws_access_key_id="any",
aws_secret_access_key="any"
)
# Create bucket
s3.create_bucket(Bucket="my-data")
# Upload file
s3.upload_file("report.pdf", "my-data", "reports/2026/q1.pdf")
# List objects
objects = s3.list_objects_v2(Bucket="my-data", Prefix="reports/")
for obj in objects.get("Contents", []):
print(f"{obj[Key]}: {obj[Size]} bytes")
Filer API: Directory-Based Access
# Upload via filer (preserves directory structure)
curl -F "file=@data.csv" http://localhost:8888/datasets/2026/
# List directory
curl http://localhost:8888/datasets/2026/?pretty=y
# Download
curl http://localhost:8888/datasets/2026/data.csv -o local.csv
SeaweedFS vs MinIO
| Feature | SeaweedFS | MinIO |
|---|---|---|
| File access | O(1) disk seek | Standard |
| Small files | Optimized | Standard |
| Cloud tiering | Built-in | Enterprise |
| FUSE mount | Yes | No |
| Erasure coding | Yes | Yes |
| S3 compatible | Yes | Yes |
Resources
- SeaweedFS GitHub — 24K+ stars
- SeaweedFS Wiki
Need to scrape web data at scale? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com
Top comments (0)