SeaweedFS Has a Free API: Distributed Object Storage for Billions of Files

#storage #devops #opensource #tutorial

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake. It implements an object store with O(1) disk seek and transparent cloud integration, handling billions of files efficiently.

What Is SeaweedFS?

SeaweedFS started as a distributed file system inspired by Facebook Haystack paper. It has evolved into a full-featured distributed storage system with S3 API compatibility, FUSE mount, Hadoop integration, and WebDAV support.

Key Features:

O(1) disk seek for file access
S3 API compatible
FUSE mount support
Automatic data replication
Erasure coding for storage efficiency
Built-in tiering to cloud storage
WebDAV, HDFS support

Quick Start

# Install SeaweedFS
wget https://github.com/seaweedfs/seaweedfs/releases/download/3.71/linux_amd64.tar.gz
tar xzf linux_amd64.tar.gz

# Start master server
./weed master -mdir=/tmp/mdata -port=9333 &

# Start volume server
./weed volume -dir=/tmp/vdata -max=5 -mserver=localhost:9333 -port=8080 &

# Start filer (optional, for directory structure)
./weed filer -master=localhost:9333 -port=8888 &

SeaweedFS API: Upload and Retrieve Files

import requests

MASTER = "http://localhost:9333"
FILER = "http://localhost:8888"

# Upload via master (volume-level)
# Step 1: Get a file ID
assign = requests.get(f"{MASTER}/dir/assign").json()
fid = assign["fid"]
url = assign["url"]
print(f"Assigned: fid={fid}, url={url}")

# Step 2: Upload the file
with open("photo.jpg", "rb") as f:
    response = requests.post(
        f"http://{url}/{fid}",
        files={"file": f}
    )
print(f"Uploaded: {response.json()}")

# Step 3: Read it back
data = requests.get(f"http://{url}/{fid}")
with open("downloaded.jpg", "wb") as f:
    f.write(data.content)

S3 API Compatibility

import boto3

# Connect to SeaweedFS S3 gateway
s3 = boto3.client(
    "s3",
    endpoint_url="http://localhost:8333",
    aws_access_key_id="any",
    aws_secret_access_key="any"
)

# Create bucket
s3.create_bucket(Bucket="my-data")

# Upload file
s3.upload_file("report.pdf", "my-data", "reports/2026/q1.pdf")

# List objects
objects = s3.list_objects_v2(Bucket="my-data", Prefix="reports/")
for obj in objects.get("Contents", []):
    print(f"{obj[Key]}: {obj[Size]} bytes")

Filer API: Directory-Based Access

# Upload via filer (preserves directory structure)
curl -F "file=@data.csv" http://localhost:8888/datasets/2026/

# List directory
curl http://localhost:8888/datasets/2026/?pretty=y

# Download
curl http://localhost:8888/datasets/2026/data.csv -o local.csv

SeaweedFS vs MinIO

Feature	SeaweedFS	MinIO
File access	O(1) disk seek	Standard
Small files	Optimized	Standard
Cloud tiering	Built-in	Enterprise
FUSE mount	Yes	No
Erasure coding	Yes	Yes
S3 compatible	Yes	Yes

Resources

SeaweedFS GitHub — 24K+ stars
SeaweedFS Wiki

Need to scrape web data at scale? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com