DEV Community

Alex Spinov
Alex Spinov

Posted on

SeaweedFS Has a Free API: Distributed Object Storage for Billions of Files

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake. It implements an object store with O(1) disk seek and transparent cloud integration, handling billions of files efficiently.

What Is SeaweedFS?

SeaweedFS started as a distributed file system inspired by Facebook Haystack paper. It has evolved into a full-featured distributed storage system with S3 API compatibility, FUSE mount, Hadoop integration, and WebDAV support.

Key Features:

  • O(1) disk seek for file access
  • S3 API compatible
  • FUSE mount support
  • Automatic data replication
  • Erasure coding for storage efficiency
  • Built-in tiering to cloud storage
  • WebDAV, HDFS support

Quick Start

# Install SeaweedFS
wget https://github.com/seaweedfs/seaweedfs/releases/download/3.71/linux_amd64.tar.gz
tar xzf linux_amd64.tar.gz

# Start master server
./weed master -mdir=/tmp/mdata -port=9333 &

# Start volume server
./weed volume -dir=/tmp/vdata -max=5 -mserver=localhost:9333 -port=8080 &

# Start filer (optional, for directory structure)
./weed filer -master=localhost:9333 -port=8888 &
Enter fullscreen mode Exit fullscreen mode

SeaweedFS API: Upload and Retrieve Files

import requests

MASTER = "http://localhost:9333"
FILER = "http://localhost:8888"

# Upload via master (volume-level)
# Step 1: Get a file ID
assign = requests.get(f"{MASTER}/dir/assign").json()
fid = assign["fid"]
url = assign["url"]
print(f"Assigned: fid={fid}, url={url}")

# Step 2: Upload the file
with open("photo.jpg", "rb") as f:
    response = requests.post(
        f"http://{url}/{fid}",
        files={"file": f}
    )
print(f"Uploaded: {response.json()}")

# Step 3: Read it back
data = requests.get(f"http://{url}/{fid}")
with open("downloaded.jpg", "wb") as f:
    f.write(data.content)
Enter fullscreen mode Exit fullscreen mode

S3 API Compatibility

import boto3

# Connect to SeaweedFS S3 gateway
s3 = boto3.client(
    "s3",
    endpoint_url="http://localhost:8333",
    aws_access_key_id="any",
    aws_secret_access_key="any"
)

# Create bucket
s3.create_bucket(Bucket="my-data")

# Upload file
s3.upload_file("report.pdf", "my-data", "reports/2026/q1.pdf")

# List objects
objects = s3.list_objects_v2(Bucket="my-data", Prefix="reports/")
for obj in objects.get("Contents", []):
    print(f"{obj[Key]}: {obj[Size]} bytes")
Enter fullscreen mode Exit fullscreen mode

Filer API: Directory-Based Access

# Upload via filer (preserves directory structure)
curl -F "file=@data.csv" http://localhost:8888/datasets/2026/

# List directory
curl http://localhost:8888/datasets/2026/?pretty=y

# Download
curl http://localhost:8888/datasets/2026/data.csv -o local.csv
Enter fullscreen mode Exit fullscreen mode

SeaweedFS vs MinIO

Feature SeaweedFS MinIO
File access O(1) disk seek Standard
Small files Optimized Standard
Cloud tiering Built-in Enterprise
FUSE mount Yes No
Erasure coding Yes Yes
S3 compatible Yes Yes

Resources


Need to scrape web data at scale? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com

Top comments (0)