<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Courtney Robinson</title>
    <description>The latest articles on DEV Community by Courtney Robinson (@zcourts).</description>
    <link>https://dev.to/zcourts</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F496171%2F882cfb9f-c8e3-4e24-a4e9-ef2baed82bc8.png</url>
      <title>DEV Community: Courtney Robinson</title>
      <link>https://dev.to/zcourts</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zcourts"/>
    <language>en</language>
    <item>
      <title>How We Built an AI‑Native Object Store (Tensor Streaming, Erasure Coding, QUIC, Rust)</title>
      <dc:creator>Courtney Robinson</dc:creator>
      <pubDate>Wed, 19 Nov 2025 12:14:09 +0000</pubDate>
      <link>https://dev.to/zcourts/how-we-built-an-ai-native-object-store-tensor-streaming-erasure-coding-quic-rust-28b8</link>
      <guid>https://dev.to/zcourts/how-we-built-an-ai-native-object-store-tensor-streaming-erasure-coding-quic-rust-28b8</guid>
      <description>&lt;p&gt;Over the past year my team and I have been building an AI product that needed to serve &lt;strong&gt;large LLM model files&lt;/strong&gt; reliably, quickly, and privately.&lt;/p&gt;

&lt;p&gt;We assumed the existing tooling would “just work”:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Git LFS
&lt;/li&gt;
&lt;li&gt;Hugging Face repos
&lt;/li&gt;
&lt;li&gt;S3 / MinIO
&lt;/li&gt;
&lt;li&gt;generic object stores
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But once we started working with &lt;strong&gt;multi‑GB safetensors&lt;/strong&gt;, &lt;strong&gt;gguf&lt;/strong&gt;, &lt;strong&gt;ONNX&lt;/strong&gt;, and &lt;strong&gt;datasets&lt;/strong&gt;, everything broke in predictable and painful ways.&lt;/p&gt;

&lt;p&gt;This post explains the technical journey that led us to build &lt;strong&gt;Anvil&lt;/strong&gt; — an &lt;strong&gt;open‑source, S3‑compatible, AI‑native object store built in Rust&lt;/strong&gt; — and how we designed it around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tensor‑level streaming
&lt;/li&gt;
&lt;li&gt;Model‑aware indexing
&lt;/li&gt;
&lt;li&gt;QUIC transport
&lt;/li&gt;
&lt;li&gt;Erasure‑coded distributed storage
&lt;/li&gt;
&lt;li&gt;Simple Docker deployment
&lt;/li&gt;
&lt;li&gt;Multi‑region clustering
&lt;/li&gt;
&lt;li&gt;gRPC APIs + S3 compatibility
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And why we decided to &lt;strong&gt;open source the entire project (Apache‑2.0)&lt;/strong&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Pain That Set This All In Motion
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Git LFS
&lt;/h3&gt;

&lt;p&gt;Failed repeatedly at multi‑GB model files. Corruption, slow diffs, weird retry loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hugging Face
&lt;/h3&gt;

&lt;p&gt;Amazing for public hosting — but for private/internal models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rate limits&lt;/li&gt;
&lt;li&gt;slow downloads&lt;/li&gt;
&lt;li&gt;no control over the infra&lt;/li&gt;
&lt;li&gt;not ideal for production workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  S3 / MinIO
&lt;/h3&gt;

&lt;p&gt;Rock‑solid for normal object storage, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;treats model files as “dumb blobs”
&lt;/li&gt;
&lt;li&gt;no safetensor/gguf indexing
&lt;/li&gt;
&lt;li&gt;no tensor‑level streaming
&lt;/li&gt;
&lt;li&gt;full downloads required before inferencing
&lt;/li&gt;
&lt;li&gt;expensive when replication is used for durability
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Our own app’s needs
&lt;/h3&gt;

&lt;p&gt;We have users on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;machines with 4–8GB VRAM
&lt;/li&gt;
&lt;li&gt;laptops needing local/offline inference
&lt;/li&gt;
&lt;li&gt;mobile‑adjacent devices
&lt;/li&gt;
&lt;li&gt;distributed clusters needing fast warm starts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We could not afford 5–15GB full model downloads for every startup.&lt;br&gt;&lt;br&gt;
We needed inference to start &lt;strong&gt;instantly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s when we realized:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Object stores were never built for AI workloads.&lt;br&gt;&lt;br&gt;
We needed something model‑aware.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h1&gt;
  
  
  Enter Anvil — What We Ended Up Building
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/worka-ai/anvil" rel="noopener noreferrer"&gt;https://github.com/worka-ai/anvil&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://worka.ai/docs/anvil/getting-started" rel="noopener noreferrer"&gt;https://worka.ai/docs/anvil/getting-started&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Landing:&lt;/strong&gt; &lt;a href="https://worka.ai/anvil" rel="noopener noreferrer"&gt;https://worka.ai/anvil&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/worka-ai/anvil/releases/latest" rel="noopener noreferrer"&gt;https://github.com/worka-ai/anvil/releases/latest&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Anvil started as an internal hack.&lt;br&gt;&lt;br&gt;
It’s now a complete, distributed object store built for ML systems.&lt;/p&gt;

&lt;p&gt;At a high level, Anvil is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;fully S3-compatible&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fully gRPC-native&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;simple (Docker-first) to run&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;built in Rust&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;open-source (Apache‑2.0)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;model-aware&lt;/strong&gt; (safetensors, gguf, onnx)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;supports tensor-streaming for partial inference loads&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;supports erasure coding (Ceph-style)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;clusterable (libp2p gossip + QUIC)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;multi-region&lt;/strong&gt; with isolated metadata
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive into the internals.&lt;/p&gt;


&lt;h1&gt;
  
  
  Model‑Aware Indexing (safetensors / gguf / onnx)
&lt;/h1&gt;

&lt;p&gt;This is one of the core innovations.&lt;/p&gt;

&lt;p&gt;When a model file is uploaded, Anvil automatically indexes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tensor names
&lt;/li&gt;
&lt;li&gt;byte offsets
&lt;/li&gt;
&lt;li&gt;dtypes
&lt;/li&gt;
&lt;li&gt;shapes
&lt;/li&gt;
&lt;li&gt;file segments
&lt;/li&gt;
&lt;li&gt;metadata
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows the client to do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anvilml&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;

&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://models/llama3.safetensors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;q_proj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;layers.12.attn.q_proj.weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No full download.&lt;br&gt;&lt;br&gt;
No giant memory spike.&lt;br&gt;&lt;br&gt;
Just one tensor.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;It enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;partial inference&lt;/strong&gt; on underpowered devices
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;instant warm starts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cold start reduction by ~12×&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;efficient multi‑variant fine‑tune workflows&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Tensor‑Level Streaming Over QUIC
&lt;/h1&gt;

&lt;p&gt;Instead of downloading the entire model file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the tensor index
&lt;/li&gt;
&lt;li&gt;Open a QUIC stream
&lt;/li&gt;
&lt;li&gt;Fetch only the byte ranges needed
&lt;/li&gt;
&lt;li&gt;Feed directly into the ML framework
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This results in:&lt;/p&gt;
&lt;h3&gt;
  
  
  🟢 Cold Start
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;37.1s → 2.9s&lt;/strong&gt; on a real 3B model.&lt;/p&gt;
&lt;h3&gt;
  
  
  🟢 Data transferred
&lt;/h3&gt;

&lt;p&gt;6.3GB → &lt;strong&gt;128MB&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  🟢 CPU and memory way lower
&lt;/h3&gt;

&lt;p&gt;QUIC gives us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiplexing
&lt;/li&gt;
&lt;li&gt;congestion control
&lt;/li&gt;
&lt;li&gt;lower latency
&lt;/li&gt;
&lt;li&gt;fewer TLS overheads than HTTP/2
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And QUIC is increasingly the default for high-performance ML workloads.&lt;/p&gt;


&lt;h1&gt;
  
  
  Erasure Coding for AI‑Sized Objects
&lt;/h1&gt;

&lt;p&gt;Traditional replication is &lt;strong&gt;expensive&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100GB model
&lt;/li&gt;
&lt;li&gt;3× replication
&lt;/li&gt;
&lt;li&gt;→ 300GB storage required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Erasure coding (like Ceph) gives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100GB
&lt;/li&gt;
&lt;li&gt;+ parity shards
&lt;/li&gt;
&lt;li&gt;→ &lt;strong&gt;~150GB&lt;/strong&gt; for the same durability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anvil uses &lt;strong&gt;Reed‑Solomon&lt;/strong&gt; encoding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;configurable shard counts
&lt;/li&gt;
&lt;li&gt;rebuilt on the fly
&lt;/li&gt;
&lt;li&gt;stored across the cluster automatically
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a life‑saver for multi‑GB models and datasets.&lt;/p&gt;


&lt;h1&gt;
  
  
  Multi‑Region Clustering (Gossip + Postgres)
&lt;/h1&gt;

&lt;p&gt;We adopted a split‑metadata pattern:&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Global Postgres&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;tenant metadata
&lt;/li&gt;
&lt;li&gt;bucket metadata
&lt;/li&gt;
&lt;li&gt;auth
&lt;/li&gt;
&lt;li&gt;region definitions
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Regional Postgres (one per region)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;object metadata
&lt;/li&gt;
&lt;li&gt;tensor index
&lt;/li&gt;
&lt;li&gt;block maps
&lt;/li&gt;
&lt;li&gt;journalling
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Node Discovery via libp2p&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Nodes gossip:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;liveness
&lt;/li&gt;
&lt;li&gt;region membership
&lt;/li&gt;
&lt;li&gt;shard ownership
&lt;/li&gt;
&lt;li&gt;cluster size
&lt;/li&gt;
&lt;li&gt;bootstrap points
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Zero configuration cluster growth:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;anvil &lt;span class="nt"&gt;--bootstrap&lt;/span&gt; /dns/anvil1/tcp/7443
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Code: Upload + Stream a Tensor
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Upload a model file
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws &lt;span class="nt"&gt;--endpoint-url&lt;/span&gt; http://localhost:9000 s3 &lt;span class="nb"&gt;cp &lt;/span&gt;llama3.safetensors s3://models/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stream a tensor
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anvilml&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;

&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://models/llama3.safetensors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;layers.8.attn.q_proj.weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploy locally
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Built for Local + Hybrid
&lt;/h1&gt;

&lt;p&gt;We wanted something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runs offline
&lt;/li&gt;
&lt;li&gt;runs on laptops
&lt;/li&gt;
&lt;li&gt;runs on home labs
&lt;/li&gt;
&lt;li&gt;runs across small teams
&lt;/li&gt;
&lt;li&gt;runs in production clusters
&lt;/li&gt;
&lt;li&gt;doesn’t require k8s or cloud lock‑in
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So Anvil is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;single binary
&lt;/li&gt;
&lt;li&gt;Docker-first
&lt;/li&gt;
&lt;li&gt;multi-region optional
&lt;/li&gt;
&lt;li&gt;no external services besides Postgres
&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Why Open Source?
&lt;/h1&gt;

&lt;p&gt;Because object storage is &lt;strong&gt;infrastructure&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
People need to trust it.&lt;br&gt;&lt;br&gt;
Teams need to inspect and extend it.&lt;br&gt;&lt;br&gt;
Researchers need to experiment with it.&lt;br&gt;&lt;br&gt;
ML engineers need to run it offline.&lt;/p&gt;

&lt;p&gt;We’re releasing Anvil under &lt;strong&gt;Apache‑2.0&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;full source
&lt;/li&gt;
&lt;li&gt;production-ready release
&lt;/li&gt;
&lt;li&gt;detailed docs
&lt;/li&gt;
&lt;li&gt;Python SDK
&lt;/li&gt;
&lt;li&gt;S3 API
&lt;/li&gt;
&lt;li&gt;examples and tutorials
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to run models locally, self-host private AI workloads, or build infra around LLMs — we hope Anvil is useful.&lt;/p&gt;




&lt;h1&gt;
  
  
  Links
&lt;/h1&gt;

&lt;h3&gt;
  
  
  GitHub
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/worka-ai/anvil" rel="noopener noreferrer"&gt;https://github.com/worka-ai/anvil&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Docs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://worka.ai/docs/anvil/getting-started" rel="noopener noreferrer"&gt;https://worka.ai/docs/anvil/getting-started&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Landing
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://worka.ai/anvil" rel="noopener noreferrer"&gt;https://worka.ai/anvil&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Release
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/worka-ai/anvil/releases/latest" rel="noopener noreferrer"&gt;https://github.com/worka-ai/anvil/releases/latest&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;If you have thoughts, critiques, architectural ideas, or want to break Anvil — we’d genuinely love feedback.&lt;br&gt;&lt;br&gt;
This is just the beginning.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>mlops</category>
      <category>s3</category>
    </item>
  </channel>
</rss>
