When Cloudflare D1's 2MB Limit Taught Me a Hard Lesson About Database Design

#d1 #webdev #architecture #database

As a developer working on ZinFlow, a web-to-EPUB converter app, I recently learned an expensive lesson about database constraints and proper architecture design. Let me share this painful but educational experience.

The Problem: Storing HTML Content in D1

My backend service extracts main content from web pages and processes it through what I call the "Smart Distillation Engine." Initially, I thought it would be convenient to store the extracted HTML content directly in Cloudflare's D1 database as a text field. After all, most articles aren't that large, right?

Wrong.

What I didn't pay attention to was this crucial limitation in D1's documentation:

Maximum string, BLOB or table row size: 2,000,000 bytes (2 MB)

The Disaster Strikes

Everything worked fine during development and early testing. Then users started saving longer articles, academic papers, and detailed tutorials. Suddenly, I started receiving error reports about failed saves.

The error was clear but devastating: Database row size exceeded maximum allowed size.

Users were losing their carefully curated content, and I was losing sleep trying to figure out why some articles worked while others failed spectacularly.

The Root Cause Analysis

After digging deeper, I realized several fundamental issues with my approach:

HTML content is unpredictable in size - A single long-form article can easily exceed 2MB when including embedded styles and metadata
Database bloat - Storing large text content in database rows is generally poor practice
Backup and migration nightmares - Large rows make database operations slower and more error-prone
Scalability issues - As the service grows, this design would become increasingly problematic

The Solution: Hybrid Storage Architecture

I've now refactored the backend to use a hybrid approach:

// Pseudocode for the new architecture
async function storeArticleContent(articleData) {
  const contentSize = Buffer.byteLength(articleData.html, 'utf8');

  if (contentSize > 1024 * 1024) { // 1MB threshold
    // Store large content in R2
    const r2Key = `articles/${articleData.id}/content.html`;
    await uploadToR2(r2Key, articleData.html);

    // Store metadata and reference in D1
    await d1.prepare(`
      INSERT INTO articles (id, title, url, content_location, storage_type, created_at)
      VALUES (?, ?, ?, ?, 'r2', ?)
    `).bind(
      articleData.id,
      articleData.title,
      articleData.url,
      r2Key,
      new Date().toISOString()
    ).run();
  } else {
    // Store small content directly in D1
    await d1.prepare(`
      INSERT INTO articles (id, title, url, content, storage_type, created_at)
      VALUES (?, ?, ?, ?, 'database', ?)
    `).bind(
      articleData.id,
      articleData.title,
      articleData.url,
      articleData.html,
      new Date().toISOString()
    ).run();
  }
}

Lessons Learned

Read the documentation thoroughly - Those limits exist for good reasons
Separate content from metadata - Large content should live in object storage, not relational databases
Plan for edge cases - Always consider the largest possible data your system might handle
Test with realistic data - My test articles were all small blog posts, not 50-page academic papers

The Silver Lining

This painful experience led to a much better architecture. The hybrid approach gives me:

Reliability - No more failed saves due to size limits
Performance - Faster database queries with smaller rows
Scalability - R2 can handle files of virtually any size
Cost efficiency - Object storage is cheaper for large content

For Fellow Developers

If you're building content-heavy applications with Cloudflare's stack:

Use D1 for metadata, relationships, and small structured data
Use R2 for large content, files, and binary data
Set reasonable thresholds (I chose 1MB as a safe buffer below D1's 2MB limit)
Always validate input sizes before database operations

Sometimes the most valuable lessons come from the most frustrating bugs. This experience reminded me why proper system architecture matters, especially when building services that users depend on for their important content.

Have you run into similar storage limitations? I'd love to hear about your experiences and solutions in the comments!

Building ZinFlow has been a journey of learning about web content extraction, EPUB generation, and now proper cloud architecture. If you're interested in converting web articles to e-books, you can check out the app at wanyi.dev/en/zinflow.