DEV Community

Cover image for How I Built a Realistic Page-Flip Engine in the Browser — and Wired It to an AI API
FlipRead
FlipRead

Posted on

How I Built a Realistic Page-Flip Engine in the Browser — and Wired It to an AI API

A solo developer's technical deep-dive into HTML5 Canvas rendering, multi-format document parsing, and AI storybook generation — the full stack behind FlipFlow, a free online flipbook maker.


TL;DR

I built FlipFlow — a web app that converts PDF, PPT, Word, and images into interactive flipbooks with realistic page-turn animations. This post covers the three hardest technical problems I solved along the way:

  1. Page-flip physics in HTML5 Canvas that actually feel real
  2. Multi-format document pipeline (PDF → PPT → DOCX → image → unified page sequence)
  3. AI generation via API to create illustrated flipbooks from a text prompt

If you're building anything involving document rendering, canvas animation, or AI-generated content pipelines, there's something here for you.

🔗 Live demo: flippingbooks.org/share/d8ca4875-919b-4d4d-ada9-1b08699006e8


The Problem I Was Actually Solving

Every week, millions of people export beautiful documents — product catalogs, brand lookbooks, digital magazines, course materials — as flat PDFs.

PDFs are read-only, static, and dead on arrival as a sharing format.

I wanted to build a free online flipbook maker that could:

  • Accept any common document format
  • Render it as a real, page-turning interactive publication
  • Add an AI layer so users could generate flipbooks without uploading anything

What I didn't expect was how hard the rendering layer would be.


Part 1: Building the Page-Flip Physics Engine

The core interaction — a page that folds, curves, and turns with realistic shadow and perspective — is harder than it looks.

The Math Behind the Curl

A page flip isn't a simple CSS transform. A real page curves as it turns. The visual deformation is a Bézier curve projection applied to a rectangular canvas layer.

Here's the simplified version of the core geometry:

function getFlipProgress(mouseX, bookCenterX, pageWidth) {
  // Normalize mouse position to -1 (left edge) → +1 (right edge)
  return Math.max(-1, Math.min(1, (mouseX - bookCenterX) / pageWidth));
}

function drawFlippingPage(ctx, progress, pageImageData) {
  const foldX = bookCenterX + progress * pageWidth;
  const foldAngle = progress * Math.PI;

  // Control points for the page curl Bézier
  const cp1x = foldX - pageWidth * 0.3;
  const cp1y = pageHeight * 0.1;
  const cp2x = foldX;
  const cp2y = pageHeight * 0.5;

  ctx.save();
  ctx.beginPath();
  ctx.moveTo(bookCenterX, 0);
  ctx.bezierCurveTo(cp1x, cp1y, cp2x, cp2y, foldX, pageHeight);
  ctx.clip();

  // Apply perspective distortion
  ctx.transform(
    Math.cos(foldAngle), 0,
    Math.sin(foldAngle) * 0.3, 1,
    foldX, 0
  );

  ctx.drawImage(pageImageData, 0, 0);
  ctx.restore();
}
Enter fullscreen mode Exit fullscreen mode

The key insight is that the fold axis doesn't stay at a fixed X — it follows the cursor horizontally, and the page "behind" the fold uses a mirrored, slightly desaturated copy of the same image to simulate the back face of the paper.

The Shadow Problem

Realistic shadows are what separates a convincing page flip from a fake-looking one.

I ended up using a radial gradient shadow pinned to the fold axis that intensifies as the page reaches mid-flip (progress ≈ 0), and fades out as the page completes its turn:

function drawPageShadow(ctx, foldX, progress) {
  const shadowIntensity = 1 - Math.abs(progress); // max at center
  const gradient = ctx.createLinearGradient(
    foldX - 60, 0,
    foldX + 60, 0
  );
  gradient.addColorStop(0, `rgba(0,0,0,0)`);
  gradient.addColorStop(0.5, `rgba(0,0,0,${shadowIntensity * 0.35})`);
  gradient.addColorStop(1, `rgba(0,0,0,0)`);

  ctx.fillStyle = gradient;
  ctx.fillRect(foldX - 60, 0, 120, pageHeight);
}
Enter fullscreen mode Exit fullscreen mode

Performance: requestAnimationFrame vs setInterval

Early versions used setInterval for the animation loop. This is a mistake — setInterval doesn't synchronize with the browser's paint cycle. The fix is requestAnimationFrame, which gives you:

  • Frame rates locked to the display refresh (typically 60fps)
  • Automatic pausing when the tab is backgrounded
  • Better GPU compositing
function animationLoop() {
  if (isFlipping) {
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    drawStaticPages();
    drawFlippingPage(ctx, currentProgress, activePageImage);
    drawPageShadow(ctx, foldX, currentProgress);

    // Ease toward target
    currentProgress += (targetProgress - currentProgress) * 0.12;
  }
  requestAnimationFrame(animationLoop);
}
animationLoop();
Enter fullscreen mode Exit fullscreen mode

The 0.12 easing factor is what creates the "physical deceleration" feel. Too high and the flip feels mechanical; too low and it feels laggy.

Mobile Touch Support

Getting drag gestures right on mobile required handling touchstart, touchmove, and touchend separately — and adding velocity tracking so a quick swipe completes the flip even if the finger lifts early:

let touchStartX = 0;
let velocity = 0;
let lastX = 0;

canvas.addEventListener('touchstart', e => {
  touchStartX = e.touches[0].clientX;
  lastX = touchStartX;
  velocity = 0;
});

canvas.addEventListener('touchmove', e => {
  const x = e.touches[0].clientX;
  velocity = x - lastX;      // track per-frame velocity
  lastX = x;
  updateFlipProgress(x);
  e.preventDefault();
}, { passive: false });

canvas.addEventListener('touchend', () => {
  // If swipe velocity > threshold, complete the flip
  if (Math.abs(velocity) > 8) {
    targetProgress = velocity > 0 ? 1 : -1;
  } else {
    // Snap back or forward based on current position
    targetProgress = currentProgress > 0 ? 1 : -1;
  }
});
Enter fullscreen mode Exit fullscreen mode

Part 2: The Multi-Format Document Pipeline

Supporting PDF, PPTX, DOCX, JPG, and PNG as inputs means five completely different parsing pipelines. This was the most tedious engineering problem in the whole project.

The Architecture Decision: Normalize to Page Images First

The key architectural decision was to normalize everything to a sequence of page images before the flipbook renderer ever touches the content.

Input formats:
  PDF  ──┐
  PPTX ──┤  → Preprocessing Layer → [ page_001.png, page_002.png, ... ] → Flipbook Renderer
  DOCX ──┤
  JPG  ──┘
  PNG  ──┘
Enter fullscreen mode Exit fullscreen mode

This means the renderer only ever deals with images. It doesn't know or care what the source format was. Adding a new input format later only requires writing a new preprocessor — the renderer stays unchanged.

PDF Rendering: pdf.js + Canvas

For PDF files, I used Mozilla's pdf.js library to render each page to a canvas at a target resolution:

async function pdfToPageImages(pdfUrl, targetDPI = 150) {
  const pdf = await pdfjsLib.getDocument(pdfUrl).promise;
  const pages = [];
  const scale = targetDPI / 96; // 96 is default screen DPI

  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const viewport = page.getViewport({ scale });
    const canvas = document.createElement('canvas');
    canvas.width = viewport.width;
    canvas.height = viewport.height;

    await page.render({
      canvasContext: canvas.getContext('2d'),
      viewport
    }).promise;

    pages.push(canvas.toDataURL('image/jpeg', 0.92));
  }
  return pages;
}
Enter fullscreen mode Exit fullscreen mode

Key lesson: 150 DPI is the sweet spot. 96 DPI looks blurry on retina screens; 300 DPI produces files that are too large to stream quickly. At 150 DPI, a 20-page PDF typically generates around 8–12MB of image data — fast enough to stream page-by-page as the user reads.

PPTX / DOCX: Server-Side Conversion

Browser-side conversion of Office formats is unreliable. For PPTX and DOCX, I moved the conversion server-side using LibreOffice in headless mode:

# Convert PPTX to PDF first, then use pdf.js for page images
libreoffice --headless --convert-to pdf input.pptx --outdir /tmp/converted/
Enter fullscreen mode Exit fullscreen mode

This gives consistent output across all Office document variants (older .ppt, .doc formats included).

Image Inputs: Aspect Ratio Normalization

When users upload a batch of JPG/PNG images, they often have inconsistent dimensions. Before building the flipbook, I normalize all images to a consistent aspect ratio with white padding:

function normalizeImageAspect(img, targetRatio = 1.414) { // A4 ratio
  const canvas = document.createElement('canvas');
  const imgRatio = img.width / img.height;

  if (imgRatio > targetRatio) {
    canvas.width = img.width;
    canvas.height = Math.round(img.width / targetRatio);
  } else {
    canvas.height = img.height;
    canvas.width = Math.round(img.height * targetRatio);
  }

  const ctx = canvas.getContext('2d');
  ctx.fillStyle = '#ffffff';
  ctx.fillRect(0, 0, canvas.width, canvas.height);

  // Center the image
  const offsetX = (canvas.width - img.width) / 2;
  const offsetY = (canvas.height - img.height) / 2;
  ctx.drawImage(img, offsetX, offsetY);

  return canvas.toDataURL('image/jpeg', 0.92);
}
Enter fullscreen mode Exit fullscreen mode

Part 3: Wiring in the AI Generation Layer

The AI storybook feature — where a user types a prompt and gets a complete illustrated flipbook — was the most surprising part of the product to build.

The Generation Pipeline

User prompt
    ↓
Text generation (chapter/scene breakdown)
    ↓
Per-scene image generation
    ↓
Layout: text + image → page canvas
    ↓
Flipbook renderer
Enter fullscreen mode Exit fullscreen mode

Prompt → Story Structure

I use a structured JSON output from the text model to define the page layout before any images are generated:

const storyPrompt = `
You are a children's book author. Given the following prompt, return a JSON array
of exactly 8 pages. Each page has:
  - "text": 1-2 sentences of story text (max 25 words)
  - "scene": a detailed image generation prompt for this page's illustration
  - "mood": one of ["wonder", "adventure", "calm", "excitement", "mystery"]

Prompt: "${userPrompt}"

Return ONLY valid JSON, no markdown, no preamble.
`;

const response = await fetch('/api/generate-story', {
  method: 'POST',
  body: JSON.stringify({ prompt: storyPrompt })
});
const pages = await response.json();
Enter fullscreen mode Exit fullscreen mode

Scene → Image Generation

Each scene description gets passed to an image generation API. The key implementation detail is parallelizing the image generation calls with a concurrency limit — generating all 8 images sequentially would be too slow:

async function generatePageImages(pages, concurrency = 3) {
  const results = new Array(pages.length);

  // Process in chunks of `concurrency`
  for (let i = 0; i < pages.length; i += concurrency) {
    const chunk = pages.slice(i, i + concurrency);
    const chunkResults = await Promise.all(
      chunk.map((page, idx) =>
        generateImage(page.scene, page.mood)
          .then(imgUrl => ({ index: i + idx, url: imgUrl }))
      )
    );
    chunkResults.forEach(({ index, url }) => {
      results[index] = url;
    });
    // Update progress indicator
    onProgress(Math.round((i + concurrency) / pages.length * 100));
  }

  return results;
}
Enter fullscreen mode Exit fullscreen mode

Why concurrency = 3? Most image generation APIs have rate limits. 3 parallel requests is fast enough to keep the UX responsive (total generation time ~15–20 seconds for 8 pages) while staying well within rate limits.

Compositing Text + Image onto a Page Canvas

Once you have the text and the generated image, you need to composite them onto a consistent page layout:

function compositePageCanvas(text, imageUrl, style = 'storybook') {
  const canvas = document.createElement('canvas');
  canvas.width = 800;
  canvas.height = 1130; // A4-ish

  const ctx = canvas.getContext('2d');
  const img = new Image();
  img.src = imageUrl;

  return new Promise(resolve => {
    img.onload = () => {
      // Background
      ctx.fillStyle = '#fffef5';
      ctx.fillRect(0, 0, canvas.width, canvas.height);

      // Illustration (top 70% of page)
      ctx.drawImage(img, 40, 40, 720, 760);

      // Text area (bottom 30%)
      ctx.fillStyle = '#f5f0e8';
      ctx.fillRect(0, 830, 800, 300);

      // Render text with word wrap
      ctx.fillStyle = '#2c1810';
      ctx.font = 'bold 28px Georgia, serif';
      ctx.textAlign = 'center';
      wrapText(ctx, text, 400, 890, 700, 36);

      resolve(canvas.toDataURL('image/jpeg', 0.92));
    };
  });
}
Enter fullscreen mode Exit fullscreen mode

The SEO Architecture That Actually Drives Users to the Tool

Here's something I didn't fully appreciate until three months after launch: the content architecture around your tool matters as much as the tool itself.

Search engines crawl and index your entire content footprint — not just your landing page. Google evaluates topical authority, link depth, and whether your pages answer the real questions users are searching for.

For FlipFlow, the high-intent queries that drive signups are:

Query Intent
convert PDF to flipbook online free High conversion — user is ready to use a tool
online flipbook maker Navigational — looking for the category
free digital flipbook creator Price-sensitive, high-volume
page flip effect online Feature-seeking
digital magazine maker Vertical use case
HTML5 flipbook maker Technical audience (you, probably)

Each of these deserves its own dedicated page with matching content. A single landing page trying to rank for all of them won't outcompete domain-specific pages that answer each query precisely.


Lessons for Other Indie Devs

After building FlipFlow from scratch as a solo developer, here's what I'd tell myself 12 months ago:

1. Solve the core interaction in complete isolation first.
Don't wire up auth, billing, or the database until the page-flip physics feel right. The core UX is your product. Everything else is infrastructure.

2. Canvas content is invisible to search engines.
Anything rendered in <canvas> doesn't get indexed. Put your text content in the DOM and use canvas only for the visual layer. This matters a lot for SEO.

3. Normalize your data model early.
The "normalize everything to page images" decision was the best architectural call I made. It decoupled the rendering layer from the ingestion layer completely.

4. AI generation shifts your user profile in ways you can't predict.
When I added the AI storybook generator, the user base shifted overnight from designers and marketers toward authors, teachers, and content creators. Build for your initial audience, but don't be surprised when the AI features open up entirely new ones.

5. A genuine free tier is the best distribution mechanism.
Every flipbook a free user creates and shares is a live demo of your product. It's a backlink, a marketing asset, and a proof of concept — all at once.


Try It Yourself

If you made it to the end of this post, you clearly care about browser rendering, document conversion, or AI-powered content tools — which means FlipFlow was probably built for someone like you.

Upload a PDF, convert a presentation, or generate an AI storybook from scratch:

👉 flippingbooks.org — free, no account required to try

Or flip through a live demo right now:

FlipFlow live demo — click to open the interactive flipbook

↑ This flipbook was generated with FlipFlow. Click to experience various page-turning effects firsthand. Its powerful functions will surprise you.


Have questions about the Canvas rendering, the document pipeline, or the AI generation layer? Drop a comment — I read every single one and usually reply within a day.


Tags: #indiehacking #javascript #webdev #ai #buildInPublic #seo #pdf #canvas #saas


Keywords: online flipbook maker, convert PDF to flipbook, free digital flipbook creator, HTML5 flipbook maker, page flip effect online, digital magazine maker, FlipFlow, PDF to flipbook online free, interactive digital publication, AI storybook generator, support PDF PPT PPTX DOC DOCX JPG PNG convert

Top comments (0)