monkeymore studio

Posted on Apr 9

Building a Browser-Based AI Image Upscaler

#machinelearning #ai #javascript #webdev

Introduction

In this article, we'll explore how to implement a powerful browser-based AI image upscaler that uses deep learning models to enlarge images while maintaining quality. This tool supports 2x and 4x upscaling using Real-ESRGAN and Real-CUGAN models, running entirely in the browser with WebGPU or WebGL acceleration.

Why Browser-Based Upscaling?

1. Privacy Protection

When users upscale images in the browser, their photos never leave their device.

2. Zero Server Costs

Running the AI model in the browser eliminates the need for:

GPU servers for deep learning inference
Bandwidth for uploading/downloading images
API costs for third-party upscaling services

3. Offline Capability

Once models are cached, users can upscale images without an internet connection.

Technical Architecture

Core Implementation

1. Data Structures

interface ImageItem {
  id: string;
  file: File;
  originalUrl: string;
  processedUrl: string | null;
  status: 'pending' | 'processing' | 'done' | 'error';
  progress: number;
  info: string;
  name: string;
}

interface ImgInstance {
  width: number;
  height: number;
  data: Uint8Array;
  getImageCrop(x: number, y: number, image: ImgInstance, x1: number, y1: number, x2: number, y2: number): void;
  padToTileSize(tileSize: number): void;
  cropToOriginalSize(width: number, height: number): void;
}

2. Backend Detection

Automatically detect and use the best available backend:

const [backend, setBackend] = useState<"webgl" | "webgpu">("webgpu");
const [detectedBackend, setDetectedBackend] = useState<"webgl" | "webgpu" | null>(null);

useEffect(() => {
  const detectBackend = async () => {
    try {
      // Check if WebGPU is available
      if (navigator.gpu) {
        const adapter = await navigator.gpu.requestAdapter();
        if (adapter) {
          setDetectedBackend("webgpu");
          setBackend("webgpu");
          return;
        }
      }
    } catch (e) {
      console.log("WebGPU not available:", e);
    }

    // Fall back to WebGL
    setDetectedBackend("webgl");
    setBackend("webgl");
  };

  detectBackend();
}, []);

3. Processing Configuration

const [modelType, setModelType] = useState<"realesrgan" | "realcugan">("realcugan");
const [model, setModel] = useState("anime_plus");
const [factor, setFactor] = useState(4);
const [denoise, setDenoise] = useState("conservative");
const [tileSize, setTileSize] = useState(64);
const [minLap, setMinLap] = useState(12);

Model Options:

Model Type	Options
Real-CUGAN	Scale: 2x, 4x; Denoise: conservative, no-denoise, denoise1x, denoise2x, denoise3x
Real-ESRGAN	Models: anime_fast, anime_plus, general_fast, general_plus

4. Processing a Single Image

const processImage = async (imageItem: ImageItem): Promise<void> => {
  return new Promise((resolve, reject) => {
    const img = new Image();
    img.src = imageItem.originalUrl;

    img.onload = () => {
      const canvas = imgCanvasRef.current;
      canvas.width = img.width;
      canvas.height = img.height;
      const ctx = canvas.getContext("2d");
      ctx.drawImage(img, 0, 0);

      // Create image data
      const data = ctx.getImageData(0, 0, img.width, img.height).data;
      const input = new ImgClass(img.width, img.height, new Uint8Array(data));

      // Create worker for processing
      const worker = new Worker("/upscale-worker.js");

      worker.onmessage = (e) => {
        const { progress, done, output, info } = e.data;

        if (info) {
          setImages(prev => prev.map(img => 
            img.id === imageItem.id ? { ...img, info } : img
          ));
        }

        if (progress !== undefined) {
          setImages(prev => prev.map(img => 
            img.id === imageItem.id ? { ...img, progress } : img
          ));
        }

        if (done && output) {
          // Convert output to blob URL
          const outCanvas = document.createElement('canvas');
          outCanvas.width = input.width * factor;
          outCanvas.height = input.height * factor;
          const outCtx = outCanvas.getContext("2d");

          const imgData = outCtx.createImageData(outCanvas.width, outCanvas.height);
          imgData.data.set(new Uint8Array(output));
          outCtx.putImageData(imgData, 0, 0);

          outCanvas.toBlob((blob) => {
            const url = URL.createObjectURL(blob);
            setImages(prev => prev.map(img => 
              img.id === imageItem.id ? { ...img, processedUrl: url, status: 'done' } : img
            ));
            worker.terminate();
            resolve();
          }, "image/jpeg", 0.92);
        }
      };

      // Send data to worker
      worker.postMessage({
        input: input.data.buffer,
        factor: factor,
        denoise: denoise,
        tile_size: tileSize,
        min_lap: minLap,
        model_type: modelType,
        width: input.width,
        height: input.height,
        model: model,
        backend: backend,
        baseUrl: window.location.origin,
      }, [input.data.buffer]);
    };
  });
};

Web Worker Implementation

The heavy lifting happens in a Web Worker to keep the UI responsive:

1. Loading TensorFlow.js

importScripts("https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.22.0/dist/tf.min.js");

// Load WebGPU backend (optional)
try {
  importScripts("https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgpu@4.22.0/dist/tf-backend-webgpu.min.js");
} catch (e) {
  console.log("WebGPU backend not available, will use WebGL");
}

2. Image Class for Processing

class Img {
  constructor(width, height, data = new Uint8Array(width * height * 4)) {
    this.width = width;
    this.height = height;
    this.data = data;
  }

  getImageCrop(x, y, image, x1, y1, x2, y2) {
    const width = x2 - x1;
    for (let j = 0; j < y2 - y1; j++) {
      const destIndex = (y + j) * this.width * 4 + x * 4;
      const srcIndex = (y1 + j) * image.width * 4 + x1 * 4;
      this.data.set(image.data.subarray(srcIndex, srcIndex + width * 4), destIndex);
    }
  }

  padToTileSize(tileSize) {
    // Pad image to tile size for processing
    // ... implementation
  }

  cropToOriginalSize(width, height) {
    // Crop back to original size after upscaling
    // ... implementation
  }
}

3. Model Loading

self.addEventListener("message", async (e) => {
  const { data } = e;

  // Model URL from GitHub
  const githubBaseUrl = "https://raw.githubusercontent.com/linmingren/openmodels/main/models";
  let modelUrl;

  if (data?.model_type === "realesrgan") {
    modelUrl = `${githubBaseUrl}/realesrgan/${data?.model}-${data?.tile_size}/model.json`;
  } else {
    modelUrl = `${githubBaseUrl}/realcugan/${data?.factor}x-${data?.denoise}-${data?.tile_size}/model.json`;
  }

  // Try to load from IndexedDB cache first
  try {
    model = await tf.loadGraphModel(`indexeddb://${modelName}`);
    self.postMessage({ progress: 10, info: "Loaded from cache" });
  } catch (error) {
    // Download from GitHub
    const fetchedModel = await tf.loadGraphModel(modelUrl);
    // Save to cache for next time
    await fetchedModel.save(`indexeddb://${modelName}`);
    model = fetchedModel;
  }
});

4. Tile-Based Processing

async function enlargeImageWithFixedInput(model, inputImg, factor, inputSize, minLap) {
  const width = inputImg.width;
  const height = inputImg.height;
  const output = new Img(width * factor, height * factor);

  // Calculate optimal tile layout
  let numX = 1;
  for (; (inputSize * numX - width) / (numX - 1) < minLap; numX++);
  let numY = 1;
  for (; (inputSize * numY - height) / (numY - 1) < minLap; numY++);

  // Process each tile
  for (let i = 0; i < numX; i++) {
    for (let j = 0; j < numY; j++) {
      const tile = new Img(inputSize, inputSize);
      tile.getImageCrop(0, 0, inputImg, x1, y1, x2, y2);

      // Upscale tile using TensorFlow.js
      const scaled = await upscale(tile, model);

      // Copy to output with blending
      output.getImageCrop(destX, destY, scaled, ...);
    }
  }

  return output;
}

5. Upscale Function

async function upscale(image, model, alpha = false) {
  const result = tf.tidy(() => {
    const tensor = img2tensor(image);
    let result = model.predict(tensor);
    if (alpha) {
      result = tf.greater(result, 0.5);
    }
    return result;
  });

  const resultImage = await tensor2img(result);
  tf.dispose(result);
  return resultImage;
}

function img2tensor(image) {
  const imgdata = new ImageData(image.width, image.height);
  imgdata.data.set(image.data);
  const tensor = tf.browser.fromPixels(imgdata).div(255).toFloat().expandDims();
  return tensor;
}

async function tensor2img(tensor) {
  const [_, height, width, __] = tensor.shape;
  const clipped = tf.tidy(() =>
    tensor.reshape([height, width, 3]).mul(255).cast("int32").clipByValue(0, 255)
  );
  const data = await tf.browser.toPixels(clipped);
  return new Img(width, height, new Uint8Array(data));
}

Processing Flow

Key Features

1. Multiple Model Support

Model	Best For	Scale
Real-CUGAN	Anime illustrations	2x, 4x
Real-ESRGAN	General photos	2x, 4x

2. Denoise Levels

Conservative: Minimal denoising, preserves details
No Denoise: Pure upscaling
Denoise 1x/2x/3x: Progressive noise reduction

3. Tile-Based Processing

Large images are processed in tiles to avoid memory issues:

Adjustable tile size (32-256)
Automatic overlap calculation
Seamless blending

4. Model Caching

Models are cached in IndexedDB for instant subsequent use:

First load: Download from GitHub
Subsequent: Load from local cache
Cache persists across sessions

Performance Characteristics

First Run: 30-60 seconds (download + cache)
Subsequent Runs: 5-15 seconds
Tile Processing: Depends on image size and tile count
Memory: ~500MB during processing
WebGPU: 2-3x faster than WebGL

Browser Support

WebGPU requires:

Chrome 113+
Edge 113+
Safari 17+ (limited)

WebGL fallback:

All modern browsers
Slower but works everywhere

Use Cases

Anime/Game Art: Upscale pixel art and sprites
Photo Enhancement: Enlarge photos without blur
Print Preparation: Increase resolution for printing
Social Media: Create higher quality images
Archival: Restore old low-resolution photos

Conclusion

Browser-based AI image upscaling brings professional-grade image enlargement to the web. The implementation uses:

TensorFlow.js for running deep learning models
WebGPU/WebGL for GPU-accelerated inference
Web Workers for non-blocking processing
IndexedDB for model caching
Tile-based processing for memory efficiency

Users can upscale images by 2x or 4x using Real-ESRGAN or Real-CUGAN models, all while their images never leave their device.

Try it yourself at Free Image Tools

Experience the power of browser-based AI image upscaling. No upload required - your images stay on your device!

DEV Community