DEV Community

emojiiii
emojiiii

Posted on

48 2 2 2 2

Running DeepSeek Janus-Pro-1B in the Browser: A Comprehensive Guide

The ability to run large language models (LLMs) directly in the browser has opened new possibilities for privacy-preserving, client-side AI applications. In this blog post, we’ll explore how to run DeepSeek Janus-Pro-1B, a powerful text-to-image generation model, entirely in the browser using WebGPU and Hugging Face’s Transformers.js library.

Why Browser-Based Inference?

  1. Privacy: Data never leaves the user’s device.
  2. Cost Efficiency: No server infrastructure required.
  3. Accessibility: Runs on any device with a modern browser and WebGPU support.

DeepSeek Janus-Pro-1B, designed for multimodal tasks like text-to-image generation, is now accessible via browser-based inference thanks to optimizations in Transformers.js and WebGPU acceleration.

Key Tools & Libraries

  1. Transformers.js: A JavaScript port of Hugging Face’s Transformers library, optimized for browser execution.
  2. WebGPU: A modern API for GPU acceleration in browsers, replacing WebGL with improved performance for ML workloads.
  3. ONNX Runtime: Enables model execution via optimized computation graphs.

Demo Code Walkthrough

The following example demonstrates how to load and run DeepSeek Janus-Pro-1B in a Web Worker for non-blocking inference. The full code is available in the GitHub repository.

import {
  AutoProcessor,
  MultiModalityCausalLM,
  BaseStreamer,
  TextStreamer,
  InterruptableStoppingCriteria,
} from "@huggingface/transformers";

// Define constants
const IMAGE_GENERATION_COMMAND_PREFIX = "/imagine ";
const MAX_NEW_TEXT_TOKENS = 1024;

/**
 * Helper function to perform WebGPU feature detection
 */
let fp16_supported = false;
async function check() {
  try {
    const adapter = await navigator.gpu.requestAdapter();
    if (!adapter) {
      throw new Error("WebGPU is not supported (no adapter found)");
    }
    fp16_supported = adapter.features.has("shader-f16");
    self.postMessage({
      status: "success",
      data: fp16_supported,
    });
  } catch (e) {
    self.postMessage({
      status: "error",
      data: e.toString(),
    });
  }
}

/**
 * This class uses the Singleton pattern to enable lazy-loading of the pipeline
 */
class ImageGenerationPipeline {
  static model_id = "onnx-community/Janus-Pro-1B-ONNX";

  static async getInstance(progress_callback = null) {
    this.processor ??= AutoProcessor.from_pretrained(this.model_id, {
      progress_callback,
    });

    this.model ??= MultiModalityCausalLM.from_pretrained(this.model_id, {
      dtype: fp16_supported
        ? {
            prepare_inputs_embeds: "q4",
            language_model: "q4f16",
            lm_head: "fp16",
            gen_head: "fp16",
            gen_img_embeds: "fp16",
            image_decode: "fp32",
          }
        : {
            prepare_inputs_embeds: "fp32",
            language_model: "q4",
            lm_head: "fp32",
            gen_head: "fp32",
            gen_img_embeds: "fp32",
            image_decode: "fp32",
          },
      device: {
        prepare_inputs_embeds: "wasm", // TODO use "webgpu" when bug is fixed
        language_model: "webgpu",
        lm_head: "webgpu",
        gen_head: "webgpu",
        gen_img_embeds: "webgpu",
        image_decode: "webgpu",
      },
      progress_callback,
    });

    return Promise.all([this.processor, this.model]);
  }
}

class ProgressStreamer extends BaseStreamer {
  constructor(total, on_progress) {
    super();
    this.total = total;
    this.on_progress = on_progress;

    this.count = null;
    this.start_time = null;
  }

  put(value) {
    if (this.count === null) {
      // Ignore the first batch of tokens (prompt)
      this.count = 0;
      this.start_time = performance.now();
      return;
    }

    const progress = ++this.count / this.total;

    this.on_progress({
      count: this.count,
      total: this.total,
      progress,
      time: performance.now() - this.start_time,
    });
  }

  end() {
    /* no nothing */
  }
}

const stopping_criteria = new InterruptableStoppingCriteria();

async function generate(messages) {
  // For this demo, we only respond to the last message
  const message = messages.at(-1);

  // Tell the main thread we are starting
  self.postMessage({ status: "start" });

  // Load the pipeline
  const [processor, model] = await ImageGenerationPipeline.getInstance();

  // Determine if the user wants to generate an image or text
  if (message.content.startsWith(IMAGE_GENERATION_COMMAND_PREFIX)) {
    const text = message.content.replace(IMAGE_GENERATION_COMMAND_PREFIX, "");

    const conversation = [
      {
        role: "<|User|>", // uses title case
        content: text,
      },
    ];
    const inputs = await processor(conversation, {
      chat_template: "text_to_image",
    });

    const callback_function = (output) => {
      self.postMessage({
        status: "image-update",
        ...output,
      });
    };

    const num_image_tokens = processor.num_image_tokens;
    const streamer = new ProgressStreamer(num_image_tokens, callback_function);

    const outputs = await model.generate_images({
      ...inputs,
      min_new_tokens: num_image_tokens,
      max_new_tokens: num_image_tokens,
      do_sample: true,
      streamer,
    });

    const blob = await outputs[0].toBlob();

    // Send the output back to the main thread
    self.postMessage({
      status: "image-update",
      blob,
    });
  } else {
    const inputs = await processor(
      message.image
        ? [
            {
              role: "<|User|>",
              content: "<image_placeholder>\n" + message.content,
              images: [message.image],
            },
          ]
        : [
            {
              role: "<|System|>",
              content:
                "You are a helpful assistant. Answer the user's questions in a concise manner.",
            },
            {
              role: "<|User|>",
              content: message.content,
            },
          ],
    );

    let startTime;
    let numTokens = 0;
    let tps;
    const token_callback_function = () => {
      startTime ??= performance.now();

      if (numTokens++ > 0) {
        tps = (numTokens / (performance.now() - startTime)) * 1000;
      }
    };
    const callback_function = (output) => {
      self.postMessage({
        status: "text-update",
        output,
        tps,
        numTokens,
      });
    };

    const streamer = new TextStreamer(processor.tokenizer, {
      skip_prompt: true,
      skip_special_tokens: true,
      callback_function,
      token_callback_function,
    });

    // Generate response
    const outputs = await model.generate({
      ...inputs,
      max_new_tokens: MAX_NEW_TEXT_TOKENS,
      do_sample: false,
      streamer,
      stopping_criteria,
    });
  }

  // Tell the main thread we are done
  self.postMessage({
    status: "complete",
  });
}

async function load() {
  self.postMessage({
    status: "loading",
    data: "Loading model...",
  });

  // Load the pipeline and save it for future use.
  const [processor, model] = await ImageGenerationPipeline.getInstance((x) => {
    // We also add a progress callback to the pipeline so that we can
    // track model loading.
    self.postMessage(x);
  });

  self.postMessage({ status: "ready" });
}

// Listen for messages from the main thread
self.addEventListener("message", async (e) => {
  const { type, data } = e.data;

  switch (type) {
    case "check":
      check();
      break;

    case "load":
      load();
      break;

    case "generate":
      stopping_criteria.reset();
      generate(data);
      break;

    case "interrupt":
      stopping_criteria.interrupt();
      break;

    case "reset":
      stopping_criteria.reset();
      break;
  }
}); 
Enter fullscreen mode Exit fullscreen mode

Running the Demo

Check out the live demo here: DeepSeek Janus-Pro-1B Browser Demo.

Key Features of the Demo:

  • Real-time progress updates during model loading and inference.
  • WebGPU-accelerated generation (requires Chrome 113+ or Edge 113+).
  • Full client-side execution—no data is sent to external servers.

Challenges & Optimizations

  1. Model Quantization: The model is quantized to 8-bit to reduce its size and improve loading speed.
  2. Memory Management: Web Workers prevent UI freezing during inference.
  3. Browser Compatibility: WebGPU is still experimental but critical for performance.

Conclusion

Running DeepSeek Janus-Pro-1B in the browser showcases the potential of client-side AI. With tools like Transformers.js and WebGPU, complex models can now operate efficiently in constrained environments while preserving user privacy.

Next Steps:

  • Experiment with different prompts and model configurations.
  • Explore fine-tuning the model for domain-specific tasks.
  • Monitor WebGPU adoption to ensure broader compatibility.

For developers, this marks an exciting shift toward decentralized, user-centric AI applications. Dive into the example code and start building! 🚀

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (1)

Collapse
 
pushpesh_roy_92c2512f3160 profile image
Pushpesh Roy

Helpful | thanks

Cloudinary image

Optimize, customize, deliver, manage and analyze your images.

Remove background in all your web images at the same time, use outpainting to expand images with matching content, remove objects via open-set object detection and fill, recolor, crop, resize... Discover these and hundreds more ways to manage your web images and videos on a scale.

Learn more

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay