ujjavala

Posted on Sep 27

My Adventures with Client-Side AI Models: Lessons from Trying Transformer.js

#javascript #architecture #nlp #ai

An honest, step-by-step recount of why client-side AI failed in my personal project and how switching to a server-only approach saved me weeks of frustration.

The Dream vs. The Reality

When I first started this project, I had an ambitious vision: create an AI system that runs entirely on the client, keeping user data private. My goals were straightforward:

Rewrite text into easier-to-pronounce phrases
Analyze fluency in real time
Give pacing advice for speech
Offer interactive roleplay conversations
Ensure zero data leaves the user's device

I imagined a system where users could get AI assistance without ever sending sensitive speech or text data to a server. It sounded perfect.

Reality was very different. Weeks of debugging fundamental browser incompatibilities, memory crashes, and internal library bugs made me realize the dream wasn’t ready. Eventually, I had to remove all client-side AI code and switch to a server-only approach.

Phase 1: The Download Disaster

I began with a simple setup using transformer.js:

import { pipeline } from '@xenova/transformers';

// Simple documentation example
const generator = await pipeline('text-generation', 'Xenova/distilgpt2');
const response = await generator(
  "Rewrite this to be easier to say: creepy crawly crabs can be creative"
);

It looked straightforward. The library documentation promised easy browser-based AI, no server required.

But my browser immediately threw errors:

transformerService.js:29 Failed to initialize transformer model:
SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

What happened here?

Transformer.js was attempting to download model files from Hugging Face’s CDN. Instead of receiving JSON model files, it got HTML 404 pages. The library then tried to parse HTML as JSON, causing the syntax error.

I tried multiple models, thinking some might work:

const modelsToTry = [
  { name: 'TinyLlama', task: 'text-generation', model: 'Xenova/TinyLlama-1.1B-Chat-v1.0' },
  { name: 'GPT-2', task: 'text-generation', model: 'gpt2' },
  { name: 'DistilGPT-2', task: 'text-generation', model: 'Xenova/distilgpt2' },
  { name: 'FLAN-T5-small', task: 'text2text-generation', model: 'Xenova/flan-t5-small' }
];

I even tried configuring the CDN manually:

import { pipeline, env } from '@xenova/transformers';

env.allowLocalModels = false;
env.allowRemoteModels = true;
env.remoteURL = 'https://cdn.huggingface.co/';

Nothing worked. Many models advertised as browser-compatible simply didn’t exist—or their download links were broken. At this point, I already spent two days just trying to get a model to download.

Phase 2: The Memory Massacre

Finally, I managed to download some models. I was hopeful, but the next problem appeared immediately: memory crashes.

const output = await generator("creepy crawly crabs can be creative");

Console error:

transformerService.js:89 An error occurred during model execution: "RangeError: offset is out of bounds"

This happened even with six-word input.

I tried every parameter tweak I could imagine:

// Conservative attempt
const output = await generator(prompt, {
  max_new_tokens: 100,
  temperature: 0.7,
  do_sample: true,
});

// More conservative
const output = await generator(prompt, {
  max_length: 150,
  temperature: 0.1,
  do_sample: false,
});

// Minimal
const output = await generator(prompt, {
  max_length: 80,
  temperature: 0.1,
  do_sample: false,
});

// Default call
const output = await generator(prompt);

No combination worked. Browser memory limits caused tensor allocation errors internally. Even “tiny” models were too big for client-side execution. This problem was unfixable from my side.

Phase 3: The Tokenization Terror

For models that loaded without crashing, I encountered another bizarre problem: tokenizer errors.

transformerService.js:132 Model execution failed: text.split is not a function
TypeError: text.split is not a function

I triple-checked the input:

async generateText(prompt, options = {}) {
  const stringPrompt = String(prompt || '').trim();
  const truncatedPrompt = stringPrompt.length > 200
    ? stringPrompt.substring(0, 200) + "..."
    : stringPrompt;
  const finalPrompt = String(truncatedPrompt);

  const output = await generator(finalPrompt);
}

Console confirmed my input was always a string:

String prompt type: string "Process sentence: creepy crawly crabs can be creative"
Final prompt type: string "Process sentence: creepy crawly crabs can be creative"

Yet transformer.js still crashed. Internal bugs meant the library was passing non-strings to its own tokenizer, regardless of what I provided.

Even the simplest test failed:

try {
  const testOutput = await generator({
    question: "What is this?",
    context: "This is a test."
  });
} catch (testError) {
  console.error('Even minimal test failed:', testError);
}

At this point, I realized the library itself was fundamentally broken for client-side usage.

Phase 4: The Desperate Pivot

I tried switching to simpler models like classification or embeddings:

const modelsToTry = [
  { name: 'Sentiment', task: 'sentiment-analysis', model: 'Xenova/distilbert-base-uncased-finetuned-sst-2-english' },
  { name: 'Embeddings', task: 'feature-extraction', model: 'Xenova/all-MiniLM-L6-v2' },
];

They loaded successfully, which gave me hope. Maybe I could build my logic on top of sentiment analysis.

But as soon as I executed them, the same memory errors appeared:

transformerService.js:130 An error occurred during model execution: "RangeError: offset is out of bounds"

Even a fallback approach, converting sentiment to guidance, was impossible:

convertSentimentToSpeech(sentimentOutput, originalPrompt) {
  const phrase = this.extractPhrase(originalPrompt);

  if (originalPrompt.includes('Process sentence')) {
    return `For "${phrase}": Practice slow, controlled breathing. Break the phrase into smaller chunks.`;
  }

  if (originalPrompt.includes('Rewrite for smoother speech')) {
    const words = phrase.split(' ');
    const alternatives = [
      `1. ${words.slice(0, Math.ceil(words.length / 2)).join(' ')}`,
      `2. ${phrase.toLowerCase()}`,
      `3. ${phrase.replace(/\b(and|but)\b/g, ',')}`
    ];
    return alternatives.join('\n');
  }

  return `Guidance for "${phrase}": Use strategies and breathing techniques.`;
}

No matter what I tried, the models crashed before returning any output.

Phase 5: The Breaking Point

At this point, I made the hard decision: remove transformer.js entirely.

What had been 350+ lines of complex client-side code with fallbacks, initialization, and error handling became a simple, reliable server-only API call:

class TransformerService {
  constructor() {}

  async rephrase(prompt, language = "en-US") {
    const response = await fetch('http://localhost:8000/rephrase', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ prompt, language }),
    });
    const data = await response.json();
    return data.response;
  }

  async fluencyCheck(inputText, language = "en-US") {
    const response = await fetch('http://localhost:8000/fluency-check', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ input: inputText, language }),
    });
    const data = await response.json();
    return data.response;
  }
}

And the frontend logic simplified dramatically:

const getSuggestionsForPromptBox = useCallback(async () => {
  try {
    if (!prompt.trim()) return;
    setLoadingPromptBox(true);

    const [suggestionsResponse, pacingResponse] = await Promise.all([
      transformerService.rephrase(prompt, selectedLanguage),
      transformerService.pacingAdvice(prompt, selectedLanguage)
    ]);

    const lines = suggestionsResponse.split('\n')
      .filter(line => line.trim())
      .map(line => line.trim());
    setPromptSuggestions(lines);
    setPromptPacingAdvice(pacingResponse);
  } catch (error) {
    console.error('Error in getSuggestionsForPromptBox:', error);
    setPromptSuggestions([]);
    setPromptPacingAdvice('');
  } finally {
    setLoadingPromptBox(false);
  }
}, [prompt, selectedLanguage]);

No more hybrid fallbacks. No more crashes. No more internal library errors.

Lessons Learned

Browser AI is not ready for production
Memory limits make even small models crash
Internal library bugs are unfixable from user code
Model availability is unreliable
Many “browser-compatible” models don’t exist
Download links break frequently
Documentation is often misleading
Debugging is nearly impossible
Errors happen deep in minified library code
Stack traces rarely point to actionable fixes
Server-side AI works reliably
High performance and consistent responses
No browser memory constraints
Easier to maintain and debug

The Server-Side Alternative

Switching to a server-only AI implementation made everything simpler and reliable. Using FastAPI and a Python AI library:

from fastapi import FastAPI, Request
import ollama

app = FastAPI()

@app.post("/rephrase")
async def rephrase_input(request: Request):
    data = await request.json()
    prompt = data.get("prompt")

    system_prompt = (
        f"Rewrite this sentence to be easier to say: \"{prompt}\""
    )

    response = ollama.generate(model="llama3", prompt=system_prompt)
    return {"response": response["response"]}

Ten minutes to implement. Worked perfectly every time.

Conclusion

After weeks of struggling with client-side AI:

Text generation models: Complete failure
Classification models: Load but crash during execution
Server-side AI: Works reliably

Sometimes the best engineering decision is knowing when to quit fighting the technology. In my project, abandoning client-side AI and going server-only delivered:

Simplicity
Reliability
Maintainability
Consistent, high-quality AI responses

The dream of fully private, client-side AI remains just that—a dream. But practical AI that works every time is achievable when moved to the server.

Top comments (1)

Roshan Sharma • Sep 28

Nice post! Your experience with Transformer.js highlights the challenges of running large AI models client-side, especially around performance and browser compatibility.
Switching to a server-side approach can alleviate these issues, offering better control and scalability.
For future projects, consider using Web Workers to offload processing and optimize model sizes to enhance performance.