DEV Community

Cover image for My Adventures with Client-Side AI Models: Lessons from Trying Transformer.js
ujjavala
ujjavala

Posted on

My Adventures with Client-Side AI Models: Lessons from Trying Transformer.js

An honest, step-by-step recount of why client-side AI failed in my personal project and how switching to a server-only approach saved me weeks of frustration.


The Dream vs. The Reality

When I first started this project, I had an ambitious vision: create an AI system that runs entirely on the client, keeping user data private. My goals were straightforward:

  • Rewrite text into easier-to-pronounce phrases
  • Analyze fluency in real time
  • Give pacing advice for speech
  • Offer interactive roleplay conversations
  • Ensure zero data leaves the user's device

I imagined a system where users could get AI assistance without ever sending sensitive speech or text data to a server. It sounded perfect.

Reality was very different. Weeks of debugging fundamental browser incompatibilities, memory crashes, and internal library bugs made me realize the dream wasn’t ready. Eventually, I had to remove all client-side AI code and switch to a server-only approach.


Phase 1: The Download Disaster

I began with a simple setup using transformer.js:

import { pipeline } from '@xenova/transformers';

// Simple documentation example
const generator = await pipeline('text-generation', 'Xenova/distilgpt2');
const response = await generator(
  "Rewrite this to be easier to say: creepy crawly crabs can be creative"
);
Enter fullscreen mode Exit fullscreen mode

It looked straightforward. The library documentation promised easy browser-based AI, no server required.

But my browser immediately threw errors:

transformerService.js:29 Failed to initialize transformer model:
SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON
Enter fullscreen mode Exit fullscreen mode

What happened here?

Transformer.js was attempting to download model files from Hugging Face’s CDN. Instead of receiving JSON model files, it got HTML 404 pages. The library then tried to parse HTML as JSON, causing the syntax error.

I tried multiple models, thinking some might work:

const modelsToTry = [
  { name: 'TinyLlama', task: 'text-generation', model: 'Xenova/TinyLlama-1.1B-Chat-v1.0' },
  { name: 'GPT-2', task: 'text-generation', model: 'gpt2' },
  { name: 'DistilGPT-2', task: 'text-generation', model: 'Xenova/distilgpt2' },
  { name: 'FLAN-T5-small', task: 'text2text-generation', model: 'Xenova/flan-t5-small' }
];
Enter fullscreen mode Exit fullscreen mode

I even tried configuring the CDN manually:

import { pipeline, env } from '@xenova/transformers';

env.allowLocalModels = false;
env.allowRemoteModels = true;
env.remoteURL = 'https://cdn.huggingface.co/';
Enter fullscreen mode Exit fullscreen mode

Nothing worked. Many models advertised as browser-compatible simply didn’t exist—or their download links were broken. At this point, I already spent two days just trying to get a model to download.


Phase 2: The Memory Massacre

Finally, I managed to download some models. I was hopeful, but the next problem appeared immediately: memory crashes.

const output = await generator("creepy crawly crabs can be creative");
Enter fullscreen mode Exit fullscreen mode

Console error:

transformerService.js:89 An error occurred during model execution: "RangeError: offset is out of bounds"
Enter fullscreen mode Exit fullscreen mode

This happened even with six-word input.

I tried every parameter tweak I could imagine:

// Conservative attempt
const output = await generator(prompt, {
  max_new_tokens: 100,
  temperature: 0.7,
  do_sample: true,
});

// More conservative
const output = await generator(prompt, {
  max_length: 150,
  temperature: 0.1,
  do_sample: false,
});

// Minimal
const output = await generator(prompt, {
  max_length: 80,
  temperature: 0.1,
  do_sample: false,
});

// Default call
const output = await generator(prompt);
Enter fullscreen mode Exit fullscreen mode

No combination worked. Browser memory limits caused tensor allocation errors internally. Even “tiny” models were too big for client-side execution. This problem was unfixable from my side.


Phase 3: The Tokenization Terror

For models that loaded without crashing, I encountered another bizarre problem: tokenizer errors.

transformerService.js:132 Model execution failed: text.split is not a function
TypeError: text.split is not a function
Enter fullscreen mode Exit fullscreen mode

I triple-checked the input:

async generateText(prompt, options = {}) {
  const stringPrompt = String(prompt || '').trim();
  const truncatedPrompt = stringPrompt.length > 200
    ? stringPrompt.substring(0, 200) + "..."
    : stringPrompt;
  const finalPrompt = String(truncatedPrompt);

  const output = await generator(finalPrompt);
}
Enter fullscreen mode Exit fullscreen mode

Console confirmed my input was always a string:

String prompt type: string "Process sentence: creepy crawly crabs can be creative"
Final prompt type: string "Process sentence: creepy crawly crabs can be creative"
Enter fullscreen mode Exit fullscreen mode

Yet transformer.js still crashed. Internal bugs meant the library was passing non-strings to its own tokenizer, regardless of what I provided.

Even the simplest test failed:

try {
  const testOutput = await generator({
    question: "What is this?",
    context: "This is a test."
  });
} catch (testError) {
  console.error('Even minimal test failed:', testError);
}
Enter fullscreen mode Exit fullscreen mode

At this point, I realized the library itself was fundamentally broken for client-side usage.


Phase 4: The Desperate Pivot

I tried switching to simpler models like classification or embeddings:

const modelsToTry = [
  { name: 'Sentiment', task: 'sentiment-analysis', model: 'Xenova/distilbert-base-uncased-finetuned-sst-2-english' },
  { name: 'Embeddings', task: 'feature-extraction', model: 'Xenova/all-MiniLM-L6-v2' },
];
Enter fullscreen mode Exit fullscreen mode

They loaded successfully, which gave me hope. Maybe I could build my logic on top of sentiment analysis.

But as soon as I executed them, the same memory errors appeared:

transformerService.js:130 An error occurred during model execution: "RangeError: offset is out of bounds"
Enter fullscreen mode Exit fullscreen mode

Even a fallback approach converting sentiment to speech therapy guidance was impossible:

convertSentimentToSpeech(sentimentOutput, originalPrompt) {
  const phrase = this.extractPhrase(originalPrompt);

  if (originalPrompt.includes('Process sentence')) {
    return `For "${phrase}": Practice slow, controlled breathing. Break the phrase into smaller chunks.`;
  }

  if (originalPrompt.includes('Rewrite for smoother speech')) {
    const words = phrase.split(' ');
    const alternatives = [
      `1. ${words.slice(0, Math.ceil(words.length / 2)).join(' ')}`,
      `2. ${phrase.toLowerCase()}`,
      `3. ${phrase.replace(/\b(and|but)\b/g, ',')}`
    ];
    return alternatives.join('\n');
  }

  return `Guidance for "${phrase}": Use strategies and breathing techniques.`;
}
Enter fullscreen mode Exit fullscreen mode

No matter what I tried, the models crashed before returning any output.


Phase 5: The Breaking Point

At this point, I made the hard decision: remove transformer.js entirely.

What had been 350+ lines of complex client-side code with fallbacks, initialization, and error handling became a simple, reliable server-only API call:

class TransformerService {
  constructor() {}

  async rephrase(prompt, language = "en-US") {
    const response = await fetch('http://localhost:8000/rephrase', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ prompt, language }),
    });
    const data = await response.json();
    return data.response;
  }

  async fluencyCheck(inputText, language = "en-US") {
    const response = await fetch('http://localhost:8000/fluency-check', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ input: inputText, language }),
    });
    const data = await response.json();
    return data.response;
  }
}
Enter fullscreen mode Exit fullscreen mode

And the frontend logic simplified dramatically:

const getSuggestionsForPromptBox = useCallback(async () => {
  try {
    if (!prompt.trim()) return;
    setLoadingPromptBox(true);

    const [suggestionsResponse, pacingResponse] = await Promise.all([
      transformerService.rephrase(prompt, selectedLanguage),
      transformerService.pacingAdvice(prompt, selectedLanguage)
    ]);

    const lines = suggestionsResponse.split('\n')
      .filter(line => line.trim())
      .map(line => line.trim());
    setPromptSuggestions(lines);
    setPromptPacingAdvice(pacingResponse);
  } catch (error) {
    console.error('Error in getSuggestionsForPromptBox:', error);
    setPromptSuggestions([]);
    setPromptPacingAdvice('');
  } finally {
    setLoadingPromptBox(false);
  }
}, [prompt, selectedLanguage]);
Enter fullscreen mode Exit fullscreen mode

No more hybrid fallbacks. No more crashes. No more internal library errors.


Lessons Learned

  • Browser AI is not ready for production
  • Memory limits make even small models crash
  • Internal library bugs are unfixable from user code
  • Model availability is unreliable
  • Many “browser-compatible” models don’t exist
  • Download links break frequently
  • Documentation is often misleading
  • Debugging is nearly impossible
  • Errors happen deep in minified library code
  • Stack traces rarely point to actionable fixes
  • Server-side AI works reliably
  • High performance and consistent responses
  • No browser memory constraints
  • Easier to maintain and debug

The Server-Side Alternative

Switching to a server-only AI implementation made everything simpler and reliable. Using FastAPI and a Python AI library:

from fastapi import FastAPI, Request
import ollama

app = FastAPI()

@app.post("/rephrase")
async def rephrase_input(request: Request):
    data = await request.json()
    prompt = data.get("prompt")

    system_prompt = (
        f"Rewrite this sentence to be easier to say: \"{prompt}\""
    )

    response = ollama.generate(model="llama3", prompt=system_prompt)
    return {"response": response["response"]}
Enter fullscreen mode Exit fullscreen mode

Ten minutes to implement. Worked perfectly every time.


Conclusion

After weeks of struggling with client-side AI:

  • Text generation models: Complete failure
  • Classification models: Load but crash during execution
  • Server-side AI: Works reliably

Sometimes the best engineering decision is knowing when to quit fighting the technology. In my project, abandoning client-side AI and going server-only delivered:

  • Simplicity
  • Reliability
  • Maintainability
  • Consistent, high-quality AI responses

The dream of fully private, client-side AI remains just that—a dream. But practical AI that works every time is achievable when moved to the server.

Top comments (0)