DEV Community

Cover image for How to Create a Browser Extension that Runs an LLM Locally
Maksim Balabash
Maksim Balabash

Posted on

How to Create a Browser Extension that Runs an LLM Locally

The Internet always felt like a little crazy person, but in a good way. It still feels this way, but after recent years of expansion, it seems like that person is going through some mental health issues currently (or a transformation, like puberty).

When reading online content today, it's unfortunately common to realize that the text has a hidden agenda attempting to draw you into someone's camp. And AI has only made the situation worse.

However, AI creates numerous opportunities, leading to the emergence of new tools and services daily.

WebLLM is one such tool that makes it incredibly easy to use LLM locally in your browser. We will create a Chrome extension that tries to uncover ideological markers, promoted values, emotional manipulations, and logical fallacies ​​behind the texts we read (here's a link to the repository).

We want an extension that:

  • extracts content from the page, a selected fragment, or textual information from an image
  • analyzes the content and identifies:
    • ideological influences
    • promoted values
    • emotional manipulations
    • logical fallacies
  • presents a concise report summarizing its findings

What can extensions do?

  • modify and enhance pages
  • read and analyze content of a page
  • automate browser tasks
  • interact with browser tabs and windows
  • integrate 3d party APIs or services
  • provide UI within the browser
  • run tasks in the background
  • enhance security and privacy
  • etc.

What limitations do extensions have?

  • can't directly access computer's files outside of special user interactions
  • extensions must request permissions to access certain data (pages, cookies, history), ensuring privacy and transparency
  • they run within the browser context and cannot modify system settings directly

At this point, we are good to go. Gathering data from a page, processing it in the background, and displaying insights in the popup is possible.

Anatomy of an extension

  1. Manifest
    • defines permissions, scripts, settings, resources, and security policies
  2. Background script
    • runs initially upon extension installation or browser startup
    • continuously listens for events, manages extension-wide tasks, and maintains global state
    • can be paused or stopped by the browser after inactivity
  3. Content script
    • injected automatically into specified pages whenever such page loads or reloads
    • has direct access to the page's DOM, allowing modifications or reading of page data
    • destroyed when the page is closed or refreshed
  4. Popup script
    • controls the popup UI
    • runs when users click your extension's toolbar icon inside its own separate, temporary context
    • immediately terminates when the popup is closed

WebLLM

WebLLM is a project built upon MLC LLM (Machine Learning Compilation for Large Language Models).

MLC LLM compiles and runs code on a unified, high-performance, cross-platform LLM inference engine, leveraging hardware acceleration (like GPUs or optimized CPU instructions) to ensure faster inference times.

This is how they describe themselves: "The mission of this project is to enable everyone to develop, optimize, and deploy AI models natively on everyone's platforms." Kudos to them!

Visualisation of how MLC LLM works

WebLLM includes all the essential features you might need, such as in-browser inference, full OpenAI API compatibility, extensive model support, custom model integration, streaming, and more.

Three interesting directions that this tool opens up:

  • we can enhance privacy by analyzing sensitive data on-device, such as personal notes, emails, or private documents, without sending any information externally
  • our apps can deliver advanced AI services even in offline situations or areas with low connectivity
  • as models improve, we can personalize experiences and offer more interactive features to users without incurring additional costs for cloud-based AI inference.

Now that we have covered all the basics, let's move on to the implementation!

Choosing the Tech Stack

  • Manifest v3 (to comply with modern browser standards)
  • WebLLM (for executing ML models directly in-browser)
  • Tesseract.js (for OCR capabilities to extract text from images)
  • @mozilla/readability (to extract cleaned text from a page's DOM)
  • TypeScript (to make our life easier)
  • Vite (to build and bundle project files)

Model controller

Although we can run models locally, it would be nice to have OpenAI and Anthropic models as well. This would allow us to test additional models and make our extension compatible with older devices (which may have poor performance when running LLM on their hardware).

We can accomplish this by using the strategy pattern. We will put specific details of each implementation into separate classes, all of which will implement the same abstract interface that defines the model API, making their objects interchangeable.

This is the interface that every model provider must implement:

export interface LLMStrategy {
  id: string
  contextWindowSize: number
  getCompletions(request: CompletionRequest): Promise<CompletionResponse>
  stopGeneration(): void
}
Enter fullscreen mode Exit fullscreen mode

This is our entry point when working with the model:

// src/scripts/model.ts

export class ModelController {
  private strategy: LLMStrategy | null = null

  constructor(strategy?: LLMStrategy) {
    if (strategy) {
      this.strategy = strategy
    }
  }

  setStrategy(strategy: LLMStrategy) {
    this.strategy = strategy
  }

  getContextWindowSize() {
    if (!this.strategy) {
      throw new Error('To obtain the context window size, the model strategy must be set')
    }

    return this.strategy.contextWindowSize
  }

  getModelId() {
    if (!this.strategy) {
      throw new Error('To obtain the model id, the model strategy must be set')
    }

    return this.strategy.id
  }

  async getCompletions(request: CompletionRequest): Promise<CompletionResponse> {
    if (!this.strategy) {
      throw new Error('To get completions, the model strategy must be set')
    }

    try {
      return this.strategy.getCompletions({ temperature: 0.99, maxTokens: 4000, ...request })
    } catch (error) {
      throw new Error('Failed to get completions', { cause: error })
    }
  }

  stopGeneration() {
    if (!this.strategy) {
      throw new Error('To stop completions generation, the model strategy must be set')
    }

    return this.strategy.stopGeneration()
  }
}
Enter fullscreen mode Exit fullscreen mode

And here's how we use it:

// src/scripts/model.ts
const model = new ModelController()

if (
  !config.modelToUse.length ||
  config.modelToUse === 'WEBLLM_LLAMA' ||
  config.modelToUse === 'WEBLLM_QWEN' ||
  config.modelToUse === 'WEBLLM_DEEPSEEK_LLAMA' ||
  config.modelToUse === 'WEBLLM_MISTRAL'
) {
  const localModelStrategy = new WebLLMStrategy()
  await localModelStrategy.initModel(config.modelToUse)
  model.setStrategy(localModelStrategy)
} else if (config.modelToUse === 'OPENAI') {
  model.setStrategy(new OpenAIStrategy(config.openAiApiKey))
} else if (config.modelToUse === 'ANTHROPIC') {
  model.setStrategy(new AnthropicStrategy(config.anthropicApiKey))
}

...

const response = await model.getCompletions({ ... })
Enter fullscreen mode Exit fullscreen mode

The optimal way to utilize models through WebLLM inside an extension is by using the CreateExtensionServiceWorkerMLCEngine, which supports caching, webGPU, and more.

First, we need to register the worker in the background script of our extension:

// src/scripts/background.ts
... 

let webLLMWorker: ExtensionServiceWorkerMLCEngineHandler | undefined

chrome.runtime.onConnect.addListener(port => {
  if (webLLMWorker === undefined) {
    webLLMWorker = new ExtensionServiceWorkerMLCEngineHandler(port)
  } else {
    webLLMWorker.setPort(port)
  }

  port.onMessage.addListener(message => {
    webLLMWorker?.onmessage.bind(webLLMWorker)(message)
  })
})

...
Enter fullscreen mode Exit fullscreen mode

Here's a lightweight example of how this can be used:

const webllm: MLCEngineInterface = await CreateExtensionServiceWorkerMLCEngine('Llama-3.2-3B-Instruct-q4f16_1-MLC')

const completion = await webllm.chat.completions.create({
  stream: true,
  temperature: 1,
  response_format: { type: 'json_object' },
  messages: [...]
})
Enter fullscreen mode Exit fullscreen mode

That's all we need. Yep, it's as simple as that!

OpenAIStrategy and AnthropicStrategy essentially create different objects and call the corresponding APIs.

You can find the complete code for implementing each model strategy here (it's pretty straightforward).

Settings

Settings UI of the Undoctrinator app

Here are two code fragments that update the extension's settings in browser storage and render the actual configuration:

// src/scripts/settings.ts

export class ConfigManager {
  private static readonly POSSIBLE_MODELS: ModelType[] = [
    'WEBLLM_LLAMA',
    'WEBLLM_MISTRAL',
    'WEBLLM_QWEN',
    'WEBLLM_DEEPSEEK_LLAMA',
    'OPENAI',
    'ANTHROPIC'
  ]

  static validateModelType(model: string): model is ModelType {
    return this.POSSIBLE_MODELS.includes(model as ModelType)
  }

  static async saveModelChoice(model: ModelType): Promise<void> {
    await chrome.storage.sync.set({ modelToUse: model })
  }

  static async saveApiKey(provider: 'openai' | 'anthropic', key: string): Promise<void> {
    const storageKey = provider === 'openai' ? 'openAiApiKey' : 'anthropicApiKey'
    await chrome.storage.sync.set({ [storageKey]: key })
  }
}

export class SettingsUIManager {
  private elements: {
    modelInputs: NodeListOf<HTMLInputElement>
    openAiInput: HTMLInputElement
    anthropicInput: HTMLInputElement
  }

  constructor() {
    const modelInputs = document.querySelectorAll<HTMLInputElement>('input[name="modelToUse"]')
    const openAiInput = document.getElementById('openAiApiKeyInput') as HTMLInputElement
    const anthropicInput = document.getElementById('anthropicApiKeyInput') as HTMLInputElement

    if (!modelInputs.length || !openAiInput || !anthropicInput) {
      throw new Error('Required UI elements not found')
    }

    this.elements = { modelInputs, openAiInput, anthropicInput }
  }

  initializeUI(config: ExtensionConfig): void {
    this.elements.modelInputs.forEach(input => {
      input.checked = input.value === config.modelToUse
    })
    this.elements.openAiInput.value = config.openAiApiKey
    this.elements.anthropicInput.value = config.anthropicApiKey

    this.updateInputStates(config.modelToUse)
  }

  setupEventListeners(): void {
    this.elements.modelInputs.forEach(input => {
      input.addEventListener('change', () => this.handleModelChange(input))
    })

    this.elements.openAiInput.addEventListener('change', e =>
      this.handleApiKeyChange('openai', (e.target as HTMLInputElement).value)
    )

    this.elements.anthropicInput.addEventListener('change', e =>
      this.handleApiKeyChange('anthropic', (e.target as HTMLInputElement).value)
    )
  }

  private updateInputStates(modelToUse: ModelType): void {
    const isWebLLM = modelToUse.startsWith('WEBLLM_')

    this.elements.openAiInput.disabled = isWebLLM || modelToUse === 'ANTHROPIC'
    this.elements.anthropicInput.disabled = isWebLLM || modelToUse === 'OPENAI'
  }

  private async handleModelChange(input: HTMLInputElement): Promise<void> {
    if (input.checked && ConfigManager.validateModelType(input.value)) {
      await ConfigManager.saveModelChoice(input.value as ModelType)
      this.updateInputStates(input.value as ModelType)
    }
  }

  private async handleApiKeyChange(provider: 'openai' | 'anthropic', value: string): Promise<void> {
    await ConfigManager.saveApiKey(provider, value)
  }
}
Enter fullscreen mode Exit fullscreen mode

We can now easily switch between different models for convenience and to evaluate the output quality of each one.

Content extraction

To retrieve any information from the page, we need to request it from the content or background script, which will then pass the data back to the popup context, where we can process it. That's how things work in extensions.

We request it by emitting specific events that scripts will treat and handle as commands. Our extension has three trigger buttons (two in the popup and one in the context menu) that activate those events. As a result, the text analysis function is called with user data:

// src/scripts/user-input.ts

...

export class UserInputController {
  ...

  handleImageInput = async (event: Event, onDataReady: analyzeContentFn) => {
    const input = event.target as HTMLInputElement
    const files = input.files
    if (!files || !this.ocrWorker) {
      return
    }

    const response = await this.ocrWorker.recognize(files[0])
    const content = removeExtraWhitespaces(response.data.text || '')

    await onDataReady(content, 'image')
  }

  setupEventListenerForTextSelection = (onDataReady: analyzeContentFn) => {
    const processSelection = async (selection: Record<string, unknown>) => {
      if (selection && selection.text && typeof selection.text === 'string') {
        await onDataReady(removeExtraWhitespaces(selection.text), 'selection')
      }
    }

    chrome.runtime.onMessage.addListener(message => {
      if (message.type === 'pendingSelectionReady') {
        chrome.runtime.sendMessage({ type: 'getPendingSelection' }, processSelection)
      }
    })
  }

  handlePageContentInput = async (onDataReady: analyzeContentFn) => {
    const documentClone = await this.getPageContent()
    const article = new Readability(documentClone).parse()
    const mainContent = removeExtraWhitespaces(
      `${article?.title || ''}\n${article?.textContent || ''}`
    )

    await onDataReady(mainContent, 'page')
  }

  private async getPageContent(timeout: number = 5000): Promise<Document> {
    if (!this.contentScriptConnection) {
      throw new Error('Connection to content script not established')
    }

    return new Promise((resolve, reject) => {
      const timeoutId = setTimeout(
        () => reject(new Error('Request timeout: Unable to retrieve page content')),
        timeout
      )

      const handleResponse = (response: { type: string; contents: unknown }) => {
        clearTimeout(timeoutId)

        const doc = this.domParser.parseFromString(response.contents as string, 'text/html')
        this.contentScriptConnection!.removeMessageHandler('pageContent', handleResponse)

        resolve(doc)
      }

      this.contentScriptConnection!.addMessageHandler('pageContent', handleResponse)
      this.contentScriptConnection!.sendMessage({ type: 'getPageContent' })
    })
  }
}

...

Enter fullscreen mode Exit fullscreen mode

Prompt

We reached the point where we can obtain the text the user wants us to process, and we can run local and cloud-based LLMs. Let's create a prompt to guide the model in extracting the insights we need about the text.

// src/scripts/prompt.ts

export function craftInstructions(text: string): Prompt {
  if (!text.trim()) {
    throw new Error('Text parameter is required and cannot be empty')
  }

  const intensityKeywords = ['NONE', 'WEAK', 'MODERATE', 'STRONG', 'EXTREME'].join(' | ')

  const prompt = {
    system: `
        You are a media analysis assistant specializing in identifying manipulation tactics and bias in text.
        Your task is to analyze the provided text and generate a structured, evidence-based report in JSON format.
        Follow the detailed framework below to ensure comprehensive analysis:
        1. Requirements:
          - Evaluate both explicit and subtle manipulation tactics.
          - Base your analysis on clear, evidence-backed reasoning.
          - Include direct quotes for all identified elements.
          - Provide a concise explanation for each identified tactic, justifying why it qualifies as manipulation or bias.
          - Rate intensity levels as: ${intensityKeywords}.
          - Your response must be formatted as a properly structured JSON object.
        2. Analysis Framework:
          A) Claims and Logic:
            - Identify unsupported or unverifiable claims.
            - Detect logical fallacies.
            - Evaluate the coherence and consistency of arguments.
          B) Emotional Manipulations:
            - Recognize sensationalist language and hyperbole.
            - Identify emotional manipulation strategies.
            - Spot oversimplifications of complex issues.
            - Detect clickbait elements or misleading headlines.
            - Highlight loaded terms, charged language, or exaggerated claims.
            - Note out-of-context data, meaningless comparisons, or correlation-causation fallacies.
            - Flag anonymous or vague sources and appeals to false authority.
            - Identify instances of missing context or lack of diverse perspectives.
            - Assess fear-mongering, urgency creation, bandwagon effects, or social proof manipulation.
            - Evaluate false scarcity tactics, gaslighting techniques, guilt-tripping, and in-group vs. out-group framing.
          C) Political Ideology:
            - Detect ideological markers or affiliations.
            - Assess the strength and influence of ideological bias.
          D) Promoted Values:
            - Highlight explicitly or implicitly promoted values or agendas.
        3. Output Format:
          {
            ideologicalStrength: ${intensityKeywords},
            emotionalManipulationStrength: ${intensityKeywords},
            ideologicalMarkers: { quote: string, explanation: string, associatedIdeology: string }[],
            promotedValues: { quote: string, explanation: string, associatedIdeology: string }[],
            emotionalManipulations: { quote: string, explanation: string, targetEmotion: string }[],
            logicalFallacies: { quote: string, explanation: string, fallacyType: string }[],
          }
    `,
    user: `
      TEXT TO ANALYZE:
      """
      ${text}
      """
    `
  }

  prompt.system = removeExtraWhitespaces(prompt.system)
  prompt.user = removeExtraWhitespaces(prompt.user)

  return prompt
}
Enter fullscreen mode Exit fullscreen mode

The full code for generating the model instructions can be found here.

Putting it all together

The main component of almost every extension is its popup, and here is how it looks in our case:

// src/scripts/popup.ts

...

const modelController = new ModelController()
const userInputController = new UserInputController()
const renderer = RenderController.getInstance()

async function initializePopup() {
  try {
    const tab = await getActiveTab()
    chrome.runtime.connect({ name: 'popup' })
    const config = await getExtensionConfigFromStorage()

    // initialize user input controller
    await Promise.all([
      ensureContentScriptInjected(tab.id!),
      userInputController.initTools(tab.id!)
    ])

    // bind input handlers
    renderer.setupEventListeners({
      onPageInput: () => userInputController.handlePageContentInput(analyzeContent),
      onImgInput: event => userInputController.handleImageInput(event, analyzeContent),
      registerSelectionInputHandler: () =>
        userInputController.setupEventListenerForTextSelection(analyzeContent),
      onCancel: () => modelController.stopGeneration()
    })

    // initialize model controller
    await renderer.transition(AppStates.modelInit)
    await modelController.initialize(config)
    await renderer.transition(AppStates.modelReady)

    await renderer.transition(AppStates.idle)
  } catch (error) {
    console.error('Error initializing the app:', error)

    const { name, message } = error as Error
    await renderer.transition(AppStates.error, { name, message })
  }
}

const analyzeContent: analyzeContentFn = async content => {
  try {
    if (!content.length) {
      throw new Error('Content and sourceType are required to perform the analysis')
    }

    const startTime = Date.now()
    await renderer.transition(AppStates.analysis, { initial: true })

    const metrics = await extractMetricsFromContent(content, modelController)
    if (metrics.length === 0 && renderer.currentState === AppStates.idle) {
      // user has canceled the operation
      return
    }

    await renderer.transition(AppStates.report)
    const report = await generateReport(metrics)

    await renderer.transition(AppStates.result, { data: report })
    console.info('Result:', report, 'done in:', (Date.now() - startTime) / 1000, 'seconds')
  } catch (error) {
    console.error('Error analyzing the content:', error)

    const { name, message } = error as Error
    await renderer.transition(AppStates.error, { name, message })
  }
}

window.onload = async () => {
  try {
    await initializePopup()
  } catch (error) {
    console.error('Error initializing app:', error)

    const { name, message } = error as Error
    await renderer.transition(AppStates.error, { name, message })
  }
}
Enter fullscreen mode Exit fullscreen mode

I intentionally left some fragments out to avoid unnecessary details. Here is a link to the repository where you can find the full version of the code.

Building is done using Vite, effectively running the TypeScript compiler, minimizing JavaScript, and copying asset files. The packaging puts all the essential files into a .zip archive.

Here are some helpful resources to share your extension with others: Chrome Extensions Best Practices, Creating a great listing page, Publish in the Chrome Web Store.

Afterword

I must admit that, after conducting diverse tests (using different models and texts), LLMs themselves exhibit numerous biases. Various models have shown differing attitudes towards a range of subjects. This is something we will all have to deal with in the future, but that is a topic for another time.

I also noticed some performance issues with most of the models, especially with larger ones (>=7B) on the M1 MacBook Pro. I'm not sure if this is due to WebLLM's potential current code inefficiencies or if this is the current way it functions on such hardware. I did not have time to get into it (maybe next time).

Another observation is that I'm terrible at UI design, and this project gave me a lot of trouble trying to create something that would work (although I still don't like how it looks). If you feel my pain and have a positive experience with any UI/UX design automation tools like uizard, usegalileo, etc, I would love to hear more about it from you.

The main takeaway from this project is that tools like WebLLM and transformers.js open up a ton of opportunities, WebNN is on the way and I'm pretty sure I missed some of the others.

Have fun hacking! and thanks for your attention 👋

Top comments (0)