Wesley Smith

Posted on Mar 6

How I Built a Chrome Extension That Summarizes Any Article in 2 Seconds Using AI

#chromeextension #javascript #ai #productivity

You know the workflow. You find a 12-minute article. You want the gist. So you Ctrl+A the whole page, paste it into ChatGPT, type "summarize this," and wait. Ten seconds. Fifteen seconds. Finally, a wall of text comes back that's somehow almost as long as the original article.

Then you do it again on the next tab. And the next one. And by the fourth article, you've spent more time summarizing than you would have spent just reading.

I got tired of this loop. Not because the AI part was bad -- GPT does a fine job summarizing -- but because the workflow was broken. The friction of copy, switch tab, paste, wait, read, switch back... it adds up. So I built a Chrome extension called TLDR that does the whole thing in one click, in about two seconds. Here is what I learned building it.

What TLDR Does

Click the extension icon. Get a summary. That's it.

Behind one click, the extension extracts the article text from the page, sends it to an LLM, and renders a summary with key bullet points in a clean popup. It caches results so revisiting an article is instant. And it gives you 36 different summary "styles" -- four tones, three lengths, three focus areas -- so the output actually matches how you think, not how a generic chatbot defaults.

Architecture: Four Moving Parts

TLDR is a Manifest V3 Chrome extension with four components that talk to each other through message passing:

[Content Script] --> extracts article from DOM
       |
       | chrome.tabs.sendMessage
       v
[Popup Script] --> orchestrates the flow, renders UI
       |
       | chrome.runtime.sendMessage
       v
[Service Worker] --> checks cache, calls AI, stores results
       |
       | fetch()
       v
[Groq API] --> LLaMA 3.1 inference

The content script runs on every page and exposes an article extraction function. The popup is the entry point -- when the user clicks the icon, it asks the content script to extract the article, then sends it to the service worker for summarization. The service worker handles caching, settings, and the actual API call. And Groq's API does the inference.

This separation is not just for cleanliness. Manifest V3 forces it. Content scripts can access the DOM but not extension APIs. Service workers can make API calls but cannot touch the DOM. The popup bridges the two.

Article Extraction: The Surprisingly Hard Part

The first version of this extension used document.body.innerText. It was terrible. You get nav links, cookie banners, sidebar widgets, comment sections, footer text, ad copy -- basically everything except the actual article.

Naive DOM parsing fails because modern web pages are 80% chrome and 20% content. A typical news article page might have 50,000 characters of HTML, of which maybe 5,000 are the story you want to read.

The solution is Mozilla's Readability.js -- the same library that powers Firefox's Reader View. It uses a scoring algorithm that analyzes DOM nodes by their tag names, class names, content density, and position to identify the most likely "article" element. It is battle-tested on millions of pages.

The extraction pipeline looks like this:

import { Readability, isProbablyReaderable } from '@mozilla/readability';
import DOMPurify from 'dompurify';

export function extractArticle() {
  // Bail early if this page is not article-shaped
  if (!isProbablyReaderable(document)) {
    return { success: false, error: 'not_article' };
  }

  // Clone the DOM so Readability's mutations don't affect the live page
  const documentClone = document.cloneNode(true);

  // Strip noise before Readability even sees it
  documentClone.querySelectorAll('script, style, noscript, iframe')
    .forEach(el => el.remove());

  const reader = new Readability(documentClone, {
    charThreshold: 100,
    keepClasses: false,
    nbTopCandidates: 5,
  });

  const article = reader.parse();
  // ...sanitize with DOMPurify, calculate reading time, return
}

A few things worth noting. First, isProbablyReaderable is a lightweight pre-check -- if you are on a Google search results page or a login form, it rejects fast without doing the expensive parse. Second, we clone the entire document because Readability mutates the DOM during parsing (it removes elements, restructures nodes). If you pass it the live document, the page breaks. Third, even after Readability extracts the article, we run the title through DOMPurify with ALLOWED_TAGS: [] to strip any injected HTML. You would be surprised what some CMSes put in <title> tags.

The charThreshold: 100 setting is worth calling out. The default is 500, which causes Readability to reject shorter articles. Lowering it to 100 means we can summarize brief blog posts that the default config would skip.

The AI Layer: Why Groq, Not OpenAI

The first prototype used OpenAI's API. It worked. It was also slow. A typical summarization call took 8-15 seconds with GPT-3.5 Turbo, and the free tier is... not free. For a browser extension where the entire value proposition is speed, that was a dealbreaker.

Groq runs LLaMA 3.1 8B on custom LPU hardware. Same call, roughly two seconds. And the free tier gives you 30 requests per minute with generous daily limits. For a summarization task where you don't need GPT-4-level reasoning -- you need fast, competent text compression -- it is the right tool.

The API is OpenAI-compatible, so switching was a one-line URL change:

const GROQ_API_URL = 'https://api.groq.com/openai/v1/chat/completions';
const DEFAULT_MODEL = 'llama-3.1-8b-instant';

We request structured JSON output with response_format: { type: 'json_object' }, which means every response comes back as parseable JSON with summary, keyPoints, and tone fields. No regex extraction of markdown. No hoping the model follows your format. Structured output just works.

36 Summary Styles: The Prompt Engineering

Most summarizers give you one output format. TLDR gives you 36 combinations: 4 tones (witty, professional, casual, academic) times 3 lengths (one-liner, brief, detailed) times 3 focus areas (key facts, opinions, implications). Each combination gets a distinct system prompt assembled at runtime.

The prompt builder composes these modularly:

export function buildSystemPrompt(settings = {}) {
  const tonePreset = TONE_PRESETS[settings.tone];
  const lengthPreset = LENGTH_PRESETS[settings.length];
  const focusPreset = FOCUS_PRESETS[settings.focus];

  return `You are TLDR, a brilliant summarizer...

STYLE: ${tonePreset.instruction}
LENGTH: ${lengthPreset.instruction}
FOCUS: ${focusPreset.instruction}
...`;
}

Each preset carries its own instruction text and few-shot examples. The witty preset says "Use wordplay, irony, or unexpected angles." The academic preset says "Acknowledge complexity, use precise terminology." The length presets are aggressively specific because LLMs tend to under-generate:

brief: {
  instruction: 'Write EXACTLY 2-3 complete sentences totaling 30-40 words. '
    + 'You MUST use AT LEAST 30 words. If your first draft is shorter, '
    + 'expand with context, significance, or relevant details.',
  maxTokens: 200,
  targetWords: 35,
}

This specificity came from testing. The comment at the top of the prompts file tells the story: "TUNED based on variation test results (34 articles, 283 API calls)." Early versions would ask for "a brief summary" and get back 8 words. You have to spell out minimums, use words like "MUST" and "NEVER," and give concrete sentence counts. LLMs respect constraints they can count.

One particularly stubborn problem: opening variety. Early testing showed that the model started 70%+ of summaries with "[Topic] is..." -- "AI is transforming healthcare," "The study is groundbreaking." Every summary opened the same way. The fix was adding explicit anti-patterns to the prompt:

BAD patterns to NEVER use:
- "[Topic] is..." or "[Topic] are..." (boring, every AI does this)
- "This article discusses..." (passive, meta)
- "The key takeaways are..." (robotic, predictable)

Combined with positive opening strategies ("Lead with the most surprising finding," "Start with an action or consequence"), this dramatically improved variety.

Manifest V3: The Service Worker Problem

If you have built Chrome extensions before, Manifest V3's service worker model is the biggest architectural change. In MV2, you had a persistent background page. In MV3, the service worker can be terminated at any time when idle.

This has one critical implication for message handling: you must register your onMessage listener at the top level of the service worker, synchronously, during initial execution. If you try to register it inside an async init function or after an await, Chrome might terminate the worker before your listener is set up.

// MUST be at top level for MV3 -- not inside an async function
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  handleMessage(message, sender)
    .then(sendResponse)
    .catch(error => {
      sendResponse({ success: false, error: { ... } });
    });
  return true; // Keep the message channel open for async response
});

That return true is easy to forget and brutal to debug. Without it, the message channel closes before your async handler resolves, and sendResponse silently fails. The popup just hangs.

The message passing architecture uses a typed message system where each message has a type field (SUMMARIZE, GET_CACHED_SUMMARY, SAVE_SETTINGS, etc.) that gets routed through a switch statement. It is simple, explicit, and easy to trace when debugging.

Smart Caching

Nobody wants to wait two seconds for a summary they already generated. The caching layer uses a hash of the URL as the cache key:

_hashUrl(url) {
  let hash = 0;
  for (let i = 0; i < url.length; i++) {
    const char = url.charCodeAt(i);
    hash = (hash << 5) - hash + char;
    hash = hash & hash; // Convert to 32-bit integer
  }
  return 'url_' + Math.abs(hash).toString(36);
}

It is a DJB2-style hash -- not cryptographically secure, but fast and collision-resistant enough for 100 cache entries. We use chrome.storage.local for the cache (per-machine, 10MB quota) and chrome.storage.sync for settings (synced across devices, 100KB quota).

Cache entries expire after 24 hours, and when the cache hits 100 entries, the oldest gets evicted. The eviction is simple -- find the entry with the smallest timestamp and delete it. No LRU linked list, no priority queue. For 100 entries, a linear scan is fine.

When the user hits "Regenerate," we pass forceRefresh: true which bypasses the cache, generates a fresh summary, and overwrites the cached version.

What I Learned

The hard part was extraction, not AI. I spent more time debugging Readability edge cases -- paywalled sites, SPAs that load content after DOMContentLoaded, pages with multiple article-like sections -- than I spent on the AI integration. The Groq API just works. Getting clean text out of the wild web is the real challenge.

Users care about speed more than summary quality. The jump from 10+ seconds (OpenAI) to ~2 seconds (Groq) changed everything. At 10 seconds, people wonder if it is worth the wait. At 2 seconds, it feels instant, and they use it reflexively. The quality difference between GPT-3.5 and LLaMA 3.1 8B for summarization is marginal; the speed difference is transformative.

Groq's free tier is viable for production. Thirty requests per minute with no credit card required. For a browser extension where each user generates maybe 10-20 summaries per day, this is more than sufficient. If you are building a developer tool or personal productivity app, you do not need to spin up your own inference server.

Manifest V3 is an improvement, but the migration pain is real. The service worker lifecycle, the restricted API surface, the new permissions model -- they all push you toward better patterns (no persistent background state, explicit permissions, declarative APIs). But the documentation assumes you already know what changed, and debugging a terminated service worker is not fun.