DEV Community

tommy
tommy

Posted on

What If the Page Could Detect Data Formats Without Copying? — Adding Page Scan + JWT Detection to a Chrome Extension

The Limits of Clipboard Detection

In my previous article, I built a Chrome extension that auto-detects clipboard data formats. Copy → click icon → detect → open in tool. Handy.

After using it for a while, something bugged me.

I was reading a Qiita article. A JSON API response was displayed in a <pre> block. Deeply nested, hard to read. I wanted to format it.

I selected the JSON with my mouse. Ctrl+C. Clicked the extension icon. Popup opened. "Format JSON" appeared. Clicked. Tool opened and formatted it.

Wait — the data is already on screen. Why do I need to copy it?

There was another problem. Copying JSON from Japanese tech blogs introduced smart quotes ("""") and trailing commas. The CMS silently converts quotes to "pretty" curly quotes. On screen it looks like "name", but the clipboard contains \u201Cname\u201D. JSON.parse() fails instantly.

"If I scan the code blocks directly from the page, no copying needed, and no CMS interference."

Built it.


Chrome extension PureMark Detect is available on the Chrome Web Store.

What Changed in v1.2.0

Three new features.

1. Inline Preview

You can now see conversion results right inside the popup. JSON gets formatted, Base64 gets decoded, timestamps get converted. No need to open the full tool when you just want a quick peek.

Copy button included — paste the result anywhere.

2. Page Scan

Hit "Scan this page" in the popup, and it scans all <pre> and <code> elements on the page. For blocks containing JSON, Base64, URL Encoded text, Unified Diff, or Unix Timestamps, action buttons are injected below the code block.

No copying. No selecting. Just click the button under the code block you're interested in.

3. JWT Detection

v1.2.0 adds JWT (JSON Web Token) detection. When it finds a token starting with eyJhbGciOi..., it decodes the header and payload right in the inline preview. With expiration badge.

JWT is Base64url-encoded, so it conflicts with existing Base64 detection. The fix was simple. Since JWT headers encode {"alg":...} in Base64url, they always start with eyJ. A regex checks this first:

// JWT detection — prioritized over Base64
test: (text: string) =>
  /^eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$/
    .test(text.trim()),
Enter fullscreen mode Exit fullscreen mode

eyJ + three dot-separated parts is enough to accurately separate JWT from Base64. Both header and payload start with eyJ (= Base64url of {), which is a structural characteristic of JWT. Just prioritizing JWT → Base64 in the detection order was enough for coexistence.

Decoded results link to JWT Decoder for claim explanations and expiration visualization.


Content Script Blew Up on import

Page scanning requires injecting code into the page as a Content Script. The detection logic was already implemented in detectors.ts on the popup side, so I just imported it —

Uncaught SyntaxError: Cannot use import statement
Enter fullscreen mode Exit fullscreen mode

Content Scripts don't load as ES Modules.

Manifest V3 Service Workers support ES Modules, but Content Scripts still use the legacy script injection approach. import statements don't work.

To share detection logic between popup and Content Script, a separate build for the Content Script is needed.

// vite.config.content.ts — Content Script dedicated build
export default defineConfig({
  build: {
    outDir: 'dist',
    emptyOutDir: false, // Don't wipe the main build output
    lib: {
      entry: resolve(__dirname, 'src/content/scanner.ts'),
      formats: ['iife'], // ← Self-contained bundle
      fileName: () => 'content-scanner.js',
    },
  },
});
Enter fullscreen mode Exit fullscreen mode

Bundling as iife (Immediately Invoked Function Expression) eliminates all import statements and inlines everything into a single file. The build command becomes two-stage:

vite build && vite build --config vite.config.content.ts
Enter fullscreen mode Exit fullscreen mode

First build for popup and Service Worker, second for Content Script. Forget emptyOutDir: false and the first build's output gets nuked.


The Smart Quote Battle

The real reason for implementing page scan was that clipboard-based detection was unreliable.

Copying JSON from Japanese tech blogs produced this:

{"name": "value"}   What you see on screen
{"name": "value"}   What actually lands in clipboard
Enter fullscreen mode Exit fullscreen mode

Indistinguishable. But "" (U+201C, U+201D) is not "" (U+0022). JSON.parse() fails immediately.

The fix: Unicode normalization before detection:

function normalizeQuotes(text: string): string {
  return text
    .replace(/[\u201C\u201D\u201E\u201F\u2033\u2036\uFF02]/g, '"')
    .replace(/[\u2018\u2019\u201A\u201B\u2032\u2035\uFF07]/g, "'");
}
Enter fullscreen mode Exit fullscreen mode

7 varieties of double quote variants and 7 single quote variants. This alone dramatically improved JSON detection from Japanese blogs.

Plus trailing comma removal and { {...}, {...} }[{...}, {...}] array bracket correction. Rescuing the "slightly broken JSON" that blog CMSes generate.

Page scan reads raw DOM text directly, bypassing CMS smart quote conversion. Try it with PureMark Detect.


"Hello World" Detected as Base64

After implementing page scan and JWT detection, I tested on various sites.

One page had Hello World in a code block. Detected as Base64.

"Hello World" is not Base64.

But character-wise, HelloWorld matches [A-Za-z0-9+/]. atob() doesn't throw either. The two-stage filter of regex + actual decode verification wasn't enough.

Final solution:

function isBase64(text: string): boolean {
  const trimmed = text.trim();
  if (trimmed.length < 16) return false;
  if (/^[A-Za-z]+ [A-Za-z]/.test(trimmed)) return false; // Exclude English text
  if (!/^[A-Za-z0-9+/\s]+=*$/.test(trimmed)) return false;
  try { atob(trimmed.replace(/\s/g, '')); return true; }
  catch { return false; }
}
Enter fullscreen mode Exit fullscreen mode

Minimum length raised to 16 characters, and "English word + space + English word" pattern rejected upfront. Page scan processes many code blocks, so it needs to be far stricter about false positives than clipboard detection.


Summary

Before (v1.0):

  • Select text → Copy → Click icon → Detect → Open tool
  • Smart quote contamination breaks detection
  • Must open full tool to see results
  • JWT misdetected as Base64

After (v1.2):

  • "Scan this page" → Auto-detect code blocks → Button click to jump to tool
  • No copying. No selecting. Reads raw DOM, bypassing smart quote issues
  • Inline preview shows conversion results without leaving the page
  • JWT correctly detected with header/payload decode display

Technically, the four key challenges were: Content Script IIFE builds, Unicode normalization, Base64 false positive prevention, and JWT detection priority design. All born from "it broke when I actually used it" — none predictable at design time.

PureMark Detect — Chrome Web Store | PureMark — Zero-click start, simply.

Top comments (0)