DEV Community

Cover image for I Built a Browser Inside a Browser, And It Translates Websites Without Breaking a Single Pixel
Pavitra
Pavitra

Posted on

I Built a Browser Inside a Browser, And It Translates Websites Without Breaking a Single Pixel

I was reading a beautifully designed Japanese tech blog. Clean typography. Perfect spacing. The kind of CSS that makes you want to shake the developer's hand.

Then I hit Google Translate.

The sidebar collapsed. The navigation overflowed. A button that once said "送信" now said "Transmission" and was three times wider than its container. The whole page looked like it had been through a blender.

I closed the tab. Stared at my ceiling. And thought:

"What if I could translate websites... without destroying them?"

That question cost me sleep, sanity, and one perfectly good weekend. But it also produced LingoLens — a translation browser that preserves the original design while swapping languages in real-time.

The part I'm most proud of? You can see your website in four different languages at the same time. Side by side. In a 2×2 grid. No layout shift.

Here's how I built it.

LingoLens Hero — Landing Page
The LingoLens landing page. Paste any URL. Choose Read Mode or Matrix Mode.


The Architecture

Before getting into the details, here's what LingoLens does at a high level:

Architecture diagram

The React app is the brain. The iframe is the body. They talk through window.postMessage.

Project structure for reference:

Lingo-lens/
├── app/
│   ├── page.tsx                    # Landing page
│   ├── read/[...url]/page.tsx      # Read Mode — single iframe proxy viewer
│   ├── matrix/[...url]/page.tsx    # Matrix Mode — 2×2 quad iframe viewer
│   ├── library/page.tsx            # Saved pages library
│   ├── api/proxy/route.ts          # The reverse proxy (Cheerio DOM surgery)
│   └── actions/
│       ├── translate.ts            # Single text translation (Lingo.dev)
│       ├── translateBatch.ts       # Batch translation (Lingo.dev)
│       ├── explain.ts              # AI Explain (Gemini + ch.at fallback)
│       ├── summarize.ts            # AI Summarize
│       ├── simplify.ts             # AI Simplify
│       └── meaning.ts              # AI Meaning (ch.at raw HTTPS)
├── components/
│   ├── TranslationPanel.tsx        # Side panel: search, edit, lock, export
│   ├── LanguageSelector.tsx        # 80+ language dropdown
│   └── query-provider.tsx          # TanStack React Query provider
├── lib/
│   ├── db.ts                       # IndexedDB schema (idb library)
│   ├── hooks/useLibrary.ts         # React Query hooks for persistence
│   ├── tts.ts                      # Text-to-speech with voice matching
│   └── languages.ts                # Language code → name mappings
└── public/
    └── translation-script.js       # 600+ lines injected into iframes
Enter fullscreen mode Exit fullscreen mode

1. 💰 The "Lazy Translation" Strategy — Only Translate What You See

This one saved me from going broke on API tokens.

A typical Wikipedia article has 500+ translatable text nodes. If you translate the entire page upfront, thats thousands of API tokens burned, 10-30 seconds of waiting, and a credit card bill that makes you reconsider your career.

So LingoLens only translates what's currently visible on screen.

Magic Wand — Viewport Translation
Hit the ✨ Magic Wand. Only visible elements translate. Scroll down, hit again — only NEW elements translate. Already-translated ones get skipped.

When you hit the ✨ Magic Wand, the injected translation-script.js runs a viewport intersection check instead of grabbing everything:

if (event.data.type === 'TRIGGER_BATCH_TRANSLATE') {
    const visibleElements = [];
    const allElements = document.querySelectorAll(
        'p, h1, h2, h3, h4, h5, h6, li, td, span, div'
    );

    allElements.forEach(el => {
        if (!isValidElement(el)) return;
        if (translatedTexts.has(getUniqueId(el))) return; // Skip already translated

        // Only grab elements in the viewport
        const rect = el.getBoundingClientRect();
        const isVisible = (
            rect.top < window.innerHeight &&
            rect.bottom > 0 &&
            rect.left < window.innerWidth &&
            rect.right > 0
        );

        if (isVisible) {
            visibleElements.push({
                id: getUniqueId(el),
                text: el.innerText.trim()
            });
        }
    });

    window.parent.postMessage({
        type: 'BATCH_TRANSLATE_REQUEST',
        payload: visibleElements
    }, '*');
}
Enter fullscreen mode Exit fullscreen mode

On a page with 500 text nodes, only ~20-40 are visible at any time. That's an 80-95% reduction in API calls per wand click.

It gets better — as you scroll and hit the wand again, it only translates the new visible elements. Already-translated ones are tracked in a Map() and skipped:

const translatedTexts = new Map(); // Cache: id → translated text
const originalTexts = new Map();   // Cache: id → original text

if (translatedTexts.has(id)) return; // never pay for the same string twice
Enter fullscreen mode Exit fullscreen mode

Architecture diagram 2

The isValidElement() function also filters out noise — hidden elements, scripts, inputs, <code> blocks, and elements that only contain numbers or punctuation. So we don't waste tokens translating a "©2024" in the footer.

Click-to-Translate and Marquee Select

Don't want to translate the whole viewport? Just click on a single paragraph. One API call, one element. Click again to toggle back to the original.


Hover over any text to see the translate tooltip. Click to translate just that element.

There's also a Marquee Select tool — draw a rectangle around any area and only elements inside your selection get translated. It uses getBoundingClientRect() intersection math:


Draw a rectangle. Only text elements intersecting your box get translated.

const intersectX = Math.max(0,
    Math.min(rect.right, boxRect.right) - Math.max(rect.left, boxRect.left)
);
const intersectY = Math.max(0,
    Math.min(rect.bottom, boxRect.bottom) - Math.max(rect.top, boxRect.top)
);

if (intersectX > 0 && intersectY > 0) {
    visibleElements.push({ id, text: el.innerText.trim() });
}
Enter fullscreen mode Exit fullscreen mode

Four levels of precision:

Method What Gets Translated Token Usage
Click a word/paragraph That single element 🟢 Minimal
Marquee Select an area Elements inside your rectangle 🟡 Targeted
Magic Wand Only visible viewport elements 🟠 Efficient
Translate entire page Everything 🔴 Wasteful

We don't do that last one. Ever.


2. 📐 See Your Website in Four Languages at Once — Matrix Mode

Imagine you're building a product that ships to Spain, Germany, Japan, and Saudi Arabia. You need to test your layout in all four languages. Normally that means opening four tabs, switching languages one by one, taking screenshots, and praying nothing broke.

With Matrix Mode, you paste one URL and see all four simultaneously:

Matrix Mode — 2×2 multi-language preview

Spanish top-left. German top-right. Arabic bottom-left. Japanese bottom-right. Same page, same moment.

Hit the ✨ Magic Wand and all four panes translate at once — each into their own language. You can instantly spot:

  • Does German overflow the navbar? (German words are long. "Geschwindigkeitsbegrenzung" doesn't fit in a button meant for "Speed".)
  • Does Arabic RTL break the sidebar?
  • Does Japanese wrap correctly in that card component?

Fitting Four Desktop Websites on One Screen

Here's the problem: if you shrink an iframe to 25% width, the website thinks it's on mobile. Media queries fire. Hamburger menu appears. Layout switches to single column. You're not testing desktop anymore.

My solution: render each iframe at a virtual 1440px desktop width, then CSS transform: scale() to shrink the canvas:

const VIRTUAL_WIDTH = 1440;

useEffect(() => {
    const container = containerRef.current;
    if (!container) return;
    const observer = new ResizeObserver((entries) => {
        for (let entry of entries) {
            setDimensions({
                width: entry.contentRect.width,
                height: entry.contentRect.height,
            });
        }
    });
    observer.observe(container);
    return () => observer.disconnect();
}, []);

const scale = dimensions.width / VIRTUAL_WIDTH; // e.g., 0.48
const virtualHeight = dimensions.height / scale;
Enter fullscreen mode Exit fullscreen mode
<div
    className="absolute origin-top-left"
    style={{
        width: `${VIRTUAL_WIDTH}px`,
        height: `${virtualHeight}px`,
        transform: `scale(${scale})`
    }}
>
    <iframe src={proxyUrl} className="w-full h-full" />
</div>
Enter fullscreen mode Exit fullscreen mode

The website thinks it has 1440px of space. Renders full desktop layout. Media queries don't fire. Then we shrink the whole thing down like a thumbnail. It's like putting ships in bottles — the ship is built full-size, then somehow it's inside the bottle.

Architecture diagram

Per-Pane Independence

Every pane is independently controllable:

Matrix Mode — Per-Pane Controls
Each pane has its own language selector, JSON export, and "Open in Read Mode" button.

  • 🔤 Language selector — change any pane to any of 80+ languages independently
  • 📥 Download JSON — export that pane's translations as an i18n-ready locale file
  • ↗️ Open in Read Mode — pop any pane into full single-view for deeper work

Locale JSON Export

Each pane can export a clean JSON mapping you can drop straight into your locales/ folder:

{
  "Read more": "Leer más",
  "Subscribe to our newsletter": "Suscríbete a nuestro boletín",
  "About the author": "Sobre el autor",
  "Get started for free": "Comienza gratis"
}
Enter fullscreen mode Exit fullscreen mode

The export works through postMessage — each iframe packages its translation cache:

if (event.data.type === 'REQUEST_JSON_DOWNLOAD') {
    const exportData = {};
    translatedTexts.forEach((value, key) => {
        const original = originalTexts.get(key);
        if (original && value) {
            exportData[original] = value;
        }
    });
    window.parent.postMessage({
        type: 'JSON_DOWNLOAD_READY',
        payload: exportData,
        language: event.data.language
    }, '*');
}
Enter fullscreen mode Exit fullscreen mode

Translate a live website, export the JSON, drop it into your i18n pipeline. You just localized your app by browsing it.

Routing postMessage to the Correct Iframe

With four iframes sending translation requests simultaneously, you need to know which pane sent which message. Solution: compare event.source against all four stored iframe refs:

const activeIframeIndex = iframeRefs.current.findIndex(
    ref => ref?.contentWindow === event.source
);
const activeLang = matrixLanguages[activeIframeIndex];
Enter fullscreen mode Exit fullscreen mode

Instead of testing languages one at a time, you see all four side by side and catch layout breaks before they hit production.


3. 🛡️ The Layout Safety Inspector

Translation isn't just a text problem. It's a layout problem.

Translating "Speed" to "Geschwindigkeitsbegrenzung" changes the physical dimensions of the element. Buttons overflow. Flex containers wrap. CSS grids collapse.

Layout Safety Inspector

Red ⚠️ pulsing badge when translation causes overflow. Green ✅ badge when layout survived.

I built a micro-engine that runs a visual regression check after every translation:

// Before: snapshot dimensions
originalDimensions.set(id, {
    width: target.offsetWidth,
    height: target.offsetHeight,
    scrollWidth: target.scrollWidth,
    scrollHeight: target.scrollHeight
});

// After: wait 50ms for DOM reflow, then measure
setTimeout(() => {
    const isOverflowing = target.scrollWidth > target.offsetWidth;
    const heightGrowth =
        (newHeight - originalDims.height) / (originalDims.height || 1);
    const isWrappingError = heightGrowth > 0.5 && originalDims.height < 50;

    if (isOverflowing || isWrappingError) {
        target.classList.add('lingo-layout-error');
        window.parent.postMessage({
            type: 'LAYOUT_ERROR_DETECTED',
            id,
            errorType: isOverflowing ? 'Overflow' : 'Text Wrapping'
        }, '*');
    }
}, 50);
Enter fullscreen mode Exit fullscreen mode

When a translation breaks the layout, the element gets a pulsing red badge, dashed red outline, and a tooltip. When it survives? Subtle green ✅ on hover.

That 50ms timeout cost me an hour of debugging at 2 AM. The browser doesn't reflow text synchronously — if you measure offsetWidth right after changing innerText, you get the old dimensions. The delay lets the browser paint first. Tiny detail, but critical.

The inspector also avoids layout thrashing by reading all dimensional metrics in one pass before calling the translation APIs — keeps the main thread smooth even when batch-translating 40+ elements.

This turns LingoLens from just a translator into a localization QA tool. You don't just see the translation — you see whether it fits.


4. 🧠 Context-Aware AI Explanations

Google Translate tells you "彼はバケツを蹴った" means "He kicked the bucket." But did someone actually kick a bucket? Or did someone die? Without context, no idea.

When you select text in the proxied page, a floating toolbar appears with four actions:

AI Toolbar — Explain, Summarize, Simplify, Meaning

Select text → floating toolbar with four AI actions. Each uses surrounding context + page title.

  • ✨ Explain — What does this mean in context? (Gemini 1.5 Flash)
  • 📝 Summarize — TL;DR this paragraph
  • 🧒 Simplify — Explain it like I'm five
  • 📖 Meaning — Deep semantic/cultural analysis (ch.at raw HTTPS)

Context Gathering

The important part isn't calling Gemini — anyone can do that. It's what gets sent. When you highlight text, I don't just send the highlighted words. DOM traversal grabs the surrounding paragraph context (up to 800 chars) plus the page title:

const handleAction = (actionType) => {
    const selection = window.getSelection();
    const selectedText = selection.toString().trim();

    if (selectedText) {
        const anchorNode = selection.anchorNode;
        let context = "";
        if (anchorNode) {
            const parentBlock = anchorNode.parentElement?.closest(
                'p, div, h1, h2, h3, h4, h5, h6, li, article, section'
            );
            context = parentBlock
                ? parentBlock.innerText.substring(0, 800)
                : (anchorNode.parentElement?.innerText || "");
        }

        window.parent.postMessage({
            type: actionType === 'explain' ? 'EXPLAIN_REQUEST' :
                  actionType === 'summarize' ? 'SUMMARIZE_REQUEST' :
                  actionType === 'simplify' ? 'SIMPLIFY_REQUEST' : 'MEANING_REQUEST',
            selectedText,
            surroundingText: context,
            pageTitle: document.title
        }, '*');
    }
};
Enter fullscreen mode Exit fullscreen mode

On the server, Gemini gets a prompt that forces contextual answers — not generic dictionary definitions:

const prompt = `
You are an AI assistant helping a user understand a website while browsing.
Explain the meaning of the selected text STRICTLY in the context of this page.

RULES:
- Do NOT give a generic dictionary definition.
- Assume the user is a beginner.
- Keep it short (2–4 lines).
- Explain what it means *on this website*, not in general.

Page title: ${pageTitle}
Selected text: "${selectedText}"
Surrounding context: "${surroundingText}"
`;

const { text } = await generateText({
    model: google('gemini-1.5-flash'),
    prompt: prompt,
});
Enter fullscreen mode Exit fullscreen mode

Explanation Dialog

Glassmorphic explanation card in bottom-right. Shows selected text, loading skeleton, then AI result with a TTS "Listen" button.

Every explanation gets auto-saved to IndexedDB — building a personal vocabulary library as you browse.


5. 🔄 The AI Fallback Chain

What happens when Gemini is down? Or the API key runs out? The user doesn't care about infrastructure problems — they want an answer.

flowchart 4

The explain.ts action tries Gemini first, then falls back to ch.at using raw Node.js https.request():

export async function explainText(request: ExplanationRequest): Promise<ExplanationResponse> {
    try {
        // Tier 1: Gemini
        const { text } = await generateText({
            model: google('gemini-1.5-flash'),
            prompt: prompt,
        });
        return { success: true, explanation: text };

    } catch (error: any) {
        console.warn('Gemini failed, falling back to ch.at:', error.message);

        try {
            // Tier 2: Raw HTTPS to ch.at
            const payload = JSON.stringify({ q: fallbackPrompt, h: [] });
            const responseData = await new Promise<string>((resolve, reject) => {
                const req = https.request({
                    hostname: 'ch.at', port: 443, path: '/', method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Content-Length': Buffer.byteLength(payload)
                    }
                }, (res) => {
                    let data = '';
                    res.on('data', chunk => data += chunk);
                    res.on('end', () => resolve(data));
                });
                req.on('error', reject);
                req.write(payload);
                req.end();
            });

            // Parse "Q: ...\nA: answer" format
            let explanation = responseData;
            const answerIndex = responseData.indexOf('\nA: ');
            if (answerIndex !== -1) {
                explanation = responseData.slice(answerIndex + 4).trim();
            }
            return { success: true, explanation };
        } catch (fallbackError) {
            return { success: false, error: 'Failed using both Gemini and fallback.' };
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Four different AI server actions. Three use Gemini. One uses raw HTTPS. One has a dual-tier fallback. The user never sees an error.


6. 🪓 Cheerio DOM Surgery — Making Iframes Work

Fun fact: most of the modern web is designed to prevent exactly what I'm doing.

Websites set X-Frame-Options: DENY to block iframe embedding. They use CSP headers to prevent script injection. Relative URLs break on a different domain. SRI integrity hashes on stylesheets won't match the proxy.

My proxy route does full surgical DOM rewriting:

flowchart 5

Every src, href, srcset, CSS @import, CSS url(), and inline style reference gets rewritten to route through the proxy:

const proxyResource = (target: string) => {
    if (!target) return target;
    if (target.startsWith('data:') || target.startsWith('#')) return target;
    try {
        const absoluteUrl = new URL(target, baseUrl).toString();
        return `${origin}/api/proxy?url=${encodeURIComponent(absoluteUrl)}&resource=true`;
    } catch (e) {
        return target;
    }
};
Enter fullscreen mode Exit fullscreen mode

Then apply to everything:

// Stylesheets (+ strip SRI integrity hashes)
$('link[rel="stylesheet"]').each((_, el) => {
    $(el).removeAttr('integrity');
    $(el).removeAttr('crossorigin');
    $(el).attr('href', proxyResource($(el).attr('href')));
});

// Scripts
$('script[src]').each((_, el) => {
    $(el).removeAttr('integrity');
    $(el).removeAttr('crossorigin');
    $(el).attr('src', proxyResource($(el).attr('src')));
});

// Images (src AND srcset)
$('img').each((_, el) => {
    if ($(el).attr('src')) $(el).attr('src', proxyResource($(el).attr('src')));
    if ($(el).attr('srcset')) $(el).attr('srcset', proxySrcset($(el).attr('srcset')));
});

// Inline Styles (background-image: url(...))
$('*[style]').each((_, el) => {
    const style = $(el).attr('style');
    if (style && style.includes('url(')) {
        const newStyle = style.replace(/url\((['"]?)(.*?)\1\)/gi, (match, quote, path) => {
            if (!path || path.startsWith('data:') || path.startsWith('#')) return match;
            const absoluteUrl = new URL(path, baseUrl).toString();
            return `url(${quote}${origin}/api/proxy?url=${encodeURIComponent(absoluteUrl)}&resource=true${quote})`;
        });
        $(el).attr('style', newStyle);
    }
});

// Remove CSP meta tags
$('meta[http-equiv="Content-Security-Policy"]').remove();

// Inject base href + our script
if ($('base').length === 0) {
    $('head').prepend(`<base href="${url}">`);
}
$('body').append(`<script src="${origin}/translation-script.js"></script>`);
Enter fullscreen mode Exit fullscreen mode

CSS files also need rewriting — they can contain @import and url() references that chain recursively. A stylesheet imports three others, which import fonts, which reference SVGs. All need proxying.

The srcset attribute uses a particularly annoying format: "image-300.jpg 300w, image-600.jpg 600w, image.jpg 2x". Had to write a dedicated parser:

const proxySrcset = (srcset: string) => {
    if (!srcset) return srcset;
    return srcset.split(',').map(part => {
        const [url, descriptor] = part.trim().split(/\s+/);
        if (url) return `${proxyResource(url)} ${descriptor || ''}`.trim();
        return part;
    }).join(', ');
};
Enter fullscreen mode Exit fullscreen mode

Basically built a localized VPN inside a Next.js API route. Just to make iframes work.


7. 🔧 Ditching the SDK for Raw HTTPS

Next.js Server Actions have aggressive caching and edge-runtime fetch closures. When I used the Lingo.dev SDK normally, connections would hang indefinitely — especially when the source language was null (websites without a lang="" attribute on their <html> tag).

Fix: abandon fetch() entirely and use native Node.js https.request():

'use server'
import https from 'node:https';
import { unstable_noStore as noStore } from 'next/cache';

export async function translateMarkdown(
    text: string, sourceLanguage: string | null, targetLanguage: string
) {
    noStore();

    const translateWithRetry = async (text: string, retries = 3): Promise<string> => {
        try {
            const result = await new Promise<string>((resolve, reject) => {
                const postData = JSON.stringify({
                    params: { fast: false, workflowId: crypto.randomUUID() },
                    locale: {
                        source: sourceLanguage || 'auto',
                        target: targetLanguage
                    },
                    data: { text }
                });

                const req = https.request('https://engine.lingo.dev/i18n', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': `Bearer ${process.env.LINGODOTDEV_API_KEY}`,
                        'Content-Length': Buffer.byteLength(postData)
                    }
                }, (res) => {
                    if (res.statusCode && (res.statusCode < 200 || res.statusCode >= 300)) {
                        return reject(new Error(`Lingo API Error ${res.statusCode}`));
                    }
                    let rawData = '';
                    res.on('data', chunk => rawData += chunk);
                    res.on('end', () => {
                        const json = JSON.parse(rawData);
                        resolve(json.data.text);
                    });
                });
                req.on('error', reject);
                req.write(postData);
                req.end();
            });
            return result;
        } catch (err: any) {
            if (retries > 0) {
                const delay = 300 * (4 - retries);
                await new Promise(resolve => setTimeout(resolve, delay));
                return translateWithRetry(text, retries - 1);
            }
            throw err;
        }
    };

    const translated = await translateWithRetry(text);
    return { success: true, data: translated };
}
Enter fullscreen mode Exit fullscreen mode

flowchart 6

3-retry exponential backoff (300ms, 600ms, 900ms) made this rock-solid. Survives flaky 502s and network hiccups that would've killed a live demo.

For batch mode, texts are processed sequentially to avoid overwhelming the API. If any segment fails after 3 retries, it falls back to the original text instead of failing the whole batch.


8. 🎨 Dynamic Theme Adaptation

This one's pure polish. Nobody asked for it.

LingoLens has a dark glassmorphic UI. When you proxy a bright website, the contrast looks jarring. When you proxy a dark site, the chrome disappears.

The injected script detects the website's theme color and sends it to the React app:

function detectThemeColor() {
    // Priority 1: <meta name="theme-color">
    const meta = document.querySelector('meta[name="theme-color"]');
    if (meta) return meta.content;

    // Priority 2: Header/nav background
    const header = document.querySelector('header, nav, .header, .nav, [role="banner"]');
    if (header) {
        const bg = window.getComputedStyle(header).backgroundColor;
        if (bg !== 'rgba(0, 0, 0, 0)' && bg !== 'transparent') return bg;
    }

    // Priority 3: Body background
    const bodyBg = window.getComputedStyle(document.body).backgroundColor;
    if (bodyBg !== 'rgba(0, 0, 0, 0)' && bodyBg !== 'transparent') return bodyBg;

    return null;
}

setTimeout(() => {
    const themeColor = detectThemeColor();
    if (themeColor) {
        window.parent.postMessage({ type: 'THEME_COLOR_DETECTED', color: themeColor }, '*');
    }
}, 1000);
Enter fullscreen mode Exit fullscreen mode

On the React side, the color becomes a radial gradient glow applied to the page background and the browser chrome's boxShadow. The user probably never notices — but subconciously, it makes the reading experience feel cohesive. The app and website feel like one thing.


9. 🎛️ The Translation Panel

Translating text is step one. Managing translations is step two. LingoLens has a side panel that tracks every translation in real-time:

Translation Control Panel

Translation Panel slides in from right. Search, filter, edit inline, lock strings, listen with TTS, export JSON.

Each translation entry has rich metadata:

export interface TranslationEntry {
    original: string;
    translated: string;
    elementTag: string;
    isLocked: boolean;
    status: 'active' | 'modified';
    timestamp: number;
    layoutError?: boolean;
    errorType?: string;
}
Enter fullscreen mode Exit fullscreen mode

Every translation can be edited inline — type a new translation and it instantly updates in the proxied page via postMessage. Locking a translation means it won't be overwritten when you switch languages or re-translate. Locked strings get a gold dashed outline.

Actions per entry:

Action What it Does
Highlight Scrolls iframe to the element and flashes it
Lock/Unlock Prevents re-translation when switching languages
Edit Inline editing, Enter to save, Escape to cancel
Listen TTS in target language
Explain AI explanation for this entry
Revert Restores original text
Export Downloads all translations as JSON

Theres also a Vocabulary tab that shows AI explanations auto-saved from the current site.


10. 💾 Persistence with IndexedDB + React Query

Translating a long article takes time and tokens. If you close the tab and come back tomorrow, your translations should still be there.

Everything persists to IndexedDB (via idb) wrapped with TanStack React Query for reactive cache management:

interface LingoLensDB extends DBSchema {
    pages: {
        key: string;
        value: SavedPage;
        indexes: { 'by-url-lang': [string, string] };
    };
    vocabulary: {
        key: string;
        value: VocabularyEntry;
        indexes: { 'by-timestamp': number };
    };
}
Enter fullscreen mode Exit fullscreen mode

The hooks provide a clean API — no manual state management, automatic cache invalidation:

export function useSavedPages() {
    return useQuery({
        queryKey: ['savedPages'],
        queryFn: async () => {
            const db = await getDB();
            if (!db) return [];
            const pages = await db.getAll('pages');
            return pages.sort((a, b) => b.lastVisited - a.lastVisited);
        }
    });
}

export function useSavePage() {
    const queryClient = useQueryClient();
    return useMutation({
        mutationFn: async (page: SavedPage) => {
            const db = await getDB();
            if (!db) throw new Error("Database not available");
            const tx = db.transaction('pages', 'readwrite');
            const existing = await tx.store.get(page.id);
            if (existing) {
                await tx.store.put({ ...existing, ...page, lastVisited: Date.now() });
            } else {
                await tx.store.put(page);
            }
            await tx.done;
        },
        onSuccess: (_, variables) => {
            queryClient.invalidateQueries({ queryKey: ['savedPages'] });
            queryClient.invalidateQueries({
                queryKey: ['savedPage', variables.url, variables.targetLanguage]
            });
        }
    });
}
Enter fullscreen mode Exit fullscreen mode

flowchart 7

When you revisit a page, saved translations get re-injected into the DOM automatically. IndexedDB for durable storage, React Query for reactive cache — you get database persistence with the DX of useState.

Library — Saved Pages

Every translated page saved with URL, title, language, and timestamp. Click to revisit with translations restored.


11. 🔊 Text-to-Speech with Voice Matching

Translating is great but what if you want to hear how "Geschwindigkeitsbegrenzung" is pronounced?

LingoLens uses the Web Speech API with smart voice selection — it prioritizes natural-sounding voices:

export function playTextToSpeech(text: string, languageCode: string = 'en') {
    if (typeof window === 'undefined' || !window.speechSynthesis) return;

    const speak = () => {
        const utterance = new SpeechSynthesisUtterance(text);
        const voices = window.speechSynthesis.getVoices();
        const targetLangPrefix = languageCode.toLowerCase().split('-')[0];
        const langVoices = voices.filter(v =>
            v.lang.toLowerCase().startsWith(targetLangPrefix)
        );

        let bestVoice = langVoices.find(v =>
            v.name.includes('Google') ||
            v.name.includes('Online') ||
            v.name.includes('Natural') ||
            v.name.includes('Premium') ||
            v.name.includes('Siri')
        );

        if (!bestVoice) bestVoice = langVoices[0] || voices[0];
        if (bestVoice) {
            utterance.voice = bestVoice;
            utterance.lang = bestVoice.lang;
        } else {
            utterance.lang = languageCode;
        }

        utterance.rate = 0.9;
        window.speechSynthesis.cancel();
        window.speechSynthesis.speak(utterance);
    };

    // Chrome loads voices async — handle the race condition
    if (window.speechSynthesis.getVoices().length > 0) {
        speak();
    } else {
        window.speechSynthesis.onvoiceschanged = () => {
            speak();
            window.speechSynthesis.onvoiceschanged = null;
        };
    }
}
Enter fullscreen mode Exit fullscreen mode

Voice priority: Google > Online > Natural > Premium > Siri. Best voice on each platform — Google on Chrome, Siri on Safari, Microsoft on Edge.


The PostMessage Protocol

The React app and iframe live in different security contexts — everything goes through window.postMessage. I ended up with 21 message types:

flowchart 8

Its basically TCP over window.postMessage. Request-response patterns, broadcasting, state sync — all over a single event listener. If I built this again, I'd create a typed BridgeClient / BridgeServer abstraction. But for a hackathon? It works.


The Bugs That Almost Broke Me

🐛 The Null Source Language Hang

If a website didn't have lang="" on its <html> tag, we sent sourceLanguage: null to Lingo.dev. The API hung indefinitely. Next.js killed the socket, throwing UND_ERR_SOCKET errors.

Fix: Default to "auto" if source language is null. One line saved the project.

🐛 The 50ms Layout Race Condition

Layout inspector initially measured with 0ms timeout. Measured dimensions before browser reflowed. Every translation looked "layout safe" because we were measuring the old layout.

Fix: setTimeout(fn, 50). Let the browser paint first.

🐛 The srcset Parsing Nightmare

srcset format looks like someone rolled their face on a keyboard. My first regex parser broke on URLs with commas in query strings.

Fix: Split by comma, then split each part by whitespace, take first token as URL.

🐛 The Matrix Mode Identity Crisis

With four iframes, translation responses got routed to the wrong pane. French appeared in the Japanese quadrant.

Fix: Compare event.source against all four iframe contentWindow references.

🐛 Chrome's Async Voice Loading

Chrome loads TTS voices asynchronously. Call getVoices() immediately = empty array = default English voice for everything.

Fix: Listen for onvoiceschanged, then speak(). Remove listener after first fire to avoid leaks.


The Tech Stack

Layer Technology Why
Framework Next.js 16 (App Router) Server Actions, API routes, file routing
Language TypeScript Because any is not a personality type
Translation Lingo.dev API Fast, accurate, 80+ languages
AI Google Gemini 1.5 Flash Explain, summarize, simplify
AI Fallback ch.at (raw HTTPS) Meaning + Gemini fallback
HTML Parsing Cheerio Server-side DOM surgery
Styling Tailwind CSS 4 + shadcn/ui Glassmorphic UI
State TanStack React Query Async state, caching, mutations
Client DB IndexedDB (via idb) Offline page & vocab storage
TTS Web Speech API Browser TTS with voice matching
Animations React Bits (Squares) Animated background grid on landing
Injected Script Vanilla JS (600+ lines) Zero deps inside the iframe
Network Node.js https Raw HTTPS bypassing Next.js quirks

What I Learned

  1. Translate less, not more. Viewport-only isn't just cheaper — its faster. Users see translations in under a second instead of waiting 10+ seconds. Less work = better UX.

  2. The proxy is the product. All the AI features, Matrix Mode, Translation Panel — they're icing. The reverse proxy that makes websites load in iframes without breaking is the actual acheivement. Get that wrong and nothing else matters.

  3. PostMessage needs a type system. 21+ message types, typos in strings become real bugs. Next time I'd build a shared typed contract.

  4. 50ms matters. Browser reflow timing, voice loading, iframe identity — the hardest bugs were all about timing.

  5. Polish compounds. Theme adaptation, layout badges, the scaling trick — no single detail is a selling point. But together they make people say "wait, this is actually good."

  6. Always have a fallback. Gemini goes down. APIs 502. The dual-tier fallback saved me at least twice during dev.

  7. Vanilla JS inside iframes is non-negotiable. Can't inject React into a proxied website — it might already use React, or Vue, or jQuery. Zero-dependency vanilla JS only.


Try It Yourself

git clone https://github.com/pavitra0/LingoLens.git
npm install
Enter fullscreen mode Exit fullscreen mode

Create .env.local:

LINGODOTDEV_API_KEY=your_lingo_dev_key
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_key
FIRECRAWL_API_KEY=your_firecrawl_key
Enter fullscreen mode Exit fullscreen mode
npm run dev
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:3000. Paste any URL. Hit the ✨ Magic Wand. Watch only the visible text translate. Then switch to Matrix Mode and see it in four languages at once.

json
From paste to translation to Matrix Mode to JSON export.


Full Demo Flow

Built for the Lingo.dev Hackathon #2 with ❤️, viewport intersection checks, and the firm belief that translation should preserve beauty — not destroy it.


Top comments (0)