Eugen

Posted on Apr 27

We Built a Chrome Extension With Clean Architecture. Here's Why It Was Worth the Extra Effort.

#chromeextension #saas #typescript #react

Most Chrome extension tutorials show you a popup with a counter. Click a button, increment a number, done. That's fine for learning the API.

But what if your extension needs OAuth with PKCE, multi-tenant team switching, content script injection into arbitrary web pages, offline detection, session refresh via background alarms, and role-based access control?

We built a Chrome extension for PaperLink - a document sharing and analytics platform. The extension lets you create secure share links for your documents and insert them directly into any text field on any page - Gmail compose, Slack message box, Google Docs, whatever you're typing in.

We used Clean Architecture for the whole thing. Here's what that looks like in practice, and why we'd do it again.

The stack

WXT (Vite-based extension framework) - handles manifest generation, HMR, multi-browser builds
React 19 with side panel UI
Tailwind CSS v4 for styling
Vitest for testing (50+ test files)
TypeScript 6 with strict mode

Why Clean Architecture for a browser extension?

Browser extensions are deceptive. They look small, but they run across 3 execution contexts (background service worker, side panel, content scripts), talk to external APIs over HTTP, store tokens in browser.storage, inject scripts into pages you don't control, and need to handle offline/expired/deactivated states gracefully.

If you put all of that in one file, you get something that works but can't be tested, can't be refactored, and breaks every time Chrome changes an API.

We split the extension into 4 layers:

extension/src/
  domain/          # Pure types, entities, no dependencies
  application/     # Use cases + port interfaces
  infrastructure/  # Chrome APIs, HTTP client, clipboard, storage
  presentation/    # React components + hooks
  entrypoints/     # WXT entry points (background.ts, sidepanel/App.tsx)

Domain: 7 files, zero dependencies

domain/
  entities/authSession.ts
  types/document.ts
  types/link.ts
  types/result.ts
  types/team.ts
  types/teamRole.ts
  types/apiErrors.ts

The domain layer has no imports from any other layer. No chrome.*, no React, no HTTP. Just TypeScript types and one entity class.

The AuthSession entity is the most interesting piece:

export class AuthSession {
  private constructor(
    private readonly accessToken: string,
    private readonly refreshToken: string,
    private readonly expiresAt: number,
    private readonly userId: string,
    private readonly activeTeamId: string,
  ) {}

  static create(params: {
    accessToken: string;
    refreshToken: string;
    expiresIn: number;
    userId: string;
    activeTeamId: string;
  }): AuthSession {
    return new AuthSession(
      params.accessToken,
      params.refreshToken,
      Date.now() + params.expiresIn * 1000,
      params.userId,
      params.activeTeamId,
    );
  }

  isExpired(): boolean {
    return Date.now() >= this.expiresAt;
  }

  requiresRefresh(): boolean {
    return Date.now() >= this.expiresAt - 60_000;
  }

  withActiveTeamId(teamId: string): AuthSession {
    return new AuthSession(
      this.accessToken,
      this.refreshToken,
      this.expiresAt,
      this.userId,
      teamId,
    );
  }

  toStorable(): StorableSession { /* ... */ }
  static fromStorable(data: StorableSession): AuthSession { /* ... */ }
}

The 60-second refresh buffer means the background worker can catch tokens before they expire. The withActiveTeamId method returns a new instance instead of mutating - immutability matters when the same session object flows through multiple async operations.

We also defined a tiny Result type that every use case returns:

export type Result<T, E extends string = string> =
  | { ok: true; value: T }
  | { ok: false; error: E }
  | { ok: false; error: 'cancelled' };

No exceptions crossing layer boundaries. Every failure is typed and explicit.

Application: 10 use cases, 6 ports

application/
  ports/
    iAuthClient.ts
    iBrowserTabs.ts
    iClipboard.ts
    iContentScriptInjector.ts
    iPaperLinkApi.ts
    iStorage.ts
  use-cases/
    copyLinkUseCase.ts
    createLinkUseCase.ts
    insertLinkUseCase.ts
    listDocumentsUseCase.ts
    listLinksUseCase.ts
    openInAppUseCase.ts
    refreshSessionUseCase.ts
    signInUseCase.ts
    signOutUseCase.ts
    switchTeamUseCase.ts

Ports are interfaces. Use cases depend on ports, never on concrete implementations. This is the core inversion.

Here's CreateLinkUseCase - the most complex one, because it orchestrates API call, clipboard copy, and optional content script insertion:

export class CreateLinkUseCase {
  constructor(
    private readonly api: IPaperLinkApi,
    private readonly clipboard: IClipboard,
    private readonly contentScriptInjector: IContentScriptInjector,
  ) {}

  async execute(params: CreateLinkParams): Promise<Result<CreateLinkResult, CreateLinkError>> {
    if (params.teamRole === TEAM_ROLE_MEMBER) {
      return error('member_not_allowed');
    }

    // ... build input from params ...

    let link;
    try {
      link = await this.api.createLink(params.accessToken, input);
    } catch (err: unknown) {
      if (err instanceof Error && err.message === API_ERROR_FORBIDDEN) {
        return error('document_not_found');
      }
      return error('create_failed');
    }

    try {
      await this.clipboard.writeText(link.url);
    } catch {
      return error('clipboard_failed');
    }

    let inserted = false;
    if (params.mode === 'copyAndInsert' && params.activeTabId != null) {
      try {
        const result = await this.contentScriptInjector.insertAtCursor(
          params.activeTabId, link.url
        );
        inserted = result.inserted;
      } catch { /* clipboard already has the URL as fallback */ }
    }

    return success({ url: link.url, inserted });
  }
}

Notice what's NOT in this use case: no chrome.scripting.executeScript, no navigator.clipboard, no fetch. It doesn't know those exist. It talks to three interfaces and returns a typed result.

When content script injection fails (restricted page, no focused field, Chrome blocking it), the URL is already on the clipboard. The user still gets their link.

Infrastructure: where Chrome lives

infrastructure/
  api/paperLinkApiClient.ts
  auth/chromeAuthAdapter.ts
  clipboard/navigatorClipboardAdapter.ts
  content-scripts/rangeInsertionAdapter.ts
  storage/chromeStorageAdapter.ts
  config/appConfig.ts
  i18n/i18nProvider.tsx

Each adapter implements one port. The ChromeAuthAdapter is the OAuth implementation with PKCE:

export class ChromeAuthAdapter implements IAuthClient {
  async signIn(): Promise<AuthSession> {
    const codeVerifier = generateCodeVerifier();
    const codeChallenge = await generatePkceChallenge(codeVerifier);
    const state = generateState();
    const redirectUri = `https://${browser.runtime.id}.chromiumapp.org/`;

    const authUrl = new URL(`${APP_BASE_URL}/en/auth/extension-login`);
    authUrl.searchParams.set('client_id', CLIENT_ID);
    authUrl.searchParams.set('redirect_uri', redirectUri);
    authUrl.searchParams.set('code_challenge', codeChallenge);
    authUrl.searchParams.set('code_challenge_method', 'S256');
    authUrl.searchParams.set('state', state);

    const responseUrl = await browser.identity.launchWebAuthFlow({
      url: authUrl.toString(),
      interactive: true,
    });

    // ... validate state, extract code ...

    const tokenResponse = await this.exchangeCode(code, codeVerifier, redirectUri);
    return AuthSession.create({ /* ... */ });
  }
}

PKCE in a browser extension isn't optional - extensions can't keep client secrets. The browser.identity.launchWebAuthFlow opens a Chrome-managed auth window, and the extension receives the authorization code via redirect URL.

The content script injector is the most unusual adapter. It uses browser.scripting.executeScript to run code inside the active tab:

export class RangeInsertionAdapter implements IContentScriptInjector {
  async insertAtCursor(tabId: number, text: string): Promise<InsertResult> {
    const results = await browser.scripting.executeScript({
      target: { tabId },
      func: insertTextAtCursor,
      args: [text],
    });

    const result = results[0];
    if (result?.result === true) {
      return { inserted: true };
    }
    return { inserted: false, reason: 'no_editable_focus' };
  }
}

The injected function handles three cases: <input>, <textarea>, and contenteditable elements. It uses setRangeText for form fields and the Selection API for rich text editors. This covers Gmail, Slack, Notion, Google Docs, and most web apps.

Background service worker: 38 lines

export default defineBackground(() => {
  const storage = new ChromeStorageAdapter();
  const authClient = new ChromeAuthAdapter();
  const refreshUseCase = new RefreshSessionUseCase(authClient, storage);

  void browser.sidePanel.setPanelBehavior({ openPanelOnActionClick: true });
  void browser.alarms.create(ALARM_NAME, { periodInMinutes: 10 });

  browser.alarms.onAlarm.addListener((alarm) => {
    if (alarm.name !== ALARM_NAME) { return; }
    void (async () => {
      const stored = await storage.get<StorableSession>(STORAGE_KEY_AUTH_SESSION);
      if (!stored) { return; }
      const session = AuthSession.fromStorable(stored);
      if (session.requiresRefresh()) {
        await refreshUseCase.execute();
      }
    })();
  });
});

Every 10 minutes, the alarm fires. If there's a stored session and it's within the 60-second refresh window, refresh it silently. The user never sees an expired token.

This is only 38 lines because all the logic lives in RefreshSessionUseCase. The background script is just wiring.

Architecture tests guard the layers

We wrote tests that fail if any layer imports from a forbidden layer:

const FORBIDDEN_DOMAIN = ['@/application', '@/infrastructure', '@/presentation'];

describe('Domain - no upward imports', () => {
  it.each(domainFiles)('%s has no forbidden imports', file => {
    const imports = readFileImports(file);
    for (const imp of imports) {
      for (const forbidden of FORBIDDEN_DOMAIN) {
        expect(imp.startsWith(forbidden)).toBe(false);
      }
    }
  });

  it.each(domainFiles)('%s has no chrome references', file => {
    const content = readFileContent(file);
    expect(content).not.toMatch(/\bchrome\./);
  });
});

The test parses actual import statements from source files and checks them against a forbidden list per layer. Domain can't import from application, infrastructure, or presentation. Application can't import from presentation or infrastructure adapters. Presentation can't import infrastructure adapters directly.

These tests catch violations at CI time, not during code review.

What we actually shipped

The Chrome extension has:

OAuth with PKCE and team selection (multi-tenant)
Document list with cursor-based pagination and search
Link creation with optional password and expiry
"Create & Copy" and "Create & Insert in Page" actions
View existing links per document with view counts
Role-based UI (Members see read-only view)
Offline detection with banner
Session auto-refresh via background alarms
Stale session recovery
i18n (multiple locales)
Accessibility (ARIA labels, screen reader announcements)

All of that in ~60 source files and ~50 test files.

The tradeoffs

More files. A flat extension would be maybe 10 files. We have 60. Every new feature touches 3-4 layers.

Composition root in App.tsx. WXT doesn't have a DI container, so all adapters are instantiated at the top of the entry point and passed down. It's explicit, it works, but it's verbose.

Infrastructure adapters vary in size. The clipboard adapter is 11 lines, storage is 24. The auth adapter is 168 lines because PKCE is inherently complex, and the API client is 194 lines because it maps every endpoint. But the real business logic still lives in use cases and domain - adapters are just plumbing. Every use case can be tested with fake implementations of the ports - no browser.* mocking needed.

Chrome Web Store review: approved in 24 hours

We submitted v1 as a minimal release to get through the review process quickly. The extension was live in the Chrome Web Store within 24 hours.

From the review logs, we could see a Google tester spent about 5 minutes on the platform - signed up, created a document, shared a link, tried the extension. That was enough. The permission justifications were clear because each Chrome API (storage, identity, scripting, activeTab, alarms) maps to exactly one infrastructure adapter with a single responsibility.

We didn't get a single rejection or follow-up question. For an extension requesting scripting and activeTab permissions - which Chrome reviews carefully - that's not typical.

We've already shipped v1.1.0 with improvements, and more updates are coming. If you try the current version and it feels rough around the edges - check back in a few days. We're iterating fast.

Screenshots for the Chrome Web Store listing

The CWS listing needs 1280x800 screenshots. We took actual screenshots of the extension in use and processed them with AI image generation - cropping the extension panel, placing it on a clean white background, and adding context. This gave us polished marketing screenshots without a designer.

Would we do it again?

Yes. Three reasons:

Testing. Every use case is tested against port interfaces, not Chrome API mocks. When Chrome changes their scripting API, we change one adapter. Tests don't move.
The "Insert in Page" feature. This required coordinating between the side panel (React), background worker (alarms), content scripts (DOM injection), and the API. Without clear boundaries, this would have been a debugging nightmare. With ports and use cases, each piece is isolated and testable.
The review process. Clean Architecture made CWS review trivial. Each permission maps to one adapter, each adapter implements one port, each port serves specific use cases. When a reviewer asks "why do you need scripting?", the answer is one file: RangeInsertionAdapter.ts, 58 lines, used by InsertLinkUseCase to paste share links into focused text fields. That's it.

Top comments (2)

Stjepan • May 1

Interesting post, thanks for sharing! I like clean architecture, AI agents can help with verbosity.

Eugen • May 3

Thanks!)