DEV Community

Cover image for I Added Legal Compliance Checks to My E2E Test Suite for €0. Here's How (and Why a Non-Developer Had To).
holistis
holistis

Posted on • Originally published at longevityai.nl

I Added Legal Compliance Checks to My E2E Test Suite for €0. Here's How (and Why a Non-Developer Had To).

Originally published on longevityai.nl — for full context, comments and related articles, visit the source.

I Added Legal Compliance Checks to My E2E Test Suite for €0. Here's How (and Why a Non-Developer Had To).

I am not a developer. I run a health AI platform in the Netherlands called Longevity AI. It generates personalized health reports using LLMs, auto-publishes blog posts about longevity, and operates under Dutch medical regulation: the IGJ (Health and Youth Care Inspectorate), Wet BIG, and GDPR.

That last part is the part most builders skip. I cannot.

If my platform publishes content that says "this supplement cures your condition" or "stop taking your medication," I am not looking at bad SEO. I am looking at an enforcement action by the IGJ. The Dutch health regulator has shut down platforms for exactly this kind of language. I have a file in my codebase called risks.md specifically about this.

Today I shipped something I think is useful for anyone building in a regulated space: a compliance scanner baked directly into my Playwright E2E test suite. It runs on every deploy. It costs €0. It uses no LLM.

Here is what it does and why I built it this way.

Illustratie bij I Added Legal Compliance Checks to My E2E Test Suite for €0. Here's How (and Why a Non-Developer Had To).

The problem: AI-generated content drifts toward prohibited language

My platform runs a nightly pipeline called Autopilot News Radar. It pulls from BBC Health, PubMed, EFSA, and ClinicalTrials, then generates three-language blog posts (NL/EN/FR) automatically. The posts go through a server-side compliance scanner before they are saved.

That scanner works. It has caught violations before they went live.

But I had a gap: no automated check was running against the actual live site. The server filter could pass a post, and some edge case in rendering, caching, or a manual publish could still get prohibited language onto the page. I would only discover that when a user flagged it -- or worse, a regulator did.

The solution: compliance.spec.ts

I added four Playwright tests that fetch real pages from the live site and run the same regex patterns the server uses.

// tools/muraqib/tests/compliance.spec.ts (simplified)

const HIGH_SEVERITY_PATTERNS = [
  { ruleId: "no-cure-claim",           pattern: /\bgeneest\s+(gegarandeerd|altijd|100%|volledig)/i },
  { ruleId: "no-cure-claim",           pattern: /\bwondermiddel\b/i },
  { ruleId: "no-cure-claim",           pattern: /\b100%\s+effectief\b/i },
  { ruleId: "no-behandel-imperatief",  pattern: /stop\s+met\s+(je|jouw|de)\s+(medicatie|medicijn)/i },
  { ruleId: "no-behandel-imperatief",  pattern: /\bdit\s+vervangt\s+(medicatie|behandeling)/i },
];

test("geen HIGH compliance-schendingen in live blogs", async ({ page }) => {
  await page.goto("/blog");
  await waitForReady(page);

  const links = await page.locator('a[href*="/blog/"]').evaluateAll((els) =>
    els.map((el) => (el as HTMLAnchorElement).href).filter(Boolean)
  );

  const violations: string[] = [];

  for (const url of [...new Set(links)].slice(0, 20)) {
    await page.goto(url);
    const text = await page.evaluate(() => document.querySelector("article")?.textContent ?? "");

    for (const { ruleId, pattern } of HIGH_SEVERITY_PATTERNS) {
      const match = text.match(pattern);
      if (match) violations.push(`[${ruleId}] "${match[0]}" op ${url}`);
    }
  }

  expect(violations, "HIGH compliance-schending(en) gevonden op live site").toHaveLength(0);
});
Enter fullscreen mode Exit fullscreen mode

No API calls. No LLM. A regex scan on the live rendered HTML, run by a real browser.

Why regex and not an LLM?

Because I need determinism and zero cost.

An LLM-based compliance check would:

  • Cost money on every deploy (I run 130 tests per deploy, nightly)
  • Be non-deterministic (same text, different answer on reruns)
  • Be a black box ("the AI said it was fine" is not an audit trail)

The patterns I check are not subtle. The IGJ is not looking for nuanced phrasing. They are looking for things like "genezingsclaims" (cure claims) and advice to stop medication. Regex handles this precisely and cheaply.

The double-layer architecture

[AI generates content]
        |
        v
[server/compliance/scanner.ts]      blocks HIGH violations before save
        |
        v
[database -- rendered on site]
        |
        v
[tools/muraqib/tests/compliance.spec.ts]   E2E test on live page, every deploy
Enter fullscreen mode Exit fullscreen mode

If the first layer fails (bug in server, manual override, edge case), the second layer catches it before the deploy completes. If the second layer catches something, GitHub Actions fails and I get notified.

Illustratie bij I Added Legal Compliance Checks to My E2E Test Suite for €0. Here's How (and Why a Non-Developer Had To).

The four compliance tests

Beyond the HIGH-severity check, the suite also runs:

  1. Blog overview loads -- sanity check that /blog renders with at least one article
  2. No HIGH violations -- cure claims and medication-stop advice on live pages
  3. No em-dashes in public text -- house style rule enforced automatically
  4. Affiliate disclosure present -- blogs with affiliate links must contain a disclosure sentence

That last one is a legal requirement in the Netherlands too. If I link to a supplement with an affiliate ref, I need to disclose it. The test checks that the disclosure text is present on any page that contains an affiliate link.

What it does NOT catch

I want to be honest about the limits.

MEDIUM-severity violations are manual. Vague implied efficacy claims are contextual. My server scanner flags them and logs them. A human (me) reviews the log weekly. Automated blocking on MEDIUM creates too many false positives.

It does not understand context. "Stop je medicatie" in a quote about why you should not stop medication would still trigger the pattern. I have handled this by writing content that avoids the phrase entirely.

It is not a legal opinion. These patterns are based on documented compliance research. They are not a substitute for a lawyer reviewing the content.

Also shipped today: AI Coach tests

Alongside the compliance spec, I also added coach.spec.ts -- eight tests for the AI Coach feature:

  • Login gate: unauthenticated users see a CTA, not the chat interface
  • Sending a message and receiving a response within timeout
  • Enter-key submission (this broke silently once before)
  • Disabled send button on empty input (prevents empty API calls)
  • Feedback buttons (thumbs up/down) working correctly
  • Back navigation to reports page
  • Zero JavaScript console errors during a full conversation

That last one is the test I trust most. A zero-console-error check on a full user flow catches a class of bugs no unit test finds.

Where Muraqib stands today

130 tests across: SEO, sitemap, robots.txt, blog rendering, anamnesis flow, lab analysis, onboarding, AI Coach, and compliance. The suite runs on GitHub Actions using Playwright 1.60.0 with the native Healer active -- it auto-repairs locators that break when the DOM changes.

Before today After today
Total tests 118 130
Playwright version 1.48 1.60.0
Compliance coverage Server only Server + E2E
AI Coach coverage None 8 tests
Healer Not active Active in CI

Every deploy runs all 130. A green check means a real browser walked through the platform and found nothing wrong. A red check means something broke before any user sees it.

Why ownership matters here

There are SaaS compliance monitoring tools. All of them put my legal patterns in a vendor's system, behind a subscription.

My patterns live in tools/muraqib/tests/compliance.spec.ts. They are in git. They have commit history. When the IGJ updates their enforcement guidance, I add a line and commit it. The next deploy runs the updated check.

That is the practical argument for owning your test infrastructure: the rules that matter most to your business should live where you can see them, change them, and audit them.


Longevity AI runs at longevityai.nl. If you are building a health SaaS in a regulated industry and want to compare notes, reach out via the site.


This article was originally published on Longevity AI. Visit the source for the full context, references and discussion.

Top comments (0)