Ira Rainey

Posted on Mar 7

A Consent Dialog Listed 1,467 Partners — So I Used AI to Unmask Them

#ai #agentframework #playwright #privacy

The Moment That Started It All

Someone sent me a link to an article about a bench on Bristol Live — a local UK news site — and when I clicked on it, a consent dialog popped up. Nothing unusual there. But something made me look closer at the fine print this time. The dialog was asking me to agree to share my data with 1,467 partners.

One thousand, four hundred and sixty-seven. For a story about a bench.

I'm not saying that advertising isn't important to allow businesses to generate revenue, but this felt a big much. Curious as to why, I tried to find out more. I clicked and scrolled through the partner list in the dialog, clicking into individual entries, reading purpose descriptions and "legitimate interest" declarations and quickly found myself deep in a rabbit hole. Hundreds of companies I'd never heard of, vague descriptions of data processing purposes, toggles nested inside toggles. After ten minutes I was no closer to understanding what any of these companies actually did with my data, or why a local news article needed nearly fifteen hundred of them. The dialog was technically giving me information, but in practice it told me almost nothing.

That's when I thought: there has to be a better way to find out more about what's going on. And the result is an open-source application called Meddling Kids.

The Illusion of Choice

A recent BBC article titled "We have more privacy controls yet less privacy than ever" hit the nail on the head. We're surrounded by cookie banners, privacy settings, and consent dialogs — yet somehow we end up with less privacy, not more. The article cites Cisco's 2024 Consumer Privacy Survey: 89% of people say they care about data privacy, but only 38% have actually done anything about it.

And honestly, can you blame the other 62%? The consent mechanism is designed to exhaust you into clicking "Accept". The alternative is scrolling through hundreds of partner names, deciphering purposes written in legalese, and toggling individual switches — all before you can read the article you came for - about a bench. Dr Carissa Veliz, author of Privacy is Power, put it well: "Mostly, people don't feel like they have control."

As a software engineer, that felt like an itch I should at least start to scratch. I figured if I could automate the process of visiting a site, accepting its consent dialog, and then capturing exactly what happens behind the scenes — cookies dropped, scripts loaded, network requests fired, storage written — maybe I could pull the mask off what's really going on.

Enter the Meddling Kids

Meddling Kids is a Scooby-Doo inspired privacy analysis tool, because they always unmasked the villain in the end. You give it a URL, it visits the site in a real browser, detects and dismisses the consent dialog, and then captures everything it sees: cookies, scripts, network traffic, localStorage, sessionStorage, and more. It then uses AI to analyse all of that data and produce a privacy report with a deterministic score out of 100.

The tech stack is a Vue 3 + TypeScript frontend with a Python FastAPI backend. Browser automation is handled by Playwright, running in headed mode on a virtual display (Xvfb) so that ad networks don't block it for being headless. Results stream to the UI in real time via Server-Sent Events.

But the interesting part is how AI is woven into pretty much every stage of the analysis doing what it is good at - analysing large amounts of data quickly.

AI All the Way Down

Vision Models for Consent Detection

The first challenge is detecting the consent dialog itself. These overlays vary wildly across sites — different consent management platforms, different layouts, different button labels. A brittle CSS selector approach wasn't going to cut it.

Instead, Meddling Kids takes a screenshot of the loaded page and sends it to a vision-capable LLM. The model looks at the screenshot and identifies whether an overlay is present, what type it is (consent dialog, paywall, sign-in prompt, etc.), and the exact text of the button to click. If the model is confident enough, Playwright clicks that button, and the tool captures a before-and-after comparison.

There's a fallback chain too: if the vision call times out or can't parse the dialog, a text-only LLM attempt runs against the page content, and if that also fails, a local regex parser takes over. No single point of failure.

Structured Analysis with the Microsoft Agent Framework

Under the hood, the analysis pipeline uses the Microsoft Agent Framework to orchestrate eight specialised AI agents. Each agent has a focused role — consent extraction, tracking analysis, script classification, cookie explanation, storage analysis, report generation, and summary findings — and they coordinate through a concurrent pipeline with controlled parallelism.

The structured report agent, for example, generates ten report sections in parallel, while a global semaphore limits concurrent LLM calls to avoid overwhelming the endpoint. Each agent uses structured output with JSON schemas and Pydantic models, so the responses are deterministic and parseable — no fragile prompt-and-pray string parsing.

The application has the ability to log everything to file so it can be analysed more closely, from operations logs, through Agent Framework threads, to final privacy reports.

The Pipeline

The whole analysis runs as a six-phase streaming pipeline over SSE, so results appear in the UI as they happen rather than after a long wait:

Navigation — Playwright opens an isolated browser context, navigates to the URL, and waits for the network to settle and content to render.
Page load and access check — Detects bot protection or access denied responses and bails out early if the site blocks us.
Initial data capture — Snapshots cookies, scripts, network requests, and storage before any consent interaction. This is the pre-consent baseline — anything captured here was tracking you before you clicked a thing.
Overlay handling — The vision model detects overlays, Playwright clicks through them, and a consent extraction agent pulls out partner lists, purposes, and CMP details. TC and AC consent strings are decoded and vendor IDs resolved against the IAB Global Vendor List and Google's ATP provider list.
Concurrent AI analysis — Three workstreams run in parallel: script grouping and classification, a structured ten-section privacy report, and a tracking risk analysis. Once the tracking analysis finishes, a summary agent distils everything into prioritised findings. A global semaphore caps concurrent LLM calls at ten to avoid hammering the endpoint.
Completion — The final privacy score, report, and summary stream back to the client.

Making Sense of the Data

A single news site analysis can surface hundreds of cookies, dozens of scripts, and thousands of network requests. No human is going to read through all of that manually, and that's exactly the point — the consent dialogs are counting on it.

The AI doesn't work in a vacuum though. Bundled with the tool are local databases sourced from public and permissively licensed sources that provide grounding context for the analysis — a form of RAG without a vector store. These include over 19,000 known tracker domains (from Privacy Badger, AdGuard, and EasyPrivacy), nearly 500 script URL patterns, the full IAB Global Vendor List (1,111 TCF vendors), Google's ATP provider list (598 providers), cookie and storage pattern databases, CMP platform signatures, 574 partner risk profiles across eight categories, and media group profiles for 16 UK publishers. This reference data is injected into agent prompts so the LLM can match what it finds against known entities rather than guessing — and it means a large chunk of the classification is deterministic before the model even gets involved.

The AI agents summarise what the tracking data actually means in plain language. They surface the risk: which cookies are from data brokers, which scripts are fingerprinting you, which network requests fire before you've even had a chance to consent. The tool also decodes IAB TCF consent strings (those opaque euconsent-v2 values) and Google's Additional Consent strings to show exactly which vendors and purposes are encoded.

Where possible every cookie, script, and network request is explained and attributed to the company behind it. This makes it very clear what is going on behind the scenes.

Perhaps most usefully for non-technical users, there's a "What You Agreed To" digest — a two to three sentence summary, written at roughly a 12-year-old reading level, explaining what clicking "Accept" actually meant. Something like: "By clicking Accept, you allowed 847 companies to track your browsing activity and share data about you, including with data brokers."

Smart Caching to Keep Costs Down

Running vision and language models isn't free, so the tool caches aggressively. Script analysis is cached by script domain, not by the site being scanned — so a Google Ads script analysed on one site is an instant cache hit when the same script appears on another. Overlay dismissal strategies are cached per domain too. In testing against a large news site, a cold run made 72 LLM script calls while subsequent warm runs made zero.

Try It Yourself

The whole thing is open source under AGPL-3.0, and you can pull a pre-built Docker image from GitHub Container Registry and have it running in minutes:

docker run -p 3001:3001 \
  -e AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ \
  -e AZURE_OPENAI_API_KEY=your-api-key \
  -e AZURE_OPENAI_DEPLOYMENT=your-deployment \
  ghcr.io/irarainey/meddlingkids:latest

It works with both Azure OpenAI and standard OpenAI — you just need to bring your own model with vision capabilities. I used gpt-5.2-chat for the main analysis and vision work, and gpt-5.1-codex-mini for script analysis. Point your browser at http://localhost:3001 and start unmasking.

If you prefer you can also clone the repo and run it locally with Python and Node in the devcontainer, or build the Docker image yourself using the docker compose file included docker compose up --build.

Everything you need to get going — setup, configuration options, Docker Compose, local development — is in the README on GitHub. There is also a comprehensive developer guide explaining how it all works.

What I Learned

Building this tool confirmed what I suspected: the scale of tracking on mainstream websites is genuinely staggering. Some UK news sites drop cookies before you've even interacted with the consent dialog. Scripts from dozens of advertising, analytics, and fingerprinting vendors fire tracking, selling, and sharing your data. Everything from the stories you read, your health and political interests, precise location, and device characteristics. If you're logged into social media on the same device then often this is also automatically shared with them too, driving the algorithms of what appears in your feed. The consent dialog is, in reality, an illusion of control over your data, wrapped in legal form.

Prof Alan Woodward from Surrey University, quoted in that BBC article, argues that when people assume they're constantly tracked, they self-censor, and that harms free speech and weakens democracy. It's a strong claim, but spend a few minutes watching the tracker graph light up on a typical news site and it starts to feel less academic.

I don't think the answer is purely technical. There are some great privacy tools out there you should be using, and together with better regulation, better enforcement, and a cultural shift around data privacy all matter more than any tool I can build. But as software engineers, we're in a unique position to make the invisible visible. If nothing else, Meddling Kids lets you see exactly what you're agreeing to — and maybe that's worth knowing before you click "Accept" next time.

Oh, and that Bristol Post article? When unmasked it scored 100 out of 100.
Zoinks!

The source code is on GitHub:
github.com/irarainey/meddlingkids

If you find it useful, give it a star. And if you run it against your own favourite news site, I'd love to hear what you find.

Top comments (1)

Alex Serebriakov • Apr 8

docker + chromium sandbox flags trip up so many people — the --no-sandbox situation is annoying

we moved screenshot generation to snapapi.pics and deleted the chromium docker setup entirely