Alan West

Posted on Apr 20

How to Prevent Email Leaks When Sharing Collaborative Docs Publicly

#security #api #privacy #webdev

So here's a fun one. You share a collaborative document publicly — a wiki page, a project roadmap, internal docs — and every editor's email address is silently exposed to anyone who knows where to look.

This isn't hypothetical. It just happened with Notion, where security researchers discovered that public pages leak the email addresses of every editor through the page's underlying API responses. And honestly? This is a pattern I've seen in multiple collaborative platforms over the years.

Let's talk about why this happens, how to check if you're affected, and how to actually prevent it — whether you're a developer building collaborative tools or just someone who shares docs publicly.

The Root Cause: API Responses That Say Too Much

Most collaborative editing platforms work the same way under the hood. The frontend renders a clean, pretty page. But the API response that feeds that frontend? It's usually packed with metadata the UI never displays.

Here's a simplified example of what a typical document API response might look like:

{
  "page": {
    "id": "abc-123",
    "title": "Q3 Product Roadmap",
    "public": true,
    "blocks": [ "...content..." ],
    "editors": [
      {
        "id": "user_01",
        "name": "Sarah Chen",
        "email": "sarah.chen@yourcompany.com",  // yikes
        "avatar": "https://...",
        "role": "admin"
      },
      {
        "id": "user_02",
        "name": "James Park",
        "email": "james.park@yourcompany.com",  // double yikes
        "avatar": "https://...",
        "role": "member"
      }
    ],
    "last_edited_by": "user_01",
    "workspace": {
      "name": "Acme Corp",
      "plan": "enterprise"  // now they know your org structure too
    }
  }
}

The UI might only show "Edited by Sarah C." but the raw API response hands over the full email, role, and workspace details. Anyone with browser dev tools open can see this. It's not even a hack — it's just reading the network tab.

This is a classic case of over-fetching at the API layer. The backend returns the full user object because it's convenient for authenticated views, and nobody thought to strip it down for public access.

How to Check If You're Exposed Right Now

Before we get into fixes, let's do some quick recon on your own public docs. This works for basically any web-based collaborative tool.

Step 1: Open your public page in an incognito window (so you're not authenticated).

Step 2: Open dev tools and check the Network tab. Filter by XHR/Fetch requests.

Step 3: Look through the JSON responses. Search for email, @, or any field that looks like PII.

# If you know the API endpoint, you can also just curl it
curl -s 'https://app.example.com/api/pages/abc-123' | \
  python3 -c "
import sys, json
data = json.load(sys.stdin)

def find_emails(obj, path=''):
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, str) and '@' in v and '.' in v:
                print(f'  {path}.{k}: {v}')
            find_emails(v, f'{path}.{k}')
    elif isinstance(obj, list):
        for i, v in enumerate(obj):
            find_emails(v, f'{path}[{i}]')

find_emails(data)
"

If that script prints anything, you've got a leak.

For Teams Using Collaborative Tools: Immediate Mitigation

If you can't control the platform's code, here's what you can do today:

Audit every public page. Seriously. Make a list. Most platforms let admins search for publicly shared content.
Use service accounts for public docs. Instead of editing public pages with personal accounts, create a generic publishing account. Only that account's metadata gets exposed.
Unpublish, edit, republish through a single account. Tedious, but it works if you need a page public and can't risk email exposure.
Use a custom domain with email aliases. If your team uses + addressing (like sarah+docs@company.com), at least you can identify which service leaked what.

But these are band-aids. The real fix is architectural.

Building It Right: API Response Shaping for Public Access

If you're building a platform with collaborative features and public sharing, here's how to not end up in a security researcher's tweet.

Use Separate Serializers for Public vs. Authenticated Responses

This is the single most important thing. Never reuse the same response shape for both contexts.

# Don't do this — one serializer for everything
class UserSerializer:
    fields = ['id', 'name', 'email', 'avatar', 'role']

# Do this — separate public and private serializers
class PublicUserSerializer:
    fields = ['display_name', 'avatar_hash']  # minimal, non-identifying

class AuthenticatedUserSerializer:
    fields = ['id', 'name', 'email', 'avatar', 'role']

# In your view/controller
def get_page(request, page_id):
    page = Page.objects.get(id=page_id)

    if page.is_public and not request.user.is_authenticated:
        # Strip everything sensitive
        serializer = PageSerializer(
            page,
            user_serializer=PublicUserSerializer  # no emails, no full names
        )
    else:
        serializer = PageSerializer(
            page,
            user_serializer=AuthenticatedUserSerializer
        )

    return Response(serializer.data)

Implement a Response Middleware That Catches Leaks

Defense in depth. Even if a developer forgets to use the right serializer, catch it at the edge.

import re

EMAIL_PATTERN = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')

class PIILeakDetectionMiddleware:
    """Scans outgoing public responses for accidental PII exposure."""

    def process_response(self, request, response):
        # Only check public/unauthenticated responses
        if request.user.is_authenticated:
            return response

        if response.get('Content-Type', '').startswith('application/json'):
            body = response.content.decode('utf-8')
            emails_found = EMAIL_PATTERN.findall(body)

            if emails_found:
                # Log it, alert on it, block it — your call
                logger.critical(
                    f"PII leak detected in public response: "
                    f"{request.path} contains {len(emails_found)} email(s)"
                )
                # In production, you might want to strip them or block the response
                # rather than just logging

        return response

Is regex-based email detection perfect? No. But it catches the obvious cases and it's saved me twice in production.

Add Integration Tests That Verify Public Responses Are Clean

This should be part of your CI pipeline:

def test_public_page_does_not_leak_emails(client, public_page):
    """Ensure unauthenticated page views contain zero email addresses."""
    response = client.get(f'/api/pages/{public_page.id}')
    body = response.json()
    body_str = json.dumps(body)

    # No email addresses anywhere in the response
    assert not re.search(
        r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
        body_str
    ), f"Public page response contains email addresses: {body_str[:200]}"

    # No internal user IDs either
    assert 'user_id' not in body_str
    assert 'workspace_id' not in body_str

The Bigger Principle: Treat Public Endpoints Like External APIs

Every time you flip a page from private to public, the access model changes completely. But most platforms treat it as a visibility toggle when it should be treated as an entirely different API surface.

I think about it this way: a public page should return roughly the same data you'd put in an RSS feed or a static HTML export. If you wouldn't put someone's email in an HTML meta tag, don't put it in a public API response.

A quick mental checklist before shipping any "make it public" feature:

What metadata travels with the content? Editor info, timestamps, comments, revision history — all of it.
Are user objects fully expanded or just referenced by opaque ID?
Does the response change shape based on authentication? It should.
Is there a test that fetches this endpoint unauthenticated and checks for PII?

Prevention Going Forward

Metadata leakage is one of those bugs that never shows up in functional testing because the feature technically works fine. The page loads, the content displays, everyone's happy — until someone opens the network tab.

The fix isn't complicated. It's just discipline:

Default to minimal data in API responses. Add fields explicitly, don't strip them reactively.
Separate your serialization layers by access context.
Test unauthenticated endpoints the same way you'd test a public API.
Run automated PII scanning on public response paths.

And if you're using a collaborative tool right now for public-facing docs, go check those network responses. You might be surprised what's in there.

DEV Community