DEV Community

MorningInsights
MorningInsights

Posted on

How We Built a 5x Faster NotebookLM Watermark Tool on AWS — Engineering Story

How We Built a 5x Faster NotebookLM Watermark Tool on AWS

Every product has an origin story. Ours started with a frozen browser tab.

I was working on a research presentation, had exported a 45-page PDF from Google NotebookLM, and needed the watermark off before a meeting. I loaded the free browser tool, uploaded the file, and watched my MacBook's fan spin up as the browser processed page by page. Three minutes later — with my browser tab completely frozen — I had my clean PDF.

Three minutes for something a computer should be able to do in seconds. That was the moment I decided to build something better.


The Technical Problem

The existing tools all shared the same architecture: upload file → run JavaScript in the browser → download result.

That architecture has a fundamental ceiling. JavaScript in a browser is single-threaded (Web Workers help, but you're still bound by the user's CPU cores and browser overhead). A 30-page PDF means 30 sequential canvas operations. A 60-second MP4 at 30fps means 1,800 sequential frame operations.

The data doesn't lie:

File Browser (sequential) Why?
30-page PDF ~3 minutes 30 pages × ~6s/page
1-minute MP4 ~4 minutes 1,800 frames × ~0.13s/frame

The processing is inherently parallelizable. Each page of a PDF is independent. Each frame of a video is independent. The browser just can't parallelize them adequately.

Lambda can.


Architecture Decision: Why Lambda

I considered several server-side approaches:

Option A: Traditional server (EC2/VPS)

Pros: Simple. Cons: Scaling requires manual provisioning, cost for idle capacity, single server = single bottleneck.

Option B: Container-based (ECS/Fargate)

Pros: More control. Cons: Slower cold starts for parallel fan-out, more operational overhead.

Option C: Lambda

Pros: Instant horizontal scale, pay-per-use, no idle costs, AWS handles all infrastructure.

Cons: Cold starts, 15-minute max runtime (not an issue for our workloads), statelessness.

Lambda was the obvious choice. The key insight: Lambda's concurrency model maps perfectly to our parallelization need. For a 30-page PDF, I don't need 1 powerful server — I need 30 Lambda functions running simultaneously, each processing one page.


Building the Pipeline

Phase 1: Prototype (Days 1–3)

I started with a simple proof of concept:

# Worker Lambda (Python, 2048MB)
import fitz  # PyMuPDF
import boto3
import json

def handler(event, context):
    # Get page data from S3
    s3 = boto3.client('s3')
    page_data = s3.get_object(
        Bucket=event['bucket'], 
        Key=event['page_key']
    )['Body'].read()

    # Open single page PDF
    doc = fitz.open(stream=page_data, filetype="pdf")
    page = doc[0]

    # Detect and remove watermark
    clean_page = remove_watermark(page)

    # Save cleaned page back to S3
    output_key = event['output_key']
    s3.put_object(Bucket=event['bucket'], Key=output_key, Body=clean_page)

    return {'status': 'completed', 'output_key': output_key}
Enter fullscreen mode Exit fullscreen mode

The prototype worked but had a problem: my watermark detection was too aggressive — it was removing some content that visually resembled the watermark pattern.

Phase 2: Watermark Detection Algorithm (Days 4–7)

NotebookLM's watermark has consistent characteristics across exports:

  • Specific frequency signature (detectable via FFT)
  • Consistent spatial positioning
  • Fixed opacity range
  • Specific text/logo pattern

I ended up combining three detection methods:

Method 1: Template matching — The watermark has a known visual pattern. Template matching finds it with ~95% accuracy.

Method 2: Frequency-domain analysis — FFT reveals repeating patterns. The watermark's repetition shows up as distinct peaks in the frequency domain.

Method 3: Statistical anomaly detection — The watermark pixels have different statistical properties than document content pixels (color distribution, gradient patterns).

Using all three with confidence weighting gave me 99%+ detection accuracy with near-zero false positives.

Phase 3: The Orchestration Layer (Days 8–10)

With worker Lambdas working, I needed an orchestrator to:

  1. Accept the uploaded file
  2. Split it into chunks (pages for PDF, GOP segments for MP4)
  3. Store chunks in S3
  4. Invoke N worker Lambdas simultaneously
  5. Wait for all workers to complete (via SQS completion messages)
  6. Reassemble the output
  7. Return a signed CloudFront URL
# Orchestrator Lambda
import boto3
import json
import uuid

def handler(event, context):
    job_id = str(uuid.uuid4())
    input_key = event['input_key']

    # Determine file type and split
    if input_key.endswith('.pdf'):
        chunks = split_pdf_to_pages(input_key)
    elif input_key.endswith('.mp4'):
        chunks = split_video_to_gops(input_key)
    else:
        chunks = [input_key]  # Single image

    # Dispatch workers in parallel
    lambda_client = boto3.client('lambda')
    waiter_tasks = []

    for i, chunk_key in enumerate(chunks):
        output_key = f"processing/{job_id}/chunk_{i:04d}"
        response = lambda_client.invoke_async(
            FunctionName='nlms-worker',
            InvokeArgs=json.dumps({
                'bucket': BUCKET,
                'page_key': chunk_key,
                'output_key': output_key,
                'job_id': job_id,
            })
        )
        waiter_tasks.append(output_key)

    # Store job state in DynamoDB
    update_job_state(job_id, 'processing', total_chunks=len(chunks))

    return {'job_id': job_id, 'total_chunks': len(chunks)}
Enter fullscreen mode Exit fullscreen mode

The reassembly Lambda is triggered by an SQS queue that workers publish to upon completion. When all N chunks are complete (tracked in DynamoDB), it runs the assembly step.

Phase 4: The API Layer (Days 11–14)

I built the REST API on Next.js 14 (App Router) for the frontend and API Gateway + Lambda for the processing backend.

The frontend API routes handle:

  • Authentication (JWT validation)
  • Credit deduction (DynamoDB)
  • Presigned URL generation for direct-to-S3 uploads

The heavy processing never touches Next.js — it goes straight from the browser to S3 via presigned URL, then triggers the Lambda pipeline.


What Broke Along the Way

Problem 1: Lambda cold starts for first batch

When I deployed, the first request after idle always had a 2-3 second delay due to Lambda cold starts. Fixed with provisioned concurrency for the orchestrator Lambda (the workers are invoked enough that they warm up quickly).

Problem 2: Large MP4 reassembly timing out

The reassembly Lambda for large videos was hitting Lambda's 15-minute limit on very long videos. Fixed by pre-splitting reassembly into smaller merge trees.

Problem 3: Memory errors on large PDFs

100-page PDFs were causing memory issues in workers at 1024MB. Bumped worker Lambdas to 2048MB — this also improved CPU allocation (AWS allocates CPU proportionally to RAM).

Problem 4: S3 rate limiting on batch jobs

Submitting 50 files simultaneously caused S3 PUT throttling. Fixed with exponential backoff on uploads and distributing across multiple S3 prefixes.


Results

After two weeks of evenings and weekends:

Metric Before (browser) After (Lambda)
30-page PDF 2 min 58s 9s
1-min MP4 4 min 18s 23s
PNG/JPG 2–3s 3s
Batch (50 files) Not supported ~18s
Browser freeze Always Never

The 50-file batch benchmark still impresses me — 50 PDFs that would take 90+ minutes in a browser completing in 18 seconds.


Business Results (First 6 Weeks)

  • Users: ~180 signups
  • Pro subscribers: 12 ($9.99/month)
  • MRR: $119.88
  • Infrastructure cost: ~$8/month at current scale
  • Unit margin: healthy

The engineering investment was ~80 hours over two weeks. For anyone building something similar, Lambda + S3 is a remarkably productive stack for file processing SaaS.


What's Next

  • Webhook support: In progress — let API users get notified on completion without polling
  • S3 direct integration: Let users point to their S3 bucket directly, skip the upload step
  • Higher concurrency tiers: For enterprise customers with very high volume

Frequently Asked Questions

Q: What was the most technically challenging part?

A: The watermark detection algorithm. Getting it to reliably detect NotebookLM's specific watermark without false positives required combining multiple detection methods and tuning confidence thresholds carefully.

Q: Why Next.js for the frontend and not a simpler stack?

A: Next.js App Router gives us server-side rendering for SEO, easy API routes for the lightweight API layer, and TypeScript throughout. The file processing itself is in Python Lambdas, so the "full-stack JS" concern doesn't apply to the heavy compute.

Q: How do you handle Lambda cold starts in production?

A: Provisioned concurrency on the orchestrator Lambda (the entry point). Worker Lambdas warm up naturally with traffic. At current scale, cold starts affect fewer than 2% of requests.

Q: What's the infrastructure cost at scale?

A: Lambda costs are genuinely low for this workload — approximately $0.003–0.005 per PDF processed. Even at 10,000 PDFs/month, infrastructure is under $50. S3 and CloudFront add a similar amount.

Q: Would you use a different architecture if you started over?

A: Potentially Step Functions instead of custom orchestration for the fan-out coordination — it would simplify the state management. But Lambda + SQS works well and I understand it deeply, which matters for debugging.


Try the Result

→ NotebookLM Studio — 5x faster watermark removal, free to start

50 free credits. No credit card. Built on the architecture described above.

Top comments (0)