Mateus Oliveira

Posted on Apr 16

I Built a Library of 7,559 Landing Page Designs — Here's What I Learned About Curation vs Volume

#design #productivity #sideprojects #ui

Every client project starts the same way. You open Dribbble. You type "SaaS landing page." You scroll. And scroll. And scroll.

Three hours later you have 47 tabs open, zero direction, and a client who needs mockups by tomorrow. I got sick of it, so I started saving pages locally. Nine months later I have 7,559 indexed landing pages, a taxonomy I never planned to build, and a side project that accidentally became a product.

Here's what I actually learned.

How it started: a folder of screenshots

I'm a solo dev based in Portugal, running a small software shop called CMTecnologia. Most of our work is web — SaaS dashboards, e-commerce, agency sites. The pattern was always the same: before writing any code, I needed to study what good looks like. Not Figma concepts or Behance showpieces — real, deployed, live pages.

So I wrote a simple Node script that took a URL, fired up Puppeteer, grabbed a full-page screenshot, saved the metadata (title, description, URL, category), and dropped it into a folder.

// The original scraper was embarrassingly simple
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setViewport({ width: 1440, height: 900 });
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });

const screenshot = await page.screenshot({ fullPage: true });
const title = await page.title();
const description = await page.$eval(
  'meta[name="description"]',
  el => el.getAttribute('content')
).catch(() => '');

await saveTemplate({ url, title, description, screenshot, category });

Nothing fancy. I ran it whenever I found a page I liked — from newsletters, Twitter threads, "best landing pages" roundups, competitor research. Over a few months it grew from 50 pages to 500, then 2,000, then suddenly I had 7,500+ and realized I was sitting on something useful.

The scraping challenges nobody warns you about

If you think scraping is just fetch and cheerio, you're in for a surprise. At scale, even a modest 7,500 pages exposed problems I didn't expect.

Deduplication is non-trivial

URLs aren't unique identifiers. The same page lives at example.com, example.com/, example.com/index.html, and www.example.com. Some pages redirected, some had query params, some had localized variants. I ended up normalizing URLs and then running perceptual hashing on the screenshots to catch visual duplicates:

function normalizeUrl(raw: string): string {
  const url = new URL(raw);
  url.hash = '';
  url.searchParams.delete('utm_source');
  url.searchParams.delete('utm_medium');
  url.searchParams.delete('utm_campaign');
  url.searchParams.delete('ref');
  // Strip trailing slash
  url.pathname = url.pathname.replace(/\/+$/, '') || '/';
  return url.toString().toLowerCase();
}

Even after normalization, about 12% of my initial set turned out to be near-duplicates. Perceptual hashing (average hash on downscaled thumbnails) caught most of them, but some required manual review.

Thumbnail generation at scale

Full-page screenshots are huge — 3-8 MB each. I needed thumbnails for the gallery. Running Sharp on 7,500 images sounds simple until you hit memory limits, corrupt PNGs from pages that didn't fully load, and the sheer I/O of processing gigabytes of images on a VPS with 4GB of RAM:

import sharp from 'sharp';

async function generateThumbnail(
  screenshotBuffer: Buffer,
  outputPath: string
): Promise<void> {
  await sharp(screenshotBuffer)
    .resize(640, null, { 
      fit: 'inside',
      withoutEnlargement: true 
    })
    .webp({ quality: 78 })
    .toFile(outputPath);
}

I processed them in batches of 50 with a concurrency limiter, and it still took about 4 hours for the full set.

Some pages fight back

Cloudflare challenges, consent modals, cookie banners, interstitial ads, client-side rendered SPAs that show nothing until JavaScript runs. About 8% of URLs just didn't produce useful screenshots. I learned to detect empty/broken screenshots by checking the image entropy — pages that were mostly white or mostly a single color got flagged for manual retry.

Curation > Volume (the real lesson)

Here's the thing nobody tells you when you're building a template library: nobody wants 8,000 templates. That sounds counterintuitive — more should be better, right?

Wrong. When I showed early versions to designer friends, the feedback was consistent:

"There's too much here. I can't find anything. Show me the 200 best SaaS pages."

This was humbling. I had spent months collecting data, and the value wasn't in the volume — it was in the selection. So I built a curation layer: I went through the collection manually (yes, all of it, over two weeks) and flagged the best ~2,200 templates. These became the "curated" set — the ones I'd actually reference for client work.

The remaining ~5,300 stay in the library for completeness, but the curated set is front and center. It's a smaller, opinionated collection versus the full archive.

Takeaway for anyone building a resource library: Your curation is the product, not your scraper. Anyone can scrape 10,000 pages. Not everyone can tell you which 2,000 are worth studying.

The taxonomy challenge: 11 categories, endless edge cases

Categorizing landing pages sounds easy until you try it. Is Notion a SaaS page or a productivity page? Is Stripe a fintech page or a developer tools page? Is a page selling an AI writing tool filed under "AI" or "SaaS"?

I settled on 11 categories after iterating through about 6 different taxonomies:

SaaS
AI / ML
Fintech
E-commerce
Agency / Studio
Portfolio
Course / Education
Event / Conference
Mobile App
Developer Tools
Other

The key insight: categories should match how people search, not how products define themselves. Developers looking for inspiration type "SaaS landing page" into the search bar — they don't type "B2B horizontal platform." Keep taxonomy user-centric.

I also added keyword tags for finer-grained filtering (dark mode, minimal, bold typography, illustration-heavy, etc.), which turned out to be more useful than the categories themselves for quick browsing.

The tech stack (for the devs in the room)

The gallery app itself is built on:

Next.js 16 with the App Router — this was actually my first production project with Next 16. The new Proxy API for middleware was genuinely useful for the paywall gating.
Tailwind CSS v4 — the new engine is noticeably faster in dev mode. No config file needed, just @import "tailwindcss" in your CSS.
Supabase for auth (magic link — no passwords, higher conversion) and Postgres on a self-hosted VPS for the data layer.
Stripe Checkout for payments.
Zod for all input validation and API schema enforcement.

The architecture is intentionally simple. The template data lives in a static manifest file (~3.9 MB JSON). No CMS, no admin panel, no complex data pipeline. At this scale, a single JSON file served from the edge is faster than any database query.

// Simplified: how the gallery page loads templates
import manifestData from '@/data/manifest.json';

type Template = {
  id: string;
  url: string;
  title: string;
  description: string;
  category: string;
  tags: string[];
  thumbnailPath: string;
  curated: boolean;
};

export function getTemplates(options: {
  category?: string;
  curated?: boolean;
  search?: string;
  limit?: number;
  offset?: number;
}): Template[] {
  let results = manifestData as Template[];

  if (options.curated) {
    results = results.filter(t => t.curated);
  }
  if (options.category) {
    results = results.filter(t => t.category === options.category);
  }
  if (options.search) {
    const q = options.search.toLowerCase();
    results = results.filter(t =>
      t.title.toLowerCase().includes(q) ||
      t.description.toLowerCase().includes(q) ||
      t.tags.some(tag => tag.includes(q))
    );
  }

  return results.slice(
    options.offset ?? 0, 
    (options.offset ?? 0) + (options.limit ?? 50)
  );
}

For the paywall, I use Supabase auth middleware. Free users see 50 templates. Pro users get the full library plus the curated filter and high-res screenshots. Stripe webhooks keep the subscription status in sync:

// Stripe webhook handler (simplified)
export async function POST(request: Request) {
  const body = await request.text();
  const signature = request.headers.get('stripe-signature')!;

  const event = stripe.webhooks.constructEvent(
    body,
    signature,
    process.env.STRIPE_WEBHOOK_SECRET!
  );

  switch (event.type) {
    case 'checkout.session.completed': {
      const session = event.data.object;
      await supabase
        .from('subscriptions')
        .upsert({
          user_id: session.metadata?.user_id,
          stripe_customer_id: session.customer,
          plan: session.mode === 'payment' ? 'lifetime' : 'pro',
          status: 'active',
        });
      break;
    }
    case 'customer.subscription.deleted': {
      await supabase
        .from('subscriptions')
        .update({ status: 'cancelled' })
        .eq('stripe_customer_id', event.data.object.customer);
      break;
    }
  }

  return new Response('ok', { status: 200 });
}

Lifetime pricing converts 3x better than subscriptions (at launch)

This was the most surprising learning. I launched with three options:

Plan	Price
Pro Monthly	EUR 19/mo
Pro Annual	EUR 149/yr
Lifetime (first 100)	EUR 49 one-time

My assumption was that monthly would be the most popular — low commitment, easy to cancel. I was completely wrong. The lifetime plan outsold monthly by about 3:1 in the first week.

Why? A few theories:

Indie hackers and developers hate recurring charges. We build SaaS but we don't like paying for it. A one-time price removes the mental overhead of "is this still worth EUR 19 this month?"
Scarcity works. "First 100 customers" creates genuine urgency. People don't want to miss the best price.
The math is obvious. EUR 49 lifetime vs EUR 19/month — anyone who plans to use it for 3+ months does the math instantly.

Takeaway: If you're launching a resource/reference product (not a tool that requires ongoing compute), consider leading with lifetime pricing. You sacrifice LTV for velocity, but early revenue and social proof matter more than optimizing for MRR in month one.

What I'd do differently

Start with curation, not volume. If I did this again, I'd curate 500 exceptional pages first and ship that. The 7,000+ total is impressive for marketing copy but the curated subset is where all the value lives.

Build the taxonomy early. I retroactively categorized thousands of pages. If I'd established the category system at page 100, I would have saved weeks.

Don't underestimate storage and CDN costs. 7,500 high-resolution screenshots plus thumbnails is about 67 GB. That's not trivial to host. I ended up using Cloudflare Tunnel to a VPS, which keeps costs manageable, but if you're starting fresh, consider a tiered storage approach from day one.

The product

If any of this sounds useful, the library is live at templates.cmtecnologia.pt. The free gallery shows 50 curated templates — no account needed. If you want the full 7,559 with search, filters, and the curated collection, Pro is EUR 49 lifetime while the early-bird slots last.

It's a reference library, not a code marketplace. Every card links to the original live page. The idea is to replace the 3-hour Dribbble scroll with a 10-minute focused search.

Wrapping up

Building this taught me that the hardest part of a data product isn't collecting data — it's making it useful. Curation, taxonomy, and presentation matter more than raw volume. And if you're launching something for the indie/developer audience, lifetime pricing is worth testing as your primary offer.

I'm still iterating — adding new pages weekly, improving the search, and working on tag-based filtering. If you have feedback or feature requests, I'd genuinely love to hear them in the comments.

I'm Mateus, solo dev at CMTecnologia in Portugal. I build web products and occasionally write about the process. You can find me on GitHub.

DEV Community