DEV Community

Cover image for # 🕸️ How I Built a Modern Web Scraper with FastAPI & Next.js
gudhalarya
gudhalarya

Posted on

# 🕸️ How I Built a Modern Web Scraper with FastAPI & Next.js

The main link of the project is:- https://web-scraper-zdoy.vercel.app/

Web scraping is one of the most powerful techniques for gathering data from the internet, whether you’re a developer, researcher, or data enthusiast. In this post, I’ll walk you through what web scraping is, why it’s useful, and how I built my own Modern Web Scraper using FastAPI (Python) for the backend and Next.js (React/TypeScript) for the frontend. I'll also share my project structure, features, deployment approach, and tips for getting started!


💡 What is Web Scraping?

Web scraping is the process of automatically extracting information from websites. Instead of copying and pasting data manually, you can use code to fetch web pages and parse out the data you need. This is widely used for:

  • Market price monitoring
  • News aggregation
  • Research and academic data collection
  • SEO analysis (meta tags, headers, keywords)
  • Competitive intelligence
  • Archiving and more!

Note: Always respect a website’s robots.txt and Terms of Service. Scrape responsibly!


🛠️ Tech Stack

Backend: Python + FastAPI

  • FastAPI: Fast, modern web framework for building APIs
  • Requests: For making HTTP requests to target websites
  • BeautifulSoup: For parsing and extracting content from HTML
  • CORS Middleware: To allow frontend-backend communication
  • Deployed on Railway: Simple, free deployment for Python APIs

Frontend: Next.js + React + TypeScript

  • Next.js: Framework for server-rendered React apps (easy deployment, SEO-friendly)
  • TypeScript: Type safety for reliability
  • Tailwind CSS: Rapid UI styling
  • Deployed on Vercel: The best way to host Next.js apps

📁 Project Structure

My project is split into two main sections:

/backend         # FastAPI backend (main.py, requirements.txt)
/src/app         # Next.js frontend (page.tsx, layout.tsx, CSS)
/public          # Frontend static assets
Enter fullscreen mode Exit fullscreen mode

✨ Key Features

  • Scrape any public website by entering its URL
  • CSS Selector support: Target specific elements (e.g. h1, .class, #id)
  • Extract all links or images from a page
  • Meta tag extraction: View meta, Open Graph, Twitter, and canonical tags
  • HTTP headers viewer: Inspect the response headers of any web page
  • Export results as TXT, CSV, or JSON
  • Configurable: Set timeout, User-Agent, follow links (crawl depth), and more
  • Modern UI: Responsive, clean, and easy to use
  • Privacy-friendly: No data is stored; all processing is local or via your backend

⚙️ How Does It Work?

  1. Frontend: You enter a URL (and optionally a CSS selector) in the web app.
  2. API Call: The frontend sends your request to the FastAPI backend.
  3. Scraper: The backend fetches the page using requests, parses it with BeautifulSoup, and extracts the desired content, links, images, or meta tags.
  4. Results: The data is sent back to the frontend for display, export, or further analysis.

🚦 How to Use the Modern Web Scraper

  1. Enter a URL:

    Example: https://example.com

  2. (Optional) Add a CSS Selector:

    • h1 for all h1 headings
    • .product-title for elements with class "product-title"
    • #main for the element with ID "main"
    • Leave blank to get the entire HTML
  3. Tweak the Config (Optional):

    • Set request timeout (for slow websites)
    • Change User-Agent (simulate different browsers)
    • Enable "Follow Links" to crawl linked pages
    • Enable "Include Metadata" to extract meta tags
  4. Scrape:

    • Click "Scrape" and see instant results in the UI
    • Use "Meta Tags" and "Headers" buttons to inspect SEO and HTTP info
  5. Export or Copy Results:

    • Download as TXT, CSV, or JSON
    • Copy to clipboard with one click

🚀 How to Deploy Your Own Version

Backend (Python/FastAPI)

  • Push your /backend folder to GitHub
  • Deploy on Railway (or Render, Heroku, Fly.io)
  • Use start command:
  uvicorn main:app --host 0.0.0.0 --port $PORT
Enter fullscreen mode Exit fullscreen mode

Frontend (Next.js)

  • Push your code to GitHub
  • Deploy on Vercel
  • Set your backend API URL in .env.local:
  NEXT_PUBLIC_API_URL=https://your-backend.up.railway.app
Enter fullscreen mode Exit fullscreen mode

⚠️ Limitations & Things to Know

  • JavaScript-heavy sites:

    This scraper fetches static HTML only. If a website loads content with JavaScript (like most React/Angular sites), the scraped data may be missing. For full JS-rendered scraping, consider using Playwright or Selenium.

  • Bot Protection:

    Some websites block scrapers using CAPTCHAs, rate limits, or IP bans. Always scrape ethically and responsibly.


💡 Lessons Learned & Next Steps

  • FastAPI + Next.js = modern, scalable, and fun to build!
  • Most scraping failures are due to JavaScript-heavy sites or anti-bot protections.
  • Next steps: Add Playwright support for JavaScript rendering, user authentication, and Docker for even easier deployments.

💬 Try It Yourself!

Want to see it live or check out the code?
👉 GitHub repo


Questions or feedback?

Drop a comment below or DM me on Twitter [@draken1974]

Image description
Happy scraping! 🕷️

Top comments (0)