# 🕸️ How I Built a Modern Web Scraper with FastAPI & Next.js

The main link of the project is:- https://web-scraper-zdoy.vercel.app/

Web scraping is one of the most powerful techniques for gathering data from the internet, whether you’re a developer, researcher, or data enthusiast. In this post, I’ll walk you through what web scraping is, why it’s useful, and how I built my own Modern Web Scraper using FastAPI (Python) for the backend and Next.js (React/TypeScript) for the frontend. I'll also share my project structure, features, deployment approach, and tips for getting started!

💡 What is Web Scraping?

Web scraping is the process of automatically extracting information from websites. Instead of copying and pasting data manually, you can use code to fetch web pages and parse out the data you need. This is widely used for:

Market price monitoring
News aggregation
Research and academic data collection
SEO analysis (meta tags, headers, keywords)
Competitive intelligence
Archiving and more!

Note: Always respect a website’s robots.txt and Terms of Service. Scrape responsibly!

🛠️ Tech Stack

Backend: Python + FastAPI

FastAPI: Fast, modern web framework for building APIs
Requests: For making HTTP requests to target websites
BeautifulSoup: For parsing and extracting content from HTML
CORS Middleware: To allow frontend-backend communication
Deployed on Railway: Simple, free deployment for Python APIs

Frontend: Next.js + React + TypeScript

Next.js: Framework for server-rendered React apps (easy deployment, SEO-friendly)
TypeScript: Type safety for reliability
Tailwind CSS: Rapid UI styling
Deployed on Vercel: The best way to host Next.js apps

📁 Project Structure

My project is split into two main sections:

/backend         # FastAPI backend (main.py, requirements.txt)
/src/app         # Next.js frontend (page.tsx, layout.tsx, CSS)
/public          # Frontend static assets

✨ Key Features

Scrape any public website by entering its URL
CSS Selector support: Target specific elements (e.g. h1, .class, #id)
Extract all links or images from a page
Meta tag extraction: View meta, Open Graph, Twitter, and canonical tags
HTTP headers viewer: Inspect the response headers of any web page
Export results as TXT, CSV, or JSON
Configurable: Set timeout, User-Agent, follow links (crawl depth), and more
Modern UI: Responsive, clean, and easy to use
Privacy-friendly: No data is stored; all processing is local or via your backend

⚙️ How Does It Work?

Frontend: You enter a URL (and optionally a CSS selector) in the web app.
API Call: The frontend sends your request to the FastAPI backend.
Scraper: The backend fetches the page using requests, parses it with BeautifulSoup, and extracts the desired content, links, images, or meta tags.
Results: The data is sent back to the frontend for display, export, or further analysis.

🚦 How to Use the Modern Web Scraper

Enter a URL:

Example: https://example.com
(Optional) Add a CSS Selector:
- h1 for all h1 headings
- .product-title for elements with class "product-title"
- #main for the element with ID "main"
- Leave blank to get the entire HTML
Tweak the Config (Optional):
- Set request timeout (for slow websites)
- Change User-Agent (simulate different browsers)
- Enable "Follow Links" to crawl linked pages
- Enable "Include Metadata" to extract meta tags
Scrape:
- Click "Scrape" and see instant results in the UI
- Use "Meta Tags" and "Headers" buttons to inspect SEO and HTTP info
Export or Copy Results:
- Download as TXT, CSV, or JSON
- Copy to clipboard with one click

🚀 How to Deploy Your Own Version

Backend (Python/FastAPI)

Push your /backend folder to GitHub
Deploy on Railway (or Render, Heroku, Fly.io)
Use start command:

  uvicorn main:app --host 0.0.0.0 --port $PORT

Frontend (Next.js)

Push your code to GitHub
Deploy on Vercel
Set your backend API URL in .env.local:

  NEXT_PUBLIC_API_URL=https://your-backend.up.railway.app

⚠️ Limitations & Things to Know

JavaScript-heavy sites:

This scraper fetches static HTML only. If a website loads content with JavaScript (like most React/Angular sites), the scraped data may be missing. For full JS-rendered scraping, consider using Playwright or Selenium.
Bot Protection:

Some websites block scrapers using CAPTCHAs, rate limits, or IP bans. Always scrape ethically and responsibly.

💡 Lessons Learned & Next Steps

FastAPI + Next.js = modern, scalable, and fun to build!
Most scraping failures are due to JavaScript-heavy sites or anti-bot protections.
Next steps: Add Playwright support for JavaScript rendering, user authentication, and Docker for even easier deployments.