The main link of the project is:- https://web-scraper-zdoy.vercel.app/
Web scraping is one of the most powerful techniques for gathering data from the internet, whether you’re a developer, researcher, or data enthusiast. In this post, I’ll walk you through what web scraping is, why it’s useful, and how I built my own Modern Web Scraper using FastAPI (Python) for the backend and Next.js (React/TypeScript) for the frontend. I'll also share my project structure, features, deployment approach, and tips for getting started!
💡 What is Web Scraping?
Web scraping is the process of automatically extracting information from websites. Instead of copying and pasting data manually, you can use code to fetch web pages and parse out the data you need. This is widely used for:
- Market price monitoring
- News aggregation
- Research and academic data collection
- SEO analysis (meta tags, headers, keywords)
- Competitive intelligence
- Archiving and more!
Note: Always respect a website’s robots.txt
and Terms of Service. Scrape responsibly!
🛠️ Tech Stack
Backend: Python + FastAPI
- FastAPI: Fast, modern web framework for building APIs
- Requests: For making HTTP requests to target websites
- BeautifulSoup: For parsing and extracting content from HTML
- CORS Middleware: To allow frontend-backend communication
- Deployed on Railway: Simple, free deployment for Python APIs
Frontend: Next.js + React + TypeScript
- Next.js: Framework for server-rendered React apps (easy deployment, SEO-friendly)
- TypeScript: Type safety for reliability
- Tailwind CSS: Rapid UI styling
- Deployed on Vercel: The best way to host Next.js apps
📁 Project Structure
My project is split into two main sections:
/backend # FastAPI backend (main.py, requirements.txt)
/src/app # Next.js frontend (page.tsx, layout.tsx, CSS)
/public # Frontend static assets
✨ Key Features
- Scrape any public website by entering its URL
-
CSS Selector support: Target specific elements (e.g.
h1
,.class
,#id
) - Extract all links or images from a page
- Meta tag extraction: View meta, Open Graph, Twitter, and canonical tags
- HTTP headers viewer: Inspect the response headers of any web page
- Export results as TXT, CSV, or JSON
- Configurable: Set timeout, User-Agent, follow links (crawl depth), and more
- Modern UI: Responsive, clean, and easy to use
- Privacy-friendly: No data is stored; all processing is local or via your backend
⚙️ How Does It Work?
- Frontend: You enter a URL (and optionally a CSS selector) in the web app.
- API Call: The frontend sends your request to the FastAPI backend.
-
Scraper: The backend fetches the page using
requests
, parses it withBeautifulSoup
, and extracts the desired content, links, images, or meta tags. - Results: The data is sent back to the frontend for display, export, or further analysis.
🚦 How to Use the Modern Web Scraper
Enter a URL:
Example:https://example.com
-
(Optional) Add a CSS Selector:
-
h1
for all h1 headings -
.product-title
for elements with class "product-title" -
#main
for the element with ID "main" - Leave blank to get the entire HTML
-
-
Tweak the Config (Optional):
- Set request timeout (for slow websites)
- Change User-Agent (simulate different browsers)
- Enable "Follow Links" to crawl linked pages
- Enable "Include Metadata" to extract meta tags
-
Scrape:
- Click "Scrape" and see instant results in the UI
- Use "Meta Tags" and "Headers" buttons to inspect SEO and HTTP info
-
Export or Copy Results:
- Download as TXT, CSV, or JSON
- Copy to clipboard with one click
🚀 How to Deploy Your Own Version
Backend (Python/FastAPI)
- Push your
/backend
folder to GitHub - Deploy on Railway (or Render, Heroku, Fly.io)
- Use start command:
uvicorn main:app --host 0.0.0.0 --port $PORT
Frontend (Next.js)
- Push your code to GitHub
- Deploy on Vercel
- Set your backend API URL in
.env.local
:
NEXT_PUBLIC_API_URL=https://your-backend.up.railway.app
⚠️ Limitations & Things to Know
JavaScript-heavy sites:
This scraper fetches static HTML only. If a website loads content with JavaScript (like most React/Angular sites), the scraped data may be missing. For full JS-rendered scraping, consider using Playwright or Selenium.Bot Protection:
Some websites block scrapers using CAPTCHAs, rate limits, or IP bans. Always scrape ethically and responsibly.
💡 Lessons Learned & Next Steps
- FastAPI + Next.js = modern, scalable, and fun to build!
- Most scraping failures are due to JavaScript-heavy sites or anti-bot protections.
- Next steps: Add Playwright support for JavaScript rendering, user authentication, and Docker for even easier deployments.
💬 Try It Yourself!
Want to see it live or check out the code?
👉 GitHub repo
Questions or feedback?
Drop a comment below or DM me on Twitter [@draken1974]
Top comments (0)