This is a submission for the Bright Data Web Scraping Challenge: Scrape Data from Complex, Interactive Websites
What I Built
This project uses Brightdata to scrape data from Amazon and return the data output on the page. You can search anything you want and expect to see it load on the page as long what you search can be found on amazon.
Demo
The project uses two different GitHub repo. One for the frontend the other for the backend.
- You can find the live 🔥  demo here.
- Github Repo Frontend
- GitHub Repo Backend
How I Used Bright Data
Project is built using bright data.
I used Brightdata Scraping browser to retrieve the data set from amazon.
import 'dotenv/config'
import { Router } from 'express';
import puppeteer from 'puppeteer-core';
import process from 'node:process';
const router = Router();
// Scraping logic using Puppeteer and BrightData
const scrapeData = async (searchTerm) => {
const BROWSER_WS = process.env.BROWSER_WS; // set your bright data proxy credential here
const URL = "https://www.amazon.com";
const browser = await puppeteer.connect({
browserWSEndpoint: BROWSER_WS,
});
// ... some code here
await browser.close();
return products;
};
// Define the API route for scraping
router.get('/scrape', async (req, res) => {
// ... some code here
}
});
export default router;
The Brightdata scraping browser uses puppeteer-core to scrape amazon data and return the contents as a json respones.
I used express.js to create an api endpoint and server for the frontend appication which is a React and vite.js setup.
import express from 'express';
import scrapeRouter from './index.js'; // Import the logic from index.js
import cors from 'cors';
const app = express();
// allow all origin
app.use(cors());
// Use the scrapeRouter for /api routes
app.use('/api', scrapeRouter);
// Set the port
const PORT = 4040;
// Start the server
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});
Tailwindcss is used for the staling and React Icons for the Icons. Other Stacks are listed below.
Deployment
The backend express app is deployed seperately
- Backend deployed on Render.com
- Frontend deployed on Netlify.com
Stacks Used
- React
- Vite
- Tailwindcss
- React Icons
- Axios
- Cors
- Brightdata (for proxy and data fetching)
- Render (for api hosting)
- Dotenv (load env)
- express (to setup server and routes)
- nodemon (local dev)
- puppeteer-core (Scraping data from Amazon)
Top comments (0)