DEV Community

Cover image for Scrape Unscrapeable Amazon Dataset with BrightData, React.js and Node.js
Alex Anie
Alex Anie

Posted on

5 4 4 6 5

Scrape Unscrapeable Amazon Dataset with BrightData, React.js and Node.js

This is a submission for the Bright Data Web Scraping Challenge: Scrape Data from Complex, Interactive Websites

What I Built

This project uses Brightdata to scrape data from Amazon and return the data output on the page. You can search anything you want and expect to see it load on the page as long what you search can be found on amazon.

Demo
The project uses two different GitHub repo. One for the frontend the other for the backend.

Image description

How I Used Bright Data

Project is built using bright data.

I used Brightdata Scraping browser to retrieve the data set from amazon.

import 'dotenv/config'
import { Router } from 'express';
import puppeteer from 'puppeteer-core';
import process from 'node:process';

const router = Router();

// Scraping logic using Puppeteer and BrightData
const scrapeData = async (searchTerm) => {
  const BROWSER_WS = process.env.BROWSER_WS; // set your bright data proxy credential here
  const URL = "https://www.amazon.com";

  const browser = await puppeteer.connect({
    browserWSEndpoint: BROWSER_WS,
  });

 // ... some code here

  await browser.close();
  return products;
};

// Define the API route for scraping
router.get('/scrape', async (req, res) => {
     // ... some code here
  }
});

export default router;
Enter fullscreen mode Exit fullscreen mode

The Brightdata scraping browser uses puppeteer-core to scrape amazon data and return the contents as a json respones.

I used express.js to create an api endpoint and server for the frontend appication which is a React and vite.js setup.

import express from 'express';
import scrapeRouter from './index.js'; // Import the logic from index.js
import cors from 'cors';

const app = express();

// allow all origin
app.use(cors());

// Use the scrapeRouter for /api routes
app.use('/api', scrapeRouter);

// Set the port
const PORT = 4040;

// Start the server
app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

Enter fullscreen mode Exit fullscreen mode

Tailwindcss is used for the staling and React Icons for the Icons. Other Stacks are listed below.

Deployment

The backend express app is deployed seperately

  • Backend deployed on Render.com
  • Frontend deployed on Netlify.com

Stacks Used

  • React
  • Vite
  • Tailwindcss
  • React Icons
  • Axios
  • Cors
  • Brightdata (for proxy and data fetching)
  • Render (for api hosting)
  • Dotenv (load env)
  • express (to setup server and routes)
  • nodemon (local dev)
  • puppeteer-core (Scraping data from Amazon)

Billboard image

Synthetic monitoring. Built for developers.

Join Vercel, Render, and thousands of other teams that trust Checkly to streamline monitor creation and configuration with Monitoring as Code.

Start Monitoring

Top comments (0)

Cloudinary image

Video API: manage, encode, and optimize for any device, channel or network condition. Deliver branded video experiences in minutes and get deep engagement insights.

Learn more

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay