DEV Community

Cover image for How to Build an Amazon Product Search App with Next.js 14, Puppeteer, Cheerio, and shadcn UI: A Comprehensive Step-by-Step Guide
ANIKET JADHAV
ANIKET JADHAV

Posted on

How to Build an Amazon Product Search App with Next.js 14, Puppeteer, Cheerio, and shadcn UI: A Comprehensive Step-by-Step Guide

Hey there, fellow developers! ๐Ÿ‘‹

Ever thought about creating your very own Amazon product search tool? Whether it's for a personal project, data analysis, or just to showcase your web scraping skills, you're in the right place! In this guide, I'll walk you through building a sleek Amazon Product Search app using Next.js 14, Puppeteer, Cheerio, and shadcn UI for those stylish components. Letโ€™s dive in!


Table of Contents

  1. Project Setup with Next.js 14
  2. Installing and Configuring Puppeteer, Cheerio, and shadcn UI
  3. Building a User-Friendly Home Page
  4. Identifying the Information to Retrieve on the Amazon Product Page
  5. Creating the API for Web Scraping
  6. Integrating Frontend with Backend API
  7. Polishing the Application with Final Touches
  8. Wrapping Up and Conclusion
  9. Helpful Resources

Setting Up Next.js 14

Let's kick things off by setting up our Next.js project.

Step 1: Initialize Your Project

First things first, open up your terminal and run the following command to create a new Next.js app:

npx create-next-app@latest amazon-product-search
Enter fullscreen mode Exit fullscreen mode

During the setup, you'll be prompted with a series of questions like:

What is your project named? amazon-product-search
Would you like to use TypeScript? No / Yes
Would you like to use ESLint? No / Yes
Would you like to use Tailwind CSS? No / Yes
Would you like your code inside a `src/` directory? No / Yes
Would you like to use App Router? (recommended) No / Yes
Would you like to use Turbopack for `next dev`? No / Yes
Would you like to customize the import alias (`@/*` by default)? No / Yes
What import alias would you like configured? @/*
Enter fullscreen mode Exit fullscreen mode

Feel free to answer these prompts based on your project preferences. Once youโ€™ve made your selections, create-next-app will generate a folder named amazon-product-search and install all the necessary dependencies.

Step 2: Launch the Development Server

Navigate into your project directory and start the development server with:

cd amazon-product-search
npm run dev
Enter fullscreen mode Exit fullscreen mode

Open your browser and head over to http://localhost:3000. You should see the default Next.js welcome page, indicating that everything is set up correctly.


Installing Puppeteer, Cheerio, and shadcn UI

Next, we'll set up the essential libraries for web scraping and UI components.

Puppeteer

Puppeteer is a powerful Node.js library that provides a high-level API to control Chrome or Chromium browsers programmatically. Itโ€™s perfect for automating tasks like web scraping.

Install Puppeteer by running:

npm install puppeteer
Enter fullscreen mode Exit fullscreen mode

Cheerio

Cheerio is a fast, flexible, and lean implementation of jQuery designed for server-side HTML manipulation. Itโ€™s fantastic for parsing and extracting data from HTML.

Install Cheerio with:

npm install cheerio
Enter fullscreen mode Exit fullscreen mode

shadcn UI

shadcn UI offers a collection of beautifully designed, accessible React components. We'll use it for our Button, Input, and Card components to ensure our app looks polished and professional.

Step 1: Initialize shadcn UI

Run the following command to set up shadcn UI in your project:

npx shadcn@latest init
Enter fullscreen mode Exit fullscreen mode

For a quicker setup with default settings (like new-york for style, zinc for base color, and enabling CSS variables), you can use the -d flag:

npx shadcn@latest init -d
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure components.json

You'll be prompted with a few questions to set up components.json:

Which style would you like to use? โ€บ New York
Which color would you like to use as base color? โ€บ Zinc
Do you want to use CSS variables for colors? โ€บ yes
Enter fullscreen mode Exit fullscreen mode

Step 3: Add shadcn UI Components

There are two ways to add components using shadcn UI:

Add Components Individually

Use the add command followed by the component names:

   npx shadcn@latest add button input card
Enter fullscreen mode Exit fullscreen mode

Interactive Component Selection

Alternatively, run the add command without specifying a component. This will present a list of available components to select and install interactively:

   npx shadcn@latest add
Enter fullscreen mode Exit fullscreen mode

Follow the prompts to choose the components you want to add.

shadcn components selection

These commands will create the respective component files in your components/ui directory. You can then import and use them throughout your project.


Building a User-Friendly Home Page

Now, let's craft a user-friendly interface where users can search for Amazon products effortlessly.

Home Page Component (app/page.jsx)

Replace the content of app/page.jsx with the following code:

"use client";
import { useState } from "react";
import Image from "next/image";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import {
  Card,
  CardContent,
  CardFooter,
  CardHeader,
  CardTitle,
} from "@/components/ui/card";

export default function Home() {
  const [loading, setLoading] = useState(false);
  const [query, setQuery] = useState("");
  const [products, setProducts] = useState([]);
  const [error, setError] = useState(null);

  const onSubmit = async (e) => {
    e.preventDefault();
    if (!query.trim()) return;
    setLoading(true);
    setError(null);

    try {
      const res = await fetch(
        `/api/searchprod?query=${encodeURIComponent(query)}`,
        {
          headers: {
            "Content-Type": "application/json",
          },
        }
      );
      const data = await res.json();
      if (data.success) {
        setProducts(data.products);
      } else {
        setError(data.error || "Failed to fetch products");
        setProducts([]);
      }
    } catch (err) {
      setError("An error occurred while fetching products");
      setProducts([]);
    } finally {
      setLoading(false);
    }
  };

  return (
    <main className="mx-auto flex flex-col mt-5 justify-center px-5 items-center max-w-7xl">
      <h1 className="text-3xl font-bold mb-5 text-transparent bg-clip-text bg-gradient-to-r from-yellow-400 via-red-500 to-yellow-600">
        Amazon Product Search
      </h1>

      <form
        onSubmit={onSubmit}
        className="flex gap-2 w-full md:px-28 px-0 mb-5"
      >
        <Input
          name="query"
          value={query}
          onChange={(e) => setQuery(e.target.value)}
          placeholder="Search for products..."
          className="w-full"
          disabled={loading}
          aria-label="Search"
        />
        <Button
          disabled={loading || !query.trim()}
          type="submit"
          className="whitespace-nowrap"
        >
          {loading ? "Searching..." : "Search"}
        </Button>
      </form>

      {error && (
        <div className="mb-5 w-full md:px-28 px-0">
          <div className="bg-red-100 text-red-700 px-4 py-3 rounded">
            {error}
          </div>
        </div>
      )}

      {!loading && !error && products.length === 0 && (
        <div className="text-center py-4 text-gray-500">
          Try searching for a product to see results.
        </div>
      )}

      {loading && (
        <div className="flex items-center justify-center py-10">
          <svg
            className="animate-spin h-5 w-5 mr-3 text-gray-600"
            xmlns="http://www.w3.org/2000/svg"
            fill="none"
            viewBox="0 0 24 24"
          >
            <circle
              className="opacity-25"
              cx="12"
              cy="12"
              r="10"
              stroke="currentColor"
              strokeWidth="4"
            ></circle>
            <path
              className="opacity-75"
              fill="currentColor"
              d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"
            ></path>
          </svg>
          <span>Loading products...</span>
        </div>
      )}

      {!loading && products.length > 0 && (
        <div className="grid xl:grid-cols-4 md:grid-cols-3 sm:grid-cols-2 grid-cols-1 gap-6 py-4 w-full">
          {products.map((product, i) => (
            <Card
              key={i}
              className="flex flex-col h-full rounded-xl shadow-2xl"
            >
              {product.image && product.image !== "N/A" ? (
                <CardContent className="relative h-64 bg-white rounded-t-xl">
                  <Image
                    src={product.image}
                    alt={product.title}
                    fill
                    style={{ objectFit: "contain" }}
                    className="p-4"
                  />
                </CardContent>
              ) : (
                <div className="h-64 flex items-center justify-center bg-gray-100">
                  <span className="text-gray-500">No Image Available</span>
                </div>
              )}
              <CardHeader className="flex-grow">
                <CardTitle className="text-base font-semibold line-clamp-2">
                  {product.title}
                </CardTitle>
              </CardHeader>
              <CardFooter className="flex flex-col items-start">
                {product.price && (
                  <p className="text-lg font-semibold text-green-600">
                    {product.price}
                  </p>
                )}
                {product.stars && (
                  <p className="text-sm text-yellow-500">{product.stars}</p>
                )}
                {product.reviews && (
                  <p className="text-sm text-gray-500">
                    {`${product.reviews} reviews`}
                  </p>
                )}
              </CardFooter>
            </Card>
          ))}
        </div>
      )}

      {!loading && !error && products.length === 0 && query.trim() && (
        <div className="text-center py-4 text-gray-500">
          {`No results found for "${query}"`}
        </div>
      )}
    </main>
  );
}
Enter fullscreen mode Exit fullscreen mode

Home screen

What's Happening Here

  • State Management: We're using useState to handle loading states, the user's query input, fetched products, and any potential errors.
  • Form Handling: The onSubmit function sends a request to our backend API with the search query.
  • Fetching Data: It fetches data from /api/searchprod and updates the UI based on the response.
  • Displaying Results: Products are displayed in a responsive grid with images, titles, prices, stars, and reviews, all styled using shadcn UI components.

Identifying the Information to Retrieve on the Amazon Product Page

To effectively scrape data from Amazon, we need to know what and where to extract. Here's how we approach it:

We'll use the Amazon listing results for the โ€œMacBook Proโ€ search term to retrieve details like the product title and price.

Step 1: Navigate to Amazon and Perform a Search

Head over to Amazon and search for "MacBook Pro." To inspect the page and understand the DOM structure, follow the instructions based on your operating system:

  • For Windows/Linux Users:

    • Right-Click Method: Right-click anywhere on the page and select "Inspect" from the context menu.
    • Keyboard Shortcut: Press Ctrl + Shift + I to open the Developer Tools directly.
  • For Mac Users:

    • Right-Click Method: Right-click (or Control + Click) anywhere on the page and select "Inspect" from the context menu.
    • Keyboard Shortcut: Press Option + Command + I to open the Developer Tools directly.

Amazon search

Step 2: Matching Information with DOM Selectors

Here's a table outlining the selectors we'll use:

Information DOM Selector Description
Product Title h2 a span Retrieves the text for the product title from search results.
Price (Whole) .a-price-whole Extracts the whole part of the product's price.
Price (Fraction) .a-price-fraction Extracts the fractional part of the product's price.
Reviews Count .a-size-base.s-underline-text Retrieves the number of reviews for the product.
Star Rating .a-icon-alt Retrieves the star rating of the product.
Product Image .s-image Extracts the product image URL from the search results.

Creating the API Route

Now, let's set up the backend that handles web scraping using Puppeteer and Cheerio.

API Route (app/api/searchprod/route.js)

Create a new file at app/api/searchprod/route.js and add the following code:

// app/api/searchprod/route.js
import { NextResponse } from "next/server";
import puppeteer from "puppeteer";
import * as cheerio from "cheerio";

export async function GET(request) {
  const searchParams = request.nextUrl.searchParams;
  const query = searchParams.get("query");

  if (!query) {
    return NextResponse.json({
      error: "No query provided",
      status: 400,
      success: false,
    });
  }

  const amazon = "https://www.amazon.in/";

  let browser;
  try {
    browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();

    // Set a user agent to mimic a real browser and avoid blocking
    await page.setUserAgent(
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
        "AppleWebKit/537.36 (KHTML, like Gecko) " +
        "Chrome/85.0.4183.102 Safari/537.36"
    );

    await page.goto(amazon, { waitUntil: "networkidle2" });

    // Type the query into the search box and press Enter
    await page.type("#twotabsearchtextbox", query);
    await Promise.all([
      page.keyboard.press("Enter"),
      page.waitForNavigation({ waitUntil: "networkidle2" }),
    ]);

    const html = await page.content();
    const $ = cheerio.load(html);

    const products = [];

    // Loop through each product item
    $(".s-main-slot .s-result-item").each((_, element) => {
      const title = $(element).find("h2 a span").text().trim();

      const priceWhole = $(element)
        .find(".a-price-whole")
        .first()
        .text()
        .trim();
      const priceFraction = $(element)
        .find(".a-price-fraction")
        .first()
        .text()
        .trim();
      const price = priceWhole
        ? `โ‚น${priceWhole}.${priceFraction || "00"}`
        : "N/A";

      const reviews = $(element)
        .find(".a-size-base.s-underline-text")
        .text()
        .trim();

      const stars = $(element).find(".a-icon-alt").text().trim();

      const image = $(element).find(".s-image").attr("src");

      if (title) {
        products.push({
          title,
          price,
          reviews: reviews || "N/A",
          stars: stars || "N/A",
          image: image || "N/A",
        });
      }
    });

    if (products.length === 0) {
      return NextResponse.json({
        error: "No products found",
        status: 404,
        success: false,
      });
    }

    return NextResponse.json({ products, success: true, status: 200 });
  } catch (error) {
    return NextResponse.json({
      error: "Something went wrong",
      status: 500,
      success: false,
    });
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Breaking It Down

Imports:

  • NextResponse: For sending JSON responses.
  • Puppeteer: Automates browser actions.
  • Cheerio: Parses and manipulates HTML.

Handling the GET Request:

  • Extract Query Parameter: Retrieves the query parameter from the URL.
  • Validation: Returns an error response if no query is provided.

Launching Puppeteer:

  • Headless Mode: Runs the browser in headless mode for efficiency.
  • User Agent: Sets a realistic user agent string to mimic a real browser and avoid detection/blocking.

Navigating to Amazon and Performing the Search:

  • Navigate to Amazon: Opens Amazon India's homepage and waits until the network is idle.
  • Perform Search: Types the search query into Amazon's search box and submits the form.

Scraping the Data:

  • Get Page Content: Retrieves the HTML content of the search results page.
  • Load into Cheerio: Parses the HTML for data extraction.
  • Extract Product Details: Gathers information such as title, price, reviews, stars, and image for each product.

Sending the Response:

  • Success: If products are found, sends them back as a JSON response.
  • No Products Found: Returns a 404 error if no products match the search query.
  • Error Handling: Catches any unexpected errors and sends a 500 error response.

Integrating Frontend with Backend API

With both frontend and backend set up, here's how they interact:

  1. User Input: The user types a product name into the search bar and hits "Search."
  2. API Call: The onSubmit function sends a GET request to /api/searchprod with the search query.
  3. Scraping: The API route uses Puppeteer to navigate Amazon, perform the search, and scrape the results using Cheerio.
  4. Displaying Results: The scraped data is sent back to the frontend, which then displays the products in a neat grid.

Home Screen GIF


Polishing the Application with Final Touches

package.json Configuration

Ensure your package.json includes the necessary dependencies:

{
  "name": "amazon-product-search",
  "version": "0.1.0",
  "private": true,
  "scripts": {
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
    "lint": "next lint"
  },
  "dependencies": {
    "@radix-ui/react-icons": "^1.3.0",
    "@radix-ui/react-slot": "^1.1.0",
    "cheerio": "^1.0.0",
    "class-variance-authority": "^0.7.0",
    "clsx": "^2.1.1",
    "lucide-react": "^0.446.0",
    "next": "14.2.13",
    "puppeteer": "^23.4.1",
    "react": "^18",
    "react-dom": "^18",
    "tailwind-merge": "^2.5.2",
    "tailwindcss-animate": "^1.0.7"
  },
  "devDependencies": {
    "eslint": "^8",
    "eslint-config-next": "14.2.13",
    "postcss": "^8",
    "tailwindcss": "^3.4.1"
  }
}
Enter fullscreen mode Exit fullscreen mode

Best Practices

  • Respectful Scraping: Always check the target site's robots.txt and terms of service to ensure you're allowed to scrape their data.
  • Error Handling: Implement robust error handling to manage unexpected issues gracefully.
  • Performance Optimization: Puppeteer can be resource-intensive. Consider using caching or limiting scraping frequency to optimize performance.
  • Security: Validate and sanitize all inputs to protect against potential vulnerabilities.

Wrapping Up and Conclusion

And there you have itโ€”a fully functional Amazon Product Search app built with Next.js 14, Puppeteer, Cheerio, and shadcn UI! ๐ŸŽ‰ Whether you're aiming to enhance your web scraping skills or build data-driven applications, these tools provide incredible flexibility and power.

What's Next?

  • Enhance the UI: Add more styling or animations to make your app even more user-friendly.
  • Add More Features: Implement pagination, filters, or sorting to refine your search results.
  • Deploy Your App: Share your creation with the world by deploying it on platforms like Vercel or Heroku.

Happy coding, and I hope this guide assists you on your development journey! ๐Ÿš€

Helpful Resources

๐Ÿ”– If you enjoy this project, please give it a star on GitHub! It helps others discover and supports the ongoing development of the project โญ๏ธ.

Top comments (2)

Collapse
 
harishankerbubio profile image
harishanker-bub-io

would be better if you extract the products after scrolling.

Collapse
 
aniyy117 profile image
ANIKET JADHAV • Edited

Thatโ€™s actually a good point! Extracting the products after scrolling or changing page would allow me to get data from other pages and improve the accuracy. Iโ€™ll definitely work on that. Thanks for the valuable feedback โœจ