DEV Community

Cover image for Creating a website aggregator with ChatGPT, React, and Node.js 🚀
Nevo David Subscriber for novu

Posted on • Edited on • Originally published at novu.co

Creating a website aggregator with ChatGPT, React, and Node.js 🚀

A website aggregator is a website that collects data from other websites across the internet and puts the information in one place where visitors can access it.

There are many versions of website aggregators; some are search engines such as Google and Duck Duck go, and some can have more of a Product Hunt structure where you can see a picture and a short text.

You will usually scrape the website, take their metatags and h1-6 tags, scan their sitemap.xml, and use some pattern to sort the information.

Today I am going to use a different solution 😈
I will take the entire website content, send it to ChatGPT, and ask them to give me the information I need.

It's kinda crazy to see ChatGPT parses the website content

So lettttsss do it 🚀

Scrape

In this article, you'll learn how to build a website aggregator which scrapes content from a website and determines the website's title and description using ChatGPT.

What is ChatGPT?

ChatGPT is an AI language model trained by OpenAI to generate text and interact with users in a human-like conversational manner. It is worth mentioning that ChatGPT is free and open to public use.

Users can submit requests and get information or answers to questions from a wide range of topics such as history, science, mathematics, and current events in just a few seconds.

ChatGPT performs other tasks, such as proofreading, paraphrasing, and translation. It can also help with writing, debugging, and explaining code snippets. Its wide range of capabilities is the reason why ChatGPT has been trending.

ChatGPT is not available as an API yet :( In order to use we will have to scrape our way in 😈

Novu - the first open-source notification infrastructure

Just a quick background about us. Novu is the first open-source notification infrastructure. We basically help to manage all the product notifications. It can be In-App (the bell icon like you have in Facebook - Websockets), Emails, SMSs and so on.

Novu

I would be super happy if you could give us a star! And let me also know in the comments ❤️
https://github.com/novuhq/novu

Limitation with ChatGPT

As previously mentioned, ChatGPT is not accessible through a public API. Instead, we can use web scraping techniques to access it. This involves automating the process of logging in to the OpenAI website, solving the captcha (you can use 2captcha for this), and sending an API request with the OpenAI cookies. Fortunately, there is a public library that can handle these tasks for us. Keep in mind that this is not a formal API, so you may encounter limitations if you attempt to make a large number of requests. Additionally, it is not suitable for real-time requests. If you want to use it, consider implementing a queue system for background processing.

Project Set up

Here, I'll guide you through creating the project environment for the web application. We'll use React.js for the front end and Node.js for the backend server.

Create the project folder for the web application by running the code below:



mkdir website-aggregator
cd website-aggregator
mkdir client server


Enter fullscreen mode Exit fullscreen mode

Setting up the Node.js server

Navigate into the server folder and create a package.json file.



cd server & npm init -y


Enter fullscreen mode Exit fullscreen mode

Install Express, Nodemon, and the CORS library.



npm install express cors nodemon


Enter fullscreen mode Exit fullscreen mode

ExpressJS is a fast, minimalist framework that provides several features for building web applications in Node.js, CORS is a Node.js package that allows communication between different domains, and Nodemon is a Node.js tool that automatically restarts the server after detecting file changes.

Create an index.js file - the entry point to the web server.



touch index.js


Enter fullscreen mode Exit fullscreen mode

Set up a Node.js server using ExpressJS. The code snippet below returns a JSON object when you visit the http://localhost:4000/api in your browser.



//👇🏻index.js
const express = require("express");
const cors = require("cors");
const app = express();
const PORT = 4000;

app.use(express.urlencoded({ extended: true }));
app.use(express.json());
app.use(cors());

app.get("/api", (req, res) => {
    res.json({
        message: "Hello world",
    });
});

app.listen(PORT, () => {
    console.log(`Server listening on ${PORT}`);
});


Enter fullscreen mode Exit fullscreen mode

Install the unofficial ChatGPT API library and Puppeteer. The ChatGPT API uses Puppeteer as an optional peer dependency to automate bypassing the Cloudflare protections.



npm install chatgpt puppeteer


Enter fullscreen mode Exit fullscreen mode

To use the ChatGPT API within the server/index.js file, you need to configure the file to use both the require and import keywords for importing libraries.

Therefore, update the server/package.json file to contain the type keyword.



{ "type": "module" }


Enter fullscreen mode Exit fullscreen mode

Add the code snippet below at the top of the server/index.js file.



import { createRequire } from "module";
const require = createRequire(import.meta.url);
//...other code statements


Enter fullscreen mode Exit fullscreen mode

Once you have completed the last two steps, you can now use ChatGPT within the index.js file.

Configure Nodemon by adding the start command to the list of scripts in the package.json file. The code snippet below starts the server using Nodemon.



//In server/package.json

"scripts": {
    "test": "echo \"Error: no test specified\" && exit 1",
    "start": "nodemon index.js"
  },


Enter fullscreen mode Exit fullscreen mode

Congratulations! You can now start the server by using the command below.



npm start


Enter fullscreen mode Exit fullscreen mode

Setting up the React application

Navigate into the client folder via your terminal and create a new React.js project.



cd client
npx create-react-app ./


Enter fullscreen mode Exit fullscreen mode

Delete the redundant files, such as the logo and the test files from the React app, and update the App.js file to display "Hello World" as below.



function App() {
    return (
        <div>
            <p>Hello World!</p>
        </div>
    );
}
export default App;


Enter fullscreen mode Exit fullscreen mode

Navigate into the src/index.css file and copy the code below. It contains all the CSS required for styling this project.



@import url("https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@300;400;500;600;700&display=swap");
* {
    box-sizing: border-box;
    margin: 0;
    padding: 0;
    font-family: "Space Grotesk", sans-serif;
}
input {
    padding: 10px;
    width: 70%;
    margin: 10px 0;
    outline: none;
}
button {
    width: 200px;
    cursor: pointer;
    padding: 10px 20px;
    outline: none;
    border: none;
    background-color: #6d9886;
}
.home,
.home__form,
.website__item,
.loading {
    display: flex;
    align-items: center;
    justify-content: center;
    flex-direction: column;
}
.home__form > h2,
.home__form {
    margin-bottom: 30px;
    text-align: center;
    width: 100%;
}
.home {
    min-height: 100vh;
    padding: 20px;
    width: 100%;
}
.website__container {
    width: 100%;
    min-height: 50vh;
    border-radius: 5px;
    display: flex;
    align-items: center;
    justify-content: center;
    flex-wrap: wrap;
    padding: 15px;
}
.website__item {
    width: 80%;
    margin: 10px;
    background-color: #f7f7f7;
    border-radius: 5px;
    padding: 30px;
    box-shadow: 0 2px 8px 0 rgba(99, 99, 99, 0.2);
}
.website__item > img {
    width: 70%;
    border-radius: 5px;
}
.website__item > h3 {
    margin: 10px 0;
}
.website__item > p {
    text-align: center;
    opacity: 0.7;
}
.loading {
    height: 100vh;
    background-color: #f2e7d5;
}


Enter fullscreen mode Exit fullscreen mode

Update the App.js file to display an input field that allows you to provide the website's URL.



import React, { useState } from "react";

const App = () => {
    const [url, setURL] = useState("");

    const handleSubmit = (e) => {
        e.preventDefault();
        console.log({ url });
        setURL("");
    };

    return (
        <div className='home'>
            <form className='home__form'>
                <h2>Website Aggregator</h2>
                <label htmlFor='url'>Provide the website URL</label>
                <input
                    type='url'
                    name='url'
                    id='url'
                    value={url}
                    onChange={(e) => setURL(e.target.value)}
                />
                <button onClick={handleSubmit}>ADD WEBSITE</button>
            </form>
        </div>
    );
};

export default App;


Enter fullscreen mode Exit fullscreen mode

URL

Congratulations! You've successfully created the application's user interface. In the following sections, I'll walk you through scraping data from websites using Puppeteer and getting a website's description and title via ChatGPT.

How to scrape data using Puppeteer in Node.js

Puppeteer is a Node.js library that automates several browser actions such as form submission, crawling single-page applications, UI testing, and in particular, web scraping and generating screenshots of web pages.

Here, I'll guide you through scraping the website's content via Puppeteer in Node.js. We'll send the website url provided by the user to the Node.js server and scrape the website's content via its URL.

Create an endpoint on the server that accepts the website's URL from the React app.



app.post("/api/url", (req, res) => {
    const { url } = req.body;

    //👇🏻 The URL from the React app
    console.log(url);
});


Enter fullscreen mode Exit fullscreen mode

Import the Puppeteer library and scrape the website's content as done below:



//👇🏻 Import Puppeteer at the top
const puppeteer = require("puppeteer");

app.post("/api/url", (req, res) => {
    const { url } = req.body;

    //👇🏻 Puppeteer webscraping function
    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.goto(url);

        //👇🏻 returns all the website content
        const websiteContent = await page.evaluate(() => {
            return document.documentElement.innerText.trim();
        });

        //👇🏻 returns the website meta image
        const websiteOgImage = await page.evaluate(() => {
            const metas = document.getElementsByTagName("meta");
            for (let i = 0; i < metas.length; i++) {
                if (metas[i].getAttribute("property") === "og:image") {
                    return metas[i].getAttribute("content");
                }
            }
        });

        console.log({ websiteContent, websiteOgImage })
        await browser.close();
    })();
});


Enter fullscreen mode Exit fullscreen mode

Add a function within the React app that sends the URL to the api/url/ endpoint.



const handleSubmit = (e) => {
    e.preventDefault();
    setLoading(true);
    setURL("");
    //👇🏻 Calls the function.
    sendURL();
};

async function sendURL() {
    try {
        const request = await fetch("http://localhost:4000/api/url", {
            method: "POST",
            body: JSON.stringify({
                url,
            }),
            headers: {
                Accept: "application/json",
                "Content-Type": "application/json",
            },
        });
        const data = await request.json();
        //👇🏻 toggles the loading state if the request is successful
        if (data.message) {
            setLoading(false);
        }
    } catch (err) {
        console.error(err);
    }
}


Enter fullscreen mode Exit fullscreen mode

From the code snippet above, we added a loading state that describes the state of the API request.



const [loading, setLoading] = useState(false);


Enter fullscreen mode Exit fullscreen mode

Create a Loading component that is shown to the users when the request is pending.



import React from "react";

const Loading = () => {
    return (
        <div className='loading'>
            <h1>Loading, please wait...</h1>
        </div>
    );
};

export default Loading;


Enter fullscreen mode Exit fullscreen mode

Display the Loading component whenever the content is yet to be available.



import Loading from "./Loading";
//👇🏻 Add this code within the App.js component

if (loading) {
    return <Loading />;
}


Enter fullscreen mode Exit fullscreen mode

Congratulations! You've learnt how to scrape content from websites using Puppeteer. In the upcoming section, you'll learn how to communicate with ChatGPT in Node.js by generating websites' descriptions and brand names.

How to communicate with ChatGPT in Node.js

ChatGPT is not yet available as a public API. Therefore, to use it, we have to scrape our way in - meaning we'll perform a full browser automation that logs in to the OpenAI website, solves the captcha, and send an API request with the OpenAI cookies.

Fortunately, a public library that does this is available and has been installed as part of the project requirement.

Import the ChatGPT API library and create a function that sends a request to ChatGPT.



import { ChatGPTAPIBrowser } from "chatgpt";

async function chatgptFunction(content) {
    // use puppeteer to bypass cloudflare (headful because of captchas)
    const api = new ChatGPTAPIBrowser({
        email: "<CHATGPT_EMAIL_ADDRESS>",
        password: "<CHATGPT_PASSWORD>",
    });
    await api.initSession();
    //👇🏻 Extracts the brand name from the website content
    const getBrandName = await api.sendMessage(
        `I have a raw text of a website, what is the brand name in a single word? ${content}`
    );
    //👇🏻 Extracts the brand description from the website content
    const getBrandDescription = await api.sendMessage(
        `I have a raw text of a website, can you extract the description of the website from the raw text. I need only the description and nothing else. ${content}`
    );
    //👇🏻 Returns the response from ChatGPT
    return {
        brandName: getBrandName.response,
        brandDescription: getBrandDescription.response,
    };
}


Enter fullscreen mode Exit fullscreen mode

Chat GPT is super intelligent, and it will answer any question we will ask it. So basically, we will send it to write us the brand name and the description based on the complete website HTML.
The brand name can usually be found on the "og:site_name," but to show you how cool it is, we will let ChatGPT extract it. As for the description, it's pretty crazy. It will tell us what the site is about and summarize everything!

Next,
Update the api/url route to as done below:



//👇🏻 holds all the ChatGPT result
const database = [];
//👇🏻 generates a random string as ID
const generateID = () => Math.random().toString(36).substring(2, 10);

app.post("/api/url", (req, res) => {
    const { url } = req.body;

    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.goto(url);
        const websiteContent = await page.evaluate(() => {
            return document.documentElement.innerText.trim();
        });
        const websiteOgImage = await page.evaluate(() => {
            const metas = document.getElementsByTagName("meta");
            for (let i = 0; i < metas.length; i++) {
                if (metas[i].getAttribute("property") === "og:image") {
                    return metas[i].getAttribute("content");
                }
            }
        });
        //👇🏻 accepts the website content as a parameter
        let result = await chatgptFunction(websiteContent);
        //👇🏻 adds the brand image and ID to the result
        result.brandImage = websiteOgImage;
        result.id = generateID();
    //👇🏻 adds the result to the array
        database.push(result);
    //👇🏻 returns the results
        return res.json({
            message: "Request successful!",
            database,
        });

        await browser.close();
    })();
});


Enter fullscreen mode Exit fullscreen mode

To display the response within the React application, create a state that holds the server's response.



const [websiteContent, setWebsiteContent] = useState([]);

async function sendURL() {
    try {
        const request = await fetch("http://localhost:4000/api/url", {
            method: "POST",
            body: JSON.stringify({
                url,
            }),
            headers: {
                Accept: "application/json",
                "Content-Type": "application/json",
            },
        });
        const data = await request.json();
        if (data.message) {
            setLoading(false);
            //👇🏻 update the state with the server response
            setWebsiteContent(data.database);
        }
    } catch (err) {
        console.error(err);
    }
}


Enter fullscreen mode Exit fullscreen mode

Lastly, update the App.js layout to display the server's response to the user.



const App = () => {
    //...other code statements
//👇🏻 remove the quotation marks around the description
const trimDescription = (content) =>
        content.match(/(?:"[^"]*"|^[^"]*$)/)[0].replace(/"/g, "");

    return (
        <div className='home'>
            <form className='home__form'>
                <h2>Website Aggregator</h2>
                <label htmlFor='url'>Provide the website URL</label>
                <input
                    type='url'
                    name='url'
                    id='url'
                    value={url}
                    onChange={(e) => setURL(e.target.value)}
                />
                <button onClick={handleSubmit}>ADD WEBSITE</button>
            </form>
            <main className='website__container '>
                {websiteContent.map((item) => (
                    <div className='website__item' key={item.id}>
                        <img src={item?.brandImage} alt={item?.brandName} />
                        <h3>{item?.brandName}</h3>
                        <p>{trimDescription(item?.brandDescription)}</p>
                    </div>
                ))}
            </main>
        </div>
    );
};


Enter fullscreen mode Exit fullscreen mode

Congratulations!🎉 You've completed the project for this tutorial.

Here is a sample of the result gotten from the application:

results

Conclusion

So far, we have covered,

  • what ChatGPT is,
  • how to scrape website content using Puppeteer, and
  • how to communicate with ChatGPT in a Node.js application

This tutorial walks you through an example of an application you can build using Puppeteer and ChatGPT. ChatGPT can be seen as the ultimate personal assistant, very useful in various fields to enable us to work smarter and better.

The source code for this tutorial is available here:

https://github.com/novuhq/blog/tree/main/website-aggregator-with-chatgpt-react

Thank you for reading!

Help me out!

If you feel like this article helped you understand WebSockets better! I would be super happy if you could give us a star! And let me also know in the comments ❤️
https://github.com/novuhq/novu
Image description

Top comments (23)

Collapse
 
nevodavid profile image
Nevo David • Edited

Hi Friends!
If you want to know exactly when I post, just register to my newsletter here

Collapse
 
sumitsaurabh927 profile image
Sumit Saurabh

Very well explained! 👏👏👏

Collapse
 
nevodavid profile image
Nevo David

Thank you Sumit 🤗
Whatsupppp

Collapse
 
sumitsaurabh927 profile image
Sumit Saurabh

I love the depth you cover in every article you post. That's what's up!!!

Collapse
 
harmonygrams profile image
Harmony

I just noticed that puppeteer doesn't work for me.
I've tried several times. It seems it doesn't have the capability to
scrape some websties.

Collapse
 
nevodavid profile image
Nevo David

Shouldn’t be a problem, maybe you are on headless mode?

Collapse
 
harmonygrams profile image
Harmony

What do you mean by headless mode?

Collapse
 
optimisedu profile image
optimisedu

Wow. You have created blueprints / boilerplate for a monster. Great article, summery and boilerplate =)

Collapse
 
nevodavid profile image
Nevo David

Thank you 🤩

Collapse
 
mezieb profile image
Okoro chimezie bright

Well done thanks for the update👍

Collapse
 
nevodavid profile image
Nevo David

You are welcome 🤗

Collapse
 
elhamnajeebullah profile image
Elham Najeebullah

I am only having one problem, everytime i submit the add website form, it open a new tab and take me to the openai site to login first. Did anyone had this problem and is there a way to handle the login process within the code before interacting with the website's content?

Collapse
 
kerimedeiros profile image
Keri Medeiros

Experiencing same issue on my end.

Collapse
 
sip profile image
Dom Sipowicz • Edited

I have a genuine question. Is there a reason to use ChatGPT over OpenAI's text-davincii-text or other OpenAI models? I know everyone use the ChatGPT (check the chart) but it's faster and more "the right way" to use directly the text/code models to perform those tasks.

Collapse
 
nevodavid profile image
Nevo David

Yes there is.
OpenAI AI uses GPT3,
Chat GPT uses a more advanced model of GPT3.5

Collapse
 
sip profile image
Dom Sipowicz

Fair enough. Also based on my own testing, text-davinci-003 yielded similar and sometimes more accurate results than ChatGPT in some cases.

Collapse
 
lorenz1989 profile image
lorenz1989

A group of students from a renowned university have researched and combined ChatGPT with search tools such as Google, Bing, and DuckDuckGo. The results exceeded expectations.
Just log in to your OpenAI account to use it.
Let’s try it at: chrome.google.com/webstore/detail/...
Source: twitter.com/DataChaz/status/160818...

Collapse
 
matija2209 profile image
matija2209

I'm building a tool that requires scraping product listings. I mostly rely on structured data for price, reviews, sku, stock, etc. Do you think any of the OpenAI products could parse prices/reviews where there is no structured data? Thanks!

Collapse
 
ajaxstardust profile image
ajaxStardust

i'm looking forward to trying this!