Nasser Maronie

Posted on Jun 23, 2024

Building a Web Page Summarization App with Next.js, OpenAI, LangChain, and Supabase

#llm #langchain #openai #supabase

An app that can understand the context of any web page.

In this article, we'll show you how to create a handy web app that can summarize the content of any web page. Using Next.js for a smooth and fast web experience, LangChain for processing language, OpenAI for generating summaries, and Supabase for managing and storing vector data, we'll build a powerful tool together.

Why We're Building It

We all face information overload with so much content online. By making an app that gives quick summaries, we help people save time and stay informed. Whether you're a busy worker, a student, or just someone who wants to keep up with news and articles, this app will be a helpful tool for you.

How it's going to be

Our app will let users enter any website URL and quickly get a brief summary of the page. This means you can understand the main points of long articles, blog posts, or research papers without reading them fully.

Potential and Impact

This summarization app can be useful in many ways. It can help researchers skim through academic papers, keep news lovers updated, and more. Plus, developers can build on this app to create even more useful features.

Next.js

Next.js is a powerful and flexible React framework developed by Vercel that enables developers to build server-side rendering (SSR) and static web applications with ease. It combines the best features of React with additional capabilities to create optimized and scalable web applications.

OpenAI

The OpenAI module in Node.js provides a way to interact with OpenAI’s API, allowing developers to leverage powerful language models like GPT-3 and GPT-4. This module enables you to integrate advanced AI functionalities into your Node.js applications.

LangChain.js

LangChain is a powerful framework designed for developing applications with language models. Originally developed for Python, it has since been adapted for other languages, including Node.js. Here’s an overview of LangChain in the context of Node.js:

What is LangChain?

LangChain is a library that simplifies the creation of applications using large language models (LLMs). It provides tools to manage and integrate LLMs into your applications, handle chaining of calls to these models, and enable complex workflows with ease.

How Large Language Models (LLM) Work?

Large Language Models (LLMs) like OpenAI’s GPT-3.5 are trained on vast amounts of text data to understand and generate human-like text. They can generate responses, translate languages, and perform many other natural language processing tasks.

Supabase

Supabase is an open-source backend-as-a-service (BaaS) platform designed to help developers quickly build and deploy scalable applications. It offers a suite of tools and services that simplify database management, authentication, storage, and real-time capabilities, all built on top of PostgreSQL

Prerequisites

Before we start, make sure you have the following:

Node.js and npm installed
A Supabase account
An OpenAI account

Step 1: Setting Up Supabase

First, we need to set up a Supabase project and create the necessary tables to store our data.

Create a Supabase Project

Go to Supabase and sign up for an account.
Create a new project and make note of your Supabase URL and API key. You'll need these later.

SQL Script for Supabase

Create a new SQL query in your Supabase dashboard and run the following scripts to create the required tables and functions:

First, create an extension if it doesn’t already exist for our vector store:



create extension if not exists vector;

Next, create a table named “documents”. This table will be used to store and embed the content of web page in vector format:



create table if not exists documents (
    id bigint primary key generated always as identity,
    content text,
    metadata jsonb,
    embedding vector(1536)
);

Now, we need a function to query our embedded data:



create or replace function match_documents (
    query_embedding vector(1536),
    match_count int default null,
    filter jsonb default '{}'
) returns table (
    id bigint,
    content text,
    metadata jsonb,
    similarity float
) language plpgsql as $$
begin
    return query
    select
        id,
        content,
        metadata,
        1 - (documents.embedding <=> query_embedding) as similarity
    from documents
    where metadata @> filter
    order by documents.embedding <=> query_embedding
    limit match_count;
end;
$$;

Next, we need to set up our table for storing the web page's detail:



create table if not exists files (
    id bigint primary key generated always as identity,
    url text not null,
    created_at timestamp with time zone default timezone('utc'::text, now()) not null
);

Step 2: Setting Up OpenAI

Create OpenAI Project

Visit the OpenAI Website: Go to OpenAI's website, sign up and create new project.
Navigate to API: After logging in, navigate to the API section and create new API key. This is usually accessible from the dashboard.

Step 3: Setting Up Next.js

Create Next.js app



$ npx create-next-app summarize-page
$ cd ./summarize-page

Install the required dependencies:



npm install @langchain/community @langchain/core @langchain/openai @supabase/supabase-js langchain openai axios

Then we will install Material UI for building our interface, feel free to use other library:



npm install @mui/material @emotion/react @emotion/styled

Step 4: OpenAI and Supabase clients

Next, we need to set up the OpenAI and Supabase clients. Create a libs directory in your project and add the following files.

`src/libs/openAI.ts`

This file will configure the OpenAI client.



import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";

const openAIApiKey = process.env.OPENAI_API_KEY;

if (!openAIApiKey) throw new Error('OpenAI API Key not found.')

export const llm = new ChatOpenAI({
  openAIApiKey,
  modelName: "gpt-3.5-turbo",
  temperature: 0.9,
});

export const embeddings = new OpenAIEmbeddings(
  {
    openAIApiKey,
  },
  { maxRetries: 0 }
);

llm: The language model instance, which will generate our summaries.
embeddings: This will create embeddings for our documents, which help in finding similar content.

`src/libs/supabaseClient.ts`

This file will configure the Supabase client.



import { createClient } from "@supabase/supabase-js";

const supabaseUrl = process.env.SUPABASE_URL || "";
const supabaseAnonKey = process.env.SUPABASE_ANON_KEY || "";

if (!supabaseUrl) throw new Error("Supabase URL not found.");
if (!supabaseAnonKey) throw new Error("Supabase Anon key not found.");

export const supabaseClient = createClient(supabaseUrl, supabaseAnonKey);

supabaseClient: The Supabase client instance to interact with our Supabase database.

Step 5: Creating Services for Content and Files

Create a services directory and add the following files to handle fetching content and managing files.

`src/services/content.ts`

This service will fetch the web page content and clean it by removing HTML tags, scripts, and styles.



import axios from "axios";

export async function getContent(url: string): Promise<string> {
  let htmlContent: string = "";

  const response = await axios.get(url as string);

  htmlContent = response.data;

  if (!htmlContent) return "";

  // Remove unwanted elements and tags
  return htmlContent
    .replace(/style="[^"]*"/gi, "")
    .replace(/<style[^>]*>[\s\S]*?<\/style>/gi, "")
    .replace(/\s*on\w+="[^"]*"/gi, "")
    .replace(
      /<script(?![^>]*application\/ld\+json)[^>]*>[\s\S]*?<\/script>/gi,
      ""
    )
    .replace(/<[^>]*>/g, "")
    .replace(/\s+/g, " ");
}

This function fetches the HTML content of a given URL and cleans it up by removing styles, scripts, and HTML tags.

`src/services/file.ts`

This service will save the web page content into Supabase and retrieve summaries.



import { embeddings, llm } from "@/libs/openAI";
import { supabaseClient } from "@/libs/supabaseClient";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
import { StringOutputParser } from "@langchain/core/output_parsers";
import {
  ChatPromptTemplate,
  HumanMessagePromptTemplate,
  SystemMessagePromptTemplate,
} from "@langchain/core/prompts";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { formatDocumentsAsString } from "langchain/util/document";

export interface IFile {
  id?: number | undefined;
  url: string;
  created_at?: Date | undefined;
}

export async function saveFile(url: string, content: string): Promise<IFile> {
  const doc = await supabaseClient
    .from("files")
    .select()
    .eq("url", url)
    .single<IFile>();

  if (!doc.error && doc.data?.id) return doc.data;

  const { data, error } = await supabaseClient
    .from("files")
    .insert({ url })
    .select()
    .single<IFile>();

  if (error) throw error;

  const splitter = new RecursiveCharacterTextSplitter({
    separators: ["\n\n", "\n", " ", ""],
  });

  const output = await splitter.createDocuments([content]);
  const docs = output.map((d) => ({
    ...d,
    metadata: { ...d.metadata, file_id: data.id },
  }));

  await SupabaseVectorStore.fromDocuments(docs, embeddings, {
    client: supabaseClient,
    tableName: "documents",
    queryName: "match_documents",
  });

  return data;
}

export async function getSummarization(fileId: number): Promise<string> {
  const vectorStore = await SupabaseVectorStore.fromExistingIndex(embeddings, {
    client: supabaseClient,
    tableName: "documents",
    queryName: "match_documents",
  });

  const retriever = vectorStore.asRetriever({
    filter: (rpc) => rpc.filter("metadata->>file_id", "eq", fileId),
    k: 2,
  });

  const SYSTEM_TEMPLATE = `Use the following pieces of context, explain what is it about and summarize it.
      If you can't explain it, just say that you don't know, don't try to make up some explanation.
      ----------------
      {context}`;

  const messages = [
    SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
    HumanMessagePromptTemplate.fromTemplate("{format_answer}"),
  ];
  const prompt = ChatPromptTemplate.fromMessages(messages);
  const chain = RunnableSequence.from([
    {
      context: retriever.pipe(formatDocumentsAsString),
      format_answer: new RunnablePassthrough(),
    },
    prompt,
    llm,
    new StringOutputParser(),
  ]);

  const format_summarization =
    `
    Give it title, subject, description, and the conclusion of the context in this format, replace the brackets with the actual content:

    [Write the title here]

    By: [Name of the author or owner or user or publisher or writer or reporter if possible, otherwise leave it "Not Specified"]

    [Write the subject, it could be a long text, at least minimum of 300 characters]

    ----------------

    [Write the description in here, it could be a long text, at least minimum of 1000 characters]

    Conclusion:
    [Write the conclusion in here, it could be a long text, at least minimum of 500 characters]
    `;

  const summarization = await chain.invoke(format_summarization);

  return summarization;
}

saveFile: Saves the file and its content to Supabase, splits the content into manageable chunks, and stores them in the vector store.
getSummarization: Retrieves relevant documents from the vector store and generates a summary using OpenAI.

Step 6: Creating an API Handler

Now, let's create an API handler to process the content and generate a summary.

`pages/api/content.ts`



import { getContent } from "@/services/content";
import { getSummarization, saveFile } from "@/services/file";
import { NextApiRequest, NextApiResponse } from "next";

export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse
) {
  if (req.method !== "POST")
    return res.status(404).json({ message: "Not found" });

  const { body } = req;

  try {
    const content = await getContent(body.url);
    const file = await saveFile(body.url, content);
    const result = await getSummarization(file.id as number);
    res.status(200).json({ result });
  } catch (err) {
    res.status(

500).json({ error: err });
  }
}

This API handler receives a URL, fetches the content, saves it to Supabase, and generates a summary. It handles both the saveFile and getSummarization functions from our services.

Step 7: Building the Frontend

Finally, let's create the frontend in src/pages/index.tsx to allow users to input URLs and display the summarizations.

`src/pages/index.tsx`



import axios from "axios";
import { useState } from "react";
import {
  Alert,
  Box,
  Button,
  Container,
  LinearProgress,
  Stack,
  TextField,
  Typography,
} from "@mui/material";

export default function Home() {
  const [loading, setLoading] = useState(false);
  const [url, setUrl] = useState("");
  const [result, setResult] = useState("");
  const [error, setError] = useState<any>(null);

  const onSubmit = async () => {
    try {
      setError(null);
      setLoading(true);
      const res = await axios.post("/api/content", { url });
      setResult(res.data.result);
    } catch (err) {
      console.error("Failed to fetch content", err);
      setError(err as any);
    } finally {
      setLoading(false);
    }
  };

  return (
    <Box sx={{ height: "100vh", overflowY: "auto" }}>
      <Container
        sx={{
          backgroundColor: (theme) => theme.palette.background.default,
          position: "sticky",
          top: 0,
          zIndex: 2,
          py: 2,
        }}
      >
        <Typography sx={{ mb: 2, fontSize: "24px" }}>
          Summarize the content of any page
        </Typography>

        <TextField
          fullWidth
          label="Input page's URL"
          value={url}
          onChange={(e) => {
            if (result) setResult("");
            setUrl(e.target.value);
          }}
          sx={{ mb: 2 }}
        />

        <Button
          disabled={loading}
          variant="contained"
          onClick={onSubmit}
        >
          Summarize
        </Button>
      </Container>

      <Container maxWidth="lg" sx={{ py: 2 }}>
        {loading ? (
          <LinearProgress />
        ) : (
          <Stack sx={{ gap: 2 }}>
            {result && (
              <Alert>
                <Typography
                  sx={{
                    whiteSpace: "pre-line",
                    wordBreak: "break-word",
                  }}
                >
                  {result}
                </Typography>
              </Alert>
            )}
            {error && <Alert severity="error">{error.message || error}</Alert>}
          </Stack>
        )}
      </Container>
    </Box>
  );
}

This React component allows users to input a URL, submit it, and display the generated summary. It handles loading states and error messages to provide a better user experience.

Step 8: Running the Application

Create a .env file in the root of your project to store your environment variables:



SUPABASE_URL=your-supabase-url
SUPABASE_ANON_KEY=your-supabase-anon-key
OPENAI_API_KEY=your-openai-api-key

Finally, start your Next.js application:



npm run dev

Now, you should have a running application where you can input web page's url, and receive the page's summarized responses.

Conclusion

Congratulations! You've built a fully functional web page summarization application using Next.js, OpenAI, LangChain, and Supabase. Users can input a URL, fetch the content, store it in Supabase, and generate a summary using OpenAI's capabilities. This setup provides a robust foundation for further enhancements and customization based on your needs.

Feel free to expand on this project by adding more features, improving the UI, or integrating additional APIs.

Check the source code in this repo:

https://github.com/firstpersoncode/summarize-page

Happy coding!

DEV Community