code / head

Posted on Mar 22 • Edited on Apr 3

Train a Chatbot with a Video

#openai #node #ai #javascript

Imagine building a chatbot that lets you upload a video, and then answers question about it in a conversational way?

For example, if you upload a recording of a zoom meeting:

What were my action items in this meeting?
What were the key metrics presented in the slide show?

This blog post will guide you through creating an AI-powered application that lets you upload a video, extracts knowledge from it, and then answers questions about it in a conversational way.

Initialize the Project

Start by creating a new directory for your project and navigate into it:

mkdir video_analyzer && cd video_analyzer

Backend Setup with Node.js

Initialize the backend: Create a new server directory:

mkdir server && cd server
npm init -y

Install Dependencies: Install the necessary Node.js packages for the backend:

npm install express cors multer ffmpeg-static fluent-ffmpeg openai sharp ssim.js tesseract.js

This installs Express for the server, along with other packages for video processing and interaction with the OpenAI API.

Frontend Setup with React and Vite

Go back to the root of the project:

cd ..

Initialize a new Vite project with React by running:

npm create vite@latest client -- --template react

Navigate to the generated client directory:

cd client

Install Dependencies: Inside the client directory, install the necessary dependencies by running:

npm install

This command installs React, Vite, and other necessary libraries defined in package.json.

Backend Development

Lets begin the development. We'll start with the backend.
Navigate to the server directory.

Setting up Express server (server.js): Initialize the server and define routes for uploading videos, processing them, and handling user queries.

const express = require("express");
const multer = require("multer");
const crypto = require("crypto");
const { Worker } = require("worker_threads");
const app = express();
const port = 5078;
const OpenAI = require("openai");
const openai = new OpenAI(
  ""
);
const cors = require("cors");

const videoData = new Map();
const upload = multer({
  storage: multer.diskStorage({
    destination: function (req, file, cb) {
      cb(null, "uploads/");
    },
    filename: function (req, file, cb) {
      cb(null, file.originalname);
    },
  }),
});

app.use(cors());
app.use(express.json());

// Upload the video
app.post("/upload", upload.single("video"), (req, res) => {
  const key = crypto.randomBytes(16).toString("hex");
  res.send(key);

  // Start a worker thread and send the path to the video and the key to the worker
  createJob(req.file.path, key);
});

// Polling processing status
app.get("/video/:key", (req, res) => {
  const key = req.params.key;
  const data = videoData.get(key);
  if (!data) {
    res.status(404).send("Video not found");
  } else {
    // At this point the UI will show the form.
    // Its a single textarea with a submit button
    // The label will say "What do you want to learn about this video?"
    res.send(data);
  }
});

// For answering questions
app.post("/completions/:key", async (req, res) => {
  const { question } = req.body;
  const knowledge = videoData.get(req.params.key);

  const systemMsg = `The following information describes a video.

  1. The transcript of the video, showing the start of each segment in seconds (as the key) and the text of the segment (as the value):
  ${JSON.stringify(knowledge.data.fromTrans, null, 2)}

  2. The result of OCR, which shows the start time of each detected word in the video as the key and the word as the value:
  ${JSON.stringify(knowledge.data.fromOCR[0], null, 2)}

  3. A description of the video:
  ${knowledge.data.fromOCR[1]}
  `;

  console.log(systemMsg);

  const completion = await openai.chat.completions.create({
    messages: [
      { role: "system", content: systemMsg },
      { role: "user", content: question },
    ],
    model: "gpt-4",
  });

  res.send(completion.choices[0].message.content);
});

app.listen(port, () => {
  console.log(`Server running at http://localhost:${port}`);
});

function createJob(videoPath, key) {
  console.log("Creating job for", videoPath);

  const worker = new Worker("./workers/processVideo.js", {
    workerData: { videoPath, key },
  });

  worker.on("message", (message) => {
    videoData.set(key, message);
  });

  worker.on("error", (error) => {
    console.error("Worker error:", error);
  });

  worker.on("exit", (code) => {
    if (code !== 0) {
      console.error("Worker stopped with exit code", code);
    }
  });
}

The code above uses the Express framework to create a web server that manages video uploads, processes these videos in the background, and allows users to query information about the videos and ask questions related to their content. Here's a breakdown of its main components and functionalities:

Setup and Middlewares: The application sets up an Express server, specifies a port number (5078), and initializes the OpenAI API client with an API key (left empty in the code snippet). It uses cors to enable Cross-Origin Resource Sharing and express.json() middleware to parse JSON request bodies.
Multer for File Uploads: multer is configured with disk storage, specifying the destination folder (uploads/) and using the original file name for the uploaded videos. This setup is used to handle video file uploads through the /upload endpoint.
Upload Endpoint (/upload): Handles video file uploads. It generates a unique key using crypto for the uploaded video, sends this key back to the client, and initiates a background job for processing the video by calling the createJob function with the video's file path and the generated key.
Polling Endpoint (/video/:key): Allows polling for the processing status of a video by its unique key. It looks up the video data in a Map object and returns the data if available or a 404 status if not found.
Question-Answering Endpoint (/completions/:key): Accepts questions about a video identified by a key. It retrieves the video's data, constructs a system message that includes video transcripts, OCR (Optical Character Recognition) results, and a description of the video, and then uses OpenAI's chat.completions endpoint to generate an answer based on this information.
Server Initialization: Starts the Express server, listening on the specified port, and logs a message indicating the server is running.
createJob Function: This function is responsible for initiating the processing of an uploaded video in a background thread. Here's a detailed explanation:
- Parameters: Takes two parameters: videoPath, which is the path to the uploaded video file, and key, a unique identifier generated for the video.
- Worker Threads: Creates a new worker thread from the ./workers/processVideo.js script, passing in videoPath and key as data to the worker. Worker threads allow CPU- intensive tasks (like video processing) to run in parallel to the main thread, preventing them from blocking the server's responsiveness.
- Worker Communication: Sets up listeners for the worker's message, error, and exit events:
  - message: When a message is received from the worker, indicating that video processing is complete, it stores the result in the videoData map using the video's key. This allows the server to later retrieve and return this data in response to client requests.
  - error: Logs any errors that occur within the worker thread.
  - exit: Checks the exit code of the worker thread; a non-zero code indicates that the worker exited due to an error, which is logged for debugging purposes.

This setup enables the server to efficiently handle video processing tasks in the background while remaining responsive to client requests.

Video Processing: The createJob function launches the processVideo.js worker, responsible for orchestrating the processing of the video to extract all the data. Here is the code for it:

const { parentPort, workerData, Worker } = require("worker_threads");
const fs = require("fs").promises;
const path = require("path");
const pathToFfmpeg = require("ffmpeg-static");
const ffmpeg = require("fluent-ffmpeg");
ffmpeg.setFfmpegPath(pathToFfmpeg);
const cpus = require("os").cpus().length;
const OpenAI = require("openai");
const { type } = require("os");
const openai = new OpenAI("");

async function main() {
  const { videoPath, key } = workerData;

  const [fromTrans, fromOCR] = await Promise.all([
    transcribe(videoPath),
    performOCR(videoPath),
  ]);

  parentPort.postMessage({ key, data: { fromTrans, fromOCR } });
}

async function transcribe(videoPath) {
  // Extract audio from video
  await extractAudio(videoPath, "audio/audio.mp3");
  const transcription = await transcribeAudio("audio/audio.mp3");

  let transcriptionOutput = {};
  for (let i = 0; i < transcription.segments.length; i++) {
    transcriptionOutput[transcription.segments[i].start] =
      transcription.segments[i].text;
  }

  return transcriptionOutput;
}

async function transcribeAudio(path) {
  const transcription = await openai.audio.transcriptions.create({
    file: require("fs").createReadStream(path),
    model: "whisper-1",
    response_format: "verbose_json",
  });
  return transcription;
}

function extractAudio(videoPath, audioOutputPath) {
  return new Promise((resolve, reject) => {
    ffmpeg(videoPath)
      .output(audioOutputPath)
      .audioCodec("libmp3lame") // Use MP3 codec
      .on("end", function () {
        console.log("Audio extraction complete.");
        resolve();
      })
      .on("error", function (err) {
        console.error("Error:", err);
        reject(err);
      })
      .run();
  });
}

async function performOCR(videoPath) {
  const vpath = path.resolve(__dirname, videoPath);
  const outputDirectory = path.resolve(__dirname, "../", "frames");

  // Convert Video to frames
  await new Promise((resolve, reject) => {
    ffmpeg(vpath)
      .outputOptions("-vf fps=1") // This sets the frame extraction rate to 1 frame per second. Adjust as needed.
      .output(`${outputDirectory}/frame-%03d.jpg`) // Output file name pattern
      .on("end", () => {
        console.log("Frame extraction is done");
        resolve();
      })
      .on("error", (err) => {
        console.error("An error occurred: " + err.message);
        reject(err);
      })
      .run();
  });

  // Calculate the SSIM for each pair of frames
  const directoryPath = path.join(__dirname, "../", "frames");
  let fileNames = await fs.readdir(directoryPath);

  fileNames.sort(
    (a, b) => parseInt(a.match(/\d+/), 10) - parseInt(b.match(/\d+/), 10)
  );

  const pairs = fileNames
    .slice(0, -1)
    .map((_, i) => [fileNames[i], fileNames[i + 1]]);

  const numCPUs = cpus;
  const workers = Array.from(
    { length: numCPUs },
    () => new Worker("./workers/ssim_worker.js")
  );

  // Distribute the SSIM work
  const segmentSize = Math.ceil(pairs.length / workers.length);
  const resultsPromises = workers.map((worker, index) => {
    const start = index * segmentSize;
    const end = start + segmentSize;
    const segment = pairs.slice(start, end);

    worker.postMessage(segment);

    return new Promise((resolve, reject) => {
      worker.on("message", resolve);
      worker.on("error", reject);
    });
  });

  const SIMMresults = await Promise.all(resultsPromises);
  const indexes = determineStableFrames(SIMMresults.flat());
  const stableFramesPaths = getPaths(indexes, directoryPath);

  // Terminate SSIM workers
  workers.forEach((worker) => worker.terminate());

  // Perform OCR and cleanup
  const cpuCount = cpus;
  const chunkSize = Math.ceil(stableFramesPaths.length / cpuCount);
  const ocrPromises = [];
  const ocrWorkers = [];

  for (let i = 0; i < cpuCount; i++) {
    const start = i * chunkSize;
    const end = start + chunkSize;
    const imagesChunk = stableFramesPaths.slice(start, end);
    const worker = new Worker("./workers/ocrWorker.js", {
      workerData: { images: imagesChunk },
    });
    ocrWorkers.push(worker);
    ocrPromises.push(
      new Promise((resolve, reject) => {
        worker.on("message", resolve);
        worker.on("error", reject);
        worker.on("exit", (code) => {
          if (code !== 0)
            reject(new Error(`Worker stopped with exit code ${code}`));
        });
      })
    );
  }

  const images = stableFramesPaths.map((path) => ({
    type: "image_url",
    image_url: {
      url: encodeImage(path.filePath),
    },
  }));
  const visionAnnalysis = openai.chat.completions.create({
    model: "gpt-4-vision-preview",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Describe these images. These images were extracted from a video, what is the video about?",
          },
          ...images,
        ],
      },
    ],
  });

  const [visionResults, ocrResults] = await Promise.all([
    visionAnnalysis,
    Promise.all(ocrPromises).then((results) => results.flat()),
  ]).catch((err) => {
    console.error("An error occurred:", err);
  });

  let OCRtext = "";
  for (let i = 0; i < ocrResults.length; i++) {
    OCRtext += `Location: ${ocrResults[i].location}s - ${ocrResults[i].text}\n`;
  }

  // Terminate OCR workers
  ocrWorkers.forEach((worker) => worker.terminate());

  // Cleanup the results with GPT-4
  const cleanSegments = await openai.chat.completions.create({
    messages: [
      {
        role: "system",
        content:
          "You are a helpful assistant tasked with cleaning up the results of an OCR (Optical Character Recognition) operation.",
      },
      {
        role: "user",
        content:
          "Below is the output of an OCR operation on a set of images." +
          'It contains the "Location" in seconds, followed by a "-", followed by the extracted text.' +
          "Please clean up the results keeping the Location and the \"-\". expected output: 'Location 60s - [Cleaned up text goes here]'. Discard all duplicates." +
          "\n" +
          OCRtext +
          ".\n" +
          'Return your response as a JSON object like this: { "2" : "Segment1", "8" : "Segment2" }, where the keys are the "Location" and the values are the cleaned up text segments. Please remove all duplicated segments.',
      },
    ],
    model: "gpt-4-turbo-preview",
    temperature: 0.5,
    response_format: { type: "json_object" },
  });

  let output;
  try {
    output = JSON.parse(cleanSegments.choices[0].message.content);
  } catch (err) {
    console.error("An error occurred parsing the response:", err);
  }

  return [output, visionResults.choices[0].message.content];
}

main();

// Function to determine stable frames based on SSIM results
function determineStableFrames(ssimResults) {
  let indices = [];
  let pushed = false;

  for (let i = 0; i < ssimResults.length; i++) {
    if (ssimResults[i] > 0.98 && !pushed) {
      indices.push(i);
      pushed = true;
    } else if (ssimResults[i] < 0.94 && pushed) {
      pushed = false;
    }
  }

  return indices;
}

function getPaths(indices, framesDir) {
  return indices.map((index) => {
    // Frame filenames are 1-indexed and follow the pattern 'frame-XXX.jpg'
    const frameNumber = (index + 1).toString().padStart(3, "0");
    const filename = `frame-${frameNumber}.jpg`;
    const filePath = path.join(framesDir, filename);

    // Here we simply return the filePath, but you can modify this part to actually read the file
    // For example, using fs.readFileSync(filePath) to load the image data
    return {
      index,
      filePath,
    };
  });
}

function encodeImage(filePath) {
  const image = require("fs").readFileSync(filePath);
  const base64Image = Buffer.from(image).toString("base64");
  return `data:image/jpeg;base64,${base64Image}`;
}

processVideo.js is the core of the video processing workflow. It performs several key tasks using worker threads to take advantage of multi-core CPUs for parallel processing. The main steps include:

Transcription of Audio: Extracts audio from the video file and then uses the OpenAI API to transcribe the audio into text. This is done by first converting the video to an audio file and then transcribing the audio to understand what is being said in the video.
Optical Character Recognition (OCR): This involves analyzing video frames to extract textual information present in them. The process includes:
Converting the video into a series of frames (images) at a specific frame rate.
Using SSIM (Structural Similarity Index Measure) to determine stable frames that don't have significant changes between them, aiming to reduce the number of frames to process and focus on those that likely contain new information.
Performing OCR on these selected frames to extract any text they contain.
Using OpenAI's GPT model to generate a descriptive analysis of the content based on the OCR results and the visual content of the frames.
Integration and Cleanup: The results from both the transcription and OCR are then cleaned up and integrated. This might involve removing duplicates, correcting errors, and formatting the data for further use, such as feeding it into an AI model for generating summaries or insights.
Communication with Main Thread: Throughout the process, the worker communicates with the main thread, sending back the processed data. This data can then be used in a web application, for instance, to display the video's contents in text form or to answer questions about the video content.

SSIM Calculatiom: The ssim_worker.js is responsible for calculating the similarity between consecutive frames to detech major changes in the video, such a slide change in a screenshare presentation.

const { parentPort } = require("worker_threads");
const path = require("path");
const sharp = require("sharp");
const { ssim } = require("ssim.js");

// Listen for messages from the parent process
parentPort.on("message", async (pairs) => {
  const ssimResults = [];

  for (let i = 0; i < pairs.length; i++) {
    const path1 = path.join(__dirname, "../frames", pairs[i][0]);
    const path2 = path.join(__dirname, "../frames", pairs[i][1]);
    const img1 = await sharp(path1).raw().ensureAlpha().toBuffer();
    const img2 = await sharp(path2).raw().ensureAlpha().toBuffer();

    const metadata1 = await sharp(path1).metadata();

    const image1Data = {
      width: metadata1.width,
      height: metadata1.height,
      data: img1,
    };

    const image2Data = {
      width: metadata1.width, // Assuming both images have the same dimensions which is the case in video frames
      height: metadata1.height,
      data: img2,
    };

    // store the result in an array
    ssimResults.push(ssim(image1Data, image2Data).mssim);
  }

  // Send the results back to the parent process
  parentPort.postMessage(ssimResults);
});

The ssim_worker.js script is a specialized worker used in the process of analyzing video frames for stability. It utilizes the Structural Similarity Index (SSIM) algorithm to compare pairs of sequential video frames to determine their similarity. A high SSIM value indicates that two frames are very similar, while a low value indicates differences. By identifying frames with high similarity, the script helps in selecting "stable" frames that do not have significant changes between them, effectively reducing the number of frames that need to be processed by OCR, saving computational resources, and potentially improving the OCR accuracy by focusing on frames where new visual information is present.

OCR Calculation: The ocrWorker.js is responsible for performing Optical Character Recognition to the stable frames of the video, in order to recognize text.

const { parentPort, workerData } = require("worker_threads");
const { createWorker } = require("tesseract.js");

async function main() {
  const { images } = workerData;

  const worker = await createWorker("eng");

  const results = [];
  for (const img of images) {
    const {
      data: { text },
    } = await worker.recognize(img.filePath);
    results.push({
      text,
      location: img.index,
    });
  }

  await worker.terminate();
  return results;
}

main().then((results) => {
  parentPort.postMessage(results);
});

The ocrWorker.js script handles the OCR part of the workflow. It takes images (frames extracted from the video) and uses Tesseract.js, a JavaScript OCR library, to recognize and extract text from these images. This is crucial for understanding any textual content that appears in the video, such as signs, subtitles, or any other information in text form within the video. The results from this worker can then be used for various purposes, including accessibility features, content analysis, and feeding into AI models for further processing or generating metadata about the video.

Storing the extracted data

The extracted knowledge from the video processing workflow, including transcriptions, vision, and OCR results, is stored in a JavaScript Map object within the server.js file. This Map is used as an in-memory storage mechanism to associate each processed video with its corresponding extracted data. The key to this Map is a unique identifier (a randomly generated hexadecimal string) for each video, and the value is the processed data resulting from the video analysis.

Storing Data in the Map

When a video is uploaded and processed, the following steps are taken to store the data in the Map:

Upon uploading a video through the /upload endpoint, a unique key for the video is generated using crypto.randomBytes(16).toString("hex"). This key serves as a unique identifier for each video upload session.

The video is then processed (audio transcription and OCR on the video frames), and the results of this processing are encapsulated in a message sent back from the worker thread responsible for the video processing (processVideo.js).

Once the main thread receives this message, it updates the Map with the key as the video's unique identifier and the value as the processed data (the message content), effectively storing the video's extracted knowledge.

Using the Map in the /completions Endpoint

The stored data in the Map is used in the /completions/:key endpoint to generate answers to user queries based on the extracted video knowledge. Here's how it works:

When a request is made to the /completions/:key endpoint with a specific video's key, the server retrieves the video's processed data from the Map using the key provided in the URL.

This data includes the transcribed text from the video's audio, the text extracted via OCR from the video's frames, and a visual description of the video intended to represent the video's content accurately.

The server then constructs a prompt for the OpenAI API, including the extracted video knowledge and the user's query. This prompt is designed to provide the AI with enough context about the video to generate a relevant and informed response to the query.

The OpenAI API processes this prompt and returns a completion, which is then sent back to the client as the response to their query. This completion aims to be an insightful answer derived from the video's content, providing value to the user by leveraging the extracted knowledge.

Note: The use of a Map for storing the extracted knowledge allows for fast and efficient retrieval of video data for a prototype like this one, but it is not a good solution for a production app.

Frontend Development

Navigate to the client directory. Your Vite project should already have an App.jsx file, replace it with:

Components

App Component (src/App.jsx): This component manages the application state, including video uploading, processing, and form display logic.

import { useEffect, useState } from "react";
import "./App.css";
import { useMemo } from "react";

const states = {
  1: "Video Upload",
  2: "Video Processing",
  3: "Ask me",
};

function App() {
  const [isProcessing, setIsProcessing] = useState(false); // Stores the key of the video being processed
  const [isReady, setIsReady] = useState(false);

  useEffect(() => {
    let intervalId;
    if (isProcessing) {
      intervalId = setInterval(() => {
        console.log("Polling for completion...", isProcessing);

        fetch(`http://localhost:5078/video/${isProcessing}`)
          .then((response) => {
            if (response.ok) {
              console.log("Setting isReady to", isProcessing);

              setIsReady(isProcessing);
              setIsProcessing(false);
            }
          })
          .catch((error) => {
            console.error("Error polling for completion:", error);
          });
      }, 3000);
    }

    return () => {
      clearInterval(intervalId);
    };
  }, [isProcessing]);

  const hideUploadWidget = useMemo(
    () => isProcessing || isReady,
    [isProcessing, isReady]
  );

  const title = useMemo(() => {
    if (isProcessing) {
      return states[2];
    } else if (isReady) {
      return states[3];
    } else {
      return states[1];
    }
  }, [isProcessing, isReady]);

  const hideSpinner = useMemo(() => !isProcessing, [isProcessing]);
  const hideForm = useMemo(() => !isReady, [isReady]);

  return (
    <div>
      <h1>{title}</h1>
      <VideoUpload
        hidden={hideUploadWidget}
        onUpload={(key) => setIsProcessing(key)}
      />
      <Spinner hidden={hideSpinner} />
      <Form hidden={hideForm} videoKey={isReady} />
    </div>
  );
}

export default App;

This component, is designed to manage the state and presentation of the video processing workflow.

State Management

The component manages several pieces of state:

isProcessing: Indicates whether a video is currently being processed. When a video is processing, this variable holds the key of the video being processed.

isReady: Similar to isProcessing, this is intended to store the key of a video that has finished processing, indicating the video is ready.

Effect Hook

This hook runs when the isProcessing state changes. If a video is being processed (isProcessing is truthy), it starts an interval that polls a local server every 3 seconds to check if the video processing is complete. When the server responds positively (response.ok), it:

Logs the completion and sets isReady to the key of the video that was being processed, indicating the video is ready.
Sets isProcessing to false, indicating that no video is currently being processed.
The interval is cleared when the component unmounts or isProcessing changes, preventing memory leaks and unnecessary requests.

Memoization

The component uses useMemo to derive values based on the state, optimizing performance by avoiding unnecessary recalculations:

hideUploadWidget: Determines if the upload widget should be hidden. It's true if a video is either being processed or is ready.
title: Displays the current state of the application based on isProcessing and isReady, using a states object for readable state titles.
hideSpinner: Controls the visibility of a spinner, which is hidden unless a video is processing.
hideForm: Controls the visibility of a form, which is hidden until a video is ready.

Render Method

The component renders a div containing:

An h1 element displaying the current title.
A VideoUpload component that is conditionally hidden based on hideUploadWidget and sets isProcessing upon upload.
A Spinner component that is conditionally hidden based on hideSpinner.
A Form component that is conditionally hidden based on hideForm and is passed the video key

Video Upload Component: Handles video file selection and upload.

function VideoUpload({ hidden, onUpload }) {
  const [video, setVideo] = useState(null);

  // Function to handle video file selection
  const handleVideoChange = (event) => {
    setVideo(event.target.files[0]);
  };

  // Function to handle video upload (example to a server)
  const handleUpload = async () => {
    if (!video) {
      alert("Please select a video file first.");
      return;
    }

    const formData = new FormData();
    formData.append("video", video);

    // Example POST request to an API endpoint
    try {
      const response = await fetch("http://localhost:5078/upload", {
        method: "POST",
        body: formData,
      });

      if (response.ok) {
        onUpload && onUpload(await response.text());
      } else {
        alert("Failed to upload video.");
      }
    } catch (error) {
      console.error("Error during upload:", error);
    }
  };

  return (
    <div style={{ display: hidden ? "none" : "block" }}>
      <input type="file" accept="video/*" onChange={handleVideoChange} />
      <button onClick={handleUpload}>Upload Video</button>
    </div>
  );
}

Form Component: Allows users to submit questions about the processed video.

function Form({ hidden, videoKey }) {
  const [question, setQuestion] = useState("");

  const handleQuestionChange = (event) => {
    setQuestion(event.target.value);
  };

  const handleSubmit = async () => {
    try {
      const response = await fetch(
        `http://localhost:5078/completions/${videoKey}`,
        {
          method: "POST",
          body: JSON.stringify({ question }),
          headers: {
            "Content-Type": "application/json",
          },
        }
      );

      if (response.ok) {
        // Handle successful response
        const answer = await response.text();
        console.log("Answer:", answer);
      } else {
        // Handle error response
        console.error(await response.text());
      }
    } catch (error) {
      console.error("Error during fetch:", error);
    }
  };

  return (
    <div className="ask-form" style={hidden ? { display: "none" } : null}>
      <textarea
        placeholder="Ask me anything about the video"
        value={question}
        onChange={handleQuestionChange}
      ></textarea>
      <button onClick={handleSubmit}>Submit</button>
    </div>
  );
}

Spinner Component: Provides visual feedback during video processing.

function Spinner({ hidden }) {
  return (
    <div style={{ display: hidden ? "none" : "block" }} className="lds-roller">
      <div></div>
      <div></div>
      <div></div>
      <div></div>
      <div></div>
      <div></div>
      <div></div>
      <div></div>
    </div>
  );
}

Styling

Apply CSS styles in src/index.css to make your app look neat and professional.

:root {
  font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif;
  line-height: 1.5;
  font-weight: 400;

  color-scheme: light dark;
  color: rgba(255, 255, 255, 0.87);
  background-color: #242424;

  font-synthesis: none;
  text-rendering: optimizeLegibility;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

body {
  margin: 0;
  display: flex;
  place-items: center;
  min-width: 320px;
  min-height: 100vh;
}

#root {
  max-width: 1280px;
  margin: 0 auto;
  padding: 2rem;
  text-align: center;
}

h1 {
  font-size: 3.2em;
  line-height: 1.1;
}

button {
  border-radius: 8px;
  border: 1px solid transparent;
  padding: 0.6em 1.2em;
  font-size: 1em;
  font-weight: 500;
  font-family: inherit;
  background-color: #1a1a1a;
  cursor: pointer;
  transition: border-color 0.25s;
}
button:hover {
  border-color: #646cff;
}
button:focus,
button:focus-visible {
  outline: 4px auto -webkit-focus-ring-color;
}

@media (prefers-color-scheme: light) {
  :root {
    color: #213547;
    background-color: #ffffff;
  }
  a:hover {
    color: #747bff;
  }
  button {
    background-color: #f9f9f9;
  }
}

.lds-roller {
  display: inline-block;
  position: relative;
  width: 80px;
  height: 80px;
  margin: 0 auto;
}
.lds-roller div {
  animation: lds-roller 1.2s cubic-bezier(0.5, 0, 0.5, 1) infinite;
  transform-origin: 40px 40px;
}
.lds-roller div:after {
  content: " ";
  display: block;
  position: absolute;
  width: 7px;
  height: 7px;
  border-radius: 50%;
  background: rgb(204, 25, 183);
  margin: -4px 0 0 -4px;
}
.lds-roller div:nth-child(1) {
  animation-delay: -0.036s;
}
.lds-roller div:nth-child(1):after {
  top: 63px;
  left: 63px;
}
.lds-roller div:nth-child(2) {
  animation-delay: -0.072s;
}
.lds-roller div:nth-child(2):after {
  top: 68px;
  left: 56px;
}
.lds-roller div:nth-child(3) {
  animation-delay: -0.108s;
}
.lds-roller div:nth-child(3):after {
  top: 71px;
  left: 48px;
}
.lds-roller div:nth-child(4) {
  animation-delay: -0.144s;
}
.lds-roller div:nth-child(4):after {
  top: 72px;
  left: 40px;
}
.lds-roller div:nth-child(5) {
  animation-delay: -0.18s;
}
.lds-roller div:nth-child(5):after {
  top: 71px;
  left: 32px;
}
.lds-roller div:nth-child(6) {
  animation-delay: -0.216s;
}
.lds-roller div:nth-child(6):after {
  top: 68px;
  left: 24px;
}
.lds-roller div:nth-child(7) {
  animation-delay: -0.252s;
}
.lds-roller div:nth-child(7):after {
  top: 63px;
  left: 17px;
}
.lds-roller div:nth-child(8) {
  animation-delay: -0.288s;
}
.lds-roller div:nth-child(8):after {
  top: 56px;
  left: 12px;
}
@keyframes lds-roller {
  0% {
    transform: rotate(0deg);
  }
  100% {
    transform: rotate(360deg);
  }
}

.ask-form {
  display: flex;
  justify-content: space-between;
}

.ask-form textarea {
  padding: 0.6em 1.2em;
  border-radius: 8px;
  border: 1px solid transparent;
  font-size: 1em;
  font-weight: 500;
  font-family: inherit;
  background-color: #1a1a1a;
  color: #fff;
  resize: none;
  transition: border-color 0.25s;
}

Running Your Application

Start the backend server:

cd server
npm start

Run the frontend application:

Open a new terminal, navigate to the client directory, and start the Vite server:

cd client
npm run dev

Conclusion

You've just built a full-stack video analyzer app with React, Vite, and Node.js! This app not only processes videos but also leverages AI to allow users to interact with the processed content in a meaningful way. Remember, this guide is a starting point. You can extend the application with more features like better error handling, a more complex UI and more!

DEV Community