DEV Community

Cover image for Trend Chat
Halleluyah
Halleluyah

Posted on

Trend Chat

This is a submission for the Bright Data Web Scraping Challenge: Most Creative Use of Web Data for AI Models

Trend Chat: Track, Analyze, and Chat with Trends from Niche Online Communities. Get Actionable Insights Powered by BrightData and AI.

Trend Chat is a powerful tool that enables users to track, analyze, and interact with trends from various online communities. Whether you're a business owner, marketer, or enthusiast looking to stay ahead of the curve, Trend Chat provides real-time insights into the conversations that matter most to your industry.

By leveraging BrightData for web scraping and AI for generative analysis, Trend Chat allows users to gather relevant data, analyze it for actionable insights, and chat with these trends to make informed decisions.


What I Built

Trend Chat is designed to solve the problem of information overload in today’s fast-paced digital world. It empowers users to track trends across niche online communities, scrape data from various websites, analyze that data to uncover key insights, and interact with those insights via a simple chat interface.

Key Features:

  • Web Scraping: Automatically gather data from websites, primarily Reddit, based on user requests, leveraging BrightData Web Scraping Browser for huge dynamic websites like reddit.
  • Data Analysis: Analyze scraped data to uncover patterns, trends, and actionable insights based off named entity extraction for commonalities in a season
  • Chat Interface: Chat with the collected insights, allowing users to ask questions and get responses based on the data.
  • AI-Powered Insights: Use AI models to generate real-time analysis, recommendations, and insights from trends.
  • User Authentication: Secure user logins and data access through Supabase Authentication.

With Trend Chat, users can easily stay on top of emerging trends, spot potential business opportunities, and make data-driven decisions with confidence.


Demo

You can explore the live demo of Trend Chat at:

Trend Chat Demo

Screenshots:

  1. Hero Section View:

    Hero View

  2. Dashboard View:

    Dashboard View

  3. Chat Interface:

    Chat Interface


How I Used Bright Data

Bright Data plays a crucial role in Trend Chat by providing the web scraping infrastructure that powers the collection of data from Reddit. Using the Bright Data Scraping Browser, I can:

  1. Gather Real-Time Data: Bright Data allows me to scrape Reddit data based on specific user inputs, ensuring that the data is always up-to-date.
  2. Access Diverse Data: Scrape content from subreddits, including forum posts, comments, and articles, based on a wide range of topics and keywords.
  3. Customize Scraping: The flexibility of Bright Data enables me to tailor the scraping process to suit specific needs, such as collecting posts and discussions from specific subreddits related to emerging trends.

This integration with Bright Data is what enables Trend Chat to provide real-time, dynamic insights from niche online communities.

Sample Code for Scraping Reddit Data using Bright Data API (Typescript)

// Function to generate the Reddit search URL based on the keyword
function generateRedditSearchUrl(keyword: string): string {
    const encodedKeyword = encodeURIComponent(keyword);
    return `https://www.reddit.com/search/?q=${encodedKeyword}`;
}

// Function to scrape Reddit posts based on a search URL
async function scrapeRedditPosts(searchUrl: string) {
    const browser = await puppeteer.connect({
        browserWSEndpoint: "puppeteer ndpoint from your scraping browser",
    });

    console.log("Connected to browser...");
    const page = await browser.newPage();
    await page.goto(searchUrl, { waitUntil: 'domcontentloaded' });
    console.log("Navigated to Reddit search page");

    await page.waitForSelector('div[data-testid="post-container"]', { timeout: 30000 });
    const posts = await extractPostData(page);

    await browser.close();

    return posts;
}

// Function to extract post data from the Reddit search results page
async function extractPostData(page: puppeteer.Page) {
    return await page.evaluate(() => {
        const postElements = document.querySelectorAll('div[data-testid="post-container"]');
        const posts = [];

        postElements.forEach((postElement: Element) => {
            const title = postElement.querySelector('h3')?.innerText;
            const author = postElement.querySelector('[data-testid="post_author_link"]')?.innerText;
            const upvotes = postElement.querySelector('[data-click-id="upvote"]')?.textContent;
            const comments = postElement.querySelector('[data-click-id="comments"]')?.textContent;

            posts.push({ title, author, upvotes, comments });
        });

        return posts;
    });
}
Enter fullscreen mode Exit fullscreen mode

AI for Named Entity Extraction

To extract valuable information from the scraped posts, Trend Chat uses AI models powered by the Transformers library for Named Entity Recognition (NER). This allows us to identify and extract key entities such as people, places, organizations, or any other relevant keywords from the data.

We use a pre-trained NER model from the Hugging Face Transformers library to analyze Reddit posts and identify entities like product names, trending topics, and more.

Sample Code for Named Entity Recognition (NER) using Transformers (Typescript)

import { pipeline } from 'transformers';  // Importing the HuggingFace pipeline for NER

const nerModel = pipeline('ner', 'dbmdz/bert-large-cased-finetuned-conll03-english');  // Pre-trained NER model

const extractEntities = async (text: string) => {
  const entities = await nerModel(text);
  return entities;
};

const sampleText = "I love programming in Python and recently explored Next.js for building dynamic web apps!";
extractEntities(sampleText).then(entities => {
  console.log('Extracted Entities:', entities);
}).catch(error => console.error('Error extracting entities:', error));
Enter fullscreen mode Exit fullscreen mode

The extracted entities can be further analyzed to find trends or insights such as frequent mentions of specific topics, brand names, or even upcoming technologies.

Combining the Two

async function main() {
    const prompt = "Tell me about recent discussions on code editors and their features, especially AI-powered editors";  // Example prompt

    // Step 1: Extract keywords
    const keywords = await extractEntities(prompt);
    console.log("Extracted Keywords: ", keywords);

    // Step 2: Search Reddit for the extracted keyword(s)
    for (const keyword of keywords) {
        const searchUrl = generateRedditSearchUrl(keyword);
        console.log(`Searching for "${keyword}" on Reddit...`);

        // Step 3: Scrape the posts for the keyword
        const posts = await scrapeRedditPosts(searchUrl);
        console.log(`Found ${posts.length} posts related to "${keyword}":`);
        console.log(posts);
    }
}

// Run the main function
main().catch(error => {
    console.error("Error running the pipeline:", error);
});
Enter fullscreen mode Exit fullscreen mode

Named Entity Extraction for the Posts

import { pipeline } from '@huggingface/transformers';

// Example of Reddit post data (this would be scraped using BrightData)
const redditPosts = [
  "Tesla's stock price surged after the announcement of their new electric car model in Berlin.",
  "Apple released new MacBook Pro models with improved performance and battery life."
];

// Initialize the NER pipeline
async function performNamedEntityRecognition(posts: string[]) {
  try {
    // Use Hugging Face's NER pipeline with a pre-trained model
    const nlp = await pipeline('ner', 'dbmdz/bert-large-cased-finetuned-conll03-english');

    // Loop through each Reddit post and extract named entities
    const results = await Promise.all(
      posts.map(async (post) => {
        const entities = await nlp(post);
        return { post, entities };
      })
    );

    // Log the results
    results.forEach((result) => {
      console.log(`Post: ${result.post}`);
      console.log("Entities:", result.entities);
      console.log("-------------");
    });

  } catch (error) {
    console.error("Error during NER:", error);
  }
}

// Run NER on Reddit posts
performNamedEntityRecognition(redditPosts);
Enter fullscreen mode Exit fullscreen mode

Tech Stack

Frontend:

  • Next.js: Used to build a dynamic and responsive frontend for displaying insights and interacting with trends.
  • React: Leveraged to manage UI components and handle user interactions in real-time.
  • Tailwind CSS: Used for fast, responsive styling.

Backend:

  • Supabase: Used for user authentication and database storage. Supabase simplifies the process of managing users and storing data.
  • AI Models (Gemini API and transformers library models): Used for generative AI capabilities, helping analyze scraped data and generate insights or recommendations based on trends.
  • Typescript: The entire application is built using Typescript for better type safety and code maintainability.

Scraping:

  • BrightData Web Scraping API: Powers the web scraping process by collecting data from Reddit and other online sources.
  • Reddit: The primary platform being scraped for posts, comments, and discussions around emerging trends.

Authentication:

  • Supabase Authentication: Secure login and user management for authenticated access to scraped data and insights.

How It Works

  1. User Authentication:

    Users log in via Supabase Authentication. Once authenticated, they can access the platform’s features, including scraping data and analyzing trends.

  2. Data Scraping:

    Using BrightData Web Scraping API, Trend Chat scrapes data from Reddit based on the user’s request. The scraped data can include posts, comments, and discussions from various subreddits.

  3. Data Storage & Analysis:

    The scraped data is stored in Supabase (using Supabase Database for efficient storage), allowing it to be queried and analyzed. AI models are then used to analyze this data, extracting trends, sentiments, and insights that can help businesses make data-driven decisions.

  4. Chatting with Insights:

    Users can interact with the platform through a chat interface, asking questions about the scraped data. The system generates AI-powered responses using Gemini API based on the data collected, helping users understand the trends and insights more clearly.

  5. Generate Insights:

    Using AI models, the system processes the collected data, generating insights such as sentiment analysis, keyword trends, or the most discussed topics within a subreddit. These insights are presented to users in an easy-to-understand format.

  6. Take Action:

    Users can use the insights provided to make informed decisions, whether for product development, marketing strategies, or community engagement. They can track ongoing trends or discover new opportunities by continuing to chat with the system.


Why Trend Chat?

In today’s digital world, understanding and tracking trends across multiple communities can be difficult and time-consuming. Trend Chat simplifies this process by providing a one-stop solution for scraping, analyzing, and interacting with real-time insights.

  • Stay Informed: Get the latest updates on trends across different topics.
  • Make Data-Driven Decisions: Use actionable insights to inform business strategies.
  • Streamlined Process: Automate the process of collecting and analyzing data, saving you time and effort.
  • Customizable Insights: Tailor the insights to your specific needs and business goals.

Trend Chat is the perfect tool for businesses, marketers, and enthusiasts looking to stay ahead of the competition and make informed decisions based on real-time data.


Future Improvements

Here are some ideas for future improvements to Trend Chat:

  1. Expand Data Sources: Integrate additional scraping sources and APIs to expand the variety of data available.
  2. Advanced AI Capabilities: Enhance the AI model for deeper insights, such as predictive trends and sentiment forecasting.
  3. Real-Time Notifications: Add push notifications to alert users about emerging trends or important updates.
  4. Integration with Social Media: Allow users to scrape and analyze social media trends more easily.

Why I qualify for more credits

  1. Innovative Data Collection: My project leverages advanced scraping techniques to gather real-time Reddit posts, enabling a unique approach to identifying and analyzing trends for marketing insights, which demonstrates a clear need for reliable and efficient data extraction.

  2. Scalable Application: The use of automated pipelines for Named Entity Recognition (NER) on large volumes of Reddit posts directly supports the scalability of my project, highlighting the need for expanded resources to handle growing datasets.

  3. Impact on Small Businesses: My project helps small businesses gain valuable market insights, ultimately contributing to their digital transformation. This aligns with Bright Data's mission of supporting innovative, impactful solutions with access to quality data.


Conclusion

Trend Chat offers a unique combination of web scraping, data analysis, and AI-powered insights to help businesses and individuals stay on top of emerging trends. By leveraging BrightData for scraping and AI models for insights, Trend Chat provides real-time, actionable information that helps users make better, data-driven decisions. Whether you're tracking trends, analyzing community conversations, or generating business strategies, Trend Chat is the tool you need to keep ahead of the curve.


Thanks for Reading!

Thank you for checking out Trend Chat! I hope this tool helps you track, analyze, and interact with trends more effectively.


Links:

Top comments (2)

Collapse
 
shafayeat profile image
Shafayet Hossain

Can't confirm my Signup...Why is it linked to Localhost??
Image description

Collapse
 
hallelx2 profile image
Halleluyah

I will adjust that from my supabase page...