DEV Community

Cover image for The Daily Gist: Never miss the best of Reddit again. Powered by Bright Data.
Dhanush Reddy
Dhanush Reddy Subscriber

Posted on

The Daily Gist: Never miss the best of Reddit again. Powered by Bright Data.

n8n and Bright Challenge: Unstoppable Workflow

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

What I Built

The Daily Gist is an n8n workflow that creates a 10- to 15-minute video summary of a subreddit's top posts. To do this, it uses the subreddit's RSS feed.

The following video is an example generated by my n8n workflow for the subreddit r/ArtificialInteligence

Demo

n8n Workflow

Github Gist

Technical Implementation

This workflow is a scheduled job, built with n8n, that creates a video summary of the top posts from a chosen subreddit everyday.

The workflow operates in several distinct stages:

n8n Workflow Part 1

n8n Workflow Part 2

n8n Workflow Part 3

  1. Data Ingestion: The process begins by ingesting post URLs from the subreddit's RSS feed.
  2. Content Scraping & Ranking: These URLs are passed to a Bright Data node, which scrapes the content of each post. The workflow then filters and ranks the posts, selecting the top 15 based on a combination of upvotes and comments.
  3. AI-Powered Content Generation: For each of the top posts, the system utilizes two distinct AI models from Gemini:
  4. Video Synthesis: With a unique image and audio track for each post, an ffmpeg command is executed to sequence and merge these elements into a final, consolidated MP4 video.

In my case, the completed video is saved to a shared storage volume accessible by a Jellyfin media server running in a parallel container. This setup allows me to seamlessly stream and watch my personalized Reddit summary on any of my devices. While this is great for personal viewing, the workflow can be easily extended using n8n's built-in integrations to automatically upload the video to YouTube, send it via Telegram, or distribute it to virtually any other platform.

Bright Data Verified Node

The core data extraction was handled by Bright Data's Web Scraper node. The workflow begins by ingesting URLs from the RSS feed of the r/ArtificialInteligence subreddit. These URLs are then processed in a batch by the Bright Data node.

Scraping websites like Reddit poses a significant challenge. The content is loaded dynamically, and the HTML structure is complex, making traditional scraping with simple CSS selectors unreliable and difficult to maintain.

This is where Bright Data made it very easy. The node effortlessly extracted key information such as post titles, upvote counts, authors, and replies. The use of Bright Data node was critical, as it provides a managed solution that guarantees a structured response, eliminating the need to build and maintain a complex, fragile custom parser for Reddit's front-end.

Journey

Coming from a coding background, I was eager to see what n8n could do. The transition to n8n's visual workflow builder was seamless, and I was able to get up to speed very quickly.

The main technical challenge I encountered was a missing dependency in the standard deployment. The default self-hosted n8n image lacks FFmpeg, which was essential for my video automation workflow. I resolved this by building and publishing my own custom n8n image with FFmpeg included, which is now available for the community on GitHub: dhanushreddy291/n8n-with-ffmpeg.

This experience was incredibly valuable. It taught me that n8n's true power is the incredible speed at which you can build and deploy. Thanks to the extensive library of integrations, complex tasks that would normally require significant coding like creating custom APIs or setting up cron jobs, can be accomplished in a matter of minutes. This ability to go from idea to a functional workflow so quickly has made it my go to solution from now on.

This submission was made by Dhanush Reddy

Top comments (2)

Collapse
 
ivis1 profile image
Ivan Isaac

Awesome project! Could you clarify how you pick the top 15—what exact formula do you use to combine upvotes and comments (any weights or time decay), and are those counts taken from the RSS feed or from the Bright Data scrape?

Collapse
 
dhanushreddy29 profile image
Dhanush Reddy • Edited

it was a simple sort using code node (in js)
sort by upvotes first followed by comments:

return [...allItems]
  .sort((a, b) => b.num_upvotes - a.num_upvotes || b.num_comments - a.num_comments)
  .slice(0, 15);
Enter fullscreen mode Exit fullscreen mode

Even though 15 are picked some fail because Gemini wont generate image at all for them (they are skipped during video generation command)

They are taken from brightdata scrape as the rss node is just for getting url's