Bryan Doss

Posted on Dec 15

Using Bright Data and OpenAI to Auto-Generate TLDR-Style Newsletters

#devchallenge #brightdatachallenge #api #webdev

This is a submission for the Bright Data Web Scraping Challenge: Build a Web Scraper API to Solve Business Problems

What I Built

I really enjoy the daily TLDR newsletters, but I often find it lacking when it comes to more niche news topics. The Bright Data web scraping challenge seemed like a good opportunity for me to tackle this problem.

The end result was a daily TLDR-style newsletter, featuring topics of my choice, scraped autonomously and written entirely by AI, delivered to my email inbox every morning.

This might also qualify under the "Most Creative Use of Web Data for AI Models." However, since I am not really training or fine-tuning an LLM I think the primary submission prompt "Build a Web Scraper API to Solve Business Problems" is most appropriate.

Demo

Full code can be found on GitHub.

Below is example output: a "niche" newsletter that was in my inbox at 8:30am this morning.

I think this qualifies as niche enough...

How I Used Bright Data

I made use of the Web Scraper API endpoints - specifically, the Reddit Post and Google News datasets.

The Idea

The idea is simple:

Find hot news posts on subreddit(s) of interest
Summarize the reddit comments (since I can't form an opinion on my own /s)
Collect links from other news outlets for diverse viewpoints
Send pretty email with links, discussion summaries, and emojis for that sweet, sweet 🤌 retention 🤌
Schedule steps 1-4 to run every day before I wake up so my newsletter is ready for my morning brew

APIs and Data Sources

We could signup for a Reddit developer account and fetch the extra news article links from various APIs, but using Bright Data means we don't have to go to multiple places for the data we want.

Besides the content itself, we will also need a way to summarize the reddit discussions and a way to format the email. For this I chose OpenAI's gpt-4o-mini.

Finally, we'll want to send this to ourselves as an email. I decided on Mailgun since it's simple enough to get started and they have a generous free tier.

Design

This is the high-level design I settled on for the end-to-end workflow. There are always better solutions, but this one worked well enough and made sense to me.

For full details, see the code on GitHub

After coding it up, I simply scheduled it to run every day at 8:30am.

Results

I'm very happy with the outputs!

The addition of the Google News aggregation step means I often get several links to multiple outlets covering the same topic. I like being able to read different articles on the same story. Oftentimes the first article I read from Reddit won't have the full facts or is biased one way or another (shocked-pikachu.jpg 👁👄👁).

The quality and consistency of the Bright Data API responses is impressive.

I'm also pleasantly surprised at the quality of gpt-4o-mini's HTML formatting given the lack of prompt engineering I did to write the emails.

"Email template:\n{html_template}\n\nArticles w/summary:{articles}\n---\nGiven the above email template and articles with summary, format the articles into the email template. Replace all brackets with content from articles/comments. Be sure to include the article links in href. Do not respond with anything other than raw HTML, do not enclose HTML in quotes.",

If you try this out for yourself, be sure to change the subreddit when making the call to get_newsletter(). Active subreddits with several new posts a day are the best candidates for fresh, interesting newsletters.

Checkout the demo on GitHub.

You can find me on LinkedIn | CTO & Partner @ EES.

DEV Community

Using Bright Data and OpenAI to Auto-Generate TLDR-Style Newsletters

What I Built

Demo

How I Used Bright Data

The Idea

APIs and Data Sources

Design

Results

Top comments (0)

Read next

How PDFs Can Enhance Collaboration in Remote Work

React 19 Finally Stable, New Rust-Based JavaScript Framework, New Developer Tools, and more

What’s New in React 19? A Quick Guide with Code Examples

Deploying Traefik Proxy with Cloudflare Origin CA Certificate on k0s