DEV Community

Cover image for 🧠 NewsPulse AI – Real-Time News Analysis with LLMs & Web Scraping via Bright Data

🧠 NewsPulse AI – Real-Time News Analysis with LLMs & Web Scraping via Bright Data

Suman Kumar on May 25, 2025

This is a submission for the Bright Data AI Web Access Hackathon We built NewsPulse AI to explore a simple but powerful question: "What if you cou...
Collapse
 
nevodavid profile image
Nevo David

Been cool seeing steady progress - it adds up. What do you think actually keeps things growing over time? Habits? Luck? Just showing up?

Collapse
 
sumankalia profile image
Suman Kumar

I think it's mostly habits and consistency. Even on days when things aren’t perfect, just showing up makes a difference over time.

Collapse
 
ranjancse profile image
Ranjan Dailata • Edited

Suggestion - After the news analysis gets completed, it would be great to programmatically scroll to the "Article Analysis" section.

Collapse
 
sumankalia profile image
Suman Kumar

Added the scroll brother, thanks

Collapse
 
ranjancse profile image
Ranjan Dailata

The Github link in this blog post is broken.

Collapse
 
sumankalia profile image
Suman Kumar

I have update the repo url

Collapse
 
dotallio profile image
Dotallio

Super impressive to see the real-time insights without any DB in the loop. Really curious, how do you handle scaling when queries spike up?

Collapse
 
sumankalia profile image
Suman Kumar

Absolutely β€” scaling is definitely something we’re thinking about for the next stage.

Right now, the system is optimized to meet the hackathon goals: fully real-time, stateless, and DB-free, focusing on live data access and analysis. It performs well under moderate load and showcases the core value of Bright Data’s infrastructure.

But for production-level traffic or query spikes, we’d definitely need to:

Implement request queues and concurrent workers

Add rate-limiting to protect both system and target sites

Possibly introduce a temporary cache layer (e.g., Redis) for recent results

And eventually move to autoscaling infrastructure like AWS Fargate or GCP Cloud Run

So yes, the current setup is hackathon-ready β€” but scaling and load management are high on the roadmap as we evolve this into a production-grade tool. πŸ™Œ

Collapse
 
shweta profile image
Shweta Kale

Loved the idea!!

I had a question though – I noticed you used the API https://api.brightdata.com instead of @brightdata/mcp How does that work? Does @brightdata/mcp use the API under the hood, or are they two separate things? In the documentation, the only method I saw was using @brightdata/mcp.

Collapse
 
sumankalia profile image
Suman Kumar

Heyy thanks! Glad you liked it πŸ˜„

Yes, the @brightdata/mcp use the same api.

So actually, I just took Bright Data’s FastMCP server code and plugged it into my Express backend directly. Basically doing the same thing as @brightdata/mcp, just manually. I'm still using api.brightdata.com under the hood, but with a bit more control on how things run.

Collapse
 
jinparkmida profile image
Jin Park

Very interesting and impressive project!
But what makes it different to say engines like ground.news/ ?

Collapse
 
sumankalia profile image
Suman Kumar

Thanks for the great question! πŸ™Œ

Ground News offers static, outlet-level bias insights.
NewsPulse AI gives dynamic, live, article-level intelligence.

Right now, we use OpenAI for content analysis, but we’re already planning to train our own models tailored for news sentiment, propaganda detection, and political bias β€” optimized for real-time media monitoring and transparency.

Collapse
 
jinparkmida profile image
Jin Park

That's really cool!
I am definitely going to keep an eye out for your continued development! :)