DEV Community

Quam Bello
Quam Bello

Posted on

The Debrief: AI-Powered News Podcast

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


What I Built

I love listening to the news. But somewhere between the ads, the clickbait headlines, and the content rabbit holes, I always end up distracted and none the wiser.

The deeper problem though? Even when I did pay attention, I often didn't understand why something mattered. I remember when Nigeria removed the fuel subsidy. I knew it happened. I had no idea it would send the price of everything through the roof. Nobody broke it down for me in a way that connected the dots.

That gap between knowing the news and understanding it, is
what The Debrief is built to close.

The Debrief is an AI-powered podcast generator that transforms live news into a fully produced, multi-voice audio briefing automatically. Just tell it what you care about. It finds the news, analyses it through multiple expert lenses, writes the script, and delivers a broadcast quality episode with distinct voices. No human in the loop.


Demo

πŸ”— Try it live: https://debrief-studio.onrender.com/


Code

πŸ™ GitHub: link


How I Used Gemma 4

First: Getting Reliable News

To build something useful, I needed a reliable, structured source of news. Through research I discovered that most news providers publish something called an RSS feed, a URL that returns XML content containing article titles, publication dates, and links.

I sourced feed data from awesome-rss-feeds and a Kaggle dataset,
then personally researched and verified additional RSS links specifically for Nigerian news sources since that coverage was thin. After cleaning and compiling everything, I ended up with 3,166 verified data points with columns: feed_name, country, section, and rss_link.

That catalog became the foundation of the entire system.


The Architecture: 4 Agents, One Brain

The Debrief runs on 4 agents, all powered by Gemma 4 as the core reasoning engine, orchestrated with LangGraph.


πŸ” Agent 1: The Inquisitor

"Find me what's relevant."

The Inquisitor takes the user's natural language request and searches the RSS catalog for the most relevant sources. It's equipped with tools to query the dataset and selects the top 3 matching feeds based on country, section, and relevance to the request.

Ask it "Give me news about the music industry in Nigeria" and it will find Nigerian entertainment and music feeds, not generic world news.


πŸ“° Agent 2: The Herald

"Here's what happened in the last 24 hours."

The Herald fetches articles published within the past 24-hour window from the selected feeds, then selects the top 5 most relevant stories to the user's request. It produces clean structured output no hallucinations, no invented headlines.

Technically the Herald doesn't use tools, but it's doing something deceptively hard: filtering signal from noise across multiple live sources and returning only what matters.


🧠 Agent 3: The Researcher

"Let me tell you what this actually means."

This is where The Debrief earns its name. Each article is routed through two parallel subgraphs:

Subgraph 1: Context Research
The Researcher asks clarifying questions about the article and uses web search to fill in background context a listener might be missing. It then summarises its findings into a clean briefing.

Subgraph 2: Persona Insights
This is where it gets interesting. News doesn't mean the same thing to everyone. A rate hike means something different to an economist than it does to a lawyer or a historian.

So I defined 9 expert personas, each with a distinct analytical
lens:

Persona What They Bring
Critic The skeptic in the room. Questions official narratives and asks the uncomfortable questions everyone is thinking but nobody is saying out loud.
Economist Follows the money. Analyses market impact, inflation, employment, and what the numbers actually mean for businesses and households.
Geopolitician Reads between the borders. Examines how international power dynamics, foreign policy, and treaties shape the story beneath the surface.
Historian Been here before. Draws parallels to past events and answers the most important question β€” the last time this happened, how did it end?
Lawyer Reads the fine print. Breaks down regulatory, legal, and policy implications β€” what it means for compliance, liability, and institutional response.
Politician Understands the room. Analyses the political motivations and stakeholder pressures behind a story β€” who benefits and who doesn't.
Scientist Follows the evidence. Cuts through speculation on health, climate, energy, and technology stories with data-backed context.
Socialite Has the receipts. Tracks what high-profile individuals β€” CEOs, public figures, politicians β€” are publicly saying and what the cultural conversation looks like.
Tech Analyst Sees the disruption coming. Examines the technological implications of a story and what it means for innovation and industry.

A Persona Router (powered by Gemma 4) reads each article and selects the most relevant 2–4 personas to weigh in. Not every story needs all nine. A sports story doesn't need the Lawyer. A policy bill doesn't need the Tech Analyst.

The Historian always runs last because their insight contextualises what the other personas found, drawing historical parallels that only make sense once the full picture is assembled.

All personas have access to web search. They don't just reason from
the article, they go looking for evidence.


🎬 Agent 4: The Director

"Let's make this sound like a show."

The Director takes everything the Researcher produced and turns it into a broadcast-ready podcast episode. It runs in two stages:

Stage 1: The Headline Segment
The anchor opens the episode with a brief overview of all 5 stories, introduces the deep dive section, and sets the tone for what's coming.

Stage 2: Deep Dive (per story)
Each story gets its own deep dive segment, a natural conversation between two voices: one asking questions, one answering, both drawing from the full research, context, and persona insights assembled by The Researcher.

I originally wanted each persona to have their own distinct voice. But Gemini TTS supports a maximum of 2 voices per call and honestly, more than 2 voices creates confusion rather than comprehension. So the conversation format (one interviewer, one expert) turned out to be the right call both technically and editorially.

Gemma 4 generates the transcript. Gemini TTS converts it to audio with two distinct speaker voices. The last deep dive wraps up the episode and credits the sources.

Post-processing: all audio segments (headline + deep dives) are stitched together with pydub and stored on Cloudinary.


Model Choices

Agent Model Why
The Director gemma-4-31b Long-form script generation needs full reasoning capability
The Inquisitor, Herald, Researcher gemma-4-27b (MoE) Faster, efficient, handles routing and structured tasks cleanly

The MoE architecture of Gemma 4 27B activates only a fraction of parameters per token near-31B quality at significantly lower latency. Perfect for the high-frequency nodes.


Performance

For 5 articles, The Debrief takes approximately 8 minutes to complete accounting for web searches, persona analysis, transcript generation, TTS conversion, and audio stitching.

The output is approximately 21 minutes of audio:

  • ~4 min 30s per deep dive segment
  • ~3 min for the headline segment

Known Limitations & What's Next

  • Transitions between deep dives need smoothing: the handoff between stories is functional but not yet seamless
  • Persona-specific tools would take this further giving the Socialite access to X (formerly Twitter) for real-time public sentiment, for example.
  • Generation time is the main UX friction 7 minutes is acceptable for a scheduled daily briefing, less ideal for breaking news
  • Nigerian source coverage is still growing, if you're requesting very niche local Nigerian news, some feeds may not return results depending on your location

Final Thoughts

All I have to do now is type in a request, and within minutes I have a fully produced audio briefing on exactly what I care about no ads, no distractions, no rabbit holes. It makes perfect sense to listen while exercising, commuting, or doing literally anything else.

Gemma 4 turned out to be exactly the right model for this. The reasoning quality, the structured output reliability, and the MoE efficiency made it possible to run a multi-agent pipeline of this complexity at scale. I'll definitely be using it for future experiments.


Resources

Top comments (0)