Building a news aggregator like Bawabet Elhadas was one of the projects that taught me the most new things. This project isn't just a website that displays news — it's a complete system that fetches news from multiple sources, stores it, categorizes it, summarizes it with AI, and knows how to display the most trending news for each user. In this article, I'll explain the technical challenges I faced and the solutions I used.
The first challenge was dealing with more than one news API source. I used GNews and NewsData as primary sources. Each API has a different data schema, different rate limits, and different pricing. To handle this diversity, I created an Abstraction Layer that transforms data from each API into a unified format. This means regardless of the source, data enters the system in the same shape. This saved me from having to change core code when adding new sources later.
API Quota Management was a real problem. GNews allows 100 daily requests on the free tier, and NewsData allows 200. If I exhaust one quota, I need to automatically switch to the other. I built a Routing System that tracks each API's consumption and routes requests to the available source. I also implemented smart caching — news doesn't change every second, so I make requests every 30 minutes and store the results. This reduced API consumption by 90%.
Caching in PostgreSQL with Prisma was the backbone of the system. Each article is stored in the database with core data: title, description, source, link, image, publish date, and category. I used UUID as the Primary Key and added Indexes on publish date and category for fast searching. I also added a Unique constraint on the article link to prevent duplication — sometimes the same article comes from multiple sources.
AI summarization was one of the most enjoyable parts. I used OpenRouter to access different language models at a reasonable price. When a new article enters the system, I send it to the API with a custom prompt: "Summarize this article in 3 sentences in Arabic in a clear and concise way." The summarization lets users read news quickly without needing to open each article. But I encountered a problem — summarization takes time (about 3-5 seconds per article), so I made it asynchronous — the article gets stored first and the summary is added when it's ready.
The Trending Score algorithm was the clever challenge. How do I determine which news is most important? I created an algorithm that calculates a score based on several factors: article freshness (news from the last hour ranks higher than 3-hour-old news), the number of sources that published the same story (if 3 sources published the same topic it's probably important), and reader count (the more people read it, the higher it ranks). The formula is simple but effective: Trending Score = Freshness × 0.4 + Source Count × 0.3 + Read Count × 0.3.
Personalization based on reading history was an advanced feature. When a user logs in (via NextAuth), the system tracks which categories they follow most — politics, sports, technology, etc. Then it shows news in their preferred categories first. It also tracks which sources they prefer, so articles from those sources appear higher. This personalization means each user sees a different interface tailored to their interests.
Updating news in real-time without WebSocket was a challenge. I didn't want to add WebSocket complexity because news doesn't need to update every second. Instead, I used client-side polling — every minute the client asks the server "any new news?" The server returns the latest article ID, and if it differs from what the client has, it fetches the new articles. On the server side, I used Cron Jobs that run every 15 minutes to fetch fresh news from all sources.
My advice for anyone building a news aggregator: start with just one source and understand the data well before adding others. Caching isn't a luxury — it's essential to avoid burning through your quota in the first hour. AI summarization adds huge value but keep it asynchronous so it doesn't slow down the system. And most importantly: test with real users to understand what news matters to them and what doesn't make a difference.
Powered by Ziad Amr
Top comments (0)