I was tired of tech news sites that track everything you click, load slowly, and hide half their content behind paywalls. So I built my own.
The result is PulseTech.news — a lightning-fast, automated tech news aggregator that updates every hour, covers 16 categories, and is fully GDPR/CCPA compliant with zero creepy tracking.
Here's exactly how I built it, the architectural decisions I made, and what I learned along the way.
The Stack (And Why I Chose It)
Before I get into the architecture, here's the full stack:
- PHP 8.x — custom lightweight framework, no Laravel, no Symfony
- MySQL 8.x — via PDO with prepared statements throughout
- Tailwind CSS — standalone binary, no Node build pipeline
- SimplePie — for RSS/Atom feed parsing
- Composer — for dependency management (vlucas/phpdotenv, simplepie)
The biggest decision here was rejecting heavy frameworks. I didn't need the overhead of Laravel for what is essentially a read-heavy content site. A custom lightweight PHP framework gave me sub-100ms page loads and complete control over every byte hitting the wire.
Architecture: The Repository Pattern
The core of PulseTech.news is built around the Repository Pattern. All data access is abstracted away from the page controllers and centralised in repository classes.
// Clean controller code — no SQL in sight
$articles = $articleRepo->getLatest($limit, $offset, $filters);
The main repositories are:
-
ArticleRepository— handles all article retrieval, including language filtering (English by default, Spanish available) -
FeedRepository— manages feed sources and their language settings
Each repository receives a PDO instance via constructor injection, keeping the database logic contained and testable.
For database access itself, I used a Singleton pattern:
$pdo = Database::getInstance()->getConnection();
One connection, one point of access, consistent throughout the application.
The Scraper Engine
This was the most interesting part to build. The scraper (classes/Scraper.php) runs as a headless background process on an hourly cron cycle. Here's what it does:
1. Feed Management
RSS and Atom feeds from the world's top tech sources are managed via an admin panel. Adding a new source is a one-click operation.
2. Intelligent Categorisation
Rather than relying on the source's own tags (which are inconsistent), I built a weighted keyword detection system. Each article's title and description is scored against keyword sets for each category:
- AI
- Cybersecurity
- Apple / iOS / iPadOS / iPhone / Mac
- Android / Samsung
- Linux
- Windows
- Gaming
- Robots
- Google / Tesla
The weighting system ensures "AI" news stays in AI, "Cybersecurity" stays in security, and articles don't bleed into the wrong categories. This took the most iteration to get right.
3. Deduplication
Articles are deduplicated on source URL before insertion. No duplicate stories, even when multiple feeds cover the same news.
The Shift from Vibe Coding to Agentic Engineering
I want to be honest about the build process here, because I think it matters.
The first version of PulseTech.news was largely vibe coded — prompting AI for code, tweaking until it worked, posting screenshots. The UI looked great. But the underlying system was fragile.
The real work came when I shifted to agentic engineering: designing structured workflows, context documents, validation loops, and a full architecture overview (PROJECT_ARCHITECTURE.md) that AI agents could operate within without breaking the build.
The difference was enormous. Instead of getting code that looked right, I got code that behaved correctly within the system. The Repository Pattern, the static helpers, the testing standards — all of it was designed so that an AI agent could contribute to the codebase following the same rules as a human developer.
Static Helper Classes
Rather than polluting controllers with raw superglobal access, I built a set of static helper classes:
Session::Get('user_id'); // Clean session access
Input::Get('page'); // Sanitised GET/POST input
Config::Get('DB_HOST'); // Environment variable access
UIHelper::ArticleCard($data); // Reusable UI components
Theme::isDark(); // Dark/light mode state
These keep the controllers clean and make the codebase easy for AI agents (and human developers) to navigate consistently.
SEO & Structured Data
Every listing on PulseTech.news is backed by JSON-LD Schema.org structured data, making the site highly discoverable. The header system manages:
- Page-specific Open Graph and Twitter Card meta tags
- Canonical URLs (auto-calculated pretty URLs)
- JSON-LD
OrganizationandWebSiteschemas - Dynamic
$pageTitle,$pageDescription, and$ogImagevariables per page
This was a deliberate investment in long-term organic traffic. SEO is compounding — the work you do today pays off for months.
Privacy First
PulseTech.news implements Google Consent Mode v2 and PII-free click tracking. Here's what that means in practice:
- No personal data is stored on click events
- Full GDPR/CCPA compliance without sacrificing analytics
- Consent banner with genuine reject option (not a dark pattern)
This wasn't just an ethical choice — it's increasingly a legal requirement and a genuine differentiator when users are increasingly privacy-conscious.
Security Standards
Every database interaction uses PDO prepared statements. No exceptions. All POST forms include CSRF tokens, and admin routes are protected via session-based authorisation checks.
// Always prepared statements — never raw interpolation
$stmt = $pdo->prepare("SELECT * FROM articles WHERE id = :id");
$stmt->execute([':id' => $id]);
Testing
The project uses PHPUnit for automated testing, located in tests/. Every Repository and Business Logic class has a corresponding test file. The convention is strict: ClassNameTest.php, bootstrapped via tests/bootstrap.php.
./vendor/bin/phpunit
Having a test suite was essential when using AI agents to contribute code — it gave me a fast feedback loop to catch regressions before they hit production.
What's Next
PulseTech.news is live and updating hourly. Here's what's on the roadmap:
- User accounts with a personal 'Read Later' library
- AI-driven personalised feeds — only see the categories and sources you care about
- More sources and languages — currently English and Spanish, expanding soon
Try It
It's completely free. No paywalls. No bloat. 16 categories updated every hour.
I'd love feedback on the speed, the dark mode UI, and any tech sources you think I should add to the scraper. Drop them in the comments below.
Top comments (0)