DEV Community

ahmet gedik
ahmet gedik

Posted on

Building a Content Cleanup System for Old Videos

Introduction

Content accumulates. After months of fetching 25 trending videos from 7 regions every few hours, you end up with thousands of video records. Many become stale, irrelevant, or simply take up database space. Here's the cleanup system I built for ViralVidVault.

The Cleanup Pipeline

Our cleanup runs as the final step of each cron cycle:

<?php

class ContentCleaner
{
    private \PDO $db;
    private int $maxAgeDays;
    private int $minViews;

    public function __construct(
        \PDO $db,
        int $maxAgeDays = 90,
        int $minViews = 100,
    ) {
        $this->db = $db;
        $this->maxAgeDays = $maxAgeDays;
        $this->minViews = $minViews;
    }

    public function run(): CleanupReport
    {
        $report = new CleanupReport();

        // Step 1: Remove videos older than maxAge days
        $report->expired = $this->removeExpired();

        // Step 2: Remove low-performing videos older than 30 days
        $report->lowPerforming = $this->removeLowPerforming();

        // Step 3: Remove already-flagged stale videos older than 7 days
        $report->stale = $this->removeStale();

        // Step 4: Remove duplicate entries (same video, different fetches)
        $report->duplicates = $this->removeDuplicates();

        // Step 5: Vacuum the database
        $this->vacuum();

        return $report;
    }

    private function removeExpired(): int
    {
        $stmt = $this->db->prepare('
            DELETE FROM videos
            WHERE fetched_at < datetime("now", :days)
            AND is_active = 1
        ');
        $stmt->execute([':days' => "-{$this->maxAgeDays} days"]);
        return $stmt->rowCount();
    }

    private function removeLowPerforming(): int
    {
        $stmt = $this->db->prepare('
            DELETE FROM videos
            WHERE views < :minViews
            AND fetched_at < datetime("now", "-30 days")
        ');
        $stmt->execute([':minViews' => $this->minViews]);
        return $stmt->rowCount();
    }

    private function removeStale(): int
    {
        $result = $this->db->exec('
            DELETE FROM videos
            WHERE is_active = 0
            AND stale_at < datetime("now", "-7 days")
        ');
        return $result;
    }

    private function removeDuplicates(): int
    {
        $result = $this->db->exec('
            DELETE FROM video_regions
            WHERE rowid NOT IN (
                SELECT MIN(rowid)
                FROM video_regions
                GROUP BY video_id, region
            )
        ');
        return $result;
    }

    private function vacuum(): void
    {
        $this->db->exec('VACUUM');
    }
}
Enter fullscreen mode Exit fullscreen mode

The Report Object

<?php

class CleanupReport
{
    public int $expired = 0;
    public int $lowPerforming = 0;
    public int $stale = 0;
    public int $duplicates = 0;

    public function total(): int
    {
        return $this->expired + $this->lowPerforming + $this->stale + $this->duplicates;
    }

    public function __toString(): string
    {
        return sprintf(
            "Cleanup: %d removed (expired=%d, low=%d, stale=%d, dupes=%d)",
            $this->total(), $this->expired, $this->lowPerforming,
            $this->stale, $this->duplicates
        );
    }
}
Enter fullscreen mode Exit fullscreen mode

Integration with Cron

<?php
// At the end of fetch_videos.php

// Step 5: Cleanup old content
$cleaner = new ContentCleaner($db->getPdo(), maxAgeDays: 90, minViews: 100);
$report = $cleaner->run();
echo "Step 5: {$report}\n";
Enter fullscreen mode Exit fullscreen mode

Typical output on ViralVidVault:

Step 5: Cleanup: 47 removed (expired=12, low=18, stale=15, dupes=2)
Enter fullscreen mode Exit fullscreen mode

Database Size Management

Without cleanup, the SQLite database at viralvidvault.com grew to 85MB after 3 months. With the cleanup system running every cron cycle, it stays around 25MB. The VACUUM command reclaims the freed space.

Key Takeaways

  1. Run cleanup as the last step of your content pipeline
  2. Use age-based expiry for the bulk of removals
  3. Remove low-performing content to keep quality high
  4. Clean up stale records after a grace period
  5. VACUUM SQLite after bulk deletes to reclaim space

Part of the "Building ViralVidVault" series.

Top comments (0)