Introduction
Content accumulates. After months of fetching 25 trending videos from 7 regions every few hours, you end up with thousands of video records. Many become stale, irrelevant, or simply take up database space. Here's the cleanup system I built for ViralVidVault.
The Cleanup Pipeline
Our cleanup runs as the final step of each cron cycle:
<?php
class ContentCleaner
{
private \PDO $db;
private int $maxAgeDays;
private int $minViews;
public function __construct(
\PDO $db,
int $maxAgeDays = 90,
int $minViews = 100,
) {
$this->db = $db;
$this->maxAgeDays = $maxAgeDays;
$this->minViews = $minViews;
}
public function run(): CleanupReport
{
$report = new CleanupReport();
// Step 1: Remove videos older than maxAge days
$report->expired = $this->removeExpired();
// Step 2: Remove low-performing videos older than 30 days
$report->lowPerforming = $this->removeLowPerforming();
// Step 3: Remove already-flagged stale videos older than 7 days
$report->stale = $this->removeStale();
// Step 4: Remove duplicate entries (same video, different fetches)
$report->duplicates = $this->removeDuplicates();
// Step 5: Vacuum the database
$this->vacuum();
return $report;
}
private function removeExpired(): int
{
$stmt = $this->db->prepare('
DELETE FROM videos
WHERE fetched_at < datetime("now", :days)
AND is_active = 1
');
$stmt->execute([':days' => "-{$this->maxAgeDays} days"]);
return $stmt->rowCount();
}
private function removeLowPerforming(): int
{
$stmt = $this->db->prepare('
DELETE FROM videos
WHERE views < :minViews
AND fetched_at < datetime("now", "-30 days")
');
$stmt->execute([':minViews' => $this->minViews]);
return $stmt->rowCount();
}
private function removeStale(): int
{
$result = $this->db->exec('
DELETE FROM videos
WHERE is_active = 0
AND stale_at < datetime("now", "-7 days")
');
return $result;
}
private function removeDuplicates(): int
{
$result = $this->db->exec('
DELETE FROM video_regions
WHERE rowid NOT IN (
SELECT MIN(rowid)
FROM video_regions
GROUP BY video_id, region
)
');
return $result;
}
private function vacuum(): void
{
$this->db->exec('VACUUM');
}
}
The Report Object
<?php
class CleanupReport
{
public int $expired = 0;
public int $lowPerforming = 0;
public int $stale = 0;
public int $duplicates = 0;
public function total(): int
{
return $this->expired + $this->lowPerforming + $this->stale + $this->duplicates;
}
public function __toString(): string
{
return sprintf(
"Cleanup: %d removed (expired=%d, low=%d, stale=%d, dupes=%d)",
$this->total(), $this->expired, $this->lowPerforming,
$this->stale, $this->duplicates
);
}
}
Integration with Cron
<?php
// At the end of fetch_videos.php
// Step 5: Cleanup old content
$cleaner = new ContentCleaner($db->getPdo(), maxAgeDays: 90, minViews: 100);
$report = $cleaner->run();
echo "Step 5: {$report}\n";
Typical output on ViralVidVault:
Step 5: Cleanup: 47 removed (expired=12, low=18, stale=15, dupes=2)
Database Size Management
Without cleanup, the SQLite database at viralvidvault.com grew to 85MB after 3 months. With the cleanup system running every cron cycle, it stays around 25MB. The VACUUM command reclaims the freed space.
Key Takeaways
- Run cleanup as the last step of your content pipeline
- Use age-based expiry for the bulk of removals
- Remove low-performing content to keep quality high
- Clean up stale records after a grace period
- VACUUM SQLite after bulk deletes to reclaim space
Part of the "Building ViralVidVault" series.
Top comments (0)