DEV Community

ahmet gedik
ahmet gedik

Posted on

Building a Cron-Based Content Freshness System

Content freshness is critical for a platform that shows trending videos. If your data is 24 hours old, it's not "trending" anymore. Here's how I built a cron-based content pipeline for DailyWatch that keeps content fresh across 8 regions while staying within API quotas.

The Pipeline Architecture

The entire fetch process runs as a single PHP script called by cron:

20 */2 * * * php /var/www/html/cron/fetch_videos.php >> /var/log/fetch.log 2>&1
Enter fullscreen mode Exit fullscreen mode

The script runs 6 sequential steps:

// fetch_videos.php - Main content pipeline
$startTime = microtime(true);
$db = Database::get();

echo "[" . date('Y-m-d H:i:s') . "] Starting fetch pipeline\n";

// Step 1: Fetch popular/trending videos (global)
echo "STEP 1: Popular videos\n";
$popular = fetchPopularVideos($db, $apiKey);
echo "  Fetched {$popular} popular videos\n";

// Step 2: Update category list
echo "STEP 2: Categories\n";
$categories = fetchCategories($db, $apiKey);
echo "  Updated {$categories} categories\n";

// Step 3: Multi-region fetch
echo "STEP 3: Regional trending\n";
$regions = explode(',', getenv('FETCH_REGIONS') ?: 'US,GB');
foreach ($regions as $region) {
    $count = fetchRegionTrending($db, $apiKey, trim($region));
    echo "  {$region}: {$count} videos\n";
    usleep(300000); // 300ms between regions
}

// Step 4: Refresh stale video data
echo "STEP 4: Stale refresh\n";
$refreshed = refreshStaleVideos($db, $apiKey, maxAge: 86400);
echo "  Refreshed {$refreshed} stale videos\n";

// Step 5: Cleanup old content
echo "STEP 5: Cleanup\n";
$cleaned = cleanupOldVideos($db, maxAge: 604800); // 7 days
echo "  Removed {$cleaned} expired videos\n";

// Step 6: Rebuild search index
echo "STEP 6: FTS rebuild\n";
rebuildSearchIndex($db);
echo "  Search index updated\n";

// Step 7: Clear page cache
echo "STEP 7: Cache clear\n";
clearPageCache();

// Step 8: Submit new URLs to IndexNow
echo "STEP 8: IndexNow\n";
$submitted = submitToIndexNow($db);
echo "  Submitted {$submitted} URLs\n";

$elapsed = round(microtime(true) - $startTime, 2);
echo "Pipeline complete in {$elapsed}s\n\n";
Enter fullscreen mode Exit fullscreen mode

Step 3 in Detail: Regional Fetching

function fetchRegionTrending(PDO $db, string $apiKey, string $region): int {
    $url = 'https://www.googleapis.com/youtube/v3/videos?' . http_build_query([
        'part' => 'snippet,statistics,contentDetails',
        'chart' => 'mostPopular',
        'regionCode' => $region,
        'maxResults' => 50,
        'key' => $apiKey,
    ]);

    $response = @file_get_contents($url);
    if ($response === false) {
        echo "  WARNING: Failed to fetch region {$region}\n";
        return 0;
    }

    $data = json_decode($response, true);
    if (!isset($data['items'])) return 0;

    $count = 0;
    $db->beginTransaction();

    foreach ($data['items'] as $item) {
        $videoId = $item['id'];
        $snippet = $item['snippet'];

        // Upsert video
        $db->prepare('
            INSERT INTO videos (video_id, title, description, channel_title,
                channel_id, category_id, thumbnail_url, published_at,
                duration, view_count)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            ON CONFLICT(video_id) DO UPDATE SET
                view_count = MAX(view_count, excluded.view_count),
                fetched_at = datetime("now")
        ')->execute([
            $videoId,
            $snippet['title'],
            mb_substr($snippet['description'] ?? '', 0, 500),
            $snippet['channelTitle'] ?? '',
            $snippet['channelId'] ?? '',
            (int)($snippet['categoryId'] ?? 0),
            $snippet['thumbnails']['medium']['url'] ?? '',
            $snippet['publishedAt'] ?? '',
            $item['contentDetails']['duration'] ?? '',
            (int)($item['statistics']['viewCount'] ?? 0),
        ]);

        // Tag region
        $db->prepare('
            INSERT OR IGNORE INTO video_regions (video_id, region, fetched_at)
            VALUES (?, ?, datetime("now"))
        ')->execute([$videoId, $region]);

        $count++;
    }

    $db->commit();
    return $count;
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Stale Content Refresh

function refreshStaleVideos(PDO $db, string $apiKey, int $maxAge): int {
    // Find videos with outdated data
    $stale = $db->query("
        SELECT video_id FROM videos
        WHERE fetched_at < datetime('now', '-{$maxAge} seconds')
        ORDER BY view_count DESC
        LIMIT 50
    ")->fetchAll(PDO::FETCH_COLUMN);

    if (empty($stale)) return 0;

    // Batch fetch updated data (50 IDs per API call = 1 quota unit)
    $ids = implode(',', $stale);
    $url = 'https://www.googleapis.com/youtube/v3/videos?' . http_build_query([
        'part' => 'statistics',
        'id' => $ids,
        'key' => $apiKey,
    ]);

    $response = @file_get_contents($url);
    if ($response === false) return 0;

    $data = json_decode($response, true);
    $updated = 0;

    foreach ($data['items'] ?? [] as $item) {
        $db->prepare('
            UPDATE videos SET
                view_count = ?,
                fetched_at = datetime("now")
            WHERE video_id = ?
        ')->execute([
            (int)($item['statistics']['viewCount'] ?? 0),
            $item['id'],
        ]);
        $updated++;
    }

    return $updated;
}
Enter fullscreen mode Exit fullscreen mode

Monitoring Output

A typical cron run at dailywatch.video produces:

[2026-02-28 20:20:01] Starting fetch pipeline
STEP 1: Popular videos
  Fetched 200 popular videos
STEP 2: Categories
  Updated 16 categories
STEP 3: Regional trending
  US: 50 videos
  GB: 50 videos
  DE: 50 videos
  FR: 50 videos
  IN: 50 videos
  BR: 50 videos
  AU: 50 videos
  CA: 50 videos
STEP 4: Stale refresh
  Refreshed 48 stale videos
STEP 5: Cleanup
  Removed 127 expired videos
STEP 6: FTS rebuild
  Search index updated
STEP 7: Cache clear
STEP 8: IndexNow
  Submitted 89 URLs
Pipeline complete in 28.34s
Enter fullscreen mode Exit fullscreen mode

The entire pipeline completes in under 30 seconds and uses approximately 12 API quota units per run. At 12 runs per day (every 2 hours), that's 144 units out of a 10,000 daily budget.

Top comments (0)