DEV Community

Mufthi Ryanda
Mufthi Ryanda

Posted on

Scaling to 300K+ Records Daily: How We Handle High Volume Data Processing with Lumen & MySQL

Building a lean, mean data processing machine that handles 100 I/O operations per second without breaking a sweat

When your application suddenly needs to process hundreds of thousands of records daily with peak loads hitting 100 I/O operations per second, you quickly learn that standard CRUD operations won't cut it. Here's how we transformed our Lumen application into a high performance data processing powerhouse.

The Challenge

Our monitoring system processes 300,000+ data records daily, generating complex reports and exports while maintaining sub-second response times. The system handles everything from real-time aggregations to massive CSV exports all while keeping memory usage under control.

Strategy 1: Database Schema Optimization

JSON Columns with Generated Virtual Columns

Instead of creating multiple tables with complex joins, we leveraged MySQL's JSON capabilities with a twist:

// Migration: Create virtual columns for frequently queried JSON fields
Schema::table('data_xxx', function (Blueprint $table) {
    $table->string('feedback_extracted')->virtualAs(
        "JSON_UNQUOTE(JSON_EXTRACT(content_data, '$.feedback'))"
    )->index();

    $table->decimal('amount_extracted', 15, 2)->virtualAs(
        "CAST(JSON_EXTRACT(content_data, '$.amount') AS DECIMAL(15,2))"
    )->index();
});
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • Virtual columns are computed on the fly but can be indexed
  • Eliminates need for complex joins
  • Maintains data flexibility while enabling fast queries

Strategic Indexing

// Composite indexes for common query patterns
Schema::table('data_xxx', function (Blueprint $table) {
    $table->index(['branch', 'visit_date', 'visit_type']);
    $table->index(['personnel_id', 'visit_date', 'status']);
    $table->index(['visit_type', 'status', 'feedback_extracted']);
});
Enter fullscreen mode Exit fullscreen mode

Strategy 2: Query Optimization Patterns

Avoiding N+1 with Smart Aggregation

Instead of loading relations, we aggregate at the database level:

public function getDataSummary($filters)
{
    return DB::table('data_xxx')
        ->select([
            'branch',
            DB::raw('SUM(CASE WHEN status = "COMPLETED" THEN 1 ELSE 0 END) as completed'),
            DB::raw('SUM(CASE WHEN status = "PLANNED" THEN 1 ELSE 0 END) as planned'),
            DB::raw('AVG(CAST(JSON_EXTRACT(content_data, "$.score") AS DECIMAL)) as avg_score')
        ])
        ->where('visit_date', $filters['date'])
        ->groupBy('branch')
        ->get();
}
Enter fullscreen mode Exit fullscreen mode

Generator Powered Data Processing

For large datasets, we use PHP generators to maintain constant memory usage:

public function processLargeDataset($filters): \Generator
{
    $query = DB::table('data_xxx')
        ->where('visit_date', '>=', $filters['start_date'])
        ->where('visit_date', '<=', $filters['end_date'])
        ->orderBy('id');

    foreach ($query->lazy(2000) as $record) {
        yield $this->transformRecord($record);
    }
}

// Usage
foreach ($this->processLargeDataset($filters) as $processedRecord) {
    // Memory stays constant regardless of dataset size
    $this->handleRecord($processedRecord);
}
Enter fullscreen mode Exit fullscreen mode

Strategy 3: High Performance Export System

Memory-Efficient CSV Generation

Our export system handles massive datasets while keeping memory usage under 50MB:

public function exportToCSV($filters): string
{
    // Create temporary file
    $tempFile = tmpfile();
    $tempPath = stream_get_meta_data($tempFile)['uri'];

    // Write headers
    fputcsv($tempFile, ['Date', 'Branch', 'Personnel', 'Customer', 'Result']);

    // Stream data in chunks
    foreach ($this->getExportData($filters) as $record) {
        fputcsv($tempFile, [
            $record['visit_date'],
            $record['branch_name'],
            $record['personnel_name'],
            $record['customer_name'],
            $record['visit_result']
        ]);
    }

    // Upload to storage
    $finalPath = "exports/data_" . date('Y-m-d_H-i-s') . ".csv";
    Storage::put($finalPath, fopen($tempPath, 'r'));

    fclose($tempFile);
    return $finalPath;
}

private function getExportData($filters): \Generator
{
    $query = DB::table('data_xxx')
        ->select([
            'visit_date', 'branch_name', 'personnel_name', 
            'customer_name', 'visit_result'
        ])
        ->where('visit_date', $filters['date'])
        ->orderBy('id');

    foreach ($query->lazy(2000) as $record) {
        yield (array) $record;
    }
}
Enter fullscreen mode Exit fullscreen mode

Background Processing with Chunked Operations

For time intensive operations, we use job queues with intelligent chunking:

public function processInBackground($requestData)
{
    // Create tracking record
    $exportLog = $this->createExportLog($requestData);

    // Queue the processing job
    Queue::push(new ProcessDataExport($exportLog->id, $requestData));

    return $exportLog;
}

// In the job class
public function handle()
{
    $startTime = microtime(true);

    foreach ($this->getDataInChunks() as $chunk) {
        $this->processChunk($chunk);

        // Prevent memory leaks and timeouts
        if (microtime(true) - $startTime > 300) { // 5 minutes
            Queue::push(new ProcessDataExport($this->logId, $this->remainingData));
            return;
        }
    }

    $this->markAsCompleted();
}
Enter fullscreen mode Exit fullscreen mode

Strategy 4: Caching and Optimization

Smart Cache Invalidation

public function getCachedSummary($filters)
{
    $cacheKey = 'summary_' . md5(serialize($filters));

    // For today's data, cache for 30 minutes
    // For historical data, cache for 24 hours
    $ttl = $filters['date'] === date('Y-m-d') ? 1800 : 86400;

    return Cache::remember($cacheKey, $ttl, function () use ($filters) {
        return $this->generateSummary($filters);
    });
}
Enter fullscreen mode Exit fullscreen mode

Performance Results

Before optimization:

  • Memory usage: 500MB+ for large exports
  • Export time: 5+ minutes for 100K records
  • Database CPU: 80%+ during peak hours

After optimization:

  • Memory usage: <50MB consistently
  • Export time: 30 seconds for 100K records
  • Database CPU: <30% during peak hours
  • Response time: <200ms for most queries

Key Takeaways

  1. JSON columns + virtual indexes eliminate complex joins while maintaining query performance
  2. PHP generators keep memory usage constant regardless of dataset size
  3. Strategic chunking prevents timeouts and resource exhaustion
  4. Proper indexing strategy is crucial for high volume operations
  5. Stream processing beats loading everything into memory

The beauty of this approach is its simplicity no complex technology, no exotic databases, just well optimized PHP and MySQL doing what they do best.

Top comments (0)