Building a lean, mean data processing machine that handles 100 I/O operations per second without breaking a sweat
When your application suddenly needs to process hundreds of thousands of records daily with peak loads hitting 100 I/O operations per second, you quickly learn that standard CRUD operations won't cut it. Here's how we transformed our Lumen application into a high performance data processing powerhouse.
The Challenge
Our monitoring system processes 300,000+ data records daily, generating complex reports and exports while maintaining sub-second response times. The system handles everything from real-time aggregations to massive CSV exports all while keeping memory usage under control.
Strategy 1: Database Schema Optimization
JSON Columns with Generated Virtual Columns
Instead of creating multiple tables with complex joins, we leveraged MySQL's JSON capabilities with a twist:
// Migration: Create virtual columns for frequently queried JSON fields
Schema::table('data_xxx', function (Blueprint $table) {
$table->string('feedback_extracted')->virtualAs(
"JSON_UNQUOTE(JSON_EXTRACT(content_data, '$.feedback'))"
)->index();
$table->decimal('amount_extracted', 15, 2)->virtualAs(
"CAST(JSON_EXTRACT(content_data, '$.amount') AS DECIMAL(15,2))"
)->index();
});
Why this works:
- Virtual columns are computed on the fly but can be indexed
- Eliminates need for complex joins
- Maintains data flexibility while enabling fast queries
Strategic Indexing
// Composite indexes for common query patterns
Schema::table('data_xxx', function (Blueprint $table) {
$table->index(['branch', 'visit_date', 'visit_type']);
$table->index(['personnel_id', 'visit_date', 'status']);
$table->index(['visit_type', 'status', 'feedback_extracted']);
});
Strategy 2: Query Optimization Patterns
Avoiding N+1 with Smart Aggregation
Instead of loading relations, we aggregate at the database level:
public function getDataSummary($filters)
{
return DB::table('data_xxx')
->select([
'branch',
DB::raw('SUM(CASE WHEN status = "COMPLETED" THEN 1 ELSE 0 END) as completed'),
DB::raw('SUM(CASE WHEN status = "PLANNED" THEN 1 ELSE 0 END) as planned'),
DB::raw('AVG(CAST(JSON_EXTRACT(content_data, "$.score") AS DECIMAL)) as avg_score')
])
->where('visit_date', $filters['date'])
->groupBy('branch')
->get();
}
Generator Powered Data Processing
For large datasets, we use PHP generators to maintain constant memory usage:
public function processLargeDataset($filters): \Generator
{
$query = DB::table('data_xxx')
->where('visit_date', '>=', $filters['start_date'])
->where('visit_date', '<=', $filters['end_date'])
->orderBy('id');
foreach ($query->lazy(2000) as $record) {
yield $this->transformRecord($record);
}
}
// Usage
foreach ($this->processLargeDataset($filters) as $processedRecord) {
// Memory stays constant regardless of dataset size
$this->handleRecord($processedRecord);
}
Strategy 3: High Performance Export System
Memory-Efficient CSV Generation
Our export system handles massive datasets while keeping memory usage under 50MB:
public function exportToCSV($filters): string
{
// Create temporary file
$tempFile = tmpfile();
$tempPath = stream_get_meta_data($tempFile)['uri'];
// Write headers
fputcsv($tempFile, ['Date', 'Branch', 'Personnel', 'Customer', 'Result']);
// Stream data in chunks
foreach ($this->getExportData($filters) as $record) {
fputcsv($tempFile, [
$record['visit_date'],
$record['branch_name'],
$record['personnel_name'],
$record['customer_name'],
$record['visit_result']
]);
}
// Upload to storage
$finalPath = "exports/data_" . date('Y-m-d_H-i-s') . ".csv";
Storage::put($finalPath, fopen($tempPath, 'r'));
fclose($tempFile);
return $finalPath;
}
private function getExportData($filters): \Generator
{
$query = DB::table('data_xxx')
->select([
'visit_date', 'branch_name', 'personnel_name',
'customer_name', 'visit_result'
])
->where('visit_date', $filters['date'])
->orderBy('id');
foreach ($query->lazy(2000) as $record) {
yield (array) $record;
}
}
Background Processing with Chunked Operations
For time intensive operations, we use job queues with intelligent chunking:
public function processInBackground($requestData)
{
// Create tracking record
$exportLog = $this->createExportLog($requestData);
// Queue the processing job
Queue::push(new ProcessDataExport($exportLog->id, $requestData));
return $exportLog;
}
// In the job class
public function handle()
{
$startTime = microtime(true);
foreach ($this->getDataInChunks() as $chunk) {
$this->processChunk($chunk);
// Prevent memory leaks and timeouts
if (microtime(true) - $startTime > 300) { // 5 minutes
Queue::push(new ProcessDataExport($this->logId, $this->remainingData));
return;
}
}
$this->markAsCompleted();
}
Strategy 4: Caching and Optimization
Smart Cache Invalidation
public function getCachedSummary($filters)
{
$cacheKey = 'summary_' . md5(serialize($filters));
// For today's data, cache for 30 minutes
// For historical data, cache for 24 hours
$ttl = $filters['date'] === date('Y-m-d') ? 1800 : 86400;
return Cache::remember($cacheKey, $ttl, function () use ($filters) {
return $this->generateSummary($filters);
});
}
Performance Results
Before optimization:
- Memory usage: 500MB+ for large exports
- Export time: 5+ minutes for 100K records
- Database CPU: 80%+ during peak hours
After optimization:
- Memory usage: <50MB consistently
- Export time: 30 seconds for 100K records
- Database CPU: <30% during peak hours
- Response time: <200ms for most queries
Key Takeaways
- JSON columns + virtual indexes eliminate complex joins while maintaining query performance
- PHP generators keep memory usage constant regardless of dataset size
- Strategic chunking prevents timeouts and resource exhaustion
- Proper indexing strategy is crucial for high volume operations
- Stream processing beats loading everything into memory
The beauty of this approach is its simplicity no complex technology, no exotic databases, just well optimized PHP and MySQL doing what they do best.
Top comments (0)