DEV Community

Discussion on: Processing One Billion Rows in PHP!

Collapse
 
gunabalans profile image
Gunabalan.S

I tried a standard PHP script execution on a WAMP environment with 8 GB RAM,
and it safely processed approximately 10 million records.

<?php
/**
 * The test.csv file contains a limited data set to verifying the accuracy of calculations.
 * 
 * karaikal;1
 * karaikal;2
 * karaikal;2
 * karaikal;3
 * Mumbai;1
 * Mumbai;2
 * Mumbai;3
 * Mumbai;4
 * Mumbai;4
 * 
 * then go with actual dataset : weather_stations.csv
 */
$seconds = time(); //time stamp

//load file
//$fileHandle = fopen('./weather_stations.csv', 'r');
$fileHandle = fopen('./test.csv', 'r');

$data = [];

$index = 0;
while (($line = fgets($fileHandle)) !== false) {
    $values = explode(";", $line);
    $key = trim($values[0]);
    $value = trim($values[1]);
    $data[$key]['m1'] = isset($data[$key]['m1']) ? ($data[$key]['m1'] . ',' . $value) : $value;
    $data[$key]['m'][$value] = ($data[$key]['m'][$value] ?? 0) + 1;
    $index++;
}

krsort($data);

foreach ($data as $key => $val) {
    $dataPerCity = explode(',', $val['m1']);
    $count = count($dataPerCity);
    $sum = array_sum($dataPerCity);
    $middle = floor(($count - 1) / 2);

    // Mean (Average)
    $mean = $sum / $count;

    // Median
    $median = ($count % 2 == 0) ? ($dataPerCity[$middle] + $dataPerCity[$middle + 1]) / 2 : $dataPerCity[$middle];

    // Mode
    $mode = array_search(max($val['m']), $val['m']);

    // Output results
    echo "$key / Mean: $mean Median: $median Mode: $mode\n";
}

fclose($fileHandle);

$seconds2 = time();
$elapsedTime = $seconds2 - $seconds;
echo "Elapsed time in seconds: $elapsedTime\n";
echo "Records processed : $index\n";
Enter fullscreen mode Exit fullscreen mode

Image description