DEV Community

Discussion on: Processing One Billion Rows in PHP!

Collapse
 
efpage profile image
Eckehard

Is there any test file to download? As far as I understood the source data can be created from a script, but for a first test a simple file would be easier.

Collapse
 
realflowcontrol profile image
Florian Engelhardt

Hey there,
I create a repo with the source code for PHP at github.com/realFlowControl/1brc
This also has a PHP script to generate the measurements.txt:

php createMeasurements.php 10000 # to create a file with 10k entries
Enter fullscreen mode Exit fullscreen mode

Hope this helps!

Collapse
 
efpage profile image
Eckehard

Hy Florian,

works great! I The timing is reportet in ms, but it seems to be seconds.

It took about 160s to write a 132 MB file on my local drive, but the time to read the file will greatly depend on the file system, caching and the reading strategy. Depending on your system the reading will be much faster the second time your open the file.

Notepad++ opens the file without any delay, as they only read the part that is displayed. The standard Editor takes about one minute to do the same, as they read the whole file at once. We can do the same on reading the file in chunks, as file operations are handled in the background by the hard drive, this operations do usually not put any strain on the processor.

So, if we do so we get a nice contest on file operations, but as you have not much control over the details on most high level languages, what does the contest really measure?