Discussion on: Processing One Billion Rows in PHP!

View post

Interesting task. Just, the dataset is not very realistic. Usually you would have a time stamp for each reading. So, you would need to process the time stamps too.

In a real live case, you would try to avoid those "brute force" reading:

Put your data in a time series database like influx-DB
Use separate tables for sepearate weather stations to make the number of data to be processed smaler
Use some "pre-aggregation": If you know the average for each hour, you can calculate a yearly average from 8760 values, and not from every reading.

Jack • Mar 9 '24 • Edited

This is the right answer, and even simple business metrics get segmented like this. The actual "solutions" you end up needing in the real world come down to utilizing efficient queries and table structures in synchronicity with caching. Intelligent database design is what you need to process through billions of rows... Lowly SQL, so it confuses me, also, when people compare how fast a language is at performing tasks which, in real life, are going to be handled by the database. I appreciate the thought exercise, but we end up arguing about which orange soda tastes best at the apple pie fair.