I had to try and see it with my eyes. I find this fascinating because it contradicts my mental model of a memory bus that has very high throughput only if the fetched data is truly contiguous. Even if data-oriented is still the fastest in my benchmark, I'm now taking my data-oriented insight with a grain of salt. Thanks for bringing up the pointer in the loop!
I'm even more surprised because in unrelated experiments, I had never seen a compelling argument based on performance numbers for passing a small/medium object around by value or by pointer. In the Ants case, we see that passing by pointer can provide a significant speedup.
Now I'm curious because I still think that fetching from memory is the bottleneck, and everything else (CPU, L1 cache reads, etc.) should be negligible. This would mean (maybe) that the pointer approach lets us retrieve fewer bytes of memory by not fetching the full Ant instance, while still achieving high throughput when retrieving non-contiguous bytes. Would need a bit more research to figure this out.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Good catch Egon!
I had to try and see it with my eyes. I find this fascinating because it contradicts my mental model of a memory bus that has very high throughput only if the fetched data is truly contiguous. Even if data-oriented is still the fastest in my benchmark, I'm now taking my data-oriented insight with a grain of salt. Thanks for bringing up the pointer in the loop!
I'm even more surprised because in unrelated experiments, I had never seen a compelling argument based on performance numbers for passing a small/medium object around by value or by pointer. In the Ants case, we see that passing by pointer can provide a significant speedup.
Now I'm curious because I still think that fetching from memory is the bottleneck, and everything else (CPU, L1 cache reads, etc.) should be negligible. This would mean (maybe) that the pointer approach lets us retrieve fewer bytes of memory by not fetching the full Ant instance, while still achieving high throughput when retrieving non-contiguous bytes. Would need a bit more research to figure this out.