From my (further) limited understanding on the topic, cache misses occur when the processors cannot predict what you will access next. For example, if I'm writing a game engine and I use a map of entity IDs to entities, then iterate through the keys and access the values, the processor cannot prefetch the next entity because it's not necessarily contiguous in memory. However, if I used an array of entities and mapped their position to their ID separately (or better yet just kept their ID on them and used the array on its own), then I could iterate through the entity array and the processor can predict what entity comes next and load it into the cache.
This is the kind of stuff that I'd like to learn more about, but it seems like the kind of thing that is rarely stated anywhere, and seems so platform specific. I bet a person could learn a lot more about it by writing a few benchmarks for the specific target platform, but it's rare that we get to hear much about results from little experiments like that.