A Deep Dive into Arrays – Part 1: What, Why & How

#computerscience #programming #datastructures

An array is a data structure that holds multiple values of the same data type in contiguous memory locations.
Let’s break this down:

What’s a Data Structure? A data structure is a specific way of storing and accessing data in a computer so it can be used efficiently. Common operations: Read, Write, Update, Delete.
We choose different data structures based on our needs. Some examples:
- Arrays (specifically 2D arrays or matrices) are used to represent pixels in an image. Since array elements are stored in contiguous memory, accessing them is very fast — something that's critical in graphics rendering.
- Trees are a natural choice for implementing file systems. Folders nested within folders can be best modeled as trees, with the root folder at the top.

Arrays are optimized for locality of reference — a key concept in making memory access fast.

This refers to the tendency of programs to access:

These patterns are utilized by:

The CPU: When the CPU fetches data from memory, it also fetches nearby memory locations, anticipating future access.
Programmers: Take an image processing task where pixels are stored in a matrix (2D array). In most programming languages, matrices are stored in row-major order, meaning all columns of row 1 are stored first, then row 2, and so on.
So consider this code:
```
        for col in range(N):
            for row in range(N):
                sum += matrix[row][col]
```
This accesses data column-first, skipping memory in the way it’s laid out, leading to cache misses. A better approach would be to access data row-first.

Array elements are stored contiguously (spatial locality).
When you access array[0], the CPU preloads array[1], array[2], etc., into the CPU cache.
This makes index-based access blazing fast. Hence, arrays are ideal when:
You need to store large amounts of data of the same type
You need to perform common operations across all elements, like scaling pixels in an image.