Arrays vs Linked Lists: Why Big-O Alone Doesn't Explain the Tradeoff

#algorithms #career #computerscience #beginners

The complexity table for arrays and linked lists is the most memorised, least understood reference in interview prep. Arrays give O(1) access. Linked lists give O(1) insertion. That's the answer on every flashcard. The table doesn't lie. It's incomplete. The missing piece, once you have it, makes every entry in that table feel obvious.

TL;DR:

Arrays and linked lists differ for one reason: contiguous memory vs scattered memory.

Every entry in the complexity table follows from that single fact.

Two O(n) operations aren't equal in practice. A million element array scan can often run an order of magnitude faster than the equivalent linked list traversal because of cache locality.

Linked lists win when your algorithm already holds a pointer to the mutation point. They lose when you have to traverse to find it first.

The right question on any problem isn't "which has better Big-O?" It's "which operation does this algorithm perform most?"

What the table tells you, and where it stops

The standard table has six or seven rows: access by index, search, insert at start, insert at end, insert at known position, delete at known position, memory overhead. Each row has two cells. Most engineers memorise the cells and stop there.

The table is correct. Arrays really do give you O(1) index access. Linked lists really do give you O(1) insertion at a known node. The cells aren't wrong. They describe outcomes without explaining why those outcomes happen, which means you can't generalise to a problem you haven't seen.

Memorising complexity tables teaches conclusions. Understanding memory layout teaches reasoning.

If your interviewer asks "which would you use here?" on an unfamiliar problem, a memorised table doesn't help. You need the mechanism underneath.

Memory layout is the root cause

Both data structures store the same thing: a sequence of values. They differ in exactly one way: where those values live.

An array is a single block of contiguous memory. Element 0 sits at the base address. Element 1 sits at base + element_size. Element i sits at base + (i * element_size). The address of any element is a single arithmetic expression. That arithmetic is what makes index access O(1). It's also what makes insertion in the middle O(n): to make room at index 2 of a 5 element array, every element from index 2 onward has to shift right.

A linked list is a chain of nodes that can live anywhere in memory. Node 0 might be at address 1000. Node 1 at address 7500. Node 2 at address 3200. The only way to find node 2 is to start at node 0 and follow the next pointers. That traversal is what makes index access O(n). It's also why insertion at a known node is O(1): you allocate a new node, point it at the next node, and update the previous node's pointer. Two pointer assignments. No shifting.

Every entry in the complexity table is a consequence of these two facts. The math gives you the cells. The layout tells you why.

When the same Big-O isn't the same: cache locality

This is what the table can't show. Both arrays and linked lists are technically O(n) for a full traversal. In practice, an array traversal of a million integers can often run an order of magnitude faster than the equivalent linked list traversal. Same Big-O. Very different wall clock time.

The reason is the CPU cache.

When the CPU reads element 0 from an array, it doesn't fetch just that one integer. It loads an entire cache line, typically 64 bytes, into the L1 cache. For a 4 byte integer array, that single fetch pulls in the next 15 elements alongside arr[0]. Iterating through them costs nanoseconds per access because they're already on chip. The CPU prefetcher notices the access pattern and starts loading the next cache line before you ask for it.

Linked list nodes are scattered across the heap. Each node.next dereference points at an unrelated address. Each pointer chase is a potential cache miss, and a cache miss is roughly a 100 nanosecond round trip to main memory. Multiply that by a million nodes and the gap is enormous.

# Both loops are O(n). The constants differ by ~10x in practice.

# Array: cache friendly contiguous reads
total = 0
for x in arr:
    # next 15 ints already in L1 after the first read
    total += x          

# Linked list: cache hostile pointer chases
total = 0
node = head
while node:
    total += node.value

    # node.next probably triggers a cache miss
    node = node.next

I keep noticing this is what surprises engineers most when they first see it. They've quoted O(n) for traversal in both cases for years. The constant factor never came up because Big-O hides it.

This is the reason ArrayList in Java and Python's built in list are array backed. You're paying the occasional O(n) resize cost in exchange for cache friendly iteration on every operation between resizes. That trade is almost always worth it on real workloads, even ones that involve frequent appends.

If you mention this in an interview, you've shown that you understand the hardware level, not just the math. Two candidates can both quote O(n) for traversal. Only one knows that one of those O(n)s is dramatically slower in practice.

Big-O tells you how growth behaves. Hardware tells you what actually feels fast.

When linked lists actually win

Honest credit where it's due: linked lists are the right structure in a narrow set of cases. Don't read the cache argument as "always pick arrays."

Mutation at a known position: if your algorithm already holds a pointer to the node where you're inserting or deleting, linked lists are O(1) and arrays are O(n). The keyword is "already holds." The standard LeetCode example is LRU Cache, where a hash map gives you O(1) access to the relevant node and a doubly linked list gives you O(1) reordering on every access. Neither structure alone solves the problem.
Pointer only manipulation: reversing a segment, splitting a list at a node, merging two sorted sequences. None of these require moving data, they're pure pointer reassignment. The singly linked list course on my own site covers seven distinct patterns built on this property.
No reallocation cost: dynamic arrays double their capacity when full, which involves copying every element. Linked lists grow one node at a time. For genuinely unbounded streaming workloads with random insertion, the linked list's amortised cost can come out ahead.

The most frequent mistake I watch beginners make is reaching for a linked list the moment they see "insert" or "delete" in a problem, without checking whether they already hold a pointer to the position. Without that pointer, you traverse first (O(n)) and then insert (O(1)). The traversal eats the advantage. Total cost is O(n), same as the array shift, with worse constants because of cache misses.

An O(1) operation that requires O(n) setup is still O(n).

How to choose without memorising the table

Once memory layout is your mental model, the choice becomes a small set of questions you can ask on any problem:

Does the algorithm need random access by index, or binary search? Pick an array. Linked lists can't do O(log n) binary search because finding the middle requires traversal.
Does the algorithm scan the whole structure repeatedly, where speed matters? Pick an array. Cache locality is doing real work in your favour.
Does the algorithm hold direct pointers to nodes it mutates (head, tail, hash map mapped)? A linked list might genuinely win.
Does the algorithm reverse, split, or merge segments by relinking? Linked lists are the natural fit.
Does the algorithm need both random access and frequent reordering at known positions? You probably want both: a hash map of keys to linked list nodes. LRU Cache is the canonical example.

Almost every interview problem involving a sequence of mutations is either an array problem (because access dominates) or a hash map plus linked list problem (because both operations matter). Pure linked list only problems are surprisingly rare in interview banks. They mostly show up when the problem itself is testing whether you can do pointer manipulation cleanly.

An aside on the table itself: the complexity tables you see everywhere aren't useless, even if they're incomplete. They're a fast lookup once you know what's underneath. The point isn't to throw the table out. It's to stop treating it as the answer when it's the symptom.

If Big-O tables still feel like facts you memorise rather than intuitions you can derive, I wrote a longer breakdown covering: