Prefix Tree vs Hash Map: 47% Slower Insert Reality

#autocomplete #trie #prefixtree #hashmap

Hash maps dominate autocomplete implementations, but that's not the full story

Most production autocomplete systems use hash maps. They're simple, fast, and "good enough" for search boxes with a few thousand product names or city entries. But push them past 100k entries with shared prefixes — think domain names, file paths, or gene sequences — and you start seeing lag.

The classic computer science answer is "use a trie" (prefix tree). The reality? Tries are slower to build, use more memory, and only win in specific scenarios. I tested both on a 500k-word dictionary to see where the crossover happens.

The mental model: what each structure actually does

A hash map stores complete strings as keys. When you type "pyth", it doesn't know that "python" and "pythonic" share a prefix — it just checks if "pyth" exists as a key (it doesn't), then returns nothing. To support autocomplete, you have to scan ALL keys looking for matches. That's $O(n \cdot k)$ where $n$ is the dictionary size and $k$ is the average word length (for startswith() comparison).

A prefix tree (trie) stores strings character-by-character in a tree structure. Each node represents one character. The path from root to a node spells out a prefix. When you type "pyth", you walk down p→y→t→h in the tree (4 operations), then collect all words in that subtree. That's $O(p + m)$ where $p$ is the prefix length and $m$ is the number of matches.

Continue reading the full article on TildAlice