I recently built a custom in-page “Ctrl + F”-style search and highlight feature.
The goal sounded simple:
- Support multi-word queries
- Prefer full phrase matches
- Fall back to individual token matches
- Highlight results in the DOM
- Skip
<code>and<pre>blocks
In my head?
“Easy. Just build a regex.”
Step 1: Build the Regex
If a user searches:
power shell
I generate a pattern like:
power[\s\u00A0]+shell|power|shell
The logic:
- Try to match the full phrase first
- If that fails, match individual tokens
On paper? Clean.
In isolation? Works.
Step 2: Enter the DOM
This is where things escalated.
Instead of just running string.match(), I had to:
- Walk the DOM
- Avoid header UI
- Avoid
<pre>,<code>,<script>,<style> - Avoid breaking syntax highlighting
- Replace only text nodes
- Preserve structure
That meant using a TreeWalker.
const walker = document.createTreeWalker(root, NodeFilter.SHOW_TEXT, {
acceptNode(node) {
const p = node.parentElement;
if (!p) return NodeFilter.FILTER_REJECT;
if (p.closest("code, pre, script, style")) {
return NodeFilter.FILTER_REJECT;
}
return NodeFilter.FILTER_ACCEPT;
},
});
Now we’re not just doing regex.
We’re doing controlled DOM mutation.
Step 3: The Alternation Problem
This is where it got interesting.
Even though the phrase appears first in the alternation:
phrase|token1|token2
The engine still happily matches:
powershellPowerShell
Depending on context.
So now the problem isn’t “regex syntax”.
It’s:
- Overlapping matches
- Execution order
- Resetting lastIndex
- Avoiding double mutation
- Preventing nested
<mark>elements
Step 4: Two Passes?
At one point I thought:
Maybe this shouldn’t be one regex.
Maybe the logic should be:
- Try phrase match
- If none found, then try token match
Which sounds simple…
Until you realise your DOM has already been mutated once.
Now you’re managing state across passes.
The Realisation
I understand JavaScript logic.
I understand regex.
But applying that logic safely across a live DOM tree?
That’s a different tier of problem.
Regex is deterministic.
The DOM is structural and stateful.
And once you start replacing text nodes, everything becomes delicate.
What I Learned
- Regex problems are easy in isolation.
- DOM mutation problems are easy in isolation.
- Combining them multiplies complexity.
Also:
The line between “simple feature” and “mini search engine” is very thin.
Where I Am Now
The search works.
Mostly.
It highlights.
It skips protected blocks.
It respects structure.
But it’s not a browser-level Ctrl + F.
Not yet.
And that’s the interesting part.
I now respect the DOM far more than I did before.
And I never thought I’d say this sentence naturally:
I get the logic of JavaScript.
Making that logic behave predictably inside a living DOM tree is the real challenge.
There’s still refinement to do.
Edge cases to tame.
State to simplify.
But that’s the line between “feature complete” and “actually robust.”
And I’m somewhere in the middle of that line.
Top comments (0)