Stack Overflowed

Posted on Jan 27

Remove Duplicate Letters Coding Problem Solution

#coding #challenge #tutorial #learning

The “Remove Duplicate Letters” problem asks you to take a string and remove repeated characters so that every letter appears exactly once in the result. The twist is that you are not allowed to reorder characters arbitrarily. The output must be a subsequence of the original string, meaning it preserves the original relative order of the characters you keep. On top of that, among all valid answers, you must return the lexicographically smallest one.

This combination of constraints is what makes the problem tricky. Removing duplicates alone is easy. Producing the smallest possible string while keeping only one occurrence of each letter, and still respecting subsequence order, is the real challenge.

Why “keep the first occurrence” is not correct

A common first attempt is to scan the string and keep the first time you see each character, skipping the rest. This guarantees uniqueness, but it often fails lexicographical minimality. Sometimes you want to drop an early occurrence of a character so that a smaller character can appear earlier in the result, as long as you can still include the dropped character later.

This is where the problem becomes less like a simple deduplication task and more like building an optimal subsequence under constraints.

The key insight: you are constructing a smallest subsequence with all unique letters

A useful way to think about the goal is this: you want the lexicographically smallest subsequence that contains every distinct letter exactly once. That framing makes it clear that every decision is about whether a character should be included now or postponed for a better arrangement.

Lexicographical order makes early positions extremely important. If you can place a smaller character earlier without losing the ability to include all letters later, you should do it.

Want to explore more coding problem solutions? Check out the Diagonal Traverse and Minimum Falling Path Sum.

Why a monotonic stack approach fits perfectly

The most effective strategy uses a stack-like construction to build the answer from left to right. As you scan the string, you decide whether to place the current character into the result. If it is already included, you skip it. If it is not included, you consider whether it should displace larger characters already in the result to make the overall string lexicographically smaller.

This is where the “monotonic” idea appears. You want the result to be as small as possible, so you prefer smaller characters earlier. If the current character is smaller than the character at the end of your current result, you may want to remove that larger character and replace it with the smaller one.

The crucial safety check: can you remove a character and still include it later?

You cannot pop characters from the result freely, because each character must appear exactly once. Removing a character is only safe if you know it will appear again later in the string, giving you another chance to include it.

This is why tracking remaining occurrences is essential. If a character at the end of your current result appears again later, you can safely remove it now and re-add it later in a better position. If it does not appear again, removing it would make it impossible to include that character at all, which would violate the problem’s requirement.

How the algorithm builds the optimal result

As you scan the string, you maintain two core pieces of information: whether a character is already in the result, and whether it appears again later. When you encounter a character not yet in the result, you compare it with the last character currently in the result. If the last character is lexicographically larger and can still be found later, you remove it to improve the result’s ordering. You repeat this idea as long as it remains safe and beneficial.

Then you insert the current character, marking it as included. This process ensures the result stays lexicographically minimal at every step while still guaranteeing that all letters can be included exactly once.

Why this approach is correct

The correctness comes from combining greedy choice with a feasibility guarantee. Greedy choice says you should place the smallest possible character early whenever doing so improves lexicographical order. Feasibility is maintained by only removing characters that you can still add later.

Because you never discard a character that would be lost forever, you preserve the ability to include all unique letters. Because you continuously improve the left-to-right order whenever it is safe, the final string is the smallest possible among all valid unique-letter subsequences.

Time and space complexity considerations

This approach runs efficiently because each character is pushed onto the stack at most once and popped at most once. That means total work scales linearly with the length of the string. Space usage is also manageable, driven by the stack and the tracking structures for seen characters and remaining counts.

This efficiency is one reason the problem is so well known. It looks complex, but the optimal solution is both elegant and fast once you recognize the right pattern.

Why this problem is popular in interviews

Remove Duplicate Letters is a classic interview problem because it tests whether candidates can combine greedy reasoning with careful constraints. It is easy to produce a valid, unique-letter string, but much harder to guarantee lexicographical minimality without breaking feasibility.

It also checks whether candidates understand when a stack is more than just a storage tool. Here, the stack represents an evolving optimal decision boundary.

What this problem teaches beyond string manipulation

Beyond its specific requirements, this problem teaches a reusable idea: when you need the smallest possible result under subsequence constraints, a greedy strategy backed by a monotonic stack and a “can I safely remove this?” test is often the right tool.

If you can clearly explain why early characters dominate lexicographical order, why you sometimes remove previously chosen letters, and how remaining-occurrence tracking makes that safe, you demonstrate strong algorithmic reasoning. That depth of understanding makes “Remove Duplicate Letters” an excellent exercise in greedy optimization and stack-based construction.

DEV Community