Why My "Optimized" Iterative Tree Traversal Was Slower Than Recursion

#java #datastructures #memorymanagement #algorithms

It was a normal day; I was grinding some LeetCode for an upcoming interview. I started to warm up with some easy problems and came across the classic "maximum depth of a binary tree" problem. Like always, I first solved it recursively and then tried the iterative approach, which is "supposed" to be optimal. But I noticed a difference in runtimes of both the solutions. The iterative one was consistently slower by 2ms.

The Discrepancy

Let’s first understand the problem statement in case you haven’t solved it before. We have to find the maximum depth of the binary tree, which is the maximum number of nodes along the path from the root node to the farthest leaf node.

Now that we have understood the problem, let’s discuss the approaches real quick.

The recursive approach

The recursive approach is pretty straightforward; you have to check for the maximum depth from the left and right subtrees of that node and return 1 + the maximum of the left and right depths. Check the following image for the solution.

The iterative (or “optimal”) approach

Here we will sort of simulate the recursive solution itself. The recursive solution uses the allocated stack memory of the RAM on which it pushes stack frames for every recursive function call. We’re going to do a similar thing by creating our own stacks to record the nodes and the current depth of the nodes.

Create 2 stacks, nodeStack and depthStack, to store TreeNodes and depths, respectively. Also initialize an ans variable for maximum depth. Push the root node and 1 into nodeStack and depthStack, respectively, as a starting point. Run a while loop until the nodeStack is empty. In every iteration, take out the top element of both stacks, maximize the ans variable, and store the left and right node of the current node along with the depth+1 value in depthStack. Finally, when the loop is completed, return the ans variable as the final ans.

Check the below screenshot for code.

The Big-O Illusion: 2ms of Hidden Latency

In the above screenshots, you can see that the iterative solution is taking 2 ms more than the recursive one. You might think that it is just a latency fluke from the LeetCode server, but it’s actually consistent. It points to something much deeper—and much more fascinating—hidden within the architecture of the Java Virtual Machine (JVM).

As software engineers, we are trained to analyze algorithms using Big-O notation. On paper, both of these approaches run in $O(N)$ time complexity. In fact, you could argue the iterative approach is 'better' because it protects us from running out of memory. Yet, the stopwatch doesn't lie.

To understand why our seemingly optimized iterative loop is losing the race to recursion, we have to look past our code syntax and look directly at how the CPU and JVM manage memory allocations under the hood. We need to look at the battle between the Hardware Call Stack and the JVM Heap.

The Hidden Costs of the Iterative Illusion

When we move from a recursive to an iterative solution, we often congratulate ourselves—with a pat on our backs—for an optimal solution. But here’s the reality check: You didn’t destroy the stack; you just moved it.

Recursion usually relies on the hardware-level OS stack, which is tightly managed and highly optimized by the CPU. When you rewrite the same problem using a stack or deque object, you are essentially writing a user-space simulation of an operating system stack, which is now built entirely on top of the JVM heap and managed and optimized by the JVM.

Shifting the code logic from stack to heap might save the code from crashing due to stackoverflow, but it forces JVM to pay heavier architectural cost. Let’s look at 2 main reasons why iterative loop is quietly charging you.

The Array resizing tax

When you initialize an ArrayDeque, Java allocated and internal array with a default size or capacity. As you push nodes in the queue during deep tree branches, that array can fill up. When if fills up completely, ArrayDeque must allocate new larger array and copy all the contents of previous array to the larger array. This array reallocation and copying of elements takes time.

Primitive Boxing/Unboxing

Look closely at the second stack: Deque<Integer> depthStack.

Java collections cannot store raw primitives like int; they can only store Objects. When you execute depthStack.push(currDepth + 1), Java implicitly converts your raw primitive int into an Integer object behind the scenes. This is called Autoboxing.

While it is much faster than creating a custom class, it still forces the JVM to fetch or instantiate an Integer object wrapper, adding minor execution steps that recursion entirely avoids by handling raw primitives right inside the hardware registers.

These two factors causes a huge overhead in the iterative solution which brings a 2ms gap between both approaches.

Conclusion

While recursion wins this specific micro-benchmark by a millisecond or two, it introduces a critical vulnerability: StackOverflowError. In production systems, if incoming data contains a severely skewed tree with a depth of 10,000+ nodes, a fixed hardware call stack will instantly exhaust its space and crash the entire application thread.

The iterative solution safely shifts this volatile memory footprint to the dynamic JVM Heap, which can scale up to gigabytes of flexible space.

True optimization isn't about chasing a faster LeetCode green screen; it’s about understanding the memory mechanics of your language so you can intentionally choose the right architectural trade-offs for your system's constraints.