Shubham Gupta for Logixy

Posted on May 19 • Originally published at youtu.be

Median of Two Sorted Arrays — LeetCode #4 (Hard)

#leetcode #python #algorithms #array

The Problem

Given two sorted arrays, return the median of all their elements combined, in logarithmic time.

Example

Input: nums1 = [1,3], nums2 = [2]

Output: 2.00000

merged array = [1,2,3] and median is 2.

Constraints

0 <= m, n <= 1000
1 <= m + n <= 2000
-10^6 <= nums1[i], nums2[i] <= 10^6
Required runtime: O(log(m+n))

The Key Insight

Let's slow down. Look at what the merge throws away. We carefully ordered all eleven numbers. But we only needed the one in the middle. The rest was wasted effort.

Here's the real question. What does the median actually tell us about how the numbers are split? The median is the point where the numbers divide into a smaller half and a larger half. Equal sized halves when the total's even; otherwise the left half gets one extra.

So forget merging. What if we just drew a dividing line through each array, a left part and a right part? Eleven numbers total. We want six on the left, five on the right. That left count never changes.

Here's the move. If we cut the first array, the cut in the second is forced. Both left parts together must add up to six. So we only choose one thing: where to cut the smaller array. The other cut follows automatically.

When is a cut correct? Every number on the left must be less than or equal to every number on the right. Because both arrays are sorted, we only check the four numbers touching the cut lines: the two ends of the left, the two starts of the right.

The test is simple. The first array's left edge must not exceed the second array's right edge, and the other way around too. And to find the right cut, we don't try every spot. We binary search, like guessing a number while halving the range each time.

Watch out for one trap. You might pick the bigger array to search. Don't. Cut the smaller one, or its forced cut can fall off the edge. And when a cut sits at the very start or end, there's no number there. We treat a missing left edge as negative infinity, a missing right edge as positive infinity.

Why It Works

Notice what we never did. We never merged the lists. Each guess threw away half the remaining cut positions, so a few guesses were enough.

Walking Through It

Let's trace it. First list: one, three, eight, nine, fifteen. Second list: seven, eleven, eighteen, nineteen, twenty-one, twenty-five. Eleven numbers total. The left half needs six. We binary search the cuts of the smaller first array.

The cut in the first array can be anywhere from zero to five. Start the search: low is zero, high is five. Midpoint of the search is cut position two. That puts two numbers from the first array on the left, so the second gives four.

Now the four border numbers. First array: left edge three, right edge eight. Second array: left edge nineteen, right edge twenty-one. Check one: is the first array's left edge, three, less than or equal to the second's right edge, twenty-one? Yes.

Check two: is the second array's left edge, nineteen, less than or equal to the first's right edge, eight? No. Nineteen is way too big. So this cut is wrong. The second array put too many numbers on its left. We need more from the first array instead.

Move the search to the right. Low becomes three. Try again. New midpoint cut: position four. Four numbers from the first array on the left, so the second gives just two.

Border numbers now. First array: left edge nine, right edge fifteen. Second array: left edge eleven, right edge eighteen. Check one: first array's left edge nine, less than or equal to the second's right edge eighteen? Yes.

Check two: second array's left edge eleven, less than or equal to the first's right edge fifteen? Yes. Both pass. This cut is valid. The left half holds the six smallest numbers, the right half the five largest.

The total, eleven, is odd. The median is the largest number on the left: the bigger of nine and eleven, which is eleven.

Complexity

Each step halves the cuts still worth considering. So the number of steps grows with the logarithm of the smaller array's length. And we only ever store a few border values, never a merged list. The memory stays constant no matter how big the inputs grow.

The Code

OK, same logic in Python. Swap so A is the smaller array, work out the left half size, and set the search bounds on A's cut. Each loop pass picks a cut in A, and the cut in B follows automatically, since the left half size is fixed.

Read the four border values. When a cut sits at an edge, fall back to negative or positive infinity. If both checks pass, the cut is valid. An odd total returns the largest left value, an even total averages the two middle ones.

Otherwise shift the search. If A's left edge is too big, go left. Otherwise go right, and try again.

class Solution(object):
    def findMedianSortedArrays(self, nums1, nums2):
        A, B = nums1, nums2
        if len(A) > len(B):
            A, B = B, A
        m, n = len(A), len(B)
        half = (m + n + 1) // 2
        lo, hi = 0, m
        while lo <= hi:
            i = (lo + hi) // 2
            j = half - i
            left1 = A[i - 1] if i > 0 else float('-inf')
            right1 = A[i] if i < m else float('inf')
            left2 = B[j - 1] if j > 0 else float('-inf')
            right2 = B[j] if j < n else float('inf')
            if left1 <= right2 and left2 <= right1:
                if (m + n) % 2:
                    return max(left1, left2)
                return (max(left1, left2) + min(right1, right2)) / 2.0
            elif left1 > right2:
                hi = i - 1
            else:
                lo = i + 1

Wrap-up

And that's the trick. Stop merging, start cutting. When a problem demands O(log n) time, ask what you can throw away each step.

📺 Watch the full walkthrough on YouTube: https://youtu.be/6e5TLbzYyaw

Narration uses an AI-generated voice (Supertonic TTS).

DEV Community