2
\$\begingroup\$

Problem Statement

(Source: Leetcode Problem 4: Median of Two Sorted Arrays [Hard])
(Topics: [Array] [Binary Search] [Divide and Conquer])

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

The overall time complexity should be O(log(m+n)).

I was able to solve this problem with the following code:

def findMedianSortedArrays(self, nums1, nums2):
    nums = nums1 + nums2
    nums.sort()
    if len(nums) % 2 == 0:
        return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
    else:
        return nums[len(nums) // 2] 

which has a time complexity of \$O((m+n)\log(m+n))\$ and space complexity of \$O(m+n)\$.

Admittedly while I didn't find the problem too difficult overall, I do want to still try to improve the time complexity of my code to at least \$O(\log(m+n))\$.

My question is:

What am I misunderstanding about the approach needed to make the code run in \$O(\log(m+n))\$ time?

Edit

Per toolic's suggestion, I have modified the code to run on newer versions of Python (of course also using snake_case to write function names for readability). I used this online Python editor to test my fixes:

def find_median_two_sorted_arrays(nums1: list, nums2: list):
    nums = nums1 + nums2
    nums.sort
    if len(nums) % 2 == 0:
        return (nums[(len(nums) // 2) - 1] + nums[len(nums) // 2]) / 2.0
    else:
        return nums[len(nums) // 2] 

Note: I would like to say that I know (after doing some research) that this can be done in \$O(\log(\min(n,m)))\$ time complexity with \$O(1)\$ space complexity. However, this is not what I want as this is admittedly very out of my reach with my understanding of Python.

\$\endgroup\$
3
  • 2
    \$\begingroup\$ A O(log(min(m, n))) approach has been mentioned here: codereview.stackexchange.com/a/202358/35991. \$\endgroup\$
    – Martin R
    Commented Dec 6, 2024 at 17:07
  • 2
    \$\begingroup\$ @TobySpeight The time-limit-exceeded tag states "You may use this tag instead of performance when an online judge rejects your solution to a programming-challenge due to time constraints". We explicitly allow code which doesn't scale. \$\endgroup\$
    – Peilonrayz
    Commented Dec 9, 2024 at 1:29
  • \$\begingroup\$ The crucial point of the task is finding a median of two arrays, not a single array obtained by joining the two. \$\endgroup\$
    – CiaPan
    Commented Dec 11, 2024 at 18:37

2 Answers 2

7
\$\begingroup\$

mentioned that \$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.

Nah! It's really about the sizes \$n, m\$, rather than about the algorithm. If one is of comparable magnitude to the other, then \$O(\min(n, m))\$ is simply \$O(n)\$, a case that's easily dealt with. For the \$\min\$ to be interesting, one of these must hold:

  • \$n \ll m\$, or
  • \$n \gg m\$

That's saying that one of the input lists is essentially "empty" for purposes of Big-Oh analysis, and won't significantly contribute to the overall time.

Take e.g. the \$n \ll m\$ case. Since \$n\$ is so small, we could merge its entries into the larger list with negligible cost. Or just use a \$O(\log m+n))\$ solution, confident that that's really \$O(\log m)\$.


improve the time complexity of my code to at least \$O(\log m+n))\$.

Your code uses a linear time expression, nums1 + nums2. You need to be able to talk about "the midpoint of the combined arrays" without ever touching the majority of array elements. A datastructure like this will assist with that:

class SlicedList:
    """Models a slice of a list using start and end index, with zero copies."""

    def __init__(
        self, nums: list[float], start: int = 0, end: int | None = None
    ) -> None:
        self.nums = nums
        self.start = start
        self.end = len(nums) if end is None else end

    def slice(self, start: int, end: int | None = None) -> "SlicedList":
        length = self.end - self.start
        end_i: int = length if end is None else end
        assert 0 <= start <= end_i <= length

        return SlicedList(self.nums, self.start + start, self.start + end_i)

    def __getitem__(self, i: int) -> float:
        return self.nums[self.start + i]

    def __len__(self) -> int:
        return self.end - self.start

And then this will help with the binary searching.

def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]:  # noqa PLR0911
    """Given two sorted lists of numbers, return (list_id, idx) in O(log n) time.

    We are concerned with the kth element of the merged lists a and b.

    list_id: 0 for a, 1 for b, according to which contains the kth sorted value
    idx: index of the kth element in either list a or b
    """
    assert 0 <= k < len(a) + len(b)
    if not a:
        return 1, k
    if not b:
        return 0, k
    if k == 0:
        return int(a[0] > b[0]), k

    # binary search
    ia, ib = len(a) // 2, len(b) // 2
    if ia + ib < k:
        if a[ia] > b[ib]:
            return kth_idx(a, b.slice(ib + 1), k - ib - 1)
        return kth_idx(a.slice(ia + 1), b, k - ia - 1)

    if a[ia] > b[ib]:
        return kth_idx(a.slice(0, ia), b, k)
    return kth_idx(a, b.slice(0, ib), k)
\$\endgroup\$
1
  • \$\begingroup\$ And nums.sort() is even worse than nums1+nums2. \$\endgroup\$
    – Teepeemm
    Commented Dec 12, 2024 at 23:54
4
\$\begingroup\$

The previous answer addresses your main concern about complexity. This will address one of the now-deleted comments on the question regarding functionality.

Portability

The function that was posted must be part of a class which was not posted. The code can't be run as-is due to the presence of the self keyword. I removed self and added an example function call:

def findMedianSortedArrays(nums1, nums2):
    nums = nums1 + nums2
    nums.sort
    if len(nums) % 2 == 0:
        return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
    else:
        return nums[len(nums) // 2] 

print(findMedianSortedArrays([6,4,5], [2,3,1]))

When I run the code on Python version 2.7, I get the expected answer:

3.5

However, when I run the code on Python version 3.7, I get an error:

TypeError: list indices must be integers or slices, not float

Since Python 2.7 is deprecated, I suggest modifying the code to run on newer 3.x versions as well.

Naming

The PEP 8 style guide recommends using snake_case for function names:

find_median_sorted_arrays
\$\endgroup\$
0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.