2
\$\begingroup\$

Problem Statement

(Source: Leetcode Problem 4: Median of Two Sorted Arrays [Hard])
(Topics: [Array] [Binary Search] [Divide and Conquer])

Given two sorted arraysnums1 andnums2 of sizem andn respectively, return themedian of the two sorted arrays.

The overall time complexity should beO(log(m+n)).

I was able to solve this problem with the following code:

def findMedianSortedArrays(self, nums1, nums2):    nums = nums1 + nums2    nums.sort()    if len(nums) % 2 == 0:        return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0    else:        return nums[len(nums) // 2]

which has a time complexity of\$O((m+n)\log(m+n))\$ and space complexity of\$O(m+n)\$.

Admittedly while I didn't find the problem too difficult overall, I do want to still try to improve the time complexity of my code to at least\$O(\log(m+n))\$.

My question is:

What am I misunderstanding about the approach needed to make the code run in\$O(\log(m+n))\$ time?

Edit

Per toolic's suggestion, I have modified the code to run on newer versions of Python (of course also usingsnake_case to write function names for readability). I usedthis online Python editor to test my fixes:

def find_median_two_sorted_arrays(nums1: list, nums2: list):    nums = nums1 + nums2    nums.sort    if len(nums) % 2 == 0:        return (nums[(len(nums) // 2) - 1] + nums[len(nums) // 2]) / 2.0    else:        return nums[len(nums) // 2]

Note: I would like to say that I know (after doing some research) that this can be done in\$O(\log(\min(n,m)))\$ time complexity with\$O(1)\$ space complexity. However, this is not what I want as this is admittedly very out of my reach with my understanding of Python.

askedDec 6, 2024 at 16:20
CrSb0001's user avatar
\$\endgroup\$
3
  • 2
    \$\begingroup\$A O(log(min(m, n))) approach has been mentioned here:codereview.stackexchange.com/a/202358/35991.\$\endgroup\$CommentedDec 6, 2024 at 17:07
  • 2
    \$\begingroup\$@TobySpeight Thetime-limit-exceeded tag states "You may use this tag instead of performance when an online judge rejects your solution to a programming-challenge due to time constraints". We explicitly allow code which doesn't scale.\$\endgroup\$CommentedDec 9, 2024 at 1:29
  • \$\begingroup\$The crucial point of the task is finding a median oftwo arrays, not a single array obtained by joining the two.\$\endgroup\$CommentedDec 11, 2024 at 18:37

2 Answers2

7
\$\begingroup\$

mentioned that\$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.

Nah!It's really about the sizes\$n, m\$, rather than about the algorithm.If one is of comparable magnitude to the other, then\$O(\min(n, m))\$ is simply\$O(n)\$, a case that's easily dealt with.For the\$\min\$ to be interesting, one of these must hold:

  • \$n \ll m\$, or
  • \$n \gg m\$

That's saying that one of the input lists is essentially "empty"for purposes of Big-Oh analysis, and won't significantly contributeto the overall time.

Take e.g. the\$n \ll m\$ case.Since\$n\$ is so small, we could merge its entries into the larger listwith negligible cost.Or just use a\$O(\log m+n))\$ solution, confident that that's really\$O(\log m)\$.


improve the time complexity of my code to at least\$O(\log m+n))\$.

Your code uses a linear time expression,nums1 + nums2.You need to be able to talk about "the midpoint of the combined arrays"without ever touching the majority of array elements.A datastructure like this will assist with that:

class SlicedList:    """Models a slice of a list using start and end index, with zero copies."""    def __init__(        self, nums: list[float], start: int = 0, end: int | None = None    ) -> None:        self.nums = nums        self.start = start        self.end = len(nums) if end is None else end    def slice(self, start: int, end: int | None = None) -> "SlicedList":        length = self.end - self.start        end_i: int = length if end is None else end        assert 0 <= start <= end_i <= length        return SlicedList(self.nums, self.start + start, self.start + end_i)    def __getitem__(self, i: int) -> float:        return self.nums[self.start + i]    def __len__(self) -> int:        return self.end - self.start

And then this will help with the binary searching.

def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]:  # noqa PLR0911    """Given two sorted lists of numbers, return (list_id, idx) in O(log n) time.    We are concerned with the kth element of the merged lists a and b.    list_id: 0 for a, 1 for b, according to which contains the kth sorted value    idx: index of the kth element in either list a or b    """    assert 0 <= k < len(a) + len(b)    if not a:        return 1, k    if not b:        return 0, k    if k == 0:        return int(a[0] > b[0]), k    # binary search    ia, ib = len(a) // 2, len(b) // 2    if ia + ib < k:        if a[ia] > b[ib]:            return kth_idx(a, b.slice(ib + 1), k - ib - 1)        return kth_idx(a.slice(ia + 1), b, k - ia - 1)    if a[ia] > b[ib]:        return kth_idx(a.slice(0, ia), b, k)    return kth_idx(a, b.slice(0, ib), k)
answeredDec 6, 2024 at 18:20
J_H's user avatar
\$\endgroup\$
1
  • \$\begingroup\$Andnums.sort() is even worse thannums1+nums2.\$\endgroup\$CommentedDec 12, 2024 at 23:54
4
\$\begingroup\$

The previous answer addresses your main concern about complexity. This will address one of the now-deleted comments on the question regarding functionality.

Portability

The function that was posted must be part of a class which was not posted. The code can't be run as-is due to the presence of theself keyword. I removedself and added an example function call:

def findMedianSortedArrays(nums1, nums2):    nums = nums1 + nums2    nums.sort    if len(nums) % 2 == 0:        return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0    else:        return nums[len(nums) // 2] print(findMedianSortedArrays([6,4,5], [2,3,1]))

When I run the code on Python version 2.7, I get the expected answer:

3.5

However, when I run the code on Python version 3.7, I get an error:

TypeError: list indices must be integers or slices, not float

Since Python 2.7 is deprecated, I suggest modifying the code to run on newer 3.x versions as well.

Naming

ThePEP 8 style guide recommends using snake_case for function names:

find_median_sorted_arrays
answeredDec 8, 2024 at 11:31
toolic's user avatar
\$\endgroup\$
0

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.