Problem Statement
(Source: Leetcode Problem 4: Median of Two Sorted Arrays [Hard])
(Topics: [Array] [Binary Search] [Divide and Conquer])
Given two sorted arrays
nums1andnums2of sizemandnrespectively, return themedian of the two sorted arrays.The overall time complexity should be
O(log(m+n)).
I was able to solve this problem with the following code:
def findMedianSortedArrays(self, nums1, nums2): nums = nums1 + nums2 nums.sort() if len(nums) % 2 == 0: return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0 else: return nums[len(nums) // 2]which has a time complexity of\$O((m+n)\log(m+n))\$ and space complexity of\$O(m+n)\$.
Admittedly while I didn't find the problem too difficult overall, I do want to still try to improve the time complexity of my code to at least\$O(\log(m+n))\$.
My question is:
What am I misunderstanding about the approach needed to make the code run in\$O(\log(m+n))\$ time?
Edit
Per toolic's suggestion, I have modified the code to run on newer versions of Python (of course also usingsnake_case to write function names for readability). I usedthis online Python editor to test my fixes:
def find_median_two_sorted_arrays(nums1: list, nums2: list): nums = nums1 + nums2 nums.sort if len(nums) % 2 == 0: return (nums[(len(nums) // 2) - 1] + nums[len(nums) // 2]) / 2.0 else: return nums[len(nums) // 2]Note: I would like to say that I know (after doing some research) that this can be done in\$O(\log(\min(n,m)))\$ time complexity with\$O(1)\$ space complexity. However, this is not what I want as this is admittedly very out of my reach with my understanding of Python.
- 2\$\begingroup\$A O(log(min(m, n))) approach has been mentioned here:codereview.stackexchange.com/a/202358/35991.\$\endgroup\$Martin R– Martin R2024-12-06 17:07:32 +00:00CommentedDec 6, 2024 at 17:07
- 2\$\begingroup\$@TobySpeight Thetime-limit-exceeded tag states "You may use this tag instead of performance when an online judge rejects your solution to a programming-challenge due to time constraints". We explicitly allow code which doesn't scale.\$\endgroup\$2024-12-09 01:29:06 +00:00CommentedDec 9, 2024 at 1:29
- \$\begingroup\$The crucial point of the task is finding a median oftwo arrays, not a single array obtained by joining the two.\$\endgroup\$CiaPan– CiaPan2024-12-11 18:37:54 +00:00CommentedDec 11, 2024 at 18:37
2 Answers2
mentioned that\$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.
Nah!It's really about the sizes\$n, m\$, rather than about the algorithm.If one is of comparable magnitude to the other, then\$O(\min(n, m))\$ is simply\$O(n)\$, a case that's easily dealt with.For the\$\min\$ to be interesting, one of these must hold:
- \$n \ll m\$, or
- \$n \gg m\$
That's saying that one of the input lists is essentially "empty"for purposes of Big-Oh analysis, and won't significantly contributeto the overall time.
Take e.g. the\$n \ll m\$ case.Since\$n\$ is so small, we could merge its entries into the larger listwith negligible cost.Or just use a\$O(\log m+n))\$ solution, confident that that's really\$O(\log m)\$.
improve the time complexity of my code to at least\$O(\log m+n))\$.
Your code uses a linear time expression,nums1 + nums2.You need to be able to talk about "the midpoint of the combined arrays"without ever touching the majority of array elements.A datastructure like this will assist with that:
class SlicedList: """Models a slice of a list using start and end index, with zero copies.""" def __init__( self, nums: list[float], start: int = 0, end: int | None = None ) -> None: self.nums = nums self.start = start self.end = len(nums) if end is None else end def slice(self, start: int, end: int | None = None) -> "SlicedList": length = self.end - self.start end_i: int = length if end is None else end assert 0 <= start <= end_i <= length return SlicedList(self.nums, self.start + start, self.start + end_i) def __getitem__(self, i: int) -> float: return self.nums[self.start + i] def __len__(self) -> int: return self.end - self.startAnd then this will help with the binary searching.
def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]: # noqa PLR0911 """Given two sorted lists of numbers, return (list_id, idx) in O(log n) time. We are concerned with the kth element of the merged lists a and b. list_id: 0 for a, 1 for b, according to which contains the kth sorted value idx: index of the kth element in either list a or b """ assert 0 <= k < len(a) + len(b) if not a: return 1, k if not b: return 0, k if k == 0: return int(a[0] > b[0]), k # binary search ia, ib = len(a) // 2, len(b) // 2 if ia + ib < k: if a[ia] > b[ib]: return kth_idx(a, b.slice(ib + 1), k - ib - 1) return kth_idx(a.slice(ia + 1), b, k - ia - 1) if a[ia] > b[ib]: return kth_idx(a.slice(0, ia), b, k) return kth_idx(a, b.slice(0, ib), k)- \$\begingroup\$And
nums.sort()is even worse thannums1+nums2.\$\endgroup\$Teepeemm– Teepeemm2024-12-12 23:54:58 +00:00CommentedDec 12, 2024 at 23:54
The previous answer addresses your main concern about complexity. This will address one of the now-deleted comments on the question regarding functionality.
Portability
The function that was posted must be part of a class which was not posted. The code can't be run as-is due to the presence of theself keyword. I removedself and added an example function call:
def findMedianSortedArrays(nums1, nums2): nums = nums1 + nums2 nums.sort if len(nums) % 2 == 0: return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0 else: return nums[len(nums) // 2] print(findMedianSortedArrays([6,4,5], [2,3,1]))When I run the code on Python version 2.7, I get the expected answer:
3.5However, when I run the code on Python version 3.7, I get an error:
TypeError: list indices must be integers or slices, not floatSince Python 2.7 is deprecated, I suggest modifying the code to run on newer 3.x versions as well.
Naming
ThePEP 8 style guide recommends using snake_case for function names:
find_median_sorted_arraysYou mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

