I doing some algorithm reviews and am soliciting opinions on my implementation of the Merge Sort in Python.
Anything from readability to optimization is fair game.
In particular, I am wondering if accessing the elements of a list usingpop(0) somehow slows things down. I am doing this because I am not aware of a good way to peek into the last for the last element.
Note: The code works on both Python 3 and 2.
import mathdef merge(left, right): merge_result = [] while (len(left) > 0 and len(right) > 0): if left[0] > right[0]: merge_result.append(right.pop(0)) else: merge_result.append(left.pop(0)) while (len(left)): merge_result.append(left.pop(0)) while (len(right)): merge_result.append(right.pop(0)) return merge_resultdef merge_sort(array_to_be_sorted): if len(array_to_be_sorted) < 2: return array_to_be_sorted middle = math.floor(len(array_to_be_sorted) / 2) left = array_to_be_sorted[0:middle] right = array_to_be_sorted[middle:] return merge(merge_sort(left),merge_sort(right))2 Answers2
.pop(0) would remove the first element from a list - this is a \$O(n)\$ operation since everything after must "shift" (Time Complexity reference).
I would not pop fromleft andright subarrays and instead keep track of indexes in both theleft andright lists:
def merge(left, right): merge_result = [] left_index = right_index = 0 while left_index < len(left) and right_index < len(right): if left[left_index] > right[right_index]: merge_result.append(right[right_index]) right_index += 1 else: merge_result.append(left[left_index]) left_index += 1 merge_result += left[left_index:] merge_result += right[right_index:] return merge_resultAnd, if you are preparing for an interview, make sure to write this function from memory couple times - it'll persist in your memory pretty soon - focus on the pattern (using left and right indexes for merging), don't try to remember the code line-by-line.
General issues
Here
while (len(left) > 0 and len(right) > 0): ....you can write just
while len(left) > 0 and len(right) > 0: ....or even:
while left and right: ...Also,
while (len(right)): merge_result.append(right.pop(0))above the second line is overintended. It should be
while len(right): merge_result.append(right.pop(0))Here,
middle = math.floor(len(array_to_be_sorted) / 2)just do
middle = len(array_to_be_sorted) // 2Performance
This is my primary concern. Your implementation works correctly, but, alas, it's slow:
OP's mergesort in 4356 milliseconds.coderodde's mergesort in 700 milliseconds.Native sort in 33 milliseconds.OP's mergesort agrees with native sort: Truecoderodde's mergesort agrees with native sort: TrueAs you can see, the native sort routine is unbeatable since it is implemented in native machine code (apart from being Timsort (as far as I know)). What comes to routine I provided, it runs in\$\Theta(n \log n)\$ time using a swapping array pattern: it creates an auxiliary array with the same contents as the actual array to sort and keeps alternating between the two. The idea is that, when merging, we do merge from one array to another, which avoids creating new smaller arrays to hold the left and right parts of the input range. All in all, I had this in mind:
import mathimport randomimport timedef millis(): return int(round(time.time() * 1000))def op_merge(left, right): merge_result = [] while left and right: if left[0] > right[0]: merge_result.append(right.pop(0)) else: merge_result.append(left.pop(0)) while len(left): merge_result.append(left.pop(0)) while len(right): merge_result.append(right.pop(0)) return merge_resultdef op_merge_sort(array_to_be_sorted): if len(array_to_be_sorted) < 2: return array_to_be_sorted middle = math.floor(len(array_to_be_sorted) / 2) left = array_to_be_sorted[0:middle] right = array_to_be_sorted[middle:] return op_merge(op_merge_sort(left), op_merge_sort(right))def coderodde_merge(source_array, target_array, left_start_index, right_start_index, right_end_index): left_end_index = right_start_index # Just rename: left_index = left_start_index right_index = right_start_index target_array_index = left_start_index while left_index < left_end_index and right_index < right_end_index: if source_array[left_index] > source_array[right_index]: target_array[target_array_index] = source_array[right_index] right_index += 1 else: target_array[target_array_index] = source_array[left_index] left_index += 1 target_array_index += 1 while left_index < left_end_index: target_array[target_array_index] = source_array[left_index] target_array_index += 1 left_index += 1 while right_index < right_end_index: target_array[target_array_index] = source_array[right_index] target_array_index += 1 right_index += 1def coderodde_merge_sort_impl(source_array, target_array, start_index, end_index): range_len = end_index - start_index if range_len < 2: return middle_index = start_index + range_len // 2 coderodde_merge_sort_impl(target_array, source_array, start_index, middle_index) coderodde_merge_sort_impl(target_array, source_array, middle_index, end_index) coderodde_merge(source_array, target_array, start_index, middle_index, end_index)def coderodde_merge_sort(array): aux_array = array[:] coderodde_merge_sort_impl(aux_array, array, 0, len(array))def benchmark_op_mergesort(): start_time = millis() arr = op_merge_sort(array1) end_time = millis() print("OP's mergesort in", end_time - start_time, "milliseconds.") return arrdef benchmark_coderodde_mergesort(): start_time = millis() coderodde_merge_sort(array2) end_time = millis() print("coderodde's mergesort in", end_time - start_time, "milliseconds.")def benchmark_native_sort(): start_time = millis() array3.sort() end_time = millis() print("Native sort in", end_time - start_time, "milliseconds.")def arrays_are_same(arr1, arr2): if len(arr1) != len(arr2): return False for i in range(len(arr1)): if arr1[i] != arr2[i]: return False return Truearray1 = []for i in range(200_000): array1.append(random.randint(-1_000_000, 2_000_000))array2 = array1[:]array3 = array1[:]array1 = benchmark_op_mergesort()benchmark_coderodde_mergesort()benchmark_native_sort()print("OP's mergesort agrees with native sort:", arrays_are_same(array1, array3))print("coderodde's mergesort agrees with native sort:", arrays_are_same(array2, array3))You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.
