Is this efficient?
"""merge sort algorithm in python"""def merge_sort(array): if len(array) == 1 or len(array) < 1: # base case return array int_mid = len(array) / 2 left = merge_sort(array[:int_mid]) right = merge_sort(array[int_mid:]) return merge_arrays(left, right)def merge_arrays(left, right): left_index = 0 right_index = 0 new_array = [] size = (len(left) + len(right)) for i in range(size): if left[left_index] <= right[right_index]: new_array.append(left[left_index]) left_index += 1 else: new_array.append(right[right_index]) right_index += 1 if right_index >= len(right): for i in range(left_index, len(left)): new_array.append(left[i]) break elif left_index >= len(left): for i in range(right_index, len(right)): new_array.append(right[i]) break return new_arrayif __name__ == '__main__': print merge_sort([8, 5, 3, 4, 6]) print merge_sort([10, 9, 8, 7, 6, 5, 4]) print merge_sort([1, 2, 3, 4, 5]) print merge_sort(['b', 'c', 'a']) print merge_sort([54, 26, 93, 17, 77, 31, 44, 55, 20]) print merge_sort([])- \$\begingroup\$If you're using python2 you may want to switch the
ranges toxrange. Probably will speed things up a bit.\$\endgroup\$Dair– Dair2016-06-18 02:32:02 +00:00CommentedJun 18, 2016 at 2:32
2 Answers2
Code
What comes to yourmerge_sort, you can rewrite the very first check succintly as follows:
if len(array) < 2: # base caseAlgorithm
You could try something like this:
def coderodde_mergesort_impl(source, target, source_offset, target_offset, range_length): if range_length < 2: return half_range_length = range_length // 2 coderodde_mergesort_impl(target, source, target_offset, source_offset, half_range_length) coderodde_mergesort_impl(target, source, target_offset + half_range_length, source_offset + half_range_length, range_length - half_range_length) left_run_index = source_offset right_run_index = source_offset + half_range_length left_run_bound = right_run_index right_run_bound = source_offset + range_length target_index = target_offset while left_run_index < left_run_bound and right_run_index < right_run_bound: if source[right_run_index] < source[left_run_index]: target[target_index] = source[right_run_index] right_run_index += 1 else: target[target_index] = source[left_run_index] left_run_index += 1 target_index += 1 while left_run_index < left_run_bound: target[target_index] = source[left_run_index] target_index += 1 left_run_index += 1 while right_run_index < right_run_bound: target[target_index] = source[right_run_index] target_index += 1 right_run_index += 1def coderodde_mergesort_ext(array, from_index, to_index): range_length = to_index - from_index aux = [] i = from_index while i < to_index: aux.append(array[i]) i += 1 coderodde_mergesort_impl(aux, array, 0, from_index, range_length)def coderodde_mergesort(array): coderodde_mergesort_ext(array, 0, len(array))The above gives me the following performance figures:
Jaime_mergesort in 18399 milliseconds.coderodde_mergesort in 16601 milliseconds.op_merge_sort in 22787 milliseconds.
Hope that helps.
No, it's not very efficient... There are operations you typically want to minimize if you want a performing solution, and memory allocation is one of the top ones. But yourmerge_arrays function is creating a new list every time it gets called, which is not a good thing. An efficient implementation of mergesort will do the sorting in-place (with a copy of the list before if the original has to be kept unchanged), and will reuse the merge buffer for all merging operations.
def mergesort(array, left=0, right=None): if right is None: right = len(array) if right - left <= 1: return mid = left + (right - left) // 2 mergesort(array, left, mid) mergesort(array, mid, right) merge(array, left, mid, right)def merge(array, left, mid, right): buffer = array[left:mid] read_left = 0 read_right = mid write = left while read_left < len(buffer) and read_right < right: if array[read_right] < buffer[read_left]: array[write] = array[read_right] read_right += 1 else: array[write] = buffer[read_left] read_left += 1 write += 1 while read_left < len(buffer): array[write] = buffer[read_left] read_left += 1 write += 1Notice how the left part is the only one copied out into the buffer, as that is all that is needed. Notice also the loop structure, which is substantially different from yours. It is a matter of taste, but the single while loop for the merging, with an extra loop to copy items that could be left in the buffer (notice that any item left on the right side is already in the right place) seems to be preferred in most production implementations I've seen, and I find it more clear, but YMMV.
The above code is creating a new buffer for every merge operation, which makes the code a little simpler. A possible way of reusing a buffer is to have it be a static variable ofmerge. Static variables are a little funny in Python, but it could look something like this:
def merge(array, left, mid, right): left_len = mid - left if len(merge.buffer) < left_len: merge.buffer.extend([None] * (len(buffer) - left_len)) for write, read in enumerate(range(left, mid)): merge.buffer[write] = array[read] read_left = 0 read_right = mid write = left while read_left < left_len and read_right < right: if array[read_right] < buffer[read_left]: array[write] = array[read_right] read_right += 1 else: array[write] = buffer[read_left] read_left += 1 write += 1 while read_left < len(buffer): array[write] = buffer[read_left] read_left += 1 write += 1merge.buffer = []I wouldn't be surprised though if, this being Python, the explicit looping took longer than the previous version. But this would be preferred in a lower level language.
Another possible optimization is to scan the left half of the array prior to the copying into the buffer, to avoid moving around items that are already in the right place:
def merge(array, left, mid, right): while array[left] < array[mid]: left += 1 ...and the rest of the code would stay the same.
- \$\begingroup\$From what I know, correct me if I am wrong, in-place merge sort is of theoretical interest and in practice might me slower than then typical implementation. Anyway, thanks for your answer, it is useful.\$\endgroup\$Orestis– Orestis2016-06-18 08:30:16 +00:00CommentedJun 18, 2016 at 8:30
- 1\$\begingroup\$That would be a bufferless mergesort that you have in mind. Doesn't apply to this code, which is how most real world mergesort are implemented.\$\endgroup\$Jaime– Jaime2016-06-18 08:33:35 +00:00CommentedJun 18, 2016 at 8:33
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.