Merge sort Python implementation

Question 1

Is this efficient?

"""merge sort algorithm in python"""def merge_sort(array):    if len(array) == 1 or len(array) < 1:  # base case        return array    int_mid = len(array) / 2    left = merge_sort(array[:int_mid])    right = merge_sort(array[int_mid:])    return merge_arrays(left, right)def merge_arrays(left, right):    left_index = 0    right_index = 0    new_array = []    size = (len(left) + len(right))    for i in range(size):        if left[left_index] <= right[right_index]:            new_array.append(left[left_index])            left_index += 1        else:            new_array.append(right[right_index])            right_index += 1        if right_index >= len(right):            for i in range(left_index, len(left)):                new_array.append(left[i])            break        elif left_index >= len(left):            for i in range(right_index, len(right)):                new_array.append(right[i])            break    return new_arrayif __name__ == '__main__':    print merge_sort([8, 5, 3, 4, 6])    print merge_sort([10, 9, 8, 7, 6, 5, 4])    print merge_sort([1, 2, 3, 4, 5])    print merge_sort(['b', 'c', 'a'])    print merge_sort([54, 26, 93, 17, 77, 31, 44, 55, 20])    print merge_sort([])

Question 2

If you're using python2 you may want to switch theranges toxrange. Probably will speed things up a bit.

Question 3

Code

What comes to yourmerge_sort, you can rewrite the very first check succintly as follows:

if len(array) < 2:  # base case

Algorithm

You could try something like this:

def coderodde_mergesort_impl(source,                        target,                        source_offset,                        target_offset,                        range_length):    if range_length < 2:        return    half_range_length = range_length // 2    coderodde_mergesort_impl(target,                             source,                             target_offset,                             source_offset,                             half_range_length)    coderodde_mergesort_impl(target,                             source,                             target_offset + half_range_length,                             source_offset + half_range_length,                             range_length - half_range_length)    left_run_index = source_offset    right_run_index = source_offset + half_range_length    left_run_bound = right_run_index    right_run_bound = source_offset + range_length    target_index = target_offset    while left_run_index < left_run_bound and right_run_index < right_run_bound:        if source[right_run_index] < source[left_run_index]:            target[target_index] = source[right_run_index]            right_run_index += 1        else:            target[target_index] = source[left_run_index]            left_run_index += 1        target_index += 1    while left_run_index < left_run_bound:        target[target_index] = source[left_run_index]        target_index += 1        left_run_index += 1    while right_run_index < right_run_bound:        target[target_index] = source[right_run_index]        target_index += 1        right_run_index += 1def coderodde_mergesort_ext(array, from_index, to_index):    range_length = to_index - from_index    aux = []    i = from_index    while i < to_index:        aux.append(array[i])        i += 1    coderodde_mergesort_impl(aux, array, 0, from_index, range_length)def coderodde_mergesort(array):    coderodde_mergesort_ext(array, 0, len(array))

The above gives me the following performance figures:

Jaime_mergesort in 18399 milliseconds.coderodde_mergesort in 16601 milliseconds.op_merge_sort in 22787 milliseconds.

Hope that helps.

Question 4

"/" is fine because I am using python 2.7.

Question 5

@Orestis I updated the answer.

Question 6

No, it's not very efficient... There are operations you typically want to minimize if you want a performing solution, and memory allocation is one of the top ones. But yourmerge_arrays function is creating a new list every time it gets called, which is not a good thing. An efficient implementation of mergesort will do the sorting in-place (with a copy of the list before if the original has to be kept unchanged), and will reuse the merge buffer for all merging operations.

def mergesort(array, left=0, right=None):    if right is None:        right = len(array)    if right - left <= 1:        return    mid = left + (right - left) // 2    mergesort(array, left, mid)    mergesort(array, mid, right)    merge(array, left, mid, right)def merge(array, left, mid, right):    buffer = array[left:mid]    read_left = 0    read_right = mid    write = left    while read_left < len(buffer) and read_right < right:        if array[read_right] < buffer[read_left]:            array[write] = array[read_right]            read_right += 1        else:            array[write] = buffer[read_left]            read_left += 1        write += 1    while read_left < len(buffer):        array[write] = buffer[read_left]        read_left += 1        write += 1

Notice how the left part is the only one copied out into the buffer, as that is all that is needed. Notice also the loop structure, which is substantially different from yours. It is a matter of taste, but the single while loop for the merging, with an extra loop to copy items that could be left in the buffer (notice that any item left on the right side is already in the right place) seems to be preferred in most production implementations I've seen, and I find it more clear, but YMMV.

The above code is creating a new buffer for every merge operation, which makes the code a little simpler. A possible way of reusing a buffer is to have it be a static variable ofmerge. Static variables are a little funny in Python, but it could look something like this:

def merge(array, left, mid, right):    left_len = mid - left    if len(merge.buffer) < left_len:        merge.buffer.extend([None] * (len(buffer) - left_len))    for write, read in enumerate(range(left, mid)):        merge.buffer[write] = array[read]    read_left = 0    read_right = mid    write = left    while read_left < left_len and read_right < right:        if array[read_right] < buffer[read_left]:            array[write] = array[read_right]            read_right += 1        else:            array[write] = buffer[read_left]            read_left += 1        write += 1    while read_left < len(buffer):        array[write] = buffer[read_left]        read_left += 1        write += 1merge.buffer = []

I wouldn't be surprised though if, this being Python, the explicit looping took longer than the previous version. But this would be preferred in a lower level language.

Another possible optimization is to scan the left half of the array prior to the copying into the buffer, to avoid moving around items that are already in the right place:

def merge(array, left, mid, right):    while array[left] < array[mid]:        left += 1    ...

and the rest of the code would stay the same.

Question 7

From what I know, correct me if I am wrong, in-place merge sort is of theoretical interest and in practice might me slower than then typical implementation. Anyway, thanks for your answer, it is useful.

Question 8

That would be a bufferless mergesort that you have in mind. Doesn't apply to this code, which is how most real world mergesort are implemented.

coderodde 32.3k15 gold badges79 silver badges205 bronze badges · Accepted Answer · 2016-06-18 12:16:00Z

Code

What comes to yourmerge_sort, you can rewrite the very first check succintly as follows:

if len(array) < 2:  # base case

Algorithm

You could try something like this:

def coderodde_mergesort_impl(source,                        target,                        source_offset,                        target_offset,                        range_length):    if range_length < 2:        return    half_range_length = range_length // 2    coderodde_mergesort_impl(target,                             source,                             target_offset,                             source_offset,                             half_range_length)    coderodde_mergesort_impl(target,                             source,                             target_offset + half_range_length,                             source_offset + half_range_length,                             range_length - half_range_length)    left_run_index = source_offset    right_run_index = source_offset + half_range_length    left_run_bound = right_run_index    right_run_bound = source_offset + range_length    target_index = target_offset    while left_run_index < left_run_bound and right_run_index < right_run_bound:        if source[right_run_index] < source[left_run_index]:            target[target_index] = source[right_run_index]            right_run_index += 1        else:            target[target_index] = source[left_run_index]            left_run_index += 1        target_index += 1    while left_run_index < left_run_bound:        target[target_index] = source[left_run_index]        target_index += 1        left_run_index += 1    while right_run_index < right_run_bound:        target[target_index] = source[right_run_index]        target_index += 1        right_run_index += 1def coderodde_mergesort_ext(array, from_index, to_index):    range_length = to_index - from_index    aux = []    i = from_index    while i < to_index:        aux.append(array[i])        i += 1    coderodde_mergesort_impl(aux, array, 0, from_index, range_length)def coderodde_mergesort(array):    coderodde_mergesort_ext(array, 0, len(array))

The above gives me the following performance figures:

Jaime_mergesort in 18399 milliseconds.coderodde_mergesort in 16601 milliseconds.op_merge_sort in 22787 milliseconds.

Hope that helps.

Movatterモバイル変換

Stack Exchange Network

Merge sort Python implementation

2 Answers2

You mustlog in to answer this question.

Related

Hot Network Questions

Subscribe to RSS