Locating a sequence in a sorted array

Question 1

I read the following question:Searching an element in a sorted array and I thought that I could give it a try in Python.

Given a sorted list of integers and an integer. Return the (index) bounds of the sequence of this element in the list.

Example:

l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]0 not found in list1:(0, 0)2:(1, 2)3:(3, 5)4:(6, 11)5 not found in list6:(12, 12)7:(13, 13)8:(14, 14)9:(15, 18)

Here is my program (using Python 3).It uses a dichotomic search and returns the bounds to help the search of the start and end of the sequence.

def main():    l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]    print(l)    for i in range(10):        try:            print(find_sequence(i, l))        except Exception as e:            print(str(e))def find_sequence(x, l):    """Return a tuple (begin, end) containing the index bounds of the sequence of x"""    left, found, right = dichotomic_search(x, l)    begin = outside_bound(x, l, left, found)    end = outside_bound(x, l, right, found)    return begin, enddef outside_bound(x, l, outside, inside):    """Return the outside bound of the sequence"""    if l[outside] == x:        return outside    middle = -1    previous_middle = -2    while middle != previous_middle:        previous_middle = middle        middle = (outside + inside) // 2        if l[middle] == x:            inside = middle        else:            outside = middle    return insidedef dichotomic_search(x, l):    """Return a tuple of indexes (left, found, right)    left: leftmost index where x might be found    found: index where x is    right: rightmost index where x might be found    """    left = 0    right = len(l) - 1    if l[left] > x or l[right] < x:        raise Exception(str(x) + ' not found in list')    if l[left] == x:        return left, left, right    if l[right] == x:        return left+1, right, right # we know that l[left]!=x    while left < right:        middle = (left + right) // 2        if l[middle] == x:            return left, middle, right        elif l[middle] < x:            left = middle + 1 # to prevent fixed point        elif l[middle] > x:            right = middle # impossible to do -1 because of the integer division    raise Exception(str(x) + ' not found in list')if __name__ == "__main__":    main()

I'm not so fond of themiddle != previous_middle, but I didn't find a more elegant way (yet).

Question 2

Do you know aboutbisect (i.e. did you implement that search yourself on purpose)?

Question 3

I implemented it on purpose (and I think using bisect the search for the second extremity of the sequence can't be optimized - whereas I'm reusing the bounds of my dichotomic search)

Question 4

Why not justbisect_left(l, x), bisect_right(l, x) - 1? The worry about reusing the bounds seems misplaced to me. In the average case you're searching an array that's half the size—but it's a binary search, so it only saves you one iteration. Whereas thebisect module has a fast C implementation.

Question 5

"it only saves you one iteration" only if you search the element. If you search the bounds, it might turn out to be worse. For instance, search for1:{0, 1, 2, 2, 2, 2, 2, 2, 2}.

Question 6

I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:

def find_indexes(xs, x)  start = find_first_index(xs, x)  return ((start, find_last_index(xs, x, start=start)) if start else None)

For a simple implementation offind_first_index andfind_last_index, checkbisect.bisect_left andbisect.bisect_right source codehere.

Question 7

My approach might be complicated, but it aims at being efficient. When you callfind_last_index, The first part of the work might have already been done by thefind_first_index (the part "finding the index"). What I could do though, is "improve"bisect.bisect_left to remember the first time the element was found (to use it as a bound forfind_last_index)

Question 8

Infind_last_index, one could usefirst_index as a lower bound for instance!

Question 9

Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument.

Question 10

(you could use the same name as bisect:lo)

Question 11

Following @tokland suggestion, here is what I came with:Thefind_left_bound returns some bounds that can be used byfind_right_bound to reduce the search area.

def main():    l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9]    print(l)    for i in range(10):        try:            print(str(i) + ': ' + str(find_sequence(l, i)))        except Exception as e:            print(str(e))def find_sequence(l, x):    """Return a tuple (begin, end) containing the index bounds of the sequence of x"""    begin, (left, right) = find_left_bound(l, x)    if l[begin] != x:        raise Exception(str(x) + ' not found in list')    end = find_right_bound(l, x, left, right)    return begin, enddef find_left_bound(l, x):    left = 0    right = len(l)    lbound = left # bounds for the 'right_bound' search    rbound = right    while left < right:        middle = (left + right) // 2        if l[middle] < x:            left = middle + 1        else:            right = middle            # If it's relevant, improve the bounds for the right search            if l[middle] > x:                rbound = middle            elif middle > lbound:                lbound = middle    return left, (lbound, rbound)def find_right_bound(l, x, left, right):    while left < right:        middle = (left + right) // 2        if l[middle] > x:            right = middle        else:            left = middle + 1    return left - 1if __name__ == "__main__":    main()

I should renamefind_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...

I don't see other ways to improve the efficiency.

Question 12

I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's aValueError, as the user has supplied values that are invalid.

raise ValueError(str(x) + ' not found in list')

This has the added bonus that you don't need to catch other, unrelated errors with yourtry except inmain.

tokland 11.2k1 gold badge21 silver badges26 bronze badges · Answer 1 · 2015-10-25 10:06:01Z

1

\$\begingroup\$

I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:

def find_indexes(xs, x)  start = find_first_index(xs, x)  return ((start, find_last_index(xs, x, start=start)) if start else None)

For a simple implementation offind_first_index andfind_last_index, checkbisect.bisect_left andbisect.bisect_right source codehere.

Share

editedOct 25, 2015 at 10:06

answeredOct 24, 2015 at 15:09

tokland

11.2k1 gold badge21 silver badges26 bronze badges

\$\endgroup\$

4

\$\begingroup\$My approach might be complicated, but it aims at being efficient. When you callfind_last_index, The first part of the work might have already been done by thefind_first_index (the part "finding the index"). What I could do though, is "improve"bisect.bisect_left to remember the first time the element was found (to use it as a bound forfind_last_index)\$\endgroup\$
oliverpool
– oliverpool
2015-10-25 08:12:58 +00:00
CommentedOct 25, 2015 at 8:12
\$\begingroup\$Infind_last_index, one could usefirst_index as a lower bound for instance!\$\endgroup\$
oliverpool
– oliverpool
2015-10-25 08:14:50 +00:00
CommentedOct 25, 2015 at 8:14
\$\begingroup\$Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument.\$\endgroup\$
tokland
– tokland
2015-10-25 10:07:05 +00:00
CommentedOct 25, 2015 at 10:07
\$\begingroup\$(you could use the same name as bisect:lo)\$\endgroup\$
oliverpool
– oliverpool
2015-10-25 10:29:28 +00:00
CommentedOct 25, 2015 at 10:29

Add a comment |

oliverpool 1,92212 silver badges28 bronze badges · Answer 2 · 2015-10-25 13:20:09Z

Following @tokland suggestion, here is what I came with:Thefind_left_bound returns some bounds that can be used byfind_right_bound to reduce the search area.

def main():    l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9]    print(l)    for i in range(10):        try:            print(str(i) + ': ' + str(find_sequence(l, i)))        except Exception as e:            print(str(e))def find_sequence(l, x):    """Return a tuple (begin, end) containing the index bounds of the sequence of x"""    begin, (left, right) = find_left_bound(l, x)    if l[begin] != x:        raise Exception(str(x) + ' not found in list')    end = find_right_bound(l, x, left, right)    return begin, enddef find_left_bound(l, x):    left = 0    right = len(l)    lbound = left # bounds for the 'right_bound' search    rbound = right    while left < right:        middle = (left + right) // 2        if l[middle] < x:            left = middle + 1        else:            right = middle            # If it's relevant, improve the bounds for the right search            if l[middle] > x:                rbound = middle            elif middle > lbound:                lbound = middle    return left, (lbound, rbound)def find_right_bound(l, x, left, right):    while left < right:        middle = (left + right) // 2        if l[middle] > x:            right = middle        else:            left = middle + 1    return left - 1if __name__ == "__main__":    main()

I should renamefind_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...

I don't see other ways to improve the efficiency.

SuperBiasedMan 13.5k5 gold badges37 silver badges62 bronze badges · Answer 3 · 2015-10-28 15:44:10Z

I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's aValueError, as the user has supplied values that are invalid.

raise ValueError(str(x) + ' not found in list')

This has the added bonus that you don't need to catch other, unrelated errors with yourtry except inmain.

Movatterモバイル変換

Stack Exchange Network

Locating a sequence in a sorted array

3 Answers3

You mustlog in to answer this question.

Linked

Related

Hot Network Questions

Subscribe to RSS