2
\$\begingroup\$

I read the following question:Searching an element in a sorted array and I thought that I could give it a try in Python.

Given a sorted list of integers and an integer. Return the (index) bounds of the sequence of this element in the list.

Example:

l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]0 not found in list1:(0, 0)2:(1, 2)3:(3, 5)4:(6, 11)5 not found in list6:(12, 12)7:(13, 13)8:(14, 14)9:(15, 18)

Here is my program (using Python 3).It uses a dichotomic search and returns the bounds to help the search of the start and end of the sequence.

def main():    l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]    print(l)    for i in range(10):        try:            print(find_sequence(i, l))        except Exception as e:            print(str(e))def find_sequence(x, l):    """Return a tuple (begin, end) containing the index bounds of the sequence of x"""    left, found, right = dichotomic_search(x, l)    begin = outside_bound(x, l, left, found)    end = outside_bound(x, l, right, found)    return begin, enddef outside_bound(x, l, outside, inside):    """Return the outside bound of the sequence"""    if l[outside] == x:        return outside    middle = -1    previous_middle = -2    while middle != previous_middle:        previous_middle = middle        middle = (outside + inside) // 2        if l[middle] == x:            inside = middle        else:            outside = middle    return insidedef dichotomic_search(x, l):    """Return a tuple of indexes (left, found, right)    left: leftmost index where x might be found    found: index where x is    right: rightmost index where x might be found    """    left = 0    right = len(l) - 1    if l[left] > x or l[right] < x:        raise Exception(str(x) + ' not found in list')    if l[left] == x:        return left, left, right    if l[right] == x:        return left+1, right, right # we know that l[left]!=x    while left < right:        middle = (left + right) // 2        if l[middle] == x:            return left, middle, right        elif l[middle] < x:            left = middle + 1 # to prevent fixed point        elif l[middle] > x:            right = middle # impossible to do -1 because of the integer division    raise Exception(str(x) + ' not found in list')if __name__ == "__main__":    main()

I'm not so fond of themiddle != previous_middle, but I didn't find a more elegant way (yet).

askedOct 24, 2015 at 11:11
oliverpool's user avatar
\$\endgroup\$
4
  • \$\begingroup\$Do you know aboutbisect (i.e. did you implement that search yourself on purpose)?\$\endgroup\$CommentedOct 24, 2015 at 11:53
  • \$\begingroup\$I implemented it on purpose (and I think using bisect the search for the second extremity of the sequence can't be optimized - whereas I'm reusing the bounds of my dichotomic search)\$\endgroup\$CommentedOct 24, 2015 at 12:17
  • \$\begingroup\$Why not justbisect_left(l, x), bisect_right(l, x) - 1? The worry about reusing the bounds seems misplaced to me. In the average case you're searching an array that's half the size—but it's a binary search, so it only saves you one iteration. Whereas thebisect module has a fast C implementation.\$\endgroup\$CommentedNov 1, 2015 at 19:19
  • \$\begingroup\$"it only saves you one iteration" only if you search the element. If you search the bounds, it might turn out to be worse. For instance, search for1:{0, 1, 2, 2, 2, 2, 2, 2, 2}.\$\endgroup\$CommentedNov 1, 2015 at 21:16

3 Answers3

1
\$\begingroup\$

I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:

def find_indexes(xs, x)  start = find_first_index(xs, x)  return ((start, find_last_index(xs, x, start=start)) if start else None)

For a simple implementation offind_first_index andfind_last_index, checkbisect.bisect_left andbisect.bisect_right source codehere.

answeredOct 24, 2015 at 15:09
tokland's user avatar
\$\endgroup\$
4
  • \$\begingroup\$My approach might be complicated, but it aims at being efficient. When you callfind_last_index, The first part of the work might have already been done by thefind_first_index (the part "finding the index"). What I could do though, is "improve"bisect.bisect_left to remember the first time the element was found (to use it as a bound forfind_last_index)\$\endgroup\$CommentedOct 25, 2015 at 8:12
  • \$\begingroup\$Infind_last_index, one could usefirst_index as a lower bound for instance!\$\endgroup\$CommentedOct 25, 2015 at 8:14
  • \$\begingroup\$Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument.\$\endgroup\$CommentedOct 25, 2015 at 10:07
  • \$\begingroup\$(you could use the same name as bisect:lo)\$\endgroup\$CommentedOct 25, 2015 at 10:29
1
\$\begingroup\$

Following @tokland suggestion, here is what I came with:Thefind_left_bound returns some bounds that can be used byfind_right_bound to reduce the search area.

def main():    l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9]    print(l)    for i in range(10):        try:            print(str(i) + ': ' + str(find_sequence(l, i)))        except Exception as e:            print(str(e))def find_sequence(l, x):    """Return a tuple (begin, end) containing the index bounds of the sequence of x"""    begin, (left, right) = find_left_bound(l, x)    if l[begin] != x:        raise Exception(str(x) + ' not found in list')    end = find_right_bound(l, x, left, right)    return begin, enddef find_left_bound(l, x):    left = 0    right = len(l)    lbound = left # bounds for the 'right_bound' search    rbound = right    while left < right:        middle = (left + right) // 2        if l[middle] < x:            left = middle + 1        else:            right = middle            # If it's relevant, improve the bounds for the right search            if l[middle] > x:                rbound = middle            elif middle > lbound:                lbound = middle    return left, (lbound, rbound)def find_right_bound(l, x, left, right):    while left < right:        middle = (left + right) // 2        if l[middle] > x:            right = middle        else:            left = middle + 1    return left - 1if __name__ == "__main__":    main()

I should renamefind_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...

I don't see other ways to improve the efficiency.

answeredOct 25, 2015 at 13:20
oliverpool's user avatar
\$\endgroup\$
1
\$\begingroup\$

I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's aValueError, as the user has supplied values that are invalid.

raise ValueError(str(x) + ' not found in list')

This has the added bonus that you don't need to catch other, unrelated errors with yourtry except inmain.

answeredOct 28, 2015 at 15:44
SuperBiasedMan's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.