I read the following question:Searching an element in a sorted array and I thought that I could give it a try in Python.
Given a sorted list of integers and an integer. Return the (index) bounds of the sequence of this element in the list.
Example:
l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]0 not found in list1:(0, 0)2:(1, 2)3:(3, 5)4:(6, 11)5 not found in list6:(12, 12)7:(13, 13)8:(14, 14)9:(15, 18)Here is my program (using Python 3).It uses a dichotomic search and returns the bounds to help the search of the start and end of the sequence.
def main(): l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9] print(l) for i in range(10): try: print(find_sequence(i, l)) except Exception as e: print(str(e))def find_sequence(x, l): """Return a tuple (begin, end) containing the index bounds of the sequence of x""" left, found, right = dichotomic_search(x, l) begin = outside_bound(x, l, left, found) end = outside_bound(x, l, right, found) return begin, enddef outside_bound(x, l, outside, inside): """Return the outside bound of the sequence""" if l[outside] == x: return outside middle = -1 previous_middle = -2 while middle != previous_middle: previous_middle = middle middle = (outside + inside) // 2 if l[middle] == x: inside = middle else: outside = middle return insidedef dichotomic_search(x, l): """Return a tuple of indexes (left, found, right) left: leftmost index where x might be found found: index where x is right: rightmost index where x might be found """ left = 0 right = len(l) - 1 if l[left] > x or l[right] < x: raise Exception(str(x) + ' not found in list') if l[left] == x: return left, left, right if l[right] == x: return left+1, right, right # we know that l[left]!=x while left < right: middle = (left + right) // 2 if l[middle] == x: return left, middle, right elif l[middle] < x: left = middle + 1 # to prevent fixed point elif l[middle] > x: right = middle # impossible to do -1 because of the integer division raise Exception(str(x) + ' not found in list')if __name__ == "__main__": main()I'm not so fond of themiddle != previous_middle, but I didn't find a more elegant way (yet).
- \$\begingroup\$Do you know about
bisect(i.e. did you implement that search yourself on purpose)?\$\endgroup\$jonrsharpe– jonrsharpe2015-10-24 11:53:22 +00:00CommentedOct 24, 2015 at 11:53 - \$\begingroup\$I implemented it on purpose (and I think using bisect the search for the second extremity of the sequence can't be optimized - whereas I'm reusing the bounds of my dichotomic search)\$\endgroup\$oliverpool– oliverpool2015-10-24 12:17:01 +00:00CommentedOct 24, 2015 at 12:17
- \$\begingroup\$Why not just
bisect_left(l, x), bisect_right(l, x) - 1? The worry about reusing the bounds seems misplaced to me. In the average case you're searching an array that's half the size—but it's a binary search, so it only saves you one iteration. Whereas thebisectmodule has a fast C implementation.\$\endgroup\$Gareth Rees– Gareth Rees2015-11-01 19:19:47 +00:00CommentedNov 1, 2015 at 19:19 - \$\begingroup\$"it only saves you one iteration" only if you search the element. If you search the bounds, it might turn out to be worse. For instance, search for
1:{0, 1, 2, 2, 2, 2, 2, 2, 2}.\$\endgroup\$oliverpool– oliverpool2015-11-01 21:16:54 +00:00CommentedNov 1, 2015 at 21:16
3 Answers3
I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:
def find_indexes(xs, x) start = find_first_index(xs, x) return ((start, find_last_index(xs, x, start=start)) if start else None)For a simple implementation offind_first_index andfind_last_index, checkbisect.bisect_left andbisect.bisect_right source codehere.
- \$\begingroup\$My approach might be complicated, but it aims at being efficient. When you call
find_last_index, The first part of the work might have already been done by thefind_first_index(the part "finding the index"). What I could do though, is "improve"bisect.bisect_leftto remember the first time the element was found (to use it as a bound forfind_last_index)\$\endgroup\$oliverpool– oliverpool2015-10-25 08:12:58 +00:00CommentedOct 25, 2015 at 8:12 - \$\begingroup\$In
find_last_index, one could usefirst_indexas a lower bound for instance!\$\endgroup\$oliverpool– oliverpool2015-10-25 08:14:50 +00:00CommentedOct 25, 2015 at 8:14 - \$\begingroup\$Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument.\$\endgroup\$tokland– tokland2015-10-25 10:07:05 +00:00CommentedOct 25, 2015 at 10:07
- \$\begingroup\$(you could use the same name as bisect:
lo)\$\endgroup\$oliverpool– oliverpool2015-10-25 10:29:28 +00:00CommentedOct 25, 2015 at 10:29
Following @tokland suggestion, here is what I came with:Thefind_left_bound returns some bounds that can be used byfind_right_bound to reduce the search area.
def main(): l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9] print(l) for i in range(10): try: print(str(i) + ': ' + str(find_sequence(l, i))) except Exception as e: print(str(e))def find_sequence(l, x): """Return a tuple (begin, end) containing the index bounds of the sequence of x""" begin, (left, right) = find_left_bound(l, x) if l[begin] != x: raise Exception(str(x) + ' not found in list') end = find_right_bound(l, x, left, right) return begin, enddef find_left_bound(l, x): left = 0 right = len(l) lbound = left # bounds for the 'right_bound' search rbound = right while left < right: middle = (left + right) // 2 if l[middle] < x: left = middle + 1 else: right = middle # If it's relevant, improve the bounds for the right search if l[middle] > x: rbound = middle elif middle > lbound: lbound = middle return left, (lbound, rbound)def find_right_bound(l, x, left, right): while left < right: middle = (left + right) // 2 if l[middle] > x: right = middle else: left = middle + 1 return left - 1if __name__ == "__main__": main()I should renamefind_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...
I don't see other ways to improve the efficiency.
I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's aValueError, as the user has supplied values that are invalid.
raise ValueError(str(x) + ' not found in list')This has the added bonus that you don't need to catch other, unrelated errors with yourtry except inmain.
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.
