Inefficiency of Binary Search on Linked Lists: A Joke or a Mistake? | Lecture notes Data Structures and Algorithms

Why std::binary_search of std::list Works,

But You Shouldn't Use It!

David Kieras, University of Michigan

Prepared for EECS 381, 1/26/2013

A common error made by beginning users of the Standard Library containers and algorithms is to write code whose run-

time complexity (big-O) is a flaming disaster. These facilities were invented by algorithm and data structure fanatics who

took big-O really seriously; their goal was to make it possible for you to use "best of breed" data structures and algorithms

very easily. They certainly did not intend to help you write awful code easily!

In fact, in most cases, the Standard Library tries to keep you out of trouble by making it inconvenient to use in a really

inefficient way. But thanks to the great generality and power of the C++ templates used in the Standard Library, there are

some loopholes that allow you to write hopelessly inefficient code as easily as very efficient code. One such loophole is that

you can easily write Standard Library code that does a binary search on a linked list, which is so ridiculously inefficient that

saying "binary search on a linked list" is actually a geeky programming joke! Linked-lists are supposed to be searched

linearly! The purpose of this document is to explain why the Standard Library makes telling this joke so easy to do, and

demonstrate with some run-time comparisons why it is so bad.

The Standard Library allows you to apply the binary_search and lower_bound algorithms to any sorted sequence

container, including std::list, and it will produce a correct result. The following works for any sequence container:

!!binary_search(container.begin(), container.end(), probe)

However, if you look at any normal code for binary search (e.g. as in Kernighan and Ritchie), it is written to use array

subscripting. Array subscripting, a so-called random-access mechanism, runs in constant time, regardless of the size of the

array or value of the subscript. In contrast, a linked list has the property that the only way you can find a particular node is to

start at one end of the list and follow the links from one node to the next, checking them one at a time. Unlike with array

subscripting, there is no way to compute the location of a list node directly from its numerical position in the list - it could be

anywhere in memory. So how is it that you can do a binary_search on a std::list?

How binary_search is implemented. Below is a somewhat simplified copy of the Metrowerks Standard Library version

of lower_bound; the binary_search algorithm just calls lower_bound and checks the result (other implementations

might differ, but only in details).

template <class ForwardIterator, class T>

ForwardIterator

lower_bound(ForwardIterator first, ForwardIterator last, const T& value)

{

!typedef typename iterator_traits<ForwardIterator>::difference_type difference_type;

!difference_type len = distance(first, last);

!while (len > 0)

!!ForwardIterator i = first;

!!difference_type len2 = len / 2;

!!advance(i, len2);

!!if (*i < value)

!!{

!!!first = ++i;

!!!len -= len2 + 1;

!!}

!!else

!!!len = len2;

!return first;

}

Partial preview of the text

Download Inefficiency of Binary Search on Linked Lists: A Joke or a Mistake? and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

Why std::binary_search of std::list Works,

But You Shouldn't Use It!

David Kieras, University of Michigan

Prepared for EECS 381, 1/26/

A common error made by beginning users of the Standard Library containers and algorithms is to write code whose run-

time complexity (big-O) is a flaming disaster. These facilities were invented by algorithm and data structure fanatics who

took big-O really seriously; their goal was to make it possible for you to use "best of breed" data structures and algorithms

very easily. They certainly did not intend to help you write awful code easily!

In fact, in most cases, the Standard Library tries to keep you out of trouble by making it inconvenient to use in a really

inefficient way. But thanks to the great generality and power of the C++ templates used in the Standard Library, there are

some loopholes that allow you to write hopelessly inefficient code as easily as very efficient code. One such loophole is that

you can easily write Standard Library code that does a binary search on a linked list, which is so ridiculously inefficient that

saying "binary search on a linked list" is actually a geeky programming joke! Linked-lists are supposed to be searched

linearly! The purpose of this document is to explain why the Standard Library makes telling this joke so easy to do, and

demonstrate with some run-time comparisons why it is so bad.

The Standard Library allows you to apply the binary_search and lower_bound algorithms to any sorted sequence

container , including std::list, and it will produce a correct result. The following works for any sequence container:

!! binary_search(container.begin(), container.end(), probe)

However, if you look at any normal code for binary search (e.g. as in Kernighan and Ritchie), it is written to use array

subscripting. Array subscripting, a so-called random-access mechanism, runs in constant time, regardless of the size of the

array or value of the subscript. In contrast, a linked list has the property that the only way you can find a particular node is to

start at one end of the list and follow the links from one node to the next, checking them one at a time. Unlike with array

subscripting, there is no way to compute the location of a list node directly from its numerical position in the list - it could be

anywhere in memory. So how is it that you can do a binary_search on a std::list?

How binary_search is implemented. Below is a somewhat simplified copy of the Metrowerks Standard Library version

of lower_bound; the binary_search algorithm just calls lower_bound and checks the result (other implementations

might differ, but only in details).

template <class ForwardIterator, class T> ForwardIterator lower_bound(ForwardIterator first, ForwardIterator last, const T& value) { ! typedef typename iterator_traits::difference_type difference_type; ! difference_type len = distance(first, last); ! while (len > 0) ! { !! ForwardIterator i = first; !! difference_type len2 = len / 2; !! advance(i, len2); !! if (*i < value) !! { !!! first = ++i; !!! len -= len2 + 1; !! } !! else !!! len = len2; ! } ! return first; }

First, see how this algorithm is written in terms of iterators, so that it can apply to any sequence container that supports

the standard iterator interface. The two input iterator s are first and last , marking the beginning and end of the range to be

searched. Among the iterator types, input iterators can be iterators pointing into any type of container.

The basic binary search algorithm involves calculating the midpoint of a range of values, and then checking the value at

that midpoint. The code does this by calling std::distance, which returns the numerical distance between the first and last

iterators. This distance is divided by two, and then std::advance is called to move the first iterator forward by that amount

to get to the midpoint of the range. The distance and advance functions are also function templates that are defined so that

they work with iterators into any type of container. Template magic is used to specialize them for different iterator types. For

random-access iterator s (which behave like pointers or subscripts, supplied by std::vector and std::deque), the

definition of distance that is used is:

template inline typename iterator_traits::difference_type __distance(RandomAccessIterator first, RandomAccessIterator last, random_access_iterator_tag) { ! return last - first; }

The subtraction operator is defined for these iterators because the internal pointers can simply be subtracted to get the

distance directly via pointer arithmetic. This is exactly what subtracting the indices would do in the array form of the binary

search algorithm.

However, for the more general input iterators, which can only move forward or back by one step at a time, the definition

used to implement distance is:

template inline typename iterator_traits::difference_type __distance(InputIterator first, InputIterator last, input_iterator_tag) { ! typename iterator_traits::difference_type result = 0; ! for (; first != last; ++first) !! ++result; ! return result; }

Inefficiency of Binary Search on Linked Lists: A Joke or a Mistake?, Lecture notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Inefficiency of Binary Search on Linked Lists: A Joke or a Mistake? and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

Why std::binary_search of std::list Works,

But You Shouldn't Use It!

David Kieras, University of Michigan

Prepared for EECS 381, 1/26/

A common error made by beginning users of the Standard Library containers and algorithms is to write code whose run-

time complexity (big-O) is a flaming disaster. These facilities were invented by algorithm and data structure fanatics who

took big-O really seriously; their goal was to make it possible for you to use "best of breed" data structures and algorithms

very easily. They certainly did not intend to help you write awful code easily!

In fact, in most cases, the Standard Library tries to keep you out of trouble by making it inconvenient to use in a really

inefficient way. But thanks to the great generality and power of the C++ templates used in the Standard Library, there are

some loopholes that allow you to write hopelessly inefficient code as easily as very efficient code. One such loophole is that

you can easily write Standard Library code that does a binary search on a linked list, which is so ridiculously inefficient that

saying "binary search on a linked list" is actually a geeky programming joke! Linked-lists are supposed to be searched

linearly! The purpose of this document is to explain why the Standard Library makes telling this joke so easy to do, and

demonstrate with some run-time comparisons why it is so bad.

The Standard Library allows you to apply the binary_search and lower_bound algorithms to any sorted sequence

container , including std::list, and it will produce a correct result. The following works for any sequence container:

However, if you look at any normal code for binary search (e.g. as in Kernighan and Ritchie), it is written to use array

subscripting. Array subscripting, a so-called random-access mechanism, runs in constant time, regardless of the size of the

array or value of the subscript. In contrast, a linked list has the property that the only way you can find a particular node is to

start at one end of the list and follow the links from one node to the next, checking them one at a time. Unlike with array

subscripting, there is no way to compute the location of a list node directly from its numerical position in the list - it could be

anywhere in memory. So how is it that you can do a binary_search on a std::list?

How binary_search is implemented. Below is a somewhat simplified copy of the Metrowerks Standard Library version

of lower_bound; the binary_search algorithm just calls lower_bound and checks the result (other implementations

might differ, but only in details).

First, see how this algorithm is written in terms of iterators, so that it can apply to any sequence container that supports

the standard iterator interface. The two input iterator s are first and last , marking the beginning and end of the range to be

searched. Among the iterator types, input iterators can be iterators pointing into any type of container.

The basic binary search algorithm involves calculating the midpoint of a range of values, and then checking the value at

that midpoint. The code does this by calling std::distance, which returns the numerical distance between the first and last

iterators. This distance is divided by two, and then std::advance is called to move the first iterator forward by that amount

to get to the midpoint of the range. The distance and advance functions are also function templates that are defined so that

they work with iterators into any type of container. Template magic is used to specialize them for different iterator types. For

random-access iterator s (which behave like pointers or subscripts, supplied by std::vector and std::deque), the

definition of distance that is used is:

The subtraction operator is defined for these iterators because the internal pointers can simply be subtracted to get the

distance directly via pointer arithmetic. This is exactly what subtracting the indices would do in the array form of the binary

search algorithm.

However, for the more general input iterators, which can only move forward or back by one step at a time, the definition

used to implement distance is:

In other words, compute the distance between two general input iterators by incrementing the first until it equals the last,

and count how many increments are required.

The advance function has a similar pair of specializations. Advancing a random-access iterator simply adds the number

of steps to the iterator, corresponding directly to pointer arithmetic, but advancing an input iterator requires incrementing the

iterator the supplied number of times.

When these functions are applied to a built-in array or a std::vector container, the distance and advance functions

will compile down to simple subtraction and addition of the subscript values/pointers, taking almost no time. But if applied to

a list, whose iterators support only moving forward or back by one step at a time, then the binary search will require using the

distance function to repeatedly count the nodes between the ends of the narrowing range and then advance with increment

and count again to position at the midpoint. Surely all this link-following will add up to a substantial amount of time!

The C++98 Standard in fact states that lower_bound and binary_search will run in O(log n) time when applied to a

container with random access iterators. When applied to a container that lacks random-access iterators, like std::list, the

Standard states that the search will be logarithmic with the number of comparisons, but linear with the number of nodes

visited. So whether it runs faster than a linear search depends on how much time it takes to do the comparisons compared to

counting the nodes over and over again.

Let's find out what happens. Sometimes you need benchmarks to see how theory works out in practice. I defined two

classes of objects which contain an ID value used in operator< and operator==, and with constructors that give each

object a unique value. One class, Cheap, uses a single integer for the ID, so comparisons that should be very fast. The other,