Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Inefficiency of Binary Search on Linked Lists: A Joke or a Mistake?, Lecture notes of Data Structures and Algorithms

Why using std::binary_search on a std::list is inefficient and should be avoided, despite producing correct results. The author, david kieras from the university of michigan, discusses the differences between random-access iterators and input iterators, and how the distance and advance functions behave accordingly. He also provides a benchmark comparing the performance of binary search on vectors and lists, with cheap and expensive objects, and demonstrates the significant difference in run time.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

lana23
lana23 šŸ‡ŗšŸ‡ø

4.8

(4)

216 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Why std::binary_search of std::list Works,
But You Shouldn't Use It!
David Kieras, University of Michigan
Prepared for EECS 381, 1/26/2013
A common error made by beginning users of the Standard Library containers and algorithms is to write code whose run-
time complexity (big-O) is a flaming disaster. These facilities were invented by algorithm and data structure fanatics who
took big-O really seriously; their goal was to make it possible for you to use "best of breed" data structures and algorithms
very easily. They certainly did not intend to help you write awful code easily!
In fact, in most cases, the Standard Library tries to keep you out of trouble by making it inconvenient to use in a really
inefficient way. But thanks to the great generality and power of the C++ templates used in the Standard Library, there are
some loopholes that allow you to write hopelessly inefficient code as easily as very efficient code. One such loophole is that
you can easily write Standard Library code that does a binary search on a linked list, which is so ridiculously inefficient that
saying "binary search on a linked list" is actually a geeky programming joke! Linked-lists are supposed to be searched
linearly! The purpose of this document is to explain why the Standard Library makes telling this joke so easy to do, and
demonstrate with some run-time comparisons why it is so bad.
The Standard Library allows you to apply the binary_search and lower_bound algorithms to any sorted sequence
container, including std::list, and it will produce a correct result. The following works for any sequence container:
!!binary_search(container.begin(), container.end(), probe)
However, if you look at any normal code for binary search (e.g. as in Kernighan and Ritchie), it is written to use array
subscripting. Array subscripting, a so-called random-access mechanism, runs in constant time, regardless of the size of the
array or value of the subscript. In contrast, a linked list has the property that the only way you can find a particular node is to
start at one end of the list and follow the links from one node to the next, checking them one at a time. Unlike with array
subscripting, there is no way to compute the location of a list node directly from its numerical position in the list - it could be
anywhere in memory. So how is it that you can do a binary_search on a std::list?
How binary_search is implemented. Below is a somewhat simplified copy of the Metrowerks Standard Library version
of lower_bound; the binary_search algorithm just calls lower_bound and checks the result (other implementations
might differ, but only in details).
template <class ForwardIterator, class T>
ForwardIterator
lower_bound(ForwardIterator first, ForwardIterator last, const T& value)
{
!typedef typename iterator_traits<ForwardIterator>::difference_type difference_type;
!difference_type len = distance(first, last);
!while (len > 0)
!{
!!ForwardIterator i = first;
!!difference_type len2 = len / 2;
!!advance(i, len2);
!!if (*i < value)
!!{
!!!first = ++i;
!!!len -= len2 + 1;
!!}
!!else
!!!len = len2;
!}
!return first;
}
1
pf3

Partial preview of the text

Download Inefficiency of Binary Search on Linked Lists: A Joke or a Mistake? and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

Why std::binary_search of std::list Works,

But You Shouldn't Use It!

David Kieras, University of Michigan

Prepared for EECS 381, 1/26/

A common error made by beginning users of the Standard Library containers and algorithms is to write code whose run-

time complexity (big-O) is a flaming disaster. These facilities were invented by algorithm and data structure fanatics who

took big-O really seriously; their goal was to make it possible for you to use "best of breed" data structures and algorithms

very easily. They certainly did not intend to help you write awful code easily!

In fact, in most cases, the Standard Library tries to keep you out of trouble by making it inconvenient to use in a really

inefficient way. But thanks to the great generality and power of the C++ templates used in the Standard Library, there are

some loopholes that allow you to write hopelessly inefficient code as easily as very efficient code. One such loophole is that

you can easily write Standard Library code that does a binary search on a linked list, which is so ridiculously inefficient that

saying "binary search on a linked list" is actually a geeky programming joke! Linked-lists are supposed to be searched

linearly! The purpose of this document is to explain why the Standard Library makes telling this joke so easy to do, and

demonstrate with some run-time comparisons why it is so bad.

The Standard Library allows you to apply the binary_search and lower_bound algorithms to any sorted sequence

container , including std::list, and it will produce a correct result. The following works for any sequence container:

!! binary_search(container.begin(), container.end(), probe)

However, if you look at any normal code for binary search (e.g. as in Kernighan and Ritchie), it is written to use array

subscripting. Array subscripting, a so-called random-access mechanism, runs in constant time, regardless of the size of the

array or value of the subscript. In contrast, a linked list has the property that the only way you can find a particular node is to

start at one end of the list and follow the links from one node to the next, checking them one at a time. Unlike with array

subscripting, there is no way to compute the location of a list node directly from its numerical position in the list - it could be

anywhere in memory. So how is it that you can do a binary_search on a std::list?

How binary_search is implemented. Below is a somewhat simplified copy of the Metrowerks Standard Library version

of lower_bound; the binary_search algorithm just calls lower_bound and checks the result (other implementations

might differ, but only in details).

template <class ForwardIterator, class T> ForwardIterator lower_bound(ForwardIterator first, ForwardIterator last, const T& value) { ! typedef typename iterator_traits::difference_type difference_type; ! difference_type len = distance(first, last); ! while (len > 0) ! { !! ForwardIterator i = first; !! difference_type len2 = len / 2; !! advance(i, len2); !! if (*i < value) !! { !!! first = ++i; !!! len -= len2 + 1; !! } !! else !!! len = len2; ! } ! return first; }

First, see how this algorithm is written in terms of iterators, so that it can apply to any sequence container that supports

the standard iterator interface. The two input iterator s are first and last , marking the beginning and end of the range to be

searched. Among the iterator types, input iterators can be iterators pointing into any type of container.

The basic binary search algorithm involves calculating the midpoint of a range of values, and then checking the value at

that midpoint. The code does this by calling std::distance, which returns the numerical distance between the first and last

iterators. This distance is divided by two, and then std::advance is called to move the first iterator forward by that amount

to get to the midpoint of the range. The distance and advance functions are also function templates that are defined so that

they work with iterators into any type of container. Template magic is used to specialize them for different iterator types. For

random-access iterator s (which behave like pointers or subscripts, supplied by std::vector and std::deque), the

definition of distance that is used is:

template inline typename iterator_traits::difference_type __distance(RandomAccessIterator first, RandomAccessIterator last, random_access_iterator_tag) { ! return last - first; }

The subtraction operator is defined for these iterators because the internal pointers can simply be subtracted to get the

distance directly via pointer arithmetic. This is exactly what subtracting the indices would do in the array form of the binary

search algorithm.

However, for the more general input iterators, which can only move forward or back by one step at a time, the definition

used to implement distance is:

template inline typename iterator_traits::difference_type __distance(InputIterator first, InputIterator last, input_iterator_tag) { ! typename iterator_traits::difference_type result = 0; ! for (; first != last; ++first) !! ++result; ! return result; }

In other words, compute the distance between two general input iterators by incrementing the first until it equals the last,

and count how many increments are required.

The advance function has a similar pair of specializations. Advancing a random-access iterator simply adds the number

of steps to the iterator, corresponding directly to pointer arithmetic, but advancing an input iterator requires incrementing the

iterator the supplied number of times.

When these functions are applied to a built-in array or a std::vector container, the distance and advance functions

will compile down to simple subtraction and addition of the subscript values/pointers, taking almost no time. But if applied to

a list, whose iterators support only moving forward or back by one step at a time, then the binary search will require using the

distance function to repeatedly count the nodes between the ends of the narrowing range and then advance with increment

and count again to position at the midpoint. Surely all this link-following will add up to a substantial amount of time!

The C++98 Standard in fact states that lower_bound and binary_search will run in O(log n) time when applied to a

container with random access iterators. When applied to a container that lacks random-access iterators, like std::list, the

Standard states that the search will be logarithmic with the number of comparisons, but linear with the number of nodes

visited. So whether it runs faster than a linear search depends on how much time it takes to do the comparisons compared to

counting the nodes over and over again.

Let's find out what happens. Sometimes you need benchmarks to see how theory works out in practice. I defined two

classes of objects which contain an ID value used in operator< and operator==, and with constructors that give each

object a unique value. One class, Cheap, uses a single integer for the ID, so comparisons that should be very fast. The other,