




















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
We will use Java and C to show algorithms written as actual programs. But you should focus on the actual algorithms and data structures, not the code.
Typology: Lecture notes
1 / 60
This page cannot be seen from the preview
Don't miss anything!
A Simple Introduction Read this first
This is a short, simpleand practical introduction to the ideas of data structure and algorithm. These are important ideas, because they are the basis for all programming.
An algorithm is a way of doing something. It is a method, or recipe, or set of instructions to
We start with a very simple example. Suppose we have two pieces of data, x and y, and we need to exchange their values. As a test case: before: x=3 and y= after: x=7 and y= This needs to work with any values. Whatever x and y are before, after they must have the values changed over. How to do this? What algorithm to use? We might say: x ← y // this means copy the value of y to x y ←x Does this work? We can check using a trace table. We work this out on paper, keeping track of the values of the variables at each step, like this.. x y Before 3 7 start values after x ← y 7 7 the 7 was copied to x after y ← x 7 7 the 7 was copied back to y At the end 7 7 So, this does not work. When we copied 7 to x, we lost the initial value of x. One way which does work is to use an extra storage space, which we will call 'temp', because it is only for temporary use: temp ←x x ←y y ←temp We check this with a trace table: x y temp Before 3 7 ?? It makes no difference what temp is after temp ← x 3 7 3 the 3 was copied to temp after x ← y 7 7 3 the 7 was copied to x after y ← temp 7 3 3 temp (start value of x )copied to y At the end 7 3 Values exchanged
We will use Java and C to show algorithms written as actual programs. But you should focus on the actual algorithms and data structures, not the code.
A data structure is a way of organising a set of data values. We will use diagrams to show them. For example, a list: Here the boxes represent data nodes, and the arrows are pointers.
A node is one of the items n a data structure. They might be any type - for example integers, for decimal numbers, or characters or strings, date or time, pixel colour or whatever. Usually all the nodes in a structure and the same type, whatever that is. Often the nodes are key-value pairs, in records (like a struct in C. For example in a ) personnel application we might have nodes which are for example 1123 John Smith 15.1.1990 Sales Payroll number First name Last Name DateOfBirth Department Key Values So each node has a unique key field - everybody has a different key field. We will often search a data structure for the node with a key field we want. In examples, we will usually take a node to have a key which is an integer, and forget about value fields. This is just to make things simple. Using nodes with keys and values of different types does not alter the algorithm or the structures. Usually a node will also have one or more fields which are pointers, to link the node to other nodes, to make a structure.
We show pointers in a data structure as actual arrows which point at other nodes. A pointer is simply a way of 'getting to' another node. How pointers are implemented is different in different languages. In C, a pointer is an actual address in memory.
In Java a pointer is a reference. This is not an actual memory address. It cannot be, since Java objects move around in memory as the program executes. But still, if we have a reference, we can 'get to' the object pointed at. Some pointers are variables with names. For example in the list, we have a pointer named 'head', and this points to the first node in the list. The simplest way to think of this is that the value of 'head' is the address of the first node in memory. We often use the idea of a null pointer. This is a pointer which points 'nowhere'. For example the last node in a list has a null pointer to the next node - because there is no next node. Often the null pointer value will be something which could not be the address of a real node - such as -1.
We often have a choice of using different algorithms. How to decide which is better? Remember all algorithms must 'work' - that is, end with the desired result. Otherwise they do not count as an algorithm. An algorithm is better if
In many areas of computing it is useful to have random numbers. Can we find an algorithm to generate a sequence of random numbers? One way is to take some starting number and multiply by something to get the next one. For example
At first that looks pretty random. But then we see the number always ends in 1. And if we look further down the output we see.. 961 281 1 121 641 561 881 so we get 1, then 121, where we started, then the sequence repeats. These are not really random numbers. In fact no algorithm can make truly random numbers on normal digital devices. Instead we use pseudo-random numbers. These are very mixed up and will go through an extremely long sequence before they repeat. But sooner or later they will repeat. In our simple algorithm we started rand as rand ← 1 This is called the seed - the first value in the sequence. Both C and Java have standard library code to create pseudo-random numbers in a much better way (using a different algorithm. In C the function is rand ) (), which produces a number from 0 to RAND_MAX, which is at least 32767. This is seeded by the function srand (). The usual way to use this is using the time ()function, so the sequence is seeded by the current computer time. This means we will get a different sequence every time the program runs.. #include <stdio.h> #include <stdlib.h> // needed for rand() #include <time.h> // needed for time()) int main(int argc, char** argv) { int counter; srand(time(0)); // seed with current time counter = 1; while (counter < 10) { printf("%d\n", rand()); // call rand and output it counter++; } return (0); } One run produces.. 736868008 841317755 3139694 1449466950 1445477844 2003343631 1022628446 1119168331
1005563998 but we would get a different sequence on every run. In Java a class named Random models a pseudo-random number generator: import java.util.Random; public class Test { public static void main(String[] args) { Random rng=new Random(); // construct a random number generator for (int counter=0; counter<9; counter++) // 10 times { int number=rng.nextInt(1000000); // get next number, in range 0 to one million System.out.println(number); // output it } } } Output: 534576 220608 335031 232252 713572 225434 946502 159104 268880 The seed here is very likely to be different on every run. The idea of random numbers is discussed in Donald Knuth's The Art of Computer Programming Volume 2. This text is the foundation of Computer Science, and all should read it.
Recursion is when an algorithm uses itself. An example is factorial. The factorial of a number is the product of all the numbers down to
int main(int argc, char** argv) { printf("%d\n", fib(5)); return (0); } Output is In fib with n= 5 In fib with n= 4 In fib with n= 3 In fib with n= 2 In fib with n= 1 In fib with n= 2 In fib with n= 3 In fib with n= 2 In fib with n= 1 5 How to explain the output? This calls Fib( 5 ) which calls Fib( 4 ) which calls Fib( 3 ) which calls Fib( 2 )returns 1 and Fib( 1 )returns 1 and Fib( 2 )returns 1 and Fib( 3 ) which calls Fib( 2 )returns 1 and Fib( 1 )returns 1 Recursion is sometimes slightly tricky.
An array is a basic data structure in which we can access a node using an index. In pseudo-code we will show array elements with the index in square brackets. So values[328] ← 2398 means we have a array named values, and the assignment writes 2398 into the node with index 328. So an array is just a set of numbered boxes. We will have arrays that start with box number 0, as in Java and C and many languages.
In most languages array elements are stored next to each other in memory, and boxes are found using a simple calculation. For example in C: int array[100]; // declare and create array of 100 ints int * ptr; // ptr is a pointer to an int ptr=array; // ptr points to start of array array[10]=37; // put value in the array in box 10 printf("%d\n", array[10] ); // 37 - fetch array box printf("%d\n", *(array+10) ); // 37 - treat array as pointer to start of block printf("%d\n", (ptr+10) ); // 37 - use ptr in same way printf("%p\n", ptr); // address of start of array = 0x7ffe652cb printf("%p\n", ptr+1); // next int location = 0x7ffe652cb printf("%ld\n", sizeof(int)); // size of int = 4 bytes we can access the element with index 10 as array[10]. We can also get it as * ptr+10 , when( ) we have said ptr=array, so that ptr points to the start of the array. The value of ptr is 0x7ffe652cb520, which is the address in RAM (in hex )of where the array starts. See that ptr+1 is 0x7ffe652cb524, not 0x7ffe652cb521. This is because we have declared ptr to be a pointer to an int, so ptr+1 is the address of the next int, not the next byte. Ints here are 4 bytes long, so it adds 4 to the address, not 1. So it finds array[10] by adding 40 to the address of start of the array. In general address of nth element = address of start of array + n size of an element in bytes( )
We use a running total, which we initialize to zero. Then we iterate through the array, adding in each element. For example for an array of size 6: total ← 0 index ← 0 while index< total ←total+array[index] index ←index+ So in Java - int[] array = {1, 5, 2, 3, 4}; int total = 0; int index = 0; while (index < 5) { total += array[index]; index++; } System.out.println(total); // 15 We get 1+5+2+3+4 = 15
into {3,1,8,4,9} We could create an array of the same size as the first, then copy the first into the last element of the new array, the second into one from last, and so on. But this uses extra memory, and we can do it 'in-place' using only one extra location. We iterate half way through the array, exchanging each element with the one a corresponding distance from the end. In Java.. int[] array = {9,4,8,1,3}; // iterate half-way through for (int index=0; index<array.length/2; index++) { // do exchange int t=array[index]; array[index]=array[array.length-1-index]; array[array.length-1-index]=t; } for(int val:array) // test System.out.println(val); Here we use array.length, so this works for any size array. We also use the 'for-each' loop in for int val:array( )
There seems little point to search for an int in an array. If we know what the int is - why look for it? Usually the data we handle has some identifying unique key, and other data fields. We look for some key, to read back the data fields linked to it. In Java, we would define a Node class like this: class Node { final int ID; String data; Node(int id, String val) { this.ID=id; this.data=val; } public String toString() { return "Node ID="+ID+" data="+data; } } This has just one value field, named 'data', but we could have several. The ID field is an int. It does not have to be an int. The ID identifies the node, and it would not make sense to
change it, so we declare it to be final. We could create two nodes with the same ID, which would be wrong. We could make this impossible - but we do not, just to keep it simple. Then we can modify our array reverse code to use an array of Nodes: Node[] array = new Node[5]; array[0]=new Node(9,"one"); array[1]=new Node(4,"two"); array[2]=new Node(8,"three"); array[3]=new Node(1,"four"); array[4]=new Node(3,"five"); // iterate half-way through for (int index=0; index<array.length/2; index++) { // do exchange Node t=array[index]; array[index]=array[array.length-1-index]; array[array.length-1-index]=t; } for(Node val:array) // test System.out.println(val); The output is: Node ID=3 data=five Node ID=1 data=four Node ID=8 data=three Node ID=4 data=two Node ID=9 data=one
This is about searching some data, in an array, for some given key value.
Iteration means looping or repeating. We often iterate through an array. For example, suppose we want to store a value, 999, in every element of an array which has 100 elements - here is the pseudo-code: index ← 0 // initialise index (start value ) while index <100 // start loop values[index] ← 999 // write 999 into array box index ← index + 1 // move to next box end loop // stop when get to last box The // starts a comment - an explanation of what is happening Here is that algorithm as a Java program:
#include <stdio.h> #include <stdlib.h> // needed for rand() #include <time.h> // needed for time()) int main(int argc, char** argv) { int array[100]; // fill array with random integers 0 to 99 int counter; srand(time(0)); counter = 0; while (counter < 100) { array[counter] = rand() % 100; printf("%d: %d\n", counter, array[counter]); // for testing counter++; } // do linear search for target 56 (for example)) int target = 56; counter = 0; while (counter < 100) { if (array[counter] == target) { printf("Found at %d\n", counter); return 0; // exit } counter++; } printf("Not found\n"); return (0); } Sample output: 0: 40 1: 49 2: 18 3: 73 .. 45: 66 46: 97 47: 77 48: 12 49: 56 50: 82 51: 4 52: 14 .. 98: 12 99: 82 Found at 49 The same in Java: import java.util.Random; public class Test { public static void main(String[] args) { int[] array = new int[100]; // fill array at random Random rng = new Random(); for (int counter = 0; counter < 100; counter++) {
array[counter] = rng.nextInt(100); System.out.println(counter + " " + array[counter]); // for testing } int target = 56; // search for 56 for (int counter = 0; counter < 100; counter++) { if (array[counter] == target) { System.out.println("Found at" + counter); return; } } System.out.println("Not present"); } }
We can measure how long a program takes. For example in Java: import java.util.Random; public class Test { public static void main(String[] args) { final int N=10000000; int[] array = new int[N]; // fill array at random .. int target = 1234567; // search long startTime=System.nanoTime(); for (int counter = 0; counter < N; counter++) { if (array[counter] == target) { System.out.println("Found at" + counter); System.out.println( (System.nanoTime()-startTime)/1000000 + " milliseconds"); return; } } System.out.println("Not present"); System.out.println( (System.nanoTime()-startTime)/1000000 + " milliseconds"); } } We are using an array of 10 million values, and calling the number of values N. System.nanoTime ()gives the current system time, in nanoseconds. We remember the start time, and get the time again when we find out, or end not finding it. Typical output is: Found at 7110511 17 milliseconds But actual speed is not very useful. The same algorithm on a faster computer would be faster. And if the computer is doing other tasks at the same time (which it will be , this will ) slow things down. A better way is to count the number of steps taken - because this is the same on slow or fast computers. In this algorithm there are 2 kinds of steps - the if statement to compare
index ← 0 value ←random while index< array[index] ← value value ←value + random index ←index+ endloop This uses 'random to mean some random value. We simply add a random amount to the last array element to get the next - so they will be in increasing order. Here is code in Java: final int N=10; int[] array = new int[N]; Random rng = new Random(); int value; value=0; for (int counter = 0; counter < N; counter++) { value+=rng.nextInt(5); array[counter] = value; System.out.println(counter + " " + array[counter]); // for testing } The random value might be 0, so we might have 2 elements next to each other which are equal.
If the data is sorted, we can use the binary search algorithm. We first compare the middle number with the target If equal, we have found it. If it is too small, we look in the 'top half' If it is too big, we look in the bottom half This is called a binary search because binary means two, and the data is divided into two halves. It is also called a bisection search. Bisection means cut into two. So we have two pointers, for the top and bottom of a section. To start with this is the whole array. After the first step, it is the top half or the bottom half. And so on until we find it, or the section gets down to size one. The algorithm to search an array of size n: bottom ←0 // set top and bottom pointers
top ←N-1 // whole array to start with do middle ← (top+bottom /2 // find middle of section ) if array[middle]= target // found it? found it - end if array[middle]> target // middle too big top ←middle // use bottom half else // middle must be too small bottom ←middle // use top half while top not equal to bottom+1// until section size 1 not found Here's the code in Java: int target = 15; // search int bottom=0; int top=N-1; do { int middle =(top+bottom)/2; System.out.println("Bottom="+bottom+" middle="+middle+" top="+top); if (array[middle]== target) { System.out.println("Found at "+middle); return; } if (array[middle]>target) top=middle; else bottom=middle; } while (top!=bottom+1); System.out.println("Not found"); and a typical run, with the array listed first (looking for 15 : ) 0 0 1 4 2 7 3 7 4 11 5 11 6 13 7 15 8 15 9 17 Bottom=0 middle=4 top=9 // first look at 4 - too small Bottom=4 middle=6 top=9 // top half - look at 6 too small Bottom=6 middle=7 top=9 // top quarter - found at 7 Found at 7