Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Open Addressing in Hash Tables: Collision Resolution Strategies, Lecture notes of Data Structures and Algorithms

The concept of open addressing in hash tables, specifically focusing on linear probing, quadratic probing, and double hashing as collision resolution strategies. It includes examples, advantages and disadvantages, and guidelines for choosing a hash function and managing clustering. The document also touches upon cryptographic hash functions and their applications.

What you will learn

  • What are the common collision resolution strategies in open addressing?
  • What are the advantages and disadvantages of quadratic probing?
  • How does linear probing work in open addressing?
  • What is double hashing and how is it different from other probing methods?
  • What is open addressing in hash tables?

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

rexana
rexana ๐Ÿ‡ฌ๐Ÿ‡ง

4.7

(11)

215 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 12:
Open Addressing Data Structures and
Algorithms
CSE 373 19 SP - KASEY CHAMPION 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Open Addressing in Hash Tables: Collision Resolution Strategies and more Lecture notes Data Structures and Algorithms in PDF only on Docsity!

Lecture 12:

Open Addressing

Data Structures and Algorithms

Administrivia

Exercise 2 due tonight.

  • Make sure youโ€™re assigning pages properly please!

Exercise 3 out sometime tonight.

Midterm in one week!

For the midterm, you are allowed one 8.5โ€x11โ€ sheet of paper (both sides) for notes

  • I strongly recommend you handwrite your note sheet.
  • But you are free to generate it with a computer if you prefer.

Idea for note sheet: in the real-world you can often google stuff,

write down what you would lookup. It should also help you study.

We will provide you identities, weโ€™ll post the sheet in the exam resources early next week.

Resizing

Our running time in practice depends on ๐œ†. What do we do when ๐œ† is big? Resize the array!

  • Usually we double, thatโ€™s not quite the best idea here
  • Increase array size to next prime number thatโ€™s roughly double the current size
    • Prime numbers tend to redistribute keys, because youโ€™re now modding by a completely unrelated number.
    • If % TableSize = ๐‘˜ then %2*TableSize gives either ๐‘˜ or ๐‘˜ +TableSize.
  • Rule of thumb: Resize sometime around when ฮป is somewhere around 1 if youโ€™re doing separate chaining. - When you resize, you have to rehash everything! CSE 373 SU 19 - ROBBIE WEBER 4 pollEV.com/cse373su Can we just copy over our old chains?

Review: Handling Collisions

Solution 1: Chaining

Each space holds a โ€œbucketโ€ that can store multiple values. Bucket is often implemented

with a LinkedList

Operation Array w/ indices as keys put(key,value) best O(1) average O(1 + ฮป) worst O(n) get(key) best O(1) average O(1 + ฮป) worst O(n) remove(key) best O(1) average O(1 + ฮป) worst O(n) Average Case: Depends on average number of elements per chain Load Factor ฮป If n is the total number of key- value pairs Let c be the capacity of array Load Factor ฮป = ๐‘› ๐‘

Linear Probing

Insert the following values into the Hash Table using a hashFunction of % table size and linear probing to resolve collisions 1, 5, 11, 7, 12, 17, 6, 25 111 12 255 6 177

Linear Probing

Insert the following values into the Hash Table using a hashFunction of % table size and linear probing to resolve collisions 38, 19, 8, 109, 10 (^8 10388 ) Problem:

  • Linear probing causes clustering
  • Clustering causes more looping when probing Primary Clustering When probing causes long chains of occupied slots within a hash table

3 Minutes

Can we do better?

Clusters are caused by picking new space near natural index

Solution 2: Open Addressing

Type 2: Quadratic Probing

Instead of checking ๐‘– past the original location, check ๐‘–

2

from the original location.

int findFinalLocation(Key s)

int naturalHash = this.getHash(s);

int index = natrualHash % TableSize;

while (index in use) {

i++;

index = (naturalHash + i*i ) % TableSize;

return index;

Quadratic Probing

(^18 ) Insert the following values into the Hash Table using a hashFunction of % table size and quadratic probing to resolve collisions 89, 18, 49, 58, 79, 27 (^58 ) (79 % 10 + 0 * 0) % 10 = 9 (79 % 10 + 1 * 1) % 10 = 0 (79 % 10 + 2 * 2) % 10 = 3 Problems: If ฮปโ‰ฅ ยฝ we might never find an empty spot Infinite loop! Can still get clusters 27 Now try to insert 9. Uh-oh

Quadratic Probing

There were empty spots. What gives?

Quadratic probing is not guaranteed to check every possible spot in the hash table.

The following is true:

Notice we have to assume ๐‘ is prime to get that guarantee.

If the table size is a prime number ๐‘, then the first ๐‘/ 2 probes check distinct indices.

Secondary Clustering

Insert the following values into the Hash Table using a hashFunction of % table size and quadratic probing to resolve collisions 19, 39, 29, 9 39 29 9 19 Secondary Clustering When using quadratic probing sometimes need to probe the same sequence of table cells, not necessarily next to one another

3 Minutes

Double Hashing

Probing causes us to check the same indices over and over- can we check different ones

instead?

Use a second hash function!

hโ€™(k, i) = (h(k) + i * g(k)) % T

int findFinalLocation(Key s)

int naturalHash = this.getHash(s);

int index = natrualHash % TableSize;

while (index in use) {

i++;

index = (naturalHash + i*jumpHash(s) ) % TableSize;

return index;

<- Most effective if g(k) returns value relatively prime to table size

Second Hash Function

Effective if g(k) returns a value that is relatively prime to table size

  • If T is a power of 2, make g(k) return an odd integer
  • If T is a prime, make g(k) return anything except a multiple of the TableSize

What are the running times for: insert Best: ๐‘‚( 1 ) Worst: ๐‘‚(๐‘›) (we have to make sure the key isnโ€™t already in the bucket.) find Best: ๐‘‚( 1 ) Worst: ๐‘‚(๐‘›) delete Best: ๐‘‚( 1 ) Worst: ๐‘‚(๐‘›)

Running Times

CSE 332 SU 18 โ€“ ROBBIE WEBER

In-Practice

For open addressing:

Weโ€™ll assume youโ€™ve set ๐œ† appropriately, and that all the operations are ฮ˜ 1.

The actual dependence on ๐œ† is complicated โ€“ see the textbook (or ask me in office hours)

And the explanations are well-beyond the scope of this course.