Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Hash-Based Indexing in Database Management Systems: Static, Extensible, and Linear Hashing, Slides of Introduction to Database Management Systems

An in-depth exploration of hash-based indexing techniques used in database management systems (dbms). The discussion covers static hashing, extensible hashing, and linear hashing, their operations, issues, and tradeoffs. Static hashing uses a fixed number of buckets, extensible hashing allows the number of buckets to grow or shrink, and linear hashing is a dynamic hashing technique with no need for a directory.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

prakash
prakash 🇮🇳

4.6

(10)

63 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Database Management Systems Design
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Hash-Based Indexing in Database Management Systems: Static, Extensible, and Linear Hashing and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Database Management Systems Design

Hash-Based Indexing

• Read

  • New Book: Chapter 11

• Hash methods can be used for index files to support

efficient searches by equality

  • Often require 1 to 2 I/O operation

• Three type of hashing schemes

  • Static Hashing
  • Extensible Hashing
  • Linear Hashing

• In practice, commercial DBMS use hashing indexing for

temporary calculations

  • Aggregation and joins
  • Tree-based indices are use as actual indices on relations

Hash Index (clustered)

121 Jil NY $ 123 Bob NY $ 1237 Pat WI $ 2381 Bill LA $ 8387 Ned SJ $ 4882 Al SF $ 9403 Ned NY $ 81982 Tim MIA $

H()

Account

attribute

Hash Index (Unclustered)

121 Jil NY $ 123 Bob NY $ 1237 Pat WI $ 2381 Bill LA $ 8387 Ned SJ $ 4882 Al SF $ 9403 Ned NY $ 81982 Tim MIA $

H()

LA
NY
NY
NY
MIA
SJ
SF
WI

city

Static Hashing: Issues

• Number of primary buckets is fixed at file creation

• Hash function maps key to a bucket number

• Typical hash function

  • H(k) = a*k + b
  • Bucket number = h(k) mod N

• Key can be int or char

  • Char – each character is mapped to ASCII, and all value are added to get an integer
  • Parameters a and b are choose to tune the distribution of values (i.e. need to play with this values to get them right …)

• When a primary bucket get full, need to create an overflow

page and chain it to primary bucket.

Static hashing: Operations

• Search for key value k:

  • Hash k to find the bucket, call this bucket B
  • Search records in B to find the one(s) with key k
  • If records are found
    • clustered , the data record is there: Cost: 1 I/O
    • unclustered , need to fetch the actual data page: Cost 2 I/Os
  • If records are not found, need to search in overflow pages (if there are any) - Clustered : Cost: (1 + number of pages searched) * I/O - Unclustered : Cost: (2 + number of pages searched) * I/O

• The more overflow page you have, the worst the

performance get

  • Need to keep overflow pages to 1 or 2, but rarely gets done!

Extensible hashing

• Allows the number of buckets to grow or shrink

• Hash function hashes to slots in a directory

  • Slots store the page id of the bucket
  • Directory can be kept in buffer pool
  • Directory can have hundreds or thousand of slots to buckets

• When a bucket gets full

  • Create a new bucket and split records between the new and full bucket - Redistributes the data - Hash function still works!!!
  • Overflow page is need only if you have many duplicate records

Binary Pattern Hashing Technique

• Hash function will map search key to binary

pattern

– Ex h(51) = 00110011

• Last d bits in the pattern are taken as bucket

number!

– Ex. If d = 2, then h(51) = 00110011 will yield

bucket number 11, which is 3 in binary

  • Thus, 51 goes to bucket 3

• The number of d of bits used to hash the

search key is called the depth

• Two types of depths

The use of the depth

• Depth tells us the number of bits that we need

to use to pick a bucket

– Ex. H(4) = 100, d = 2, tell us to use 00 to identify

slot. This would be slot 00.

• Directory has a global depth

– Used to hash key to proper slot

• Each bucket has a local depth

– Used when bucket need to be slipt

• Let us see what happens when we need to

insert the value 22 into the hash index Docsity.com

Hash File before the Insert 22

Directory

Page

Bucket A

Bucket B

Bucket C

Bucket D

The issue of a full bucket: Insert 20

Directory

Page

Bucket A

Bucket B

Bucket C

Bucket D

H(20) = 10100

d = 2

gives slot 00.

This bucket is

Full

Splitting a bucket

• A full bucket gets split into two buckets

– Their directory slots are called corresponding

elements

• These buckets have the same hash value at

the current depth d

• But at depth d + 1, they differ by 1 bit

– one has a 1 at bit position d + 1

– the other has a 0 at bit position d + 1

• Example:

– Bucket A is split into two buckets: bucket A andDocsity.com

Corresponding elements & buckets

2 Bucket A

3 Bucket A

3 Bucket A

Operation insert 20

After

Split

Original bucket A Split image of bucket A

splitting

Local depth is changed from 2 to 3

Need 3 bits for hashing

Hash = 00

Hash = 000

Hash = 100

Expanding the Directory

Directory

Page

Bucket A

Bucket B

Bucket C

Bucket D

3 Bucket A

Number of slots

is not enough.

Need to double

size of the

directory

and increase

Global depth