Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Query Evaluation Techniques - Database Management Systems - Lecture Slides, Slides of Introduction to Database Management Systems

Some concept of Database Management Systems are Access Methods Layer, Basic Structure, Common Structures, Designing Systems, Join Processing, Modern Computers, Query Evaluation Techniques. Main points of this lecture are: Query Evaluation Techniques, Different, Algorithms, Execute, Relational Operators, Selection, Projection, Joins, Aggregates, Engine

Typology: Slides

2012/2013

Uploaded on 04/27/2013

prakash
prakash 🇮🇳

4.6

(10)

63 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Database Management Systems Design
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Query Evaluation Techniques - Database Management Systems - Lecture Slides and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Database Management Systems Design

Query Evaluation Techniques

  • Read :
    • Chapter 12, sec 12.1-12.
    • Chapter 13
  • Purpose:
    • Study different algorithms to execute (evaluate) SQL relational operators - Selection - Projection - Joins - Aggregates - Etc.

Processing a query

  • Parser –
    • transforms query into a tree expression (parse tree)
    • Looks into catalog for metadata about tables, attributes, operators, etc.
  • Optimizer
    • transforms query expression tree into a query plan tree
      • Tree with metadata about relations, attributes, operators, and algorithms to run each operator
    • Searches several alternative query plan trees to find the cheapest - Based on I/O cost, CPU cost (and network cost)
  • Execution Engine
    • Takes the plan from optimizer, interprets it and runs it.

Issues in selecting a query plan

  • Need to understand cost of different plan
    • Plan – algorithm to run a given relational operator
    • Example- Selection: “Get all students with gpa = 4.00”
      • Plan 1 – scan heap file to find records with gap 4.
      • Plan 2 – Use an index on gpa to find records with 4.
      • Plan 3 – Sort table students on gap attribute, then scan sorted records for those with gpa = 4.
  • Need to have statistics about:
    • Relation size, attribute size
    • Distribution of attribute values
      • Uniform vs skewed
    • Disk speed
    • Memory size
    • Etc.

What information is stored?

  • Table
    • Table name
    • File that stores and file type (heap file, clustered B+ tree, …)
    • Attributes names and types
    • Indices defined on the table
    • Integrity constrains
    • Statistics
  • Index
    • Index name and type of structure (B+ tree, external hash, …)
    • Search key
    • Statistics
  • Views
    • View name and definition

What are the statistics?

  • Relational Cardinality
    • Number of tuples in table R : Ntuples(R)
  • Relation Size
    • Number of pages used to store R : NPages(R)
  • Index Cardinality
    • Number of distinct key values in index I : NKeys(I)
  • Index size
    • Number of pages for index I: NPages(I)
      • B+-tree – number of leaf pages
  • Number of Tuples Per Page
    • Number of tuples in a page (avg) R or I: NTPages(R)
  • Index Height
    • Number of non-leaf levels: IHeight(I)
  • Index Range
    • Min and max values in the index: ILow(I) and IHigh(I)

Sample Catalog

  • Tables:
    • Sailors(sid:integer, sname:string, rating:integer, age:real);
    • Reservations(sid:integer, bid:integer, day:dates,rname:string);
  • Catalog table for attributes:
    • Attribut_Cat(attr_name:string, rel_name:string, type:string, position:integer)

Catalog Instance: Attribute_Cat

  • Attr_name Attribute_Cat string Attr_name Rel_name Type Position
    • Rel_name Attribute_Cat string - Type Attribute_Cat string
      • Position Attribute_Cat Integer - Sid Sailors Integer
        • Sname Sailors string
          • Rating Sailors integer - Age Sailors real - Sid Reserves integer - Bid Reserves Integer - Day Reserves Dates
          • rname Reserves string

Single-Table access paths

  • Consider SQL query: Select sid, slogin, sname From Students Where gpa == 4.00 and age < 25;
  • Need to read tuples and evaluate where

clause!

  • Accessing a table mostly done via two kinds

of access paths

  • File Scan
    • Read each page on data file (e.g. heap file) and get every tuple, then evaluate predicate

SQL and relational algebra

  • Typically query parser will transform SQL

query into parse tree, and then into query

plan tree

  • Query plan tree is a tree that represents a

relational algebra expression!

  • Every node in the tree implements a relational

operator + the scan operator.

  • The scan operator is used to fetch tuples from

disk

  • First access path that gets evaluated! Docsity.com

Index Matching

  • Given a query Q, with a predicate (conjunct)

p, we say that an index I matches the

predicate p if and only if

  • the index can be used to retrieve just the tuples that satisfy p.
  • The index might match all the conjuncts in

the selection or just a few (or even just one)

  • Primary conjuncts – conjuncts that are

matched by the index

  • If the index matches the whole where

Rules for matching

  • Hash index: one or more conjunct in the

form attribute = value in a selection have

attributes that match the index search key.

  • Need to include all attributes in search key
  • Tree Index (B+-tree or ISAM): one or more

terms of the form attribute op value in a

selection have attributes that match a

prefix of the search key.

  • op can be any of : >, <, =, <>, >=, <=
  • Note on prefixes:
  • If the search key for a B+tree is: <a, b, c> the following are prefixes of this search key: Docsity.com

More examples

  • Index on Reserves with search key <bid, sid>
  • Hash Index or B-tree match the condition
    • rname = “Tom” and bid = 10 and sid = 4
  • Why?
    • Can be rewritten as bid = 10 and sid = 4 and rname = “Tom”
    • First two conjuncts bid = 10 and sid = 4 match search key
    • Conjuct rname = “Tom” must be evaluate afterward Docsity.com

Selectivity of Access paths

  • Each access paths a selectivity, which is the

number of pages to be retrieved by it

  • Both data pages and index pages (if any)
  • It is helpful to define the notion of selectivity

factors for conjuncts

  • Given a predicate p, the selectivity factor p is the fraction of tuples in a relation R that satisfy p.
  • Denoted as SF p
  • Selectivity for conjunct p applied to a relation

R can then be computed as:

NTPages R

SF NTuples R selectivity p