Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Query Processing in Database Systems: A Comprehensive Guide, Slides of Introduction to Database Management Systems

University of Waterloo Introduction to Database Management Systems

Query Processing, Query Processing Components, Possible Execution Plan, Pictorial Representation, Query Processing Methodology, Query Normalization, Simplification, Restructuring, Index Selection Guidelines, Clustering and Joins, Selecting Alternatives, Optimization Granularity, Query Optimization, Relation, Cost Calculation

Typology: Slides

2011/2012

Uploaded on 02/13/2012

richiey 🇨🇦

32 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

7-1

Query Processing

high level user query (SQL)

Query

Processor

low level data manipulation

commands

(execution plan)

Query Compiler

Plan

Generator

Plan

Cost

Estimator

Plan Evaluator

7-2

Query Processing Components

■

Query language that is used

●

SQL: “intergalactic dataspeak”

■

Query execution methodology

●

The steps that one goes through in executing high-level

(declarative) user queries.

■

Query optimization

●

How do we determine a good execution plan?

Partial preview of the text

Download Query Processing in Database Systems: A Comprehensive Guide and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Query Processing

high level user query (SQL)

Query

Processor

low level data manipulation

commands

(execution plan)

Query Compiler

Plan

Generator

Plan

Cost

Estimator

Plan Evaluator

Query Processing Components

Query language that is used

SQL: “intergalactic dataspeak”

Query execution methodology

The steps that one goes through in executing high-level

(declarative) user queries.

Query optimization

How do we determine a good execution plan?

What are we trying to do?

Consider query

“For each project whose budget is greater than $250000 and which employs more than two employees, list the names and titles of employees.”

In SQL

SELECT Ename, Title

FROM Emp, Project, Works

WHERE Budget > 250000

AND Emp.Eno=Works.Eno

AND Project.Pno=Works.Pno

AND Project.Pno IN

(SELECT w.Pno

FROM Works w

GROUP BY w.Pno

HAVING SUM (*) > 2)

How to execute this query?

A Possible Execution Plan

1. T 1 ← Scan Project table and select all tuples

with Budget value > 250000

2. T 2 ← Join T 1 with the Works relation

3. T 3 ← Join T 2 with the Emp relation

4. T 4 ← Group tuples of T 3 over Pno

5. Scan tuples in each group of T 4 and for groups

that have more than 2 tuples, Project over

Ename, Title

Note: Overly simplified – we’ll detail later.

Lexical and syntactic analysis

check validity (similar to compilers) check for attributes and relations type checking on the qualification

Put into (query normal form

Conjunctive normal form ( p 11 ∨ p 12 ∨…∨ p 1 n ) ∧…∧ ( p (^) m 1 ∨ p (^) m 2 ∨…∨ p (^) mn ) Disjunctive normal form ( p 11 ∧ p 12 ∧…∧ p 1 n ) ∨…∨ ( p (^) m 1 ∧ p (^) m 2 ∧…∧ p (^) mn ) OR's mapped into union AND's mapped into join or selection

Query Normalization

Refute incorrect queries

Type incorrect

If any of its attribute or relation names are not defined in

the global schema

If operations are applied to attributes of the wrong type

Semantically incorrect

Components do not contribute in any way to the

generation of the result

Only a subset of relational calculus queries can be tested

for correctness

Those that do not contain disjunction and negation

To detect

➠ connection graph (query graph) ➠ join graph

Analysis

SELECT Ename,Resp FROM Emp, Works, Project WHERE Emp.Eno = Works.Eno AND Works.Pno = Project.Pno AND Pname = ‘CAD/CAM’ AND Dur > 36 AND Title = ‘Programmer’

Query graph (^) Join graph

Analysis – Example

Dur>

Pname=ëCAD/CAMí

Ename

Emp.Eno=Works.Eno Works.Pno=Project.Pno

RESULT

Title = ëProgrammerí

Resp

Works Emp.Eno=Works.Eno Works.Pno=Project.Pno

Emp Project Emp^ Project

Works

If the query graph is not connected, the query

may be wrong.

SELECT Ename,Resp FROM Emp, Works, Project WHERE Emp.Eno = Works.Eno AND Pname = ‘CAD/CAM’ AND Dur > 36 AND Title = ‘Programmer’

Analysis

Pname=ëCAD/CAMí

Ename (^) RESULT

Resp

Works

Emp Project

Convert SQL to relational algebra

Make use of query trees

Example

SELECT Ename FROM Emp, Works, Project WHERE Emp.Eno = Works.Eno AND Works.Pno = Project.Pno AND Ename <> ‘J. Doe’ AND Pname = ‘CAD/CAM’ AND (Dur = 12 OR Dur = 24)

Restructuring

ΠENAME

σDUR=12 OR DUR=

σPNAME=ìCAD/CAMî

σENAME≠ìJ. DOEî

Project Works Emp

Project

Select

Join

⋈PNO

⋈ENO

How to implement operators

Selection (assume R has n pages)

Scan without an index – O( n)

Scan with index

➠ B +^ index – O(log n ) ➠ Hash index – O(1)

Projection

Without duplicate elimination – O( n )

With duplicate elimination

➠ Sorting-based – O( n log n ) ➠ Hash-based – O( n + t ) where t is the result of hashing phase

How to implement operators

(cont’d)

Join

Nested loop join: R ⋈ S

foreach tuple r∈R do

foreach tuple s∈S do

if r==s then add <r,s> to result

O( n * m )

Improvements possible by

➠ page-oriented nested loop join ➠ block-oriented nested loop join

How to implement operators

(cont’d)

Join

Index nested loop join: R ⋈ S

foreach tuple r∈R do use index on join attr. to find tuples of S foreach such tuple s∈S do add <r,s> to result

Sort-merge join

➠ Sort R and S on the join attribute ➠ Merge the sorted relations

Hash join

➠ Hash R and S using a common hash function ➠ Within each bucket, find tuples where r = s

Example 2

SELECT e.Ename, w.Resp

FROM Emp e, Works w

WHERE e.Age BETWEEN 45 AND 60

AND e.Title=‘Programmer’

AND e.Eno=w.Eno

Clearly, Emp should be the outer relation.

Suggests that we build a hash index on w.Eno.

What index should we build on Emp?

B+ tree on e.Age could be used, OR an index on e.Title could be used. Only one of these is needed, and which is better depends upon the selectivity of the conditions. ➠ As a rule of thumb, equality selections more selective than range selections.

As both examples indicate, our choice of indexes is guided by

the plan(s) that we expect an optimizer to consider for a query.

Have to understand optimizers!

Examples of Clustering

SELECT e.Title

FROM Emp e

WHERE e.Age > 40

B+ tree index on e.Age can be used to get

qualifying tuples.

How selective is the condition?

Is the index clustered?

Clustering and Joins

SELECT e.Ename, p.Pname

FROM Emp e, Project p

WHERE p.Budget=‘350000’

AND e.City=p.City

Clustering is especially important when accessing inner tuples

in Index Nested Loop join.

Should make index on e.City clustered.

Suppose that the WHERE clause is instead:

WHERE e.Title=‘Programmer’ AND e.City=p.City If many employees are Programmers, Sort-Merge join may be worth considering. A clustered index on p.City would help.

Summary: Clustering is useful whenever many tuples are to

be retrieved.

SELECT Ename

FROM Emp e,Works w

WHERE e.Eno = w.Eno

AND w.Dur > 37

Strategy 1

ΠENAME (σDUR>37∧EMP.ENO=ASG.ENO (Emp × Works))

Strategy 2

ΠENAME (Emp ⋈ENO (σDUR>37 (Works)))

Strategy 2 is “better” because

It avoids Cartesian product It selects a subset of Works before joining

How to determine the “better” alternative?

Selecting Alternatives

Query Optimization Issues –

Optimization Timing

Static

compilation ⇒ optimize prior to the execution

difficult to estimate the size of the intermediate results

⇒ error propagation

can amortize over many executions

Dynamic

run time optimization

exact information on the intermediate relation sizes

have to reoptimize for multiple executions

Hybrid

compile using a static algorithm

if the error in estimate sizes > threshold, reoptimize at

run time

Query Optimization Issues –

Statistics

Relation

cardinality

size of a tuple

fraction of tuples participating in a join with another

relation

Attribute

cardinality of domain

actual number of distinct values

Common assumptions

independence between different attribute values

uniform distribution of attribute values within their

domain

Query Optimization Components

Cost function (in terms of time)

I/O cost + CPU cost

These might have different weights

Can also maximize throughput

Solution space

The set of equivalent algebra expressions (query trees).

Search algorithm

How do we move inside the solution space?

Exhaustive search, heuristic algorithms (iterative

improvement, simulated annealing, genetic,…)

Cost Calculation

Cost function takes CPU and I/O processing into

account

Instruction and I/O path lengths

Estimate the cost of executing each node of the

query tree

Is pipelining used or are temporary relations created?

Estimate the size of the result of each node

Selectivity of operations – “reduction factor”

Error propagation is possible

Join

Special case: A is a key of R and B is a foreign key

of S;

card ( R ⋈ A=B S ) = card ( S )

More general:

card ( R ⋈ S ) = SF J ∗ card ( R ) ∗ card ( S )

Intermediate Relation Size

Search Space

Characterized by “equivalent” query plans

Equivalence is defined in terms of equivalent query

results

Equivalent plans are generated by means of

algebraic transformation rules

The cost of each plan may be different

Focus on joins

Search Space – Join Trees

For N relations, there are O( N !)

equivalent join trees that can be

obtained by applying commutativity

and associativity rules

SELECT Ename,Resp FROM Emp, Works, Project WHERE Emp.Eno=Works.Eno AND Works.PNO=Project.PNO

Project

Emp Works

Project (^) Works

Emp

Project

Works

Emp

×

Commutativity of binary operations

R × S ⇔ S × R

R ⋈ S ⇔ S ⋈ R

R ∪ S ⇔ S ∪ R

Associativity of binary operations

( R × S ) × T ⇔ R × ( S × T )

( R ⋈ S ) ⋈ T ⇔ R ⋈ ( S ⋈ T )

Idempotence of unary operations

Π A ’(Π A ’’( R )) ⇔ Π A ’( R )

σ p 1 ( A 1 )(σ p 2 ( A 2 )( R )) = σ p 1 ( A 1 ) ∧ p 2 ( A 2 )( R )

where R [ A ] and A' ⊆ A , A" ⊆ A and A' ⊆ A"

Transformation Rules

Equivalent Query

ΠEname

σPname=`CAD/CAMí ∧ (Dur=12 ∨ Dur=24) ∧ Ename<>`J. DOEí

×

Works Project Emp

Emp

ΠEname

σEname <> `J. Doeí

Project Works

ΠEno,Ename

σPname = `CAD/CAMí

ΠPno

σDur = 12 ∧ Dur=

ΠPno,Eno

ΠPno,Ename

Another Equivalent Query

Search Strategy

How to “move” in the search space.

Deterministic

Start from base relations and build plans by adding one relation at each step Dynamic programming: breadth-first Greedy: depth-first

Randomized

Search for optimalities around a particular starting point Trade optimization time for execution time Better when > 5-6 relations Simulated annealing Iterative improvement

Search Algorithms

Restrict the search space

Use heuristics ➠ E.g., Perform unary operations before binary operations Restrict the shape of the join tree ➠ Consider only linear trees, ignore bushy ones

R 1 R 2

R 3

R 4

Linear Join Tree

R 1 R 2 R 3 R 4

Bushy Join Tree

Query Processing in Database Systems: A Comprehensive Guide, Slides of Introduction to Database Management Systems

Related documents

Partial preview of the text

Download Query Processing in Database Systems: A Comprehensive Guide and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Query Processing

high level user query (SQL)

Query

Processor

low level data manipulation

commands

(execution plan)

Query Compiler

Plan

Generator

Plan

Cost

Estimator

Plan Evaluator

Query Processing Components

Query language that is used

SQL: “intergalactic dataspeak”

Query execution methodology

The steps that one goes through in executing high-level

(declarative) user queries.

Query optimization

How do we determine a good execution plan?

What are we trying to do?

Consider query

In SQL

SELECT Ename, Title

FROM Emp, Project, Works

WHERE Budget > 250000

AND Emp.Eno=Works.Eno

AND Project.Pno=Works.Pno

AND Project.Pno IN

(SELECT w.Pno

FROM Works w

GROUP BY w.Pno

HAVING SUM (*) > 2)

How to execute this query?

A Possible Execution Plan

1. T 1 ← Scan Project table and select all tuples

with Budget value > 250000

2. T 2 ← Join T 1 with the Works relation

3. T 3 ← Join T 2 with the Emp relation

4. T 4 ← Group tuples of T 3 over Pno

5. Scan tuples in each group of T 4 and for groups

that have more than 2 tuples, Project over

Ename, Title

Note: Overly simplified – we’ll detail later.

Lexical and syntactic analysis

Put into (query normal form

Query Normalization

Refute incorrect queries

Type incorrect

If any of its attribute or relation names are not defined in

the global schema

If operations are applied to attributes of the wrong type

Semantically incorrect

Components do not contribute in any way to the

generation of the result

Only a subset of relational calculus queries can be tested

for correctness

Those that do not contain disjunction and negation

To detect

Analysis

Analysis – Example

If the query graph is not connected, the query

may be wrong.

Analysis

Convert SQL to relational algebra

Make use of query trees

Example

Restructuring

ΠENAME

σDUR=12 OR DUR=

σPNAME=ìCAD/CAMî

σENAME≠ìJ. DOEî

⋈PNO

⋈ENO