Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Query Optimization - Database Management Systems - Project 3 | CSCI 1270, Study Guides, Projects, Research of Database Management Systems (DBMS)

Brown University Database Management Systems (DBMS)

Material Type: Project; Class: Database Management Systems; Subject: Computer Science; University: Brown University; Term: Fall 2009;

Typology: Study Guides, Projects, Research

2009/2010

Uploaded on 02/24/2010

koofers-user-k0j 🇺🇸

10 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

CS 127

Project 3- Query Optimization

Database Management Systems

Sunday, December 13, 2009, 11:59 P.M.

Project 3

Query Optimization

Out: November, 23, 2009

In: Sunday, December 13, 2009, 11:59 P.M.

1 Silly Premise

Ben Koen runs I SQL YOU!TM, a service which will execute your SQL queries when given

a database. Unfortunately, Ben hasn’t heard of trees as data structures (incidentally, his

family tree doesn’t split).

Luckily for you, Ben hires you not to implement his B-Tree (he made last year’s students

do that), but to work with him in handling the queries. “Some people!” exclaims Ben,

“don’t know the love that comes from a massive, slow join!”

Save Ben’s business before Ben destroys it.

2 Introduction

We task you with the following:

•Collecting statistics on the data

•Generating a cost estimation for a given query plan

•Giving a justification for your cost model

Your code will implement a cost model: a relatively small program that receives queries

and uses information about the database to help the query optimizer pick the best query

plan.

Most DBMS’s have a stage during optimization that comes before the cost model: query

rewriting. Usually, after a query has been parsed into a relational algebra expression from

the SQL, a rewriter creates an equivalent RA expression that will cost less in most cases.

An example of this would be the RA expression:

σamount >2500(borrower ./ loan)

Which can be re-written to the equivalent expression:

borrower ./ (σamount >2500(loan))

which usually costs less. We are not asking you to implement a re-writer.1We only ask

for the cost model and plan generator.

1unless you want grad credit. See ’Grad Credit’.

Project 3- Query Optimization December 11, 2009 1

Partial preview of the text

Download Query Optimization - Database Management Systems - Project 3 | CSCI 1270 and more Study Guides, Projects, Research Database Management Systems (DBMS) in PDF only on Docsity!

Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.

Project 3

Query Optimization

Out: November, 23, 2009

In: Sunday, December 13, 2009, 11:59 P.M.

1 Silly Premise

Ben Koen runs I SQL YOU!TM, a service which will execute your SQL queries when given a database. Unfortunately, Ben hasn’t heard of trees as data structures (incidentally, his family tree doesn’t split). Luckily for you, Ben hires you not to implement his B-Tree (he made last year’s students do that), but to work with him in handling the queries. “Some people!” exclaims Ben, “don’t know the love that comes from a massive, slow join!” Save Ben’s business before Ben destroys it.

2 Introduction

We task you with the following:

Collecting statistics on the data
Generating a cost estimation for a given query plan
Giving a justification for your cost model

Your code will implement a cost model: a relatively small program that receives queries and uses information about the database to help the query optimizer pick the best query plan. Most DBMS’s have a stage during optimization that comes before the cost model: query rewriting. Usually, after a query has been parsed into a relational algebra expression from the SQL, a rewriter creates an equivalent RA expression that will cost less in most cases. An example of this would be the RA expression:

σamount > 2500 (borrower ./ loan) Which can be re-written to the equivalent expression:

borrower ./ (σamount > 2500 (loan)) which usually costs less. We are not asking you to implement a re-writer.^1 We only ask for the cost model and plan generator.

(^1) unless you want grad credit. See ’Grad Credit’.

Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.

3 Plan Generator

3.1 Generating the Cost Model

As mentioned in lecture, there are several ways a DBMS can use its indices to select data as appropriate (see The Plans, below). Given metadata about the database, the plan generator finds the lowest-cost option. This is done with a number of formulas. For your plan generator, you will need to use the following data accessible by the methods outlined:

nr: The number of tuples in r. Statistics.getTuples()
br: The number of blocks in r. Statistics.getPages()
fr: The block size (number of tuples per block) of r (typically br = dnr/fre). Global for the system BufferPool.PAGE SIZE
sr: Tuple size (in bytes). Statistics.getTupleSize()
V(att, r): Range of values for attribute att in r (called the Attribute Variance). Statistics.getVariance()
SC(att, r): Number of possible values for attribute att in r (called the Selection Car- dinality). You may assume for simplicity that we only deal with continuous intervals. Statistics.getCardinality()

Your job for this portion is to implement the calculateStatistics() method in Statis- tics.java. You must scan through the file and collect the variance and cardinality for each field.

3.2 Using the Cost Model for Plan Generation

Each PlanNode takes either a table or some children PlanNodes and estimates its cost based on the data available from each. As an example, take the following query:

Πbranch name(σcustomer city=“Harrison”(customer ./ (account ./ depositor)))

This evaluates to a tree like:

Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.

many different values there are) for each attribute in a table. You will only be writing the code to calculate variance, the cardinality is calculated on the fly in Statistics.getCardinality().

5 The Plans

For each plan node, write the following methods within the class:

estimateDiskCost() - estimate how many units of time will be spent fetching from disk for the current node, which should include the cost of the descendants
estimateTupleCount() - estimate how many tuples the result of this node will contain
estimateTupleSize() - estimate how big an individual tuple will be

We have provided explanations of the following plan nodes to help you figure out a cost model. Remember to write down your justification for your cost model in your README!

5.1 Aggregate

Input: PlanNode child Op o Description: An aggregate operation reads over the child node and applies the operation. The result is just an INT TYPE.

5.2 Joins

Input: PlanNode outer PlanNode inner Description: In general, a join operation can take any two relations to join. In this case, we assume that the outer relation has a foreign key dependency on the primary key of the inner table. So the tuple count of the join result is the same as that of the outer relation. The JoinNode class contains the estimateTupleCount() and estimateTupleSize() methods for all join operations, as these estimates will be the same for all join operations. The tuple size for join operations will merely be the sum of the tuples of both child operators.

5.2.1 Block Nested Loops Join

This is the operator you wrote in Executor. Read all blocks in the inner operator for every block in the outer operator and join.

Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.

5.2.2 Index Nested Inner Join

This is similar to block nested loops join, except you traverse an index of the inner operator rather than the entire relation.

5.2.3 Merge Inner Join

Assume that the sorting has already been done. You receive two sorted operators and are expected to join them.

5.3 Filter

Input: PlanNode child Attribute a Description: You are filtering on a predicate, assume here that it is EQUAL. You will need to look at the entire relation. The resulting relation will be related to the cardinality of the attribute being predicated. The actual size of each tuple will remain the same.

5.4 Primary Index Seek

Input: Table t Attribute selection Description: Seeking an element in a primary index will depend on the size and depth of the index.

5.5 Secondary Index Seek

Input: Table t Attribute selection Description: Seeking an element in a secondary index will also depend on the size and depth of the index.

5.6 Project

Input: PlanNode child Attributes a Description: Projecting merely removes columns, so the read cost and result size should be the same as the child, but the tuple size will only be the length of the attributes being projected.

Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.

With the second operating on much smaller amounts of data. The simplest method to do this is called predicate pushdown. You must find the columns in a query plan which are selected out of the table and pick the ones which are never used in any operator other than the outermost selection (as seen in the above example). It is intuitive to see that the ordering of joins in a multi-join query plan could also affect the running time. If you know that attributes A,B exist in tables R 1 ,R 2 ,R 3 , where |R 1 | ≤ |R 2 | ≤ |R 3 |. You know that any query which joins the smaller two together before it joins with the bigger one will perform better than a query which joins the larger two first.

8 The Code

Retrieve the code like you did last time from https://database.cs.brown.edu/svn/cs127rep/QueryOptimizer.

9 Handin

We expect the following components to be included in your handin:

The plannodes package with all estimation functions filled in
The simpledb.Statistics class
README justifying your formulas for each PlanNode

The code you wrote must pass all the unit tests in plannodes.test. If it does not, make sure you have a solid, reasonable explanation in your README. You can handin your project using the script:

/course/cs127/bin/cs127_handin optimizer

Good luck, and as always, feel free to ask TA’s any questions you like!

Query Optimization - Database Management Systems - Project 3 | CSCI 1270, Study Guides, Projects, Research of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Query Optimization - Database Management Systems - Project 3 | CSCI 1270 and more Study Guides, Projects, Research Database Management Systems (DBMS) in PDF only on Docsity!

Project 3

Query Optimization

Out: November, 23, 2009

In: Sunday, December 13, 2009, 11:59 P.M.

1 Silly Premise

2 Introduction

3.1 Generating the Cost Model

3.2 Using the Cost Model for Plan Generation

5.1 Aggregate

5.2 Joins

5.3 Filter

5.4 Primary Index Seek

5.5 Secondary Index Seek

5.6 Project