



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Project; Class: Database Management Systems; Subject: Computer Science; University: Brown University; Term: Fall 2009;
Typology: Study Guides, Projects, Research
1 / 7
This page cannot be seen from the preview
Don't miss anything!
Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.
Ben Koen runs I SQL YOU!TM, a service which will execute your SQL queries when given a database. Unfortunately, Ben hasn’t heard of trees as data structures (incidentally, his family tree doesn’t split). Luckily for you, Ben hires you not to implement his B-Tree (he made last year’s students do that), but to work with him in handling the queries. “Some people!” exclaims Ben, “don’t know the love that comes from a massive, slow join!” Save Ben’s business before Ben destroys it.
We task you with the following:
Your code will implement a cost model: a relatively small program that receives queries and uses information about the database to help the query optimizer pick the best query plan. Most DBMS’s have a stage during optimization that comes before the cost model: query rewriting. Usually, after a query has been parsed into a relational algebra expression from the SQL, a rewriter creates an equivalent RA expression that will cost less in most cases. An example of this would be the RA expression:
σamount > 2500 (borrower ./ loan) Which can be re-written to the equivalent expression:
borrower ./ (σamount > 2500 (loan)) which usually costs less. We are not asking you to implement a re-writer.^1 We only ask for the cost model and plan generator.
(^1) unless you want grad credit. See ’Grad Credit’.
Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.
3 Plan Generator
As mentioned in lecture, there are several ways a DBMS can use its indices to select data as appropriate (see The Plans, below). Given metadata about the database, the plan generator finds the lowest-cost option. This is done with a number of formulas. For your plan generator, you will need to use the following data accessible by the methods outlined:
Your job for this portion is to implement the calculateStatistics() method in Statis- tics.java. You must scan through the file and collect the variance and cardinality for each field.
Each PlanNode takes either a table or some children PlanNodes and estimates its cost based on the data available from each. As an example, take the following query:
Πbranch name(σcustomer city=“Harrison”(customer ./ (account ./ depositor)))
This evaluates to a tree like:
Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.
many different values there are) for each attribute in a table. You will only be writing the code to calculate variance, the cardinality is calculated on the fly in Statistics.getCardinality().
5 The Plans
For each plan node, write the following methods within the class:
We have provided explanations of the following plan nodes to help you figure out a cost model. Remember to write down your justification for your cost model in your README!
Input: PlanNode child Op o Description: An aggregate operation reads over the child node and applies the operation. The result is just an INT TYPE.
Input: PlanNode outer PlanNode inner Description: In general, a join operation can take any two relations to join. In this case, we assume that the outer relation has a foreign key dependency on the primary key of the inner table. So the tuple count of the join result is the same as that of the outer relation. The JoinNode class contains the estimateTupleCount() and estimateTupleSize() methods for all join operations, as these estimates will be the same for all join operations. The tuple size for join operations will merely be the sum of the tuples of both child operators.
5.2.1 Block Nested Loops Join
This is the operator you wrote in Executor. Read all blocks in the inner operator for every block in the outer operator and join.
Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.
5.2.2 Index Nested Inner Join
This is similar to block nested loops join, except you traverse an index of the inner operator rather than the entire relation.
5.2.3 Merge Inner Join
Assume that the sorting has already been done. You receive two sorted operators and are expected to join them.
Input: PlanNode child Attribute a Description: You are filtering on a predicate, assume here that it is EQUAL. You will need to look at the entire relation. The resulting relation will be related to the cardinality of the attribute being predicated. The actual size of each tuple will remain the same.
Input: Table t Attribute selection Description: Seeking an element in a primary index will depend on the size and depth of the index.
Input: Table t Attribute selection Description: Seeking an element in a secondary index will also depend on the size and depth of the index.
Input: PlanNode child Attributes a Description: Projecting merely removes columns, so the read cost and result size should be the same as the child, but the tuple size will only be the length of the attributes being projected.
Project 3- Query Optimization Sunday, December 13, 2009, 11:59 P.M.
With the second operating on much smaller amounts of data. The simplest method to do this is called predicate pushdown. You must find the columns in a query plan which are selected out of the table and pick the ones which are never used in any operator other than the outermost selection (as seen in the above example). It is intuitive to see that the ordering of joins in a multi-join query plan could also affect the running time. If you know that attributes A,B exist in tables R 1 ,R 2 ,R 3 , where |R 1 | ≤ |R 2 | ≤ |R 3 |. You know that any query which joins the smaller two together before it joins with the bigger one will perform better than a query which joins the larger two first.
8 The Code
Retrieve the code like you did last time from https://database.cs.brown.edu/svn/cs127rep/QueryOptimizer.
9 Handin
We expect the following components to be included in your handin:
The code you wrote must pass all the unit tests in plannodes.test. If it does not, make sure you have a solid, reasonable explanation in your README. You can handin your project using the script:
/course/cs127/bin/cs127_handin optimizer
Good luck, and as always, feel free to ask TA’s any questions you like!