Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ER-to-Relational Mapping and Database Normalization, Slides of Introduction to Database Management Systems

An overview of er-to-relational mapping and database normalization. It covers the principles of improving relations, avoiding information repetition, and dealing with anomalies. The document also introduces the concepts of normal forms, functional dependencies, and superkeys. It provides examples and explanations to help understand these concepts.

Typology: Slides

2011/2012

Uploaded on 02/13/2012

richiey
richiey 🇨🇦

32 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
5-1
Design Process -Where are we?
Conceptual
Design
Conceptual
Schema
(ER Model)
Logical
Design
Logical Schema
(Relational Model)
Step 1: ER-to-Relational
Mapping
Step 2: Normalization:
“Improving” the design
5-2
nRelations should have semantic unity
nInformation repetition should be avoided
lAnomalies: insertion, deletion, modification
nAvoid null values as much as possible
lDifficulties with interpretation
àdon’t know, don’t care, known but unavailable,
does not apply
lSpecification of joins
nAvoid spurious joins
Relational Design Principles
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download ER-to-Relational Mapping and Database Normalization and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Design Process - Where are we?

Conceptual

Design

Conceptual Schema (ER Model)

Logical

Design

Logical Schema (Relational Model)

Step 1: ER-to-Relational

Mapping

Step 2: Normalization:

“Improving” the design

n Relations should have semantic unity

n Information repetition should be avoided

l Anomalies: insertion, deletion, modification

n Avoid null values as much as possible

l Difficulties with interpretation

‡ don’t know, don’t care, known but unavailable,

does not apply

l Specification of joins

n Avoid spurious joins

Relational Design Principles

Bad Design

EMP-PROJ

Employee No Employee Name Salary Title

… Project No

Duration

Resp

Project Name

Budget

ENO

EMP-PROJ

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000 M. Smith 34000 M. Smith

Analyst Analyst 34000 A. Lee Mech. Eng. 27000 A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. 34000 L. Chu Elect. Eng. 40000 R. Davis Mech. Eng. 27000

E

E

E

E

E

E

E

E

E

E8 J. Jones Syst. Anal.^34000

PNO DURATIONRESP

P1 12 Manager P1 Analyst P2 6 Analyst P3 10 Consultant P4 48 Engineer P2 18 Programmer P2 24 Manager P4 48 Manager P3 36 Engineer P3 40 Manager

PNAME BUDGET

Instrumentation 150000 Instrumentation 150000 Database Develop. 135000

Database Develop. 135000 Database Develop. 135000

CAD/CAM 250000

CAD/CAM 250000

CAD/CAM 250000

Maintenance 310000

Maintenance 310000

n The TITLE, SALARY, BUDGET attribute values are

repeated for each project that the engineer is involved in.

l Waste of space

l Complicates updates

Information Repetition

ENO

EMP-PROJ

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000 M. Smith 34000 M. Smith

Analyst Analyst 34000 A. Lee Mech. Eng. 27000 A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. 34000 L. Chu Elect. Eng. 40000 R. Davis Mech. Eng. 27000

E

E

E

E

E

E

E

E

E

E8 J. Jones Syst. Anal.^34000

PNO DURATIONRESP

P1 12 Manager P1 Analyst P2 6 Analyst P3 10 Consultant P4 48 Engineer P2 18 Programmer P2 24 Manager P4 48 Manager P3 36 Engineer P3 40 Manager

PNAME BUDGET

Instrumentation 150000 Instrumentation 150000 Database Develop. 135000

Database Develop. 135000 Database Develop. 135000

CAD/CAM 250000

CAD/CAM 250000

CAD/CAM 250000

Maintenance 310000

Maintenance 310000

This example instance of EMP -PROJ relation violates one of the constraints in our earlier design. Which one?

n If any attribute of project (say BUDGET of P1) is

modified, all the tuples for all employees who

work on that project need to be modified.

Modification Anomaly

ENO

EMP-PROJ

ENAME TITLE SALARY

J. Doe Elect. Eng. 40000 M. Smith 34000 M. Smith

Analyst Analyst 34000 A. Lee Mech. Eng. 27000 A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. 34000 L. Chu Elect. Eng. 40000 R. Davis Mech. Eng. 27000

E

E

E

E

E

E

E

E

E

E8 J. Jones Syst. Anal. 34000

PNO DURATIONRESP

P1 12 Manager P1 Analyst P2 6 Analyst P3 10 Consultant P4 48 Engineer P2 18 Programmer P2 24 Manager P4 48 Manager P3 36 Engineer P3 40 Manager

PNAME BUDGET

Instrumentation 150000 Instrumentation 150000 Database Develop. 135000

Database Develop. 135000 Database Develop. 135000

CAD/CAM 250000

CAD/CAM 250000

CAD/CAM 250000

Maintenance 310000

Maintenance 310000

MGR

E

E

E

E

E

E

E

E

E

E

What to do?

n Take each relation individually and “improve” it in terms

of the desired characteristics

l Normal forms

‡ Atomic values (1NF)

‡ Can be defined according to keys and dependencies.

‡ Functional Dependencies ( 2NF, 3NF, BCNF)

‡ Multivalued dependencies (4NF)

l Normalization

‡ Normalization is a process of concept separation which applies a top - down methodology for producing a schema by subsequent

refinements and decompositions.

‡ Do not combine unrelated sets of facts in one table; each relation

should contain an independent set of facts.

‡ Universal relation assumption

‡ 1NF to 3NF; 1NF to BCNF

Normalization Issues

n How do we decompose a schema into a desirable normal

form?

n What criteria should the decomposed schemas follow in order

to preserve the semantics of the original schema?

l Reconstructability: recover the original relation ⇒ no spurious joins

l Lossless decomposition: no information loss

l Dependency preservation: the constraints (i.e., dependencies) that

hold on the original relation should be enforceable by means of the

constraints (i.e., dependencies) defined on the decomposed relations.

n What happens to queries?

l Processing time may increase due to joins

l Denormalization

Normal Forms

4NF

BCNF

3NF

2NF

1NF

All relations

Some Basics

n Attributes

l Prime attribute is a member of any key

l Non-prime attribute is any attribute which is not prime

n Full functional dependency

l A FD XY is a full functional dependency if X is minimal, i.e.,

removal of any attribute A from X means the dependency does not

hold anymore.

l Formally - iff for all AX , ( X { A }) Y.

n Partial functional dependency

l Formally - iff for some AX , ( X { A }) Y.

n Transitive dependency

l Formally - XY and YZ and XZ and YX and ZY

X → /

f

Y

X

p

Y

/

Normal Forms Based on FDs

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

First Normal Form (1NF )

1NF eliminates the relations within relations

or relations as attributes of tuples.

eliminate the partial functional

dependencies of non-prime attributes

to key attributes

eliminate the transitive functional

dependencies of non-prime attributes to key attributes

eliminate the partial and transitive

functional dependencies of prime (key)

attributes to key.

Lossless &

Dependency

preserving

Lossless

n All attribute values are atomic

n 1NF relation cannot have an attribute value that

is:

l a set of values (set-value)

l a tuple of values (nested relation)

n This is a standard assumption in relational

DBMSs and in the rest of this section

n In object-oriented DBMSs this assumption is

relaxed.

First Normal Form

n Two possible definitions:

l A relation R ∈2NF iff all non-prime attributes in R are fully functionally dependent on primary key.

l A relation R ∈2NF iff the attributes are either

‡ a candidate key, or

‡ fully dependent on every key.

n Partial functional dependencies cause problems.

n 2NF is only of historical importance, since it is subsumed

by 3NF.

n In the example, EMP-PROJ is not 2NF, we turn it into

2NF by decomposing it:

l EMP(ENO, ENAME, TITLE, SALARY)

l PROJ(PNO,PNAME,BUDGET,MGR)

l ASSIGN(ENO,PNO,DURATION,RESP)

Second Normal Form

3NF – Example

n EMP is not in 3NF because of fd 2

l TITLE → SALARY but TITLE is not a superkey and

SALARY is not prime

l Problem is that ENO transitively determines SALARY

(as well as directly determining it)

n Solution:

fd 1

fd 2

EMP

ENO ENAME TITLE SALARY

EMP

ENO ENAME TITLE TITLE^ SALARY

PAY

fd 1 fd 2

Boyce-Codd Normal Form

n You can still have transitive dependencies in 3NF

if the dependent attribute(s) are prime.

n A 1NF relation scheme R is in BCNF if for every

non-trivial functional dependency XY , X is a

superkey.

n Properties of BCNF

l All non-prime attributes are fully dependent on every

key.

l All prime attributes are fully dependent on the keys that

they do not belong to.

l No attribute is non-trivially dependent on any set of

non-prime attributes.

n Formally: A relation scheme R defined over

U = { A 1

, A 2

, …, A n

} is in BCNF if for all

functional dependencies that hold on R of the

form XA , where XU and AU , at least

one of the following holds :

l XA is a trivial functional dependency

l X is a superkey for R

n No transitive dependencies.

Boyce-Codd Normal Form

BCNF – Example

n Assume the following definition of the PROJECT

relation with:

l Each employee on a project has a unique location and

responsibility with respect to that project, and

l Only one project can be found at each location

n FDs would be

PJNO ENO LOCATION RESP

PROJECT

which makes PROJECT in 3NF but not in BCNF

n Lossless join decomposition:

l If R is decomposed into R 1 , …, Rn , it should be possible

to reconstruct R with no additional (spurious) tuples.

l If a relation scheme R is decomposed into R 1 and R 2 ,

then at least one of the following FDs should be in F

R 1 ∩ R 2 → R 1

R 1 ∩ R 2 → R 2

n Dependency preservation:

l If a relation scheme R is decomposed into R 1 and R 2 ,

then every FD in F that holds on relation R (even the

implied ones) should be guaranteed to hold whenever

the projected dependencies within relations R 1 and R 2

are enforced.

Why These Rules?

Closure of a Set of FDs

n This is most easily done by converting it to the

problem of computing the closure of a set of

attributes.

n For each FD defined on the base relations, pick the

attribute (or set of attributes) that appear on its

left-hand-side

l Find their closure which gives the set of attributes that

are dependent on that attribute

‡ Theorem 1: XYF

iff YComputeX +( X , F ).

‡ Theorem 2: X is a superkey of R iff ComputeX +( X , F ) = R.

l This also gives the set of FDs that can be inferred from

the original FD.

function ComputeX

( X , F )

begin

X

X

while there exists YZF such that

YX

and ZX

then X

X

Z

return ( X

)

end

Closure of a Set of Attributes

Attribute Closure Example

n Let F consist of

l A → B

l C → D, E

l E, G → H

n ComputeX

({C, G}, F )

l Initial: X

= {C, G}

l Iteration 1( C → D, E): X

= {C, G, D, E}

l Iteration 2 (E, G → H): X

= {C, G, D, E, H}

BCNF Decomposition Example

n We start with D = {ENO, ENAME, TITLE, PNO,

PNAME, RESP}

n Iteration 1

l Pick one of the FDs that violate BNCF

l ENO → ENAME, TITLE

n D = { R 1

, R 2

} where

l R 1 (ENO, PNO, PNAME, RESP)

l R 2 (ENO, ENAME, TITLE)

n R 2

is in BCNF, but R 1

is not

BCNF Decomposition Example

n Iteration 2

l D has R 1 which is not in BCNF

l Pick one of the FDs that violate BNCF

l PNO → PNAME

n D = { R 2

, R 3

, R 4

} where

l R 3 (ENO, PNO, RESP)

l R 4 (PNO, PNAME)

n Both relations are in BCNF

n Threfore, replace EMP with R 2

, R 3

, R 4

Complexity of Normalization

n Assume we are given a set of attributes A and a set of FDs

F , and let n = the size of this input (at most O (| A |*| F |)).

l The number of dependencies in F + may be exponential in n.

l A + can be found in linear time.

l Testing whether XY is in F + can be done in linear time.

l Testing whether a decomposition is lossless can be done in linear

time.

l Testing whether a decomposition is dependency preserving can be

done in polynomial time.

l Testing whether a relation scheme is in BCNF is NP-complete.

l There is a quadratic algorithm to find a set of relations over attributes A where

‡ Each is in 3NF

‡ The set preserves all dependencies in F , and

‡ The set correspond to a lossless decomposition of the universal relation covering all of A.