









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of er-to-relational mapping and database normalization. It covers the principles of improving relations, avoiding information repetition, and dealing with anomalies. The document also introduces the concepts of normal forms, functional dependencies, and superkeys. It provides examples and explanations to help understand these concepts.
Typology: Slides
1 / 17
This page cannot be seen from the preview
Don't miss anything!
Design Process - Where are we?
Conceptual
Design
Conceptual Schema (ER Model)
Logical
Design
Logical Schema (Relational Model)
Step 1: ER-to-Relational
Mapping
Step 2: Normalization:
“Improving” the design
n Relations should have semantic unity
n Information repetition should be avoided
l Anomalies: insertion, deletion, modification
n Avoid null values as much as possible
l Difficulties with interpretation
‡ don’t know, don’t care, known but unavailable,
does not apply
l Specification of joins
n Avoid spurious joins
Relational Design Principles
Bad Design
EMP-PROJ
Employee No Employee Name Salary Title
… Project No
Duration
Resp
Project Name
Budget
J. Doe Elect. Eng. 40000 M. Smith 34000 M. Smith
Analyst Analyst 34000 A. Lee Mech. Eng. 27000 A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. 34000 L. Chu Elect. Eng. 40000 R. Davis Mech. Eng. 27000
E8 J. Jones Syst. Anal.^34000
P1 12 Manager P1 Analyst P2 6 Analyst P3 10 Consultant P4 48 Engineer P2 18 Programmer P2 24 Manager P4 48 Manager P3 36 Engineer P3 40 Manager
≈
≈
Instrumentation 150000 Instrumentation 150000 Database Develop. 135000
Database Develop. 135000 Database Develop. 135000
Maintenance 310000
Maintenance 310000
n The TITLE, SALARY, BUDGET attribute values are
repeated for each project that the engineer is involved in.
l Waste of space
l Complicates updates
Information Repetition
J. Doe Elect. Eng. 40000 M. Smith 34000 M. Smith
Analyst Analyst 34000 A. Lee Mech. Eng. 27000 A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. 34000 L. Chu Elect. Eng. 40000 R. Davis Mech. Eng. 27000
E8 J. Jones Syst. Anal.^34000
P1 12 Manager P1 Analyst P2 6 Analyst P3 10 Consultant P4 48 Engineer P2 18 Programmer P2 24 Manager P4 48 Manager P3 36 Engineer P3 40 Manager
≈
≈
Instrumentation 150000 Instrumentation 150000 Database Develop. 135000
Database Develop. 135000 Database Develop. 135000
Maintenance 310000
Maintenance 310000
This example instance of EMP -PROJ relation violates one of the constraints in our earlier design. Which one?
n If any attribute of project (say BUDGET of P1) is
modified, all the tuples for all employees who
work on that project need to be modified.
Modification Anomaly
J. Doe Elect. Eng. 40000 M. Smith 34000 M. Smith
Analyst Analyst 34000 A. Lee Mech. Eng. 27000 A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. 34000 L. Chu Elect. Eng. 40000 R. Davis Mech. Eng. 27000
E8 J. Jones Syst. Anal. 34000
P1 12 Manager P1 Analyst P2 6 Analyst P3 10 Consultant P4 48 Engineer P2 18 Programmer P2 24 Manager P4 48 Manager P3 36 Engineer P3 40 Manager
≈
≈
Instrumentation 150000 Instrumentation 150000 Database Develop. 135000
Database Develop. 135000 Database Develop. 135000
Maintenance 310000
Maintenance 310000
≈
What to do?
n Take each relation individually and “improve” it in terms
of the desired characteristics
l Normal forms
‡ Atomic values (1NF)
‡ Can be defined according to keys and dependencies.
‡ Functional Dependencies ( 2NF, 3NF, BCNF)
‡ Multivalued dependencies (4NF)
l Normalization
‡ Normalization is a process of concept separation which applies a top - down methodology for producing a schema by subsequent
refinements and decompositions.
‡ Do not combine unrelated sets of facts in one table; each relation
should contain an independent set of facts.
‡ Universal relation assumption
‡ 1NF to 3NF; 1NF to BCNF
Normalization Issues
n How do we decompose a schema into a desirable normal
form?
n What criteria should the decomposed schemas follow in order
to preserve the semantics of the original schema?
l Reconstructability: recover the original relation ⇒ no spurious joins
l Lossless decomposition: no information loss
l Dependency preservation: the constraints (i.e., dependencies) that
hold on the original relation should be enforceable by means of the
constraints (i.e., dependencies) defined on the decomposed relations.
n What happens to queries?
l Processing time may increase due to joins
l Denormalization
Normal Forms
…
4NF
BCNF
3NF
2NF
1NF
All relations
Some Basics
n Attributes
l Prime attribute is a member of any key
l Non-prime attribute is any attribute which is not prime
n Full functional dependency
l A FD X → Y is a full functional dependency if X is minimal, i.e.,
removal of any attribute A from X means the dependency does not
hold anymore.
l Formally - iff for all A ∈ X , ( X − { A }) → Y.
n Partial functional dependency
l Formally - iff for some A ∈ X , ( X − { A }) → Y.
n Transitive dependency
l Formally - X → Y and Y → Z and X → Z and Y → X and Z ⊄ Y
X → /
f
Y
X →
p
Y
/
Normal Forms Based on FDs
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
First Normal Form (1NF )
1NF eliminates the relations within relations
or relations as attributes of tuples.
eliminate the partial functional
dependencies of non-prime attributes
to key attributes
eliminate the transitive functional
dependencies of non-prime attributes to key attributes
eliminate the partial and transitive
functional dependencies of prime (key)
attributes to key.
Lossless &
Dependency
preserving
Lossless
n All attribute values are atomic
n 1NF relation cannot have an attribute value that
is:
l a set of values (set-value)
l a tuple of values (nested relation)
n This is a standard assumption in relational
DBMSs and in the rest of this section
n In object-oriented DBMSs this assumption is
relaxed.
First Normal Form
n Two possible definitions:
l A relation R ∈2NF iff all non-prime attributes in R are fully functionally dependent on primary key.
l A relation R ∈2NF iff the attributes are either
‡ a candidate key, or
‡ fully dependent on every key.
n Partial functional dependencies cause problems.
n 2NF is only of historical importance, since it is subsumed
by 3NF.
n In the example, EMP-PROJ is not 2NF, we turn it into
2NF by decomposing it:
l EMP(ENO, ENAME, TITLE, SALARY)
l PROJ(PNO,PNAME,BUDGET,MGR)
l ASSIGN(ENO,PNO,DURATION,RESP)
Second Normal Form
3NF – Example
n EMP is not in 3NF because of fd 2
l TITLE → SALARY but TITLE is not a superkey and
SALARY is not prime
l Problem is that ENO transitively determines SALARY
(as well as directly determining it)
n Solution:
fd 1
fd 2
EMP
EMP
PAY
fd 1 fd 2
Boyce-Codd Normal Form
n You can still have transitive dependencies in 3NF
if the dependent attribute(s) are prime.
n A 1NF relation scheme R is in BCNF if for every
non-trivial functional dependency X → Y , X is a
superkey.
n Properties of BCNF
l All non-prime attributes are fully dependent on every
key.
l All prime attributes are fully dependent on the keys that
they do not belong to.
l No attribute is non-trivially dependent on any set of
non-prime attributes.
n Formally: A relation scheme R defined over
U = { A 1
, A 2
, …, A n
} is in BCNF if for all
functional dependencies that hold on R of the
form X → A , where X ⊆ U and A ⊆ U , at least
one of the following holds :
l X → A is a trivial functional dependency
l X is a superkey for R
n No transitive dependencies.
Boyce-Codd Normal Form
BCNF – Example
n Assume the following definition of the PROJECT
relation with:
l Each employee on a project has a unique location and
responsibility with respect to that project, and
l Only one project can be found at each location
n FDs would be
PROJECT
which makes PROJECT in 3NF but not in BCNF
n Lossless join decomposition:
l If R is decomposed into R 1 , …, Rn , it should be possible
to reconstruct R with no additional (spurious) tuples.
l If a relation scheme R is decomposed into R 1 and R 2 ,
then at least one of the following FDs should be in F
‡ R 1 ∩ R 2 → R 1
‡ R 1 ∩ R 2 → R 2
n Dependency preservation:
l If a relation scheme R is decomposed into R 1 and R 2 ,
then every FD in F that holds on relation R (even the
implied ones) should be guaranteed to hold whenever
the projected dependencies within relations R 1 and R 2
are enforced.
Why These Rules?
Closure of a Set of FDs
n This is most easily done by converting it to the
problem of computing the closure of a set of
attributes.
n For each FD defined on the base relations, pick the
attribute (or set of attributes) that appear on its
left-hand-side
l Find their closure which gives the set of attributes that
are dependent on that attribute
‡ Theorem 1: X → Y ∈ F
iff Y ⊆ ComputeX +( X , F ).
‡ Theorem 2: X is a superkey of R iff ComputeX +( X , F ) = R.
l This also gives the set of FDs that can be inferred from
the original FD.
function ComputeX
( X , F )
begin
X
← X
while there exists Y → Z ∈ F such that
Y ⊆ X
and Z ⊆ X
then X
← X
∪ Z
return ( X
)
end
Closure of a Set of Attributes
Attribute Closure Example
n Let F consist of
l A → B
l C → D, E
l E, G → H
n ComputeX
({C, G}, F )
l Initial: X
= {C, G}
l Iteration 1( C → D, E): X
= {C, G, D, E}
l Iteration 2 (E, G → H): X
= {C, G, D, E, H}
BCNF Decomposition Example
n We start with D = {ENO, ENAME, TITLE, PNO,
PNAME, RESP}
n Iteration 1
l Pick one of the FDs that violate BNCF
l ENO → ENAME, TITLE
n D = { R 1
, R 2
} where
l R 1 (ENO, PNO, PNAME, RESP)
l R 2 (ENO, ENAME, TITLE)
n R 2
is in BCNF, but R 1
is not
BCNF Decomposition Example
n Iteration 2
l D has R 1 which is not in BCNF
l Pick one of the FDs that violate BNCF
l PNO → PNAME
n D = { R 2
, R 3
, R 4
} where
l R 3 (ENO, PNO, RESP)
l R 4 (PNO, PNAME)
n Both relations are in BCNF
n Threfore, replace EMP with R 2
, R 3
, R 4
Complexity of Normalization
n Assume we are given a set of attributes A and a set of FDs
F , and let n = the size of this input (at most O (| A |*| F |)).
l The number of dependencies in F + may be exponential in n.
l A + can be found in linear time.
l Testing whether X → Y is in F + can be done in linear time.
l Testing whether a decomposition is lossless can be done in linear
time.
l Testing whether a decomposition is dependency preserving can be
done in polynomial time.
l Testing whether a relation scheme is in BCNF is NP-complete.
l There is a quadratic algorithm to find a set of relations over attributes A where
‡ Each is in 3NF
‡ The set preserves all dependencies in F , and
‡ The set correspond to a lossless decomposition of the universal relation covering all of A.