










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of key concepts in database normalization and relational database design. It covers topics such as strong and weak entity types, relationship types, attributes, candidate keys, database schemas, and normalization techniques like 1nf, 2nf, and 3nf. The document also discusses sql statements for querying and manipulating data, including select, join, having, and update. Additionally, it covers topics related to referential integrity, views, and triggers. This information would be useful for students studying database design, relational database management systems, and sql programming in computer science or information systems courses at the university level.
Typology: Summaries
1 / 18
This page cannot be seen from the preview
Don't miss anything!
Name/Topic Relevant Equations, Values, Definitions, and Theorems
Database System
The DBMS software together with the data itself. Sometimes, the applications are also included.
File Based System
Collections of application programs that perform services for the end users. Each program defines and
manages its own data.
System Catalog
Provide descriptions of data to enable program-data independence.
Database
Management System
A software system the enables users to define, create, and maintain the database and provides
controlled access to the database. It is the software that interacts with the users’ application programs
and the database.
Data Definition
Language (DDL)
Permits specification of data types, structures, and any data constraints. All specifications are stored in
the database.
Data Manipulations
Language (DML)
General enquiry facility (query language) of the data. Used for selecting, inserting, deleting, and
updating data in the database.
Database Terminology
Relation: The table itself
Attribute: Column in the table
Tuple: Row in the table
Distributed Database
Logically interrelated collections of shared data (and a description of this data) physically distributed
over a computer network. They are the software systems that permit the management of the
distributed database and makes the distribution transparent to users.
Entity Type
A group of objects with the same properties identified by enterprises as having an independent
existence.
Strong entity types are entity types that are not
existence-dependent on some other entity type.
Weak entity types are existence-dependent on other
entity types.
Entity Occurence
A uniquely identifiable object of an entity type.
Relationship Type
Sets of meaningful associations among entity types.
Relationship
Occurence
Uniquely identifiable associations which include one occurrence from each participating entity type.
Relationship Degree
The number of participating entities in the
relationship.
Two is binary
Three is ternary
Four is quaternary
financial institution
Recursive Relationship
A type of relationship where the same entity type participates in the relationship more than once with
different roles. Relationships may be given role names to indicate the purpose that each participating
entity type plays in a relationship.
Attribute
Properties of entities or relationship types.
Attribute Domain
A set of allowable values for one or more attributes.
Single-values attributes hold a single value for
each occurrence of an entity type.
Multi-valued attributes hold multiple values for
each occurrence of an entity type.
Simple attributes are composed of a single
component with an independent existence.
Composite attributes are composed of multiple
components, each with an independent existence.
Derived Attribute
A value that is derivable from the value of a related attributes, or sets of attributes, not necessarily in
the same entity type.
Candidate Key
Minimal sets of attributes that uniquely identify each occurrence of an entity type.
Primary Key
A candidate key that is selected to uniquely identify each occurrence of an entity type.
Composite Key
A candidate key that consists of two or more attributes.
Constraint
Represents restrictions in the real world. The main type of constraint on relationships is called
multiplicity.
Multiplicity
The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of
an associated entity type through a particular relationship. Multiplicity is made up of two types of
restrictions, cardinality and participation.
Cardinality:
Describes the maximum
number of possible
relationship occurrences
for an entity participating
in a given relationship
type.
Participation:
Determines whether all or
only some entity
occurrences participate in a
relationship.
Fan Trap
When a model represents a relationship between
entity types, but the pathway between certain
entity occurrences is ambiguous. To fix a fan trap,
you can organize the existing relationships.
Degree
The number of attributes (columns) in a relation (table).
Cardinality
The number of tuples (rows) in a relation (table).
Relational Database
A collection of normalized relations with distinct relation names.
Relation Schema
A named relation defined by a set of attribute and domain name pairs.
Delational Database
Schema
A set of relation schemas, each with a distinct name.
Relationship
Properties
a relation) has a distinct name.
from the same domain.
theoretically.
Entity Integrity
No attribute of a primary key can be null
Referential Integrity
If a foreign key exists in a relation, either foreign key value must match a candidate key value of some
tuple in its home relation or the foreign key value must be wholly null.
Rules for Deriving
Relations
Strong Entity Types:
Create a relation that includes all simple attributes
of that entity. For composite attributes, include
only constituent simple attributes.
Weak Entity Types:
Create a relation that includes all simple attributes
of that entity. The primary key is partially or fully
derived from each owner entity.
1:* Binary Relationship Types:
Entity on ‘one side’ is designated the parent entity
and entity on ‘many side’ is the child entity
Post copy of the primary key attribute(s) of parent
entity into relation representing child entity, to act
as a foreign key.
Mandatory Participation on Both Sides of a 1:
Relationship:
Combine entities involved into one relation and
choose one of the primary keys of original entities
to be primary key of new relation, while other (if
one exists) is used as an alternate key
Mandatory Participation on One Side of a 1:
Relationship:
Identify parent and child entities using
participation constraints. The entity with optional
participation is designated the parent entity, and
the other entity is designated as the child entity.
Copy of the primary key of parent is placed in
relation representing child entity. If the
relationship has one or more attributes, these
attributes should follow the posting of the primary
key to the child relation.
Optional Participation on Both Sides of 1:
Relationship:
Designation of the parent and child entities is
arbitrary unless you can find out more about the
relationship. Consider 1:1 Staff Uses Car
relationship with optional participation on both
sides. Assume majority of cars, but not all, are
used by staff and only minority of staff use cars.
Car entity, although optional, is closer to always
being present in the relationship than Staff entity.
Therefore, designate Staff as parent entity and Car
as the child entity.
Superclass/Subclass Relationship Types:
: Binary Relationship Types:
Create relation to represent the relationship and
include any attributes that are part of the
relationship. Post a copy of the primary key
attribute(s) of the entities that participate in
relationship into new relation, to act as foreign
keys. These foreign keys will also form primary key
of new relation, possibly in combination with some
of the attributes of the relationship.
Complex Relationship Types:
Create a relation to represent the relationship and
include any attributes that are part of the
relationship. Post copy of primary key attribute(s)
of entities that participate in the complex
relationship into new relation, to act as foreign
keys. Any foreign keys that represent a ‘many’
relationship (for example, 1.., 0..) generally will
also form the primary key of new relation, possibly
in combination with some of the attributes of the
relationship.
Multi-Values Attributes:
Create new relation to represent multi-valued
attribute and include primary key of entity in new
relation, to act as a foreign key. Unless the multi-
valued attribute is itself an alternate key of the
entity, the primary key of new relation is a
combination of the multi-valued attribute and the
primary key of the entity.
Normalization
A technique for producing a set of relations with desirable properties, given the data requirements of an
enterprise. The relationship between attributes must be optimized.
Data Redundancy
The major aim of relational database design is to group attributes into relations to minimize data
redundancy. The benefits of this are a reduction in the file storage space, and it allows updates to be
achieved with a minimal number of operations, thus reducing the opportunities for inconsistencies.
Update Anomalies
Insertion Anomalies:
In the StaffBranch table below,
you are not able to insert a new
branch without also inserting a
new staff.
Deletion Anomalies:
If you remove SA9 you also are
losing the only record of branch
B007 when using the un-
normalized approach. If you
remove a branch, you will also
lose every instance of staff that
works at that branch in the
StaffBranch table.
Modification Anomalies:
If you modify the address of a
branch that multiple staff work
at, only 1 staff will have the
updated record (unless you
manually modify every row in
the BranchStaff table).
Projection
'),…',
(𝑅): The projection operation works on a
single relation 𝑅 and defines a relation that
contains a vertical subset of 𝑅, extracting the
values of specified attributes and eliminating
duplicates.
Cartesian Product
𝑅 × 𝑆: The cartesian product operation defines a
relation that is the concatenation of every tuple of
relation 𝑅 with every tuple of relation 𝑆.
Cartesian product and selection can be reduced to
a single operation called Join.
Union
(𝑅 ∪ 𝑆): The union of two relations 𝑅 and 𝑆
defines a relation that contains all the tuples of 𝑅,
or 𝑆, or both 𝑅 and 𝑆, with duplicate tuples being
eliminated. 𝑅 and 𝑆 must be union compatible.
Projection may be used to make two relations
union-compatible.
Set Difference
𝑅 − 𝑆: The set difference operation defines a
relation consisting of the tuples that are in relation
𝑅, but not in 𝑆. 𝑅 and 𝑆 must be union compatible.
Intersection
(𝑅 ∩ 𝑆): The intersection operation defines a
relation consisting of the set of all tuples that are
in both 𝑅 and 𝑆. 𝑅 and 𝑆 must be union
compatible.
Rename Operation
The rename operation provides a new name 𝑆 for the expression 𝐸, and optionally names the attributes
as 𝑎
.
,
/('
!
,'
"
,…,'
)
(𝐸) or 𝜌
/
or 𝜌
('
!
,'
"
,…,'
)
Join Operations
𝑅 ⋈ 𝑆 is the natural join operation. It is an
equijoin of the two relations 𝑅 and 𝑆 over all
common attributes 𝑥. One occurrence of each
common attribute is eliminated from the result. In
contrast, equijoin doesn’t eliminate attributes.
2
𝑆 defines a relation that contains the tuples
of 𝑅 that participate in the join of 𝑅 with 𝑆
satisfying the predicate 𝐹. We can rewrite semijoin
using a projection and a join:
2
3
2
Use outer join to display rows in the result that
don’t have matching values in the join column.
4
𝑆 denotes a left outer join in which tuples
from 𝑅 that don’t have matching values in
common columns of 𝑆 are also included in the
resulting relations.
2
𝑆 is equivalent to 𝜎
2
. The theta join operation defines a relation that contains tuples
satisfying the predicate 𝐹 from the cartesian product of 𝑅 and 𝑆. The predicate 𝐹 is of the form 𝑅. As
with cartesian product, the degree of a theta join is the sum of the degrees of the operand relations 𝑅
and 𝑆. If a predicate 𝐹 contains only equality (=) the term equijoin is used.
Division
𝑅 ÷ 𝑆 defines a relation over the attributes 𝐶 that
consist of a set of tuples from 𝑅 that match
combinations of every tuple in 𝑆.
Aggregation
In extended relational algebra there is also aggregation. ℱ
35
(𝑅) applies the aggregate function list 𝐴𝐿 to
the relation 𝑅 to define a relation over the aggregate list. 𝐴𝐿 contains one or more (aggregate function,
attribute) pairs. The main aggregate functions are COUNT, SUM, AVG, MIN, and MAX.
Grouping
Extended relational algebra also has grouping. 𝒢
63
35
(𝑅) groups the tuples of relation 𝑅 by the grouping
attributes 𝐺𝐴 and then applies the aggregate function list 𝐴𝐿 to define a new relation. 𝐴𝐿 contains one
or more (aggregate function, attribute) pairs. The resulting relation contains the grouping attributes 𝐺𝐴
along with the results of each of the aggregate functions.
SQL is a transform-oriented language with 2 major components:
Most components of an SQL statement are case insensitive, except for literal character data.
Literals are constants used in SQL statements. All non-numeric literals must be enclosed in single quotes
(e.g. 'London'). All numeric literals must not be enclosed in quotes (e.g. 650.00).
Backus-Naur Form
Upper-case letters represent reserved words. Lower-case letters represent user-defined words.
| indicates a choice among alternatives Curly braces indicate a required element
Square brackets indicate an optional element (…) indicates optional repetition (0 or more)
SELECT Statements
SELECT specifies which columns are to appear in the output
FROM specifies table(s) to be used
WHERE filters rows
GROUP BY forms groups of rows with same column value
HAVING filters groups subject to some condition
ORDER BY specifies the order of the output
Outer Joins
If one row of a joined table is unmatched, the row is omitted from result table. Outer join operations
retain rows that do not satisfy the join condition.
Left, Right, and Full outer joins are supported in SQL.
Unmatched columns are filled with NULLs.
EXIST and NOT EXIST
EXISTS and NOT EXISTS are for use only with subqueries. They produce a simple true/false result.
EXISTS is true if and only if there exists at least one row in the result table returned by subquery. It
returns false if subquery returns an empty result table.
and EXCEPT
UNION of two tables, A and B, results in a table containing all rows in either A or B or both.
Intersection (INTERCEPT) results in a table containing all rows common to both A and B.
Difference (EXCEPT) results in a table containing all rows in A but not in B.
The format of set operator clause in each case is: op [ALL] [CORRESPONDING [BY {column1 [, ...]}]]
The two tables must be union compatible to perform the above operations.
If CORRESPONDING BY is specified, set operation is performed on the named column(s).
If CORRESPONDING is specified without the BY clause, set operation is performed on common columns.
If ALL is specified, the result can include duplicate rows.
Insert a new row into a relation.
INSERT INTO TableName [ (columnList) ]
VALUES (dataValueList)
columnList is optional, if omitted, SQL assumes a list of all
columns in their original CREATE TABLE order.
Number of items in each list must be the same.
Must be direct correspondence in position of items in two
lists.
Data type of each item in dataValueList must be
compatible with data type of the corresponding column.
Update an existing row in a relation.
UPDATE TableName
SET columnName1 = dataValue
[, columnName2 = dataValue2...]
[WHERE searchCondition]
TableName can be a name of a base table or an updatable
view.
SET clause specifies names of one or more columns that
are to be updated.
WHERE clause is optional, if omitted, named columns are
updated for all rows in table. If the WHERE clause is
specified, only those rows that satisfy searchCondition are
updated.
New dataValue(s) must be compatible with data type for
the corresponding column.
Delete an existing row in a relation.
TableName can be name of a base table or an updatable
view.
searchCondition is optional; if omitted, all rows are
deleted from the table. This does not delete the table.
Integrity Enhancement
Feature (IEF)
The purpose of the integrity enhancement feature (IEF) is to allow constraint checking to be centralized
and standardized. It protects databases from becoming inconsistent.
Required Data:
In the CREATE and ALTER table statements, add
NOT NULL after the data type.
Position VARCHAR(10) NOT NULL
Domain Constraints:
In the CREATE and ALTER table statements, include
CHECK clause which can reference only the column
being defined.
ISO standard allows the domain to be defined
more explicitly using CREATE DOMAIN.
Domains can be removed using DROP DOMAIN. If
RESTRICT is specified and the domain is used in an
existing table, view, or assertion definition, the
drop will fail. If CASCADE is specified, the table
column is changed to use the domain’s underlying
data type and any constraint or default clause is
replaced by column constraints or default clauses.
CHECK (searchCondition)
CREATE DOMAIN DomainName [As] datatype
[DEFAULT defaultOption]
[CHECK (searchCondition)]
DROP DOMAIN DomainName
Entity Integrity:
The primary key of a table must contain a unique,
non-null value for each row. ISO standard supports
PRIMARY KEY clause in CREATE and ALTER TABLE
statements.
You can only have one PRIMARY KEY clause per
table, but you can still ensure uniqueness for
alternate keys using UNIQUE.
clientNo VARCHAR(5) NOT NULL,
propertyNO VARCHAR(5) NOT NULL,
UNIQUE(clientNo, propertyNo)
Referential Integrity:
Foreign keys are a column or set of columns that links each row in the child table containing the foreign
key to the row of the parent table containing the matching candidate key. Referential integrity means
that if a foreign key contains a value, that value must refer to an existing row in a parent table.
Actions that attempt to update/delete a candidate key value in the parent table with matching rows in
the child table is dependent on the referential action specified using ON UPDATE and ON DELETE
subclauses of the foreign key.
Delete the row from the parent and delete the
matching rows in the child, and so on in cascading
manner.
Delete the row from the parent and set each
component of FK in the child to the specified
default. Only valid if DEFAULT specified for FK
columns.
Reject delete from the parent. This is the default
setting.
Delete the row from the parent and set the FK
column(s) in the child to NULL.
The WITH CHECK OPTION clause ensures that if a
row fails to satisfy the WHERE clause of defining
query, it is not added to underlying base table.
CREATE VIEW officeStaff AS subselect
FROM Staff
WHERE branchNo = ‘B003’;
DROP VIEW ViewName [RESTRICT | CASCADE]
LOCAL and CASCADED determine the scope of the
check testing.
LOCAL checks only the view being defined whereas
CASCADED checks the underlying views.
You can use DROP VIEW to drop a view.
With CASCADE, all related dependent objects are
deleted; i.e. any views defined on the view being
dropped.
With RESTRICT (default), if any other objects
depend for their existence on the continued
existence of the view being dropped, the
command is rejected.
A view is updatable if and only if:
expression, or aggregate function) and no column appears more than once
or difference
If a column in the view is based on an aggregate function, the column may appear only in SELECT and
ORDER BY clauses of queries that access the view, and the column may not be used in WHERE nor be an
argument to an aggregate function in any query based on the view. Additionally, grouped views may
never be joined with a base table or a view.
The WITH CHECK OPTION clause enforces constraints on the database and helps preserve database
integrity. It can only be used for updatable views.
New rows appear within a view when insert/update on the view causes them to satisfy the WHERE
condition. Rows that enter or leave a view are called migrating rows.
WITH CHECK OPTION prohibits a row migrating out of the view. Also, when an INSERT or UPDATE on the
view violates the WHERE condition of the defining query, the operation is rejected.
Indexes are a structure that provides accelerated
access to the rows of a table based on the values
of one or more columns. They can significantly
improve the performance of a query.
CREATE [UNIQUE] INDEX IndexName
ON TableName (columnName [ASC| DESC] [,..])
DROP INDEX IndexName
You can drop indexes using the DROP INDEX
clause.
View Materialization
Storing a view as a temporary table when a view is first queried. Thereafter, queries based on
materialized view can be faster than re-computing the view each time.
With a regular view, operations are done for every query. With a materialized view, the operations are
done when the view is created or refreshed only (a materialized view does not contain up-to-the-minute
information).
Materialized views consume space because a copy of the data or at least of information derivable from
the data is kept.
View Resolution
Any operations on a view are automatically translated into operations on relations from which it is
derived.
View resolution mechanism may be slow, particularly if the view is accessed frequently.
Access Control
Authorization identifier is a normal SQL identifier used to establish the identity of a user. Usually has an
associated password.
Used to determine which database objects a user may reference and what operations may be
performed on those objects.
Each object created in SQL has an owner, as defined in the AUTHORIZATION clause of schema to which
object belongs.
Privileges
Actions a user is permitted to carry out on a given
base table or view can be restricted.
GRANT {PrivilegeList | ALL PRIVILEGES}
ON ObjectName
TO {AuthorizationIdList | PUBLIC}
The owner of table must grant other users the
necessary privileges using the GRANT statement.
You can only grant privileges you have access to
grant. You can only revoke privileges you have
granted.
To create a view, the user must have SELECT
privilege on all tables that make up the view and
REFERENCES privilege on the named columns.
PUBLIC allows access to be granted to all present
and future authorized users. ObjectName can be a
base table, view, character set…
WITH GRANT OPTION allows privileges to be
passed on.
REVOKE takes away privileges granted with
{PrivilegeList | ALL PRIVILEGES}
ON ObjectName
FROM {AuthorizationIdList | PUBLIC}
ALL PRIVILEGES refers to all privileges granted to a
user by a user revoking privileges.
GRANT OPTION FOR allows privileges passed on
via WITH GRANT OPTION of GRANT to be revoked
separately from the privileges themselves.
REVOKE fails if it results in an abandoned object,
such as a view, unless the CASCADE keyword has
been specified. Privileges granted to this user by
other users are not affected. CASCASE will also
revoke the access to users which was granted by
the user who is having their privileges revoked.
Constraint may be defined as INITIALLY
IMMEDIATE or INITIALLY DEFERRED, indicating
mode the constraint assumes at start of each
transaction.
{ALL | constraintName [,... ]}
In INITIALLY IMMEDIATE, also possible to specify
whether the mode can be changed subsequently
using qualifier [NOT] DEFERRABLE. The default
mode is INITIALLY IMMEDIATE.
Triggers
Triggers define an action that the database takes when some events occur in the application.
Triggers are executed whenever the triggering event occurs.
They are stored like procedures in the database and called implicitly when the event occurs.
The purpose of triggers is to validate input data and maintain complex integrity constraint (impossible
through table constraints).
Triggering events include insertion, deletion, and update of rows in a table. They can also be set to cover
specific named columns of a table (update).
Statement type:
What causes the trigger to fire (Cause)
CREATE TRIGGER TriggerName
BEFORE|AFTER|INSTEAD OF
ON
[REFERENCING
[WHEN (triggerCondition)]
Timing:
BEFORE, AFTER, INSTEAD OF event
occurs
executing the original SQL statement
Level:
STATEMENT or ROW
that is affected
regardless of how many rows are affected
(default)
statements (COMMIT or ROLLBACK), SQL schema
definition or manipulation statements (e.g.
creation/ deletion of tables, user-defined types,
other triggers).
The
an old or new row (OLD/NEW or OLD ROW / NEW
ROW) in case of a row-level trigger (REFERENCING
NEW ROW AS alias).
You can only create triggers on tables that you own. You must have the CREATE TRIGGER system
privilege. You cannot execute a COMMIT command in a trigger.
INSTEAD-OF triggers fire when a user issues a DML command associated with a complex view.
Syntax to drop a trigger is DROP TRIGGER <trigger_name>;
Syntax to enable or disable a trigger is ALTER TRIGGER <trigger_name> [ENABLE | DISABLE];
As more than one trigger can be defined on a table, order of firing is important. The following order is
observed:
o Execute any BEFORE row-level trigger.
o Execute the statement itself.
o Apply any referential constraints.
o Execute any AFTER row-level trigger.
SQL vs NoSQL
Traditional RDBMSs encounter challenges in handling Big Data. NoSQL and NewSQL data stores are
alternatives that can handle this huge volume of data and provide the scalability.
MySQL is still best for applications with complex queries or If you have strict rules for the applications
data. While NoSQL is fast, that doesn’t mean SQL is slow. However, if you anticipate minimal changes in
the data structure, there’s no reason to use SQL.
NoSQL is also very good for the cloud.
NoSQL is designed to scale outwards horizontally (by using many separate pieces of hardware) as
opposed to SQL which is designed to scale upwards (by improving hardware). SQL can still scale
horizontally, but it’s not common.
Basically Available:
The data store is available all the
time whenever it is accessed,
even if parts of it are
unavailable.
Soft-State:
It does not need to be consistent
always and can tolerate
inconsistency for a certain time
period..
Eventually Consistent:
After a certain time period, the
data store comes to a consistent
state
CAP Theorem
Only two of three CAP properties (consistency, availability, and partition tolerance) can be satisfied by
networked shared-data systems at the same time.
Consistency:
equivalent to having a single up-
to-date instance of the data.
Consistency in CAP is not the
same as consistency in ACID.
Availability:
the data is available to serve
requests at the moment it is
needed.
Partition Tolerance:
the capacity of the networked
shared-data system to tolerate
network partitions.
DBMS are typically CA systems whereas NoSQL are typically AP systems.
Scaling
There are three scaling dimensions:
There are four scaling characteristics: