9 분 소요

image


Main Issues:

  • What problems are caused by redundancy?
  • What is criteria for good relation?
  • What is functional dependency?
    • Concepts of FDs
    • Inference Rules for FDs
    • Closure Property
    • Relationship : FD and Key
  • What is normalization?
    • First Normal Form (1NF)
    • Second Normal Form (2NF)
    • Third Normal Form (3NF)
    • Boyce Codd Normal Form (BCNF)


Relational Design : Normalization

image


Problems : Bad Relational Schema

  • For a “badly” designed relation, the following problems occur:
    • Redundancy: The same information is repeated.
    • Insert Anomaly: We can not insert new information.
    • Delete Anomaly: We may lose unwanted information.
    • Update Anomaly: We need to change it in many places.


“Bad” Relation: Example

image


Problems

  • Redundancy: Each department information is repeated as the number of employees work for it.
  • Insert Anomaly: Suppose that we insert new department; Then, we can not insert that department if some employee is not assigned to it. Why?
  • Delete Anomaly: Suppose that we delete some department; Then, we may lose all employee information working for it. What if we delete last employee working for department?
  • Update Anomaly: Suppose that we change the name of some department; Then, it may cause this update to be made for every employee working for it.


Solution : Decomposition

  • We decompose Bad relation as follows:

image


  • Are these two relations good?
  • Can we recover original information exactly?

  • Those two relations are good because:
  • No Redundancy : Each department information occurs only once.
  • No Insert Anomaly : We can insert new department even though no employee is assigned to it.
  • No Delete Anomaly : Even if we delete some department, we still can keep all the employees working for it.
  • No Update Anomaly : We can change the name of department only once.


Other “Bad” Relation: Example

image


Guidelines : Designing Relational Schema

  • Guideline 1: Design a relational schema with good information; In good relations, there are no redundancy and no insertion, deletion, update anomalies.
  • Guideline 2: Design a relational schema so that they can be joined with appropriate joining pair attributes (primary key, foreign key). Otherwise, the join results may contain spurious tuples.
  • Guideline 3: (If possible) avoid placing attributes in a relation whose values may frequently be NULL.


Functional Dependency (FD)

  • A Functional dependency (FD) is used to specify formal measures of the “goodness” of relational designs.
  • FDs are constraints that are derived from the meaning and inter-relationships of the attributes.
  • FDs are derived from the real-world constraints on the attributes.
  • FDs and keys are used to define normal forms for relations.


  • A FD XY in a relation R is defined as: For any two tuples t1 and t2 in R, if t1[X] = t2[X] holds, then t1[Y] = t2[Y] also holds where X, Y : a set of attributes in R.
  • Whenever two tuples have the same value for X, they also must have the same value for Y.
  • Value(s) of X uniquely (functionally) decides the value(s) of Y.
  • If X → Y holds in R, this does not say that Y → X holds.
  • If every tuple for X has distinct value(s), then X can decide any attribute(s) in R; Why?


FD : Example

  • Given a tuple and the value(s) of X, we can decide the corresponding value of Y .

  • R = (A, B, C, D, E)

    • FD : (1) A → B
    • (2) C → D


image


  • R = (A, B, C, D, E)
    • FD : (1) A → {B, D}
    • (2) {B, C} → E


image


FD : 실제 예 (1)

  • FD는 한 relation에서 실세계의 제약조건을 반영. DB Designer는 이러한 제약조건을 FD로 변환하여 정의;

STUDENT(ID, club, prof#, course#, major)

  • Each student joins only one club.
    • ID → club
  • Each student has only one major.
    • ID → major
  • Each professor teaches only one course.
    • prof# → course#


  • FD는 relation 안의 모든 tuple들이 항상 만족해야 하는 조건.


image


  • 만약 새로운 tuple <Eve, Dance, John, OS, . . >을 insert 한다면?

  • (참고로) 여기서 Key는 무엇인지?


FD : 실제 예 (2)

Student(S#, C#, sname, age, cname, credit, grade, D#, dname, office)

  • Each student has only one his/her name and age.
    • S# → {sname, age}
  • Each course has only one its name and credit.
    • C# → {cname, credit}
  • Each student belongs to only one department.
    • S# → D#
  • Each department has only one its name and office.
    • D# → {dname, office}
  • For each student and his/her taken course, only one grade exists.
    • {S#, C#} → grade


FD : 실제 예 (3)

  • 다음의 각 FD가 만족되는지 검증하라.

image


1) SSN → name ? answer : __ 2) position → phone ? answer : __ 3) phone → position ? answer : ___ 4) {position, phone} → phone ? answer : __


Normalization


History

  • E. F. Codd : English Computer Scientist; While working for IBM, he first proposed the process of normalization and established normal forms: 1NF, 2NF, 3NF, BCNF; “ A Relational Model of Data for Large Shared Data Banks”
  • “There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition, nonsimple domains are replaced by ”domains whose elements are atomic (= non-decomposable) values.”


Normalization

  • Normalization:
    • Process of decomposing “bad“ relations into “smaller”, but “good”(no redundancy, no anomalies) relations
  • Normal forms:
    • First Normal Form (1NF)
    • Second Normal Form (2NF)
    • Third Normal Form (3NF)
    • Boyce Codd Normal Form (BCNF)
    • Fourth Normal Form (4NF)
  • 2NF, 3NF, BCNF: based on keys and FDs.
  • 4NF: based on keys and MVDs.


First Normal Form (1NF)

  • A relation R is 1NF if
    • each tuple must be identified by primary key;
    • each attribute must have atomic (= only one) value;
  • In other words, R does not allow
    • multi-valued attribute
    • composite attribute c) Combination of a) and b): “Nested relation”


  • Consider a PERSON relation; This is not in 1NF because {phones} is multi-valued attribute.

image


  • There are 3 methods to normalize into 1NF.


Convert into 1NF : Method A

  • Method A: Expand the key so that there is a separate tuple for each phone number;

image


  • The following problems may occur;
    • Primary key is changed as {SSN, phones}.
    • Redundancy: Same information(SSN, name, age) is repeated.
    • Anomalies may occur.


Convert into 1NF : Method B

  • Method B; If maximum number of phone values(say, 3) is known, replace phone attribute by 3 atomic attributes.

image


  • The following problems may occur;
    • Many null values exist.
    • Spurious semantics about ordering among phone values.
    • Hard to query.
    • Hard to redesign ; What if few more phones are added?


Convert into 1NF : Method C

  • Method C; Remove phones attribute that violates 1NF and place it in a separate relation; Decompose it!

image


  • Method C is the best; Completely general!
    • No redundancy
    • No anomalies
    • Easy to query.
    • Easy to redesign


More than 1 Multivalued Attributes

  • Consider a PERSON relation; It has 2 multivalued attributes;
    • PERSON(SSN, {hobbies}, {phones})
  • Suppose we use method A;
    • PERSON(SSN, hobbies, phones)
    • Key is a {all attributes};
    • Lots of redundancy! All possible of combination of values for every ID.
  • We recommend Method C;
    • PERSON1(SSN, hobbies)
    • PERSON2(SSN, phones)


Nested Relations

  • Consider STUDENT relation; This is not in 1NF because {transcript} is multi-valued/composite attribute; That means “Nested Relation (= Relation within relation)”!

image


  • Remove nested relation transcript and put it in separated relation;
    • STUDENT(ID, name)
    • TRANSCRIPT(ID, course, year, grade)


Normalization : 기본 개념

  • (지금부터) 모든 relation은 1NF라고 가정함.
  • Redundancy와 Anomaly 현상은 “bad” FD들이 존재할 때 발생함;
  • 이러한 “bad” FD들을 단계별로 찾아내서 소거하는 것이 정규화의 기본 개념. 다음은 전형적인“bad” FD들;
    • Key에 부분 종속되는 FD들을 소거 : 2NF
    • Key에 이행 종속되는 FD 들을 소거 : 3NF
    • Super Key가 아닌 것에 종속되는 FD들을 소거 : BCNF


Partial Dependency (부분 종속)

  • An attribute that is a member of some key is called a prime attribute. An attribute that is not a member of any key is non-prime attribute.
  • Given FD X → Y, Y is partially dependent on X if there exists FD such that Y is dependent of a proper subset of X.


image


Second Normal Form (2NF)

  • A relation R is 2NF if any non-prime attribute is not partially dependent of any key.

image


2NF : Example

image


  • Is this relation in 2NF?
  • No! Why? Non-prime attribute ‘ename’ is partially dependent on key ‘{SSN, PNO}’ because {SSN, PNO} → ename also holds. and 2) SSN → ename is given,
  • Each of non-prime attributes {pname, location} is partially dependent on key {SSN, PNO} because {SSN, PNO} → pname, {SSN, PNO} → location also holds and 3) PNO → {pname, location} is given


image


Transitive Dependency (이행종속)

  • Given FD X → Z, Z is transitive dependent on X if there exist FDs such that X → Y and Y → Z

image


Third Normal Form (3NF)

  • A relation R is 3NF if any non-prime attribute is not transitive dependent of any key.

image


  • A relation R is 3NF if for every FD : X → A,
    • (1) X is a super key,
      • (= X contains any key)
    • (2) A is a prime attribute
      • (= A is a member of some key)


3NF : Example

image


  • Is this relation 3NF?
  • No! Why? Non-prime attribute ‘dname’ is transitive dependent on key ‘SSN’ because SSN → dname also holds.
  • In other words, in 2) DNO → dname, ‘DNO’ is not a super key and also, ‘dname’ is not a prime attribute;


  • We decompose as follows:

image


Normalization : Exercise

Normalization : Exercise (1)

image


  • What is key? Key = {ID, course}
  • This relation is not in 2NF because non-prime attribute {dept} is partially dependent on key {ID, course}
  • We have the following problems: Explain!
    • Redundancy
    • Insert Anomaly
    • Delete Anomaly
    • Update Anomaly


Normalization : Exercise (2)

  • We decompose STUDENT as follows:

image


  • In both relations, no more problems in 2NF; Explain!
    • No Redundancy
    • No Insert Anomaly
    • No Delete Anomaly
    • No Update Anomaly


Normalization : Exercise (3)

image


  • This relation is not in 3NF because non-prime attribute ‘college’ is transitive dependent on key = {ID};
  • In other words, in 2) dept → college, ‘dept’ is not a super key; also, ‘college’ is not a prime attribute.
  • We still have the following problems: Explain!
    • Redundancy
    • Insert Anomaly
    • Delete Anomaly
    • Update Anomaly


Normalization : Exercise (4)

  • We decompose STUDENT2 as follows:

image


  • In the both relations, no more problems; Explain!
    • No Redundancy
    • No Insert Anomaly
    • No Delete Anomaly
    • No Update Anomaly


Boyce-Codd Normal Form (BCNF)

  • A relation R is in BCNF if for every FD : X → A, X is a super key
  • In other words, all FDs that hold over R satisfy super key constraints

image


BCNF : Example

image


  • There are 2 keys; {student, course}, {student, prof}
  • This relation is in 3NF, but not BCNF; Why?
    • In 2) prof → course, ‘prof’ is not a super key;
  • We still have redundancy problem: The same ‘course’ name is repeated many times. And any anomalies?


image


  • We decompose as follows:

image


  • In both BCNF relations, there exists no more any redundancy and no more anomaly problems.


Why BCNF?

  • Every BCNF guarantees no redundancy and no anomalies caused by FDs. – Since X is a super key, the two tuples must be identical (y1 = y2, z1 = z2); But this is not allowed. Why?

image


  • Suppose R is a binary relation; R = (A, B)
    • Does R satisfy 2NF?
    • Does R satisfy 3NF?
    • Does R satisfy BCNF?


  • Normalization : Exercise
    • Consider the following CAR relation;

image


  • Decompose the above CAR relation with
    • no redundancy
    • no insert anomaly
    • no delete anomaly
    • no update anomaly


댓글남기기