[Database] Normalization
Main Issues:
- What problems are caused by redundancy?
- What is criteria for good relation?
- What is functional dependency?
- Concepts of FDs
- Inference Rules for FDs
- Closure Property
- Relationship : FD and Key
- What is normalization?
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce Codd Normal Form (BCNF)
Relational Design : Normalization
Problems : Bad Relational Schema
- For a “badly” designed relation, the following problems occur:
- Redundancy: The same information is repeated.
- Insert Anomaly: We can not insert new information.
- Delete Anomaly: We may lose unwanted information.
- Update Anomaly: We need to change it in many places.
“Bad” Relation: Example
Problems
- Redundancy: Each department information is repeated as the number of employees work for it.
- Insert Anomaly: Suppose that we insert new department; Then, we can not insert that department if some employee is not assigned to it. Why?
- Delete Anomaly: Suppose that we delete some department; Then, we may lose all employee information working for it. What if we delete last employee working for department?
- Update Anomaly: Suppose that we change the name of some department; Then, it may cause this update to be made for every employee working for it.
Solution : Decomposition
- We decompose Bad relation as follows:
- Are these two relations good?
-
Can we recover original information exactly?
- Those two relations are good because:
- No Redundancy : Each department information occurs only once.
- No Insert Anomaly : We can insert new department even though no employee is assigned to it.
- No Delete Anomaly : Even if we delete some department, we still can keep all the employees working for it.
- No Update Anomaly : We can change the name of department only once.
Other “Bad” Relation: Example
Guidelines : Designing Relational Schema
- Guideline 1: Design a relational schema with good information; In good relations, there are no redundancy and no insertion, deletion, update anomalies.
- Guideline 2: Design a relational schema so that they can be joined with appropriate joining pair attributes (primary key, foreign key). Otherwise, the join results may contain spurious tuples.
- Guideline 3: (If possible) avoid placing attributes in a relation whose values may frequently be NULL.
Functional Dependency (FD)
- A Functional dependency (FD) is used to specify formal measures of the “goodness” of relational designs.
- FDs are constraints that are derived from the meaning and inter-relationships of the attributes.
- FDs are derived from the real-world constraints on the attributes.
- FDs and keys are used to define normal forms for relations.
- A FD X → Y in a relation R is defined as: For any two tuples t1 and t2 in R, if t1[X] = t2[X] holds, then t1[Y] = t2[Y] also holds where X, Y : a set of attributes in R.
- Whenever two tuples have the same value for X, they also must have the same value for Y.
- Value(s) of X uniquely (functionally) decides the value(s) of Y.
- If X → Y holds in R, this does not say that Y → X holds.
- If every tuple for X has distinct value(s), then X can decide any attribute(s) in R; Why?
FD : Example
-
Given a tuple and the value(s) of X, we can decide the corresponding value of Y .
-
R = (A, B, C, D, E)
- FD : (1) A → B
- (2) C → D
- R = (A, B, C, D, E)
- FD : (1) A → {B, D}
- (2) {B, C} → E
FD : 실제 예 (1)
- FD는 한 relation에서 실세계의 제약조건을 반영. DB Designer는 이러한 제약조건을 FD로 변환하여 정의;
STUDENT(ID, club, prof#, course#, major)
- Each student joins only one club.
- ID → club
- Each student has only one major.
- ID → major
- Each professor teaches only one course.
- prof# → course#
- FD는 relation 안의 모든 tuple들이 항상 만족해야 하는 조건.
-
만약 새로운 tuple <Eve, Dance, John, OS, . . >을 insert 한다면?
-
(참고로) 여기서 Key는 무엇인지?
FD : 실제 예 (2)
Student(S#, C#, sname, age, cname, credit, grade, D#, dname, office)
- Each student has only one his/her name and age.
- S# → {sname, age}
- Each course has only one its name and credit.
- C# → {cname, credit}
- Each student belongs to only one department.
- S# → D#
- Each department has only one its name and office.
- D# → {dname, office}
- For each student and his/her taken course, only one grade exists.
- {S#, C#} → grade
FD : 실제 예 (3)
- 다음의 각 FD가 만족되는지 검증하라.
1) SSN → name ? answer : __ 2) position → phone ? answer : __ 3) phone → position ? answer : ___ 4) {position, phone} → phone ? answer : __
Normalization
History
- E. F. Codd : English Computer Scientist; While working for IBM, he first proposed the process of normalization and established normal forms: 1NF, 2NF, 3NF, BCNF; “ A Relational Model of Data for Large Shared Data Banks”
- “There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition, nonsimple domains are replaced by ”domains whose elements are atomic (= non-decomposable) values.”
Normalization
- Normalization:
- Process of decomposing “bad“ relations into “smaller”, but “good”(no redundancy, no anomalies) relations
- Normal forms:
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- 2NF, 3NF, BCNF: based on keys and FDs.
- 4NF: based on keys and MVDs.
First Normal Form (1NF)
- A relation R is 1NF if
- each tuple must be identified by primary key;
- each attribute must have atomic (= only one) value;
- In other words, R does not allow
- multi-valued attribute
- composite attribute c) Combination of a) and b): “Nested relation”
- Consider a PERSON relation; This is not in 1NF because {phones} is multi-valued attribute.
- There are 3 methods to normalize into 1NF.
Convert into 1NF : Method A
- Method A: Expand the key so that there is a separate tuple for each phone number;
- The following problems may occur;
- Primary key is changed as {SSN, phones}.
- Redundancy: Same information(SSN, name, age) is repeated.
- Anomalies may occur.
Convert into 1NF : Method B
- Method B; If maximum number of phone values(say, 3) is known, replace phone attribute by 3 atomic attributes.
- The following problems may occur;
- Many null values exist.
- Spurious semantics about ordering among phone values.
- Hard to query.
- Hard to redesign ; What if few more phones are added?
Convert into 1NF : Method C
- Method C; Remove phones attribute that violates 1NF and place it in a separate relation; Decompose it!
- Method C is the best; Completely general!
- No redundancy
- No anomalies
- Easy to query.
- Easy to redesign
More than 1 Multivalued Attributes
- Consider a PERSON relation; It has 2 multivalued attributes;
- PERSON(SSN, {hobbies}, {phones})
- Suppose we use method A;
- PERSON(SSN, hobbies, phones)
- Key is a {all attributes};
- Lots of redundancy! All possible of combination of values for every ID.
- We recommend Method C;
- PERSON1(SSN, hobbies)
- PERSON2(SSN, phones)
Nested Relations
- Consider STUDENT relation; This is not in 1NF because {transcript} is multi-valued/composite attribute; That means “Nested Relation (= Relation within relation)”!
- Remove nested relation transcript and put it in separated relation;
- STUDENT(ID, name)
- TRANSCRIPT(ID, course, year, grade)
Normalization : 기본 개념
- (지금부터) 모든 relation은 1NF라고 가정함.
- Redundancy와 Anomaly 현상은 “bad” FD들이 존재할 때 발생함;
- 이러한 “bad” FD들을 단계별로 찾아내서 소거하는 것이 정규화의 기본 개념. 다음은 전형적인“bad” FD들;
- Key에 부분 종속되는 FD들을 소거 : 2NF
- Key에 이행 종속되는 FD 들을 소거 : 3NF
- Super Key가 아닌 것에 종속되는 FD들을 소거 : BCNF
Partial Dependency (부분 종속)
- An attribute that is a member of some key is called a prime attribute. An attribute that is not a member of any key is non-prime attribute.
- Given FD X → Y, Y is partially dependent on X if there exists FD such that Y is dependent of a proper subset of X.
Second Normal Form (2NF)
- A relation R is 2NF if any non-prime attribute is not partially dependent of any key.
2NF : Example
- Is this relation in 2NF?
- No! Why? Non-prime attribute ‘ename’ is partially dependent on key ‘{SSN, PNO}’ because {SSN, PNO} → ename also holds. and 2) SSN → ename is given,
- Each of non-prime attributes {pname, location} is partially dependent on key {SSN, PNO} because {SSN, PNO} → pname, {SSN, PNO} → location also holds and 3) PNO → {pname, location} is given
Transitive Dependency (이행종속)
- Given FD X → Z, Z is transitive dependent on X if there exist FDs such that X → Y and Y → Z
Third Normal Form (3NF)
- A relation R is 3NF if any non-prime attribute is not transitive dependent of any key.
- A relation R is 3NF if for every FD : X → A,
- (1) X is a super key,
- (= X contains any key)
- (2) A is a prime attribute
- (= A is a member of some key)
- (1) X is a super key,
3NF : Example
- Is this relation 3NF?
- No! Why? Non-prime attribute ‘dname’ is transitive dependent on key ‘SSN’ because SSN → dname also holds.
- In other words, in 2) DNO → dname, ‘DNO’ is not a super key and also, ‘dname’ is not a prime attribute;
- We decompose as follows:
Normalization : Exercise
Normalization : Exercise (1)
- What is key? Key = {ID, course}
- This relation is not in 2NF because non-prime attribute {dept} is partially dependent on key {ID, course}
- We have the following problems: Explain!
- Redundancy
- Insert Anomaly
- Delete Anomaly
- Update Anomaly
Normalization : Exercise (2)
- We decompose STUDENT as follows:
- In both relations, no more problems in 2NF; Explain!
- No Redundancy
- No Insert Anomaly
- No Delete Anomaly
- No Update Anomaly
Normalization : Exercise (3)
- This relation is not in 3NF because non-prime attribute ‘college’ is transitive dependent on key = {ID};
- In other words, in 2) dept → college, ‘dept’ is not a super key; also, ‘college’ is not a prime attribute.
- We still have the following problems: Explain!
- Redundancy
- Insert Anomaly
- Delete Anomaly
- Update Anomaly
Normalization : Exercise (4)
- We decompose STUDENT2 as follows:
- In the both relations, no more problems; Explain!
- No Redundancy
- No Insert Anomaly
- No Delete Anomaly
- No Update Anomaly
Boyce-Codd Normal Form (BCNF)
- A relation R is in BCNF if for every FD : X → A, X is a super key
- In other words, all FDs that hold over R satisfy super key constraints
BCNF : Example
- There are 2 keys; {student, course}, {student, prof}
- This relation is in 3NF, but not BCNF; Why?
- In 2) prof → course, ‘prof’ is not a super key;
- We still have redundancy problem: The same ‘course’ name is repeated many times. And any anomalies?
- We decompose as follows:
- In both BCNF relations, there exists no more any redundancy and no more anomaly problems.
Why BCNF?
- Every BCNF guarantees no redundancy and no anomalies caused by FDs. – Since X is a super key, the two tuples must be identical (y1 = y2, z1 = z2); But this is not allowed. Why?
- Suppose R is a binary relation; R = (A, B)
- Does R satisfy 2NF?
- Does R satisfy 3NF?
- Does R satisfy BCNF?
- Normalization : Exercise
- Consider the following CAR relation;
- Decompose the above CAR relation with
- no redundancy
- no insert anomaly
- no delete anomaly
- no update anomaly
댓글남기기