Ha3k

Posted on May 26

DBMS 2023

Here are the answers to the questions, categorized as requested.

Group-A (Very Short Answer Type Question) - [1 x 10 = 10 marks]

(I) The information about data in a database is called metadata.
(II) Which one of the following commands is used to modify a column inside a table? ALTER TABLE.
(III) What is the full form of NTFS? New Technology File System.
(IV) What is the full form of TCL? Transaction Control Language.
(V) Consistency ACID property states that only valid data will be written to the database.
(VI) In which of the following formats data is stored in the database management system? Tables (or Relational format / Structured format).
(VII) The database system must take special actions to ensure that transactions operate properly without interference from concurrently executing database statements. This property is referred to as Isolation.
(VIII) The database design prevents some data from being represented due to redundancy anomaly. (Specific anomalies like update, insert, delete are also valid answers).
(IX) We can use the following three rules to find logically implied functional dependencies. This collection of rules is called Armstrong's Axioms.
(X) Which character function can be used to return a specified portion of a character string in SQL? SUBSTRING (or SUBSTR).
(XI) The normal form which satisfies multivalued dependencies and which is in BCNF is 4NF (Fourth Normal Form).
(XII) DBMS periodically suspends all processing and synchronizes its files and journals through the use of Checkpointing.

Group-B (Short Answer Type Question) - Answer any three [5 x 3 = 15 marks]

Here are answers to three questions from Group-B:

2. State Armstrong's three axioms. [5]

Armstrong's Axioms are a set of inference rules used to infer all possible functional dependencies from a given set of functional dependencies. They are considered sound (do not generate incorrect FDs) and complete (can generate all correct FDs).

Reflexivity Axiom: If Y is a subset of X, then X → Y.
- Explanation: If a set of attributes X contains all attributes of another set Y, then X functionally determines Y. This is always true by definition.
- Example: If you have attributes {A, B, C} and X = {A, B}, Y = {A}, then {A, B} → {A}.
Augmentation Axiom: If X → Y, then XZ → YZ for any set of attributes Z.
- Explanation: If X functionally determines Y, adding the same set of attributes Z to both sides of the dependency preserves the dependency.
- Example: If {StudentID} → {StudentName}, then {StudentID, CourseID} → {StudentName, CourseID}.
Transitivity Axiom: If X → Y and Y → Z, then X → Z.
- Explanation: If X functionally determines Y, and Y in turn functionally determines Z, then X also functionally determines Z.
- Example: If {EmployeeID} → {DepartmentID} and {DepartmentID} → {DepartmentName}, then {EmployeeID} → {DepartmentName}.

3. What is functional dependency? What is join dependency? [5]

Functional Dependency (FD):
A functional dependency is a constraint between two sets of attributes in a relation. It is denoted as X → Y, meaning that the value of attribute set X uniquely determines the value of attribute set Y. In other words, for any two tuples in the relation, if they have the same values for X, they must also have the same values for Y.
- Formally: For a relation R, X → Y holds if for any two tuples t1 and t2 in R, if t1[X] = t2[X], then t1[Y] = t2[Y].
- Example: In a Student relation {StudentID, StudentName, Major}, StudentID → StudentName is an FD because each StudentID uniquely identifies a StudentName.
Join Dependency (JD):
A join dependency is a constraint that specifies a condition for decomposing a relation into a set of smaller relations without loss of information. It states that a relation R can be losslessly reconstructed by taking the natural join of its projections onto a set of relations R1, R2, ..., Rn. A join dependency is a more general form of dependency than a multivalued dependency or functional dependency.
- Formally: A join dependency * (R1, R2, ..., Rn) holds on a relation R if R = R1 ⋈ R2 ⋈ ... ⋈ Rn, where R1, R2, ..., Rn are projections of R, and ⋈ is the natural join.
- Example: Consider a relation Supplier_Part_Project (Supplier, Part, Project). A join dependency * ({Supplier, Part}, {Part, Project}, {Supplier, Project}) means that the relationship between suppliers, parts, and projects can be fully captured by knowing which supplier supplies which part, which part is used in which project, and which supplier is involved in which project. If you join these three projections, you will get the original table back without spurious tuples. Join dependencies are crucial for achieving Fifth Normal Form (5NF).

4. Explain Lossless and Lossy decomposition by using suitable examples. [5]

When a relation is decomposed into smaller relations during normalization, it's essential that the decomposition is lossless.

Lossless Decomposition:
A decomposition of a relation R into R1, R2, ..., Rn is lossless if the natural join of R1, R2, ..., Rn results in exactly the original relation R. No information is lost, and no "spurious" (incorrect) tuples are generated.
- Condition for Lossless Join (for decomposition into two relations R1 and R2): A decomposition of R into R1 and R2 is lossless if and only if either:
  1. (R1 ∩ R2) → R1 (the common attributes form a superkey of R1), OR
  2. (R1 ∩ R2) → R2 (the common attributes form a superkey of R2).
- Example: Consider Employee_Dept (EmpID, EmpName, DeptID, DeptName). FD: EmpID → EmpName, DeptID → DeptName. Decompose into:
  - Employee (EmpID, EmpName, DeptID)
  - Department (DeptID, DeptName) Intersection: DeptID. Since DeptID is a key for Department (DeptID → DeptName), this decomposition is lossless. Joining Employee and Department on DeptID will reproduce the original Employee_Dept relation accurately.
Lossy Decomposition:
A decomposition is lossy if, when the decomposed relations are joined back together, it results in spurious tuples that were not present in the original relation, or if original information is lost. This can lead to incorrect or inconsistent data.
- Example: Consider Student_Subject_Teacher (StudentID, Subject, Teacher).
  Assume: StudentID → Subject (each student takes one subject)
  Subject → Teacher (each subject is taught by one teacher)
  No direct StudentID → Teacher dependency.
  Let's try to decompose into:
  - Student_Teacher (StudentID, Teacher)
  - Subject_Teacher (Subject, Teacher)
  Consider the data:
  Student_Subject_Teacher
  | StudentID | Subject | Teacher |
  | :-------- | :------ | :------ |
  | 101 | Math | Smith |
  | 102 | Physics | Jones |
  
  Decomposition:
  Student_Teacher
  | StudentID | Teacher |
  | :-------- | :------ |
  | 101 | Smith |
  | 102 | Jones |
  
  Subject_Teacher
  | Subject | Teacher |
  | :------ | :------ |
  | Math | Smith |
  | Physics | Jones |
  
  Now, join Student_Teacher and Subject_Teacher on Teacher:
  Student_Teacher ⋈ Subject_Teacher
  | StudentID | Teacher | Subject |
  | :-------- | :------ | :------ |
  | 101 | Smith | Math |
  | 101 | Smith | Physics | <-- Spurious!
  | 102 | Jones | Math | <-- Spurious!
  | 102 | Jones | Physics |
  
  The join introduced spurious tuples (e.g., student 101 taking Physics). This is because Teacher is not a key in either decomposed relation and does not uniquely identify StudentID or Subject. Thus, this is a lossy decomposition.

Group-C (Long Answer Type Question) - Answer any three [15 x 3 = 45 marks]

Here are answers to three questions from Group-C:

7. (a) What is the difference between DELETE, TRUNCATE and DROP commands? [3]

These three SQL commands are used to remove data or objects from a database, but they differ significantly in their scope, functionality, and how they interact with transactions.

Feature	DELETE	TRUNCATE TABLE	DROP TABLE
Command Type	DML (Data Manipulation Language)	DDL (Data Definition Language)	DDL (Data Definition Language)
Purpose	Removes rows from a table. Can use WHERE clause to filter.	Removes all rows from a table, effectively emptying it.	Removes the entire table (schema and data) from the database.
Rollback	Can be rolled back (if within a transaction).	Cannot be rolled back (auto-commits).	Cannot be rolled back.
Speed	Slower, as it logs each row deletion.	Faster, as it deallocates data pages.	Fastest, as it removes the table definition.
Triggers	Fires DELETE triggers.	Does not fire triggers.	N/A (table is removed).
Constraints	Checks integrity constraints (e.g., foreign keys).	Generally bypasses some integrity checks (e.g., foreign key constraints can prevent truncation if table is referenced).	Removes all associated constraints.
Identity/Auto-Increment	Does not reset auto-increment counter.	Resets auto-increment counter.	N/A (table is removed).

7. (b) Explain various update anomalies that can arise in a relational database with examples. [7]

Update anomalies are problems that occur in poorly designed (unnormalized) relational databases where redundant data can lead to inconsistencies. They are primarily categorized into three types:

Insertion Anomaly:
- Definition: Occurs when certain data cannot be inserted into the database without the presence of other unrelated data, or without introducing redundant information.
- Example: Consider a Course_Instructor relation (CourseID, CourseName, InstructorName, InstructorOffice). Assume FDs: CourseID → CourseName, InstructorName → InstructorOffice. If you want to add a new Instructor (e.g., 'Prof. White', Office 'A-101') who is not yet assigned to any Course, you cannot do so without inserting a CourseID and CourseName. You might be forced to use a NULL or dummy CourseID, which violates good database design principles and can lead to integrity issues.
Deletion Anomaly:
- Definition: Occurs when deleting a fact about one entity unintentionally deletes facts about another, unrelated entity due to data redundancy.
- Example: Using the Course_Instructor table from above: | CourseID | CourseName | InstructorName | InstructorOffice | | :------- | :--------- | :------------- | :--------------- | | CS101 | Databases | Smith | B-201 | | CS102 | Networks | Jones | C-302 | | CS103 | OS | Smith | B-201 | If CS102 (Networks) is dropped, and Jones only teaches CS102, then deleting the row for CS102 would also remove all information about Instructor Jones (including their InstructorOffice), which might be critical and should be preserved even if the course is no longer offered.
Update Anomaly (Modification Anomaly):
- Definition: Occurs when updating a piece of data requires updating multiple records, and if all copies of the redundant data are not updated consistently, it leads to data inconsistency.
- Example: Using the same Course_Instructor table: | CourseID | CourseName | InstructorName | InstructorOffice | | :------- | :--------- | :------------- | :--------------- | | CS101 | Databases | Smith | B-201 | | CS102 | Networks | Jones | C-302 | | CS103 | OS | Smith | B-201 | If Instructor Smith changes their office from 'B-201' to 'B-202', you would need to update every row where Smith is the instructor (i.e., for CS101 and CS103). If you only update one instance and miss the other, the database becomes inconsistent, showing two different office numbers for the same instructor.

Solution: These anomalies are typically resolved through database normalization, which involves decomposing relations into smaller, more focused relations that adhere to higher normal forms (e.g., 3NF, BCNF). For the Course_Instructor example, it should be decomposed into:

Course (CourseID, CourseName, InstructorName)
Instructor_Info (InstructorName, InstructorOffice) This eliminates redundancy and prevents these anomalies.

7. (c) Explain the functionalities of DBA. [5]

A Database Administrator (DBA) is a professional responsible for the overall management, maintenance, security, and performance of a database system. Their role is crucial for an organization's data integrity and availability. Key functionalities of a DBA include:

Database Design and Implementation:
- Working with developers and users to define the logical and physical schema of the database.
- Creating database objects like tables, views, indexes, stored procedures, and triggers.
- Ensuring appropriate data types, constraints, and relationships are defined.
Security Management:
- Establishing and maintaining user accounts, roles, and permissions (privileges).
- Implementing and enforcing security policies to protect sensitive data from unauthorized access.
- Auditing database activities to detect and investigate security breaches.
Backup and Recovery:
- Designing and implementing comprehensive backup strategies to prevent data loss.
- Performing regular backups and testing recovery procedures to ensure data can be restored effectively in case of failures (e.g., hardware crash, data corruption, human error).
Performance Tuning and Optimization:
- Monitoring database performance metrics (e.g., I/O, CPU, memory usage, query response times).
- Identifying and resolving performance bottlenecks by optimizing queries, creating/modifying indexes, and adjusting database configuration parameters.
- Analyzing query execution plans and providing recommendations to developers.
Data Integrity and Consistency:
- Ensuring the accuracy, consistency, and reliability of data through the enforcement of integrity constraints (e.g., primary keys, foreign keys, unique constraints, check constraints).
- Troubleshooting and resolving data corruption or inconsistency issues.
Installation, Configuration, and Upgrade:
- Installing and configuring DBMS software (e.g., Oracle, SQL Server, MySQL, PostgreSQL).
- Managing database server parameters and settings.
- Applying patches, updates, and upgrades to the DBMS software.
Capacity Planning:
- Monitoring database growth and predicting future storage, memory, and processing requirements.
- Planning for necessary hardware and software upgrades to ensure the database can handle increasing workloads.
Troubleshooting and Support:
- Diagnosing and resolving database-related issues, errors, and system outages.
- Providing technical support and guidance to application developers and end-users.

In essence, a DBA is the custodian of the database, ensuring its continuous operation, integrity, security, and optimal performance.

9. (a) Consider the relation R = (A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies: F = (AB->C, A-> DE, B-> F, F->GH, D->J} Decompose R into 3NF. [7]

To decompose R into 3NF using the Synthesis Algorithm:

Step 1: Find a minimal cover for F.
First, break down FDs with multiple attributes on the RHS:

A->DE becomes: A->D, A->E
F->GH becomes: F->G, F->H

So, the new set of FDs is:
F' = {AB->C, A->D, A->E, B->F, F->G, F->H, D->J}

Now, check for redundancy in F' (all LHS are single, except AB->C):

Is AB->C derivable from others? No.
Is A->D derivable? No.
Is A->E derivable? No.
Is B->F derivable? No.
Is F->G derivable? No.
Is F->H derivable? No.
Is D->J derivable? No.

No FDs are redundant. No extraneous attributes on the LHS. Thus, F' itself is a minimal cover:
G = {AB->C, A->D, A->E, B->F, F->G, F->H, D->J}

Step 2: Find all candidate keys of R.
Attributes on RHS: C, D, E, F, G, H, J.
Attributes not on RHS: A, B, I. These must be part of any candidate key.
Let's check (ABI)+:
(ABI)+ = A, B, I (initial)
A -> D, E (from A)
B -> F (from B)
F -> G, H (from F)
D -> J (from D)
So, (ABI)+ = {A, B, C, D, E, F, G, H, I, J}.
Since (ABI)+ contains all attributes of R, {A, B, I} is a candidate key.
Since 'I' is not determined by any other attribute, it must be part of every candidate key. Thus, {A, B, I} is the only candidate key.

Step 3: Decompose into 3NF using the Synthesis Algorithm.
For each FD X → Y in G, create a relation schema RY = X ∪ Y. If none of the created schemas contain a candidate key, add a schema with a candidate key.

For AB->C: R1(A, B, C)
For A->D: R2(A, D)
For A->E: R3(A, E)
For B->F: R4(B, F)
For F->G: R5(F, G)
For F->H: R6(F, H)
For D->J: R7(D, J)

Check if any of these relations contain a candidate key of R ({A, B, I}). None of them do.
Therefore, add a relation schema that consists of the candidate key:

R8(A, B, I)

The 3NF decomposition of R is:
{R1(A, B, C), R2(A, D), R3(A, E), R4(B, F), R5(F, G), R6(F, H), R7(D, J), R8(A, B, I)}
This decomposition is lossless and dependency-preserving, and all relations are in 3NF.

9. (b) Define strong entity set and weak entity set. Give a proper example. [4]

In the Entity-Relationship (ER) model, entity sets are collections of real-world objects. Their classification as strong or weak depends on their identifying characteristics.

Strong Entity Set:
- Definition: A strong entity set is an entity set that possesses a sufficient set of attributes to form a unique primary key (or discriminator) for its instances. It can exist independently in the database without relying on the existence of another entity set.
- Characteristics:
  - Has its own primary key (underlined with a solid line in ER diagrams).
  - Represented by a single rectangle in an ER diagram.
- Example: An EMPLOYEE entity set where each employee has a unique EmployeeID as its primary key. An employee's existence is independent of any specific project or department they work on.
Weak Entity Set:
- Definition: A weak entity set is an entity set that does not have enough attributes to form a primary key on its own. Its existence is dependent on the existence of another entity set, called the owner or identifying entity set. To uniquely identify an instance of a weak entity, it must borrow the primary key from its owner entity set, combined with its own partial key (or discriminator).
- Characteristics:
  - Does not have a full primary key. It has a partial key (or discriminator), which uniquely identifies weak entities only within the context of their owner.
  - Its primary key is a composite of the owner's primary key and its own partial key.
  - Represented by a double rectangle in an ER diagram.
  - The identifying relationship between the weak entity and its owner is represented by a double diamond.
- Example: A DEPENDENT entity set, which represents the dependents of an EMPLOYEE. A dependent cannot exist in the database without being associated with a specific employee.
  - EMPLOYEE (EmployeeID (PK), Name, ...) - Strong entity.
  - DEPENDENT (DependentName (partial key), BirthDate, Relationship) - Weak entity. The primary key for DEPENDENT would be (EmployeeID, DependentName), meaning 'DependentName' is unique only for a given EmployeeID.

9. (c) What do you mean by derived attribute? Give an example. [4]

Derived Attribute:
A derived attribute is an attribute in an ER model or relational database whose value is not stored explicitly in the database but is computed or derived from other stored attributes. It is calculated "on-the-fly" when requested. This approach helps in reducing data redundancy and maintaining data consistency.
- Characteristics:
  - Value is computed from other attributes.
  - Not stored directly in the database.
  - Changes in source attributes automatically update the derived attribute's value.
  - In an ER diagram, represented by a dashed oval.
Advantages:
- Eliminates data redundancy.
- Ensures data consistency, as the value is always up-to-date based on its source.
Disadvantages:
- May incur computational overhead each time the value is accessed.
- Can make query processing slightly slower compared to directly stored attributes, depending on the complexity of the derivation.
Example:
Consider an EMPLOYEE entity with a stored attribute DateOfBirth.
A Age attribute can be derived from DateOfBirth and the CurrentDate.
- If an employee's DateOfBirth is '1990-01-15', their Age can be calculated as CurrentYear - 1990 (adjusted for month/day). This Age is not stored in the EMPLOYEE table.
- Every time you query for an employee's age, the system computes it based on their DateOfBirth and the current date. This means the Age is always accurate and does not need manual updates as time passes.

10. (A) What is blocking factor. Explain the difference between B-tree and B+ tree indexing with proper example. [5+5+5]

This question has three parts, each worth 5 marks.

10. (A) Part 1: What is blocking factor? [5]

The blocking factor (or block factor) refers to the number of logical records (tuples/rows) that can be stored within a single physical block (or disk block) on a storage device. A physical block is the smallest unit of data that the disk can read or write at one time.

Calculation:
Blocking Factor = Floor (Block Size / Record Size)
Where:
- Block Size: The fixed size of a physical block on disk (e.g., 4KB, 8KB).
- Record Size: The average size of a single logical record in the database.
- Floor: Since a record cannot be split across blocks, we take the largest integer less than or equal to the division result.
Significance:
- I/O Efficiency: The primary goal of blocking is to minimize the number of disk I/O operations. Reading an entire block at once retrieves multiple records. A higher blocking factor means fewer disk reads are needed to fetch a given number of records, significantly improving performance, as disk I/O is a major bottleneck in database systems.
- Storage Utilization: A well-chosen blocking factor can optimize how disk space is used. If record sizes are very small compared to block size, and the blocking factor is low, it can lead to wasted space within blocks.
- Buffer Management: When a block is read into memory, all records within it become available in the buffer. A higher blocking factor allows more records to be processed per buffer load, which can be beneficial for sequential processing.
Example:
Suppose Block Size = 8192 bytes (8KB) and Record Size = 200 bytes.
Blocking Factor = Floor (8192 / 200) = Floor (40.96) = 40.
This means 40 records can be stored in one disk block. To retrieve 100 records, you would need Ceiling (100 / 40) = 3 disk I/O operations, instead of 100 if each record was stored separately.

10. (A) Part 2: Explain the difference between B-tree and B+ tree indexing with proper example. [5]

Both B-trees and B+ trees are balanced tree data structures commonly used for indexing in database management systems to enable efficient data retrieval, insertion, and deletion, especially on disk-based storage.

B-Tree:
- Structure: Internal nodes and leaf nodes can store both keys and pointers to data records.
- Data Storage: Data pointers (or actual data records) can exist at any node level, including internal nodes.
- Search: A search for a key might terminate at an internal node if the key is found there.
- Range Queries: Less efficient for range queries. To find all keys within a range, the traversal might involve moving up and down the tree multiple times, as sibling nodes are typically not linked.
- Example: A B-tree node could look like [Key1 | DataPtr1 | Key2 | DataPtr2]. If a search finds Key1 in an internal node, it can directly access DataPtr1.
B+ Tree:
- Structure: Internal nodes store only keys and pointers to child nodes (acting as an index). All actual data pointers (or data records) are stored exclusively in the leaf nodes.
- Data Storage: All data records are stored only at the leaf level. Internal nodes serve purely as an index to guide the search.
- Search: All searches for data must traverse from the root down to a leaf node.
- Range Queries: Highly efficient for range queries. Once the first record in the range is found in a leaf node, subsequent records within the range can be found by traversing the linked list of leaf nodes sequentially. This minimizes disk I/O for range scans.
- Example: An internal B+ tree node looks like [Key1 | ChildPtr1 | Key2 | ChildPtr2]. These keys are copies that guide the search. A leaf node looks like [Key1 | DataPtr1 | Key2 | DataPtr2 | PointerToNextLeaf]. Searching for Key1 always leads to a leaf node to retrieve DataPtr1.

Key Differences Summarized:

Feature	B-Tree	B+ Tree
Data in Internal Nodes	Yes (keys and data pointers/records)	No (only keys and child pointers)
Data in Leaf Nodes	Yes (keys and data pointers/records)	Yes (all data pointers/records)
Leaf Node Linking	No (typically, not guaranteed)	Yes (linked for sequential access)
Search Termination	Can terminate at internal or leaf nodes	Always terminates at a leaf node
Range Queries	Less efficient	Highly efficient
Space Utilization	Can be more compact if data is small and can be stored in internal nodes	Internal nodes only store keys, potentially more levels for same data, but optimized for disk I/O
Typical Use	File systems (e.g., ext4, XFS)	Database indexes (e.g., MySQL InnoDB, SQL Server)

B+ trees are generally preferred for database indexing due to their consistent search path (always to a leaf node, optimizing disk seeks) and superior efficiency for range queries, which are common in databases.

10. (A) Part 3: Insert the following elements in B-Tree of order 4: 65, 66, 70, 71, 74, 80, 91, 81, 99, 82, 75, 77, 89, 56. [5]

A B-Tree of order 4 (m=4) implies:

Maximum keys per node: m-1 = 3
Maximum children per node: m = 4
Minimum keys per internal node (except root): ceil(m/2)-1 = ceil(4/2)-1 = 2-1 = 1
Minimum children per internal node (except root): ceil(m/2) = 2
Root node minimum children: 2 (if not a leaf)

Let's represent a node as [Key1 | Key2 | Key3]. When a node splits, the median key moves up to the parent.

Insert 65: [65]
Insert 66: [65 | 66]
Insert 70: [65 | 66 | 70] (Node is full)
Insert 71: Split [65 | 66 | 70]. Median 70 moves up.
Root: [70]
Children: [65 | 66] (left of 70), [71] (right of 70)
Insert 74: Insert into [71].
Root: [70]
Children: [65 | 66], [71 | 74]
Insert 80: Insert into [71 | 74].
Root: [70]
Children: [65 | 66], [71 | 74 | 80] (Node is full)
Insert 91: Split [71 | 74 | 80]. Median 80 moves up.
Root: [70 | 80]
Children: [65 | 66] (left of 70), [71 | 74] (between 70 and 80), [91] (right of 80)
Insert 81: Insert into [91] (right of 80, so in the node that contains keys >=80).
Root: [70 | 80]
Children: [65 | 66], [71 | 74], [81 | 91]
Insert 99: Insert into [81 | 91].
Root: [70 | 80]
Children: [65 | 66], [71 | 74], [81 | 91 | 99] (Node is full)
Insert 82: Split [81 | 91 | 99]. Median 91 moves up to root [70 | 80].
Root [70 | 80] becomes [70 | 80 | 91] (Node is full)
Children: [65 | 66], [71 | 74], [81 | 82] (left of 91), [99] (right of 91)
Insert 75: Insert into [71 | 74] (between 70 and 80).
Root: [70 | 80 | 91]
Children: [65 | 66], [71 | 74 | 75] (Node is full), [81 | 82], [99]
Insert 77: Split [71 | 74 | 75]. Median 74 moves up to root [70 | 80 | 91].
Root [70 | 80 | 91] is now [70 | 74 | 80 | 91], which is too many keys for a root node (max 3). So the root must split. Median of [70 | 74 | 80 | 91] is 80.
New Root: [80]
Left Child of New Root (for values < 80): [70 | 74]
Children of [70 | 74]:
* <70: `[65 | 66]`
* >=70 and <74: `[71]` (left child after splitting `[71 | 74 | 75]`)
* >=74 and <80: `[75 | 77]` (right child after splitting `[71 | 74 | 75]`)
Right Child of New Root (for values > 80): [91]
Children of [91]:
* <91: `[81 | 82]`
* >=91: [99]
Insert 89: Insert into [81 | 82] (which is a child of [91]).
[81 | 82] becomes [81 | 82 | 89] (Node is full)
Current Tree:
Root: [80]
Left: [70 | 74] (Children: [65 | 66], [71], [75 | 77])
Right: [91] (Children: [81 | 82 | 89] (full), [99])
Insert 56: Insert into [65 | 66] (which is a child of [70 | 74]).
[65 | 66] becomes [56 | 65 | 66] (Node is full)
Current Tree:
Root: [80]
Left: [70 | 74] (Children: [56 | 65 | 66] (full), [71], [75 | 77])
Right: [91] (Children: [81 | 82 | 89] (full), [99])

Now, handle the two full nodes. Let's process the left branch first.
Split [56 | 65 | 66]: Median 65 moves up to its parent [70 | 74].
[70 | 74] becomes [65 | 70 | 74] (Node is full).
Split [65 | 70 | 74]: Median 70 moves up to its parent [80].
[80] becomes [70 | 80].

Now handle the right branch.
Split [81 | 82 | 89]: Median 82 moves up to its parent [91].
[91] becomes [82 | 91].

Final B-Tree Structure (Order 4):
```
        [ 70 | 80 ]
       /      |      \
    [ 65 ]   [ 74 ]  [ 82 | 91 ]
   /   \     /  \    /    |    \
[ 56 ] [ 66 ] [ 71 ] [75|77] [ 81 ] [ 89 ] [ 99 ]
```

*   **Root:** `[70 | 80]`
*   **Level 1 Children:**
    *   Left child of `70` (for values < 70): `[65]`
    *   Middle child (between 70 and 80): `[74]`
    *   Right child of `80` (for values > 80): `[82 | 91]`
*   **Level 2 Leaf Children:**
    *   Children of `[65]`: `[56]` (left), `[66]` (right)
    *   Children of `[74]`: `[71]` (left), `[75 | 77]` (right)
    *   Children of `[82 | 91]`: `[81]` (left of 82), `[89]` (between 82 and 91), `[99]` (right of 91)

10. (A) Part 3 (C) Explain different Hashing techniques. [5]

Hashing is a technique used in databases to directly map search keys to specific disk block addresses, providing very fast data retrieval (ideally O(1) average time). It involves a hash function that transforms a search key into an address, and buckets (disk blocks) where records are stored.

Hashing techniques are primarily categorized into static and dynamic hashing:

Static Hashing:
- Concept: The number of buckets (disk blocks) is fixed when the database is created and remains constant. If the number of records grows significantly, the hash function might distribute keys unevenly, leading to a high number of collisions and performance degradation.
- Collision Resolution: Since multiple keys can map to the same bucket, techniques are needed to handle these "collisions":
  - Separate Chaining: Each bucket stores a pointer to a linked list. All records that hash to the same bucket are added to this linked list. This is robust but requires extra space for pointers.
  - Open Addressing: If a bucket is full, the system probes for the next available bucket in a predefined sequence (e.g., linear probing, quadratic probing, double hashing). This can lead to clustering.
  - Overflow Area: A separate, dedicated area of disk blocks is maintained to store records that overflow from their primary buckets. Pointers link overflowing buckets to the overflow area.
Dynamic Hashing:
- Concept: The number of buckets adapts dynamically to the growth or shrinkage of the database. This allows the system to maintain high performance by keeping the number of collisions low, even with varying data volumes.
- Types:
  - Extendable Hashing:
    - Uses a directory (an array of pointers) where each entry points to a data bucket.
    - When a bucket overflows, it splits into two, and the directory may double in size if the current depth is reached. The hash function uses an increasing number of bits from the key to determine the bucket, effectively expanding the address space.
    - Advantage: Provides very good average performance (usually two disk accesses for a lookup: one for the directory, one for the bucket). No complete rehashing of the table is required.
    - Disadvantage: The directory can become very large for extremely large databases, potentially requiring its own disk I/O.
  - Linear Hashing:
    - Does not use a directory. It maintains a split pointer that points to the next bucket to be split.
    - When any bucket overflows, the bucket pointed to by the split pointer is split, not necessarily the overflowing one. This leads to gradual, incremental growth of the file.
    - A new hash function (using an additional bit) is used for the split bucket. The split pointer moves sequentially.
    - Advantage: No directory overhead. More graceful and continuous growth compared to extendable hashing's potential directory doublings.
    - Disadvantage: May temporarily have some buckets with longer overflow chains than others because the split is systematic, not always on the problematic bucket.

Conclusion:
The choice between static and dynamic hashing depends on the expected database size fluctuation. Dynamic hashing is generally preferred for large, frequently changing databases due to its adaptability and consistent performance, while static hashing can be suitable for databases with a stable and predictable size.

DEV Community

DBMS 2023

Group-A (Very Short Answer Type Question) - [1 x 10 = 10 marks]

Group-B (Short Answer Type Question) - Answer any three [5 x 3 = 15 marks]

Group-C (Long Answer Type Question) - Answer any three [15 x 3 = 45 marks]

Top comments (0)