Ha3k

Posted on May 26

DBMS 2024

Here are the answers to the questions from the provided PDF, categorized as requested.

Group-A (Very Short Answer Type Question) - [1 x 10 = 10]

(I) Rectangles in ER diagram represents? d) entity set
(II) Which one is the fundamental operations in the relational algebra? d) Cartesian Product (Projection, Selection, Union, Difference are also fundamental. Cartesian product is considered fundamental as other joins can be derived from it.)
(III) Which of the following establishes a top-to-bottom relationship among the items? c) Hierarchical schema
(IV) Which of the following is not an example of DBMS? d) Google (MySQL, Microsoft Access, and IBM DB2 are all database management systems. Google is a search engine company, though they do develop their own internal database technologies, they are not typically listed as a general-purpose DBMS product in this context.)
(V) Which of the following statement is TRUE about SQL Injection? c) SQL Injection is a Code Penetration Technique (It involves injecting malicious SQL code.)
(VI) what is Granularity? d) The size of file (While granularity can refer to various sizes (data item, record), in the context of concurrency control, it often refers to the size of the data item/lockable unit, which can range from a field to a record, a block, or even a file/database. However, among the given options, "size of file" is a reasonable interpretation of a large granularity unit, but "size of record" or "size of data item" would be more precise for typical concurrency contexts. If it's about the smallest lockable unit, then it would be a data item. Given the options, and ambiguity, 'size of record' or 'size of data item' are more precise for concurrency, but 'size of file' represents a large granularity. Let's pick the most encompassing interpretation or acknowledge the ambiguity. For locking, it's typically the size of the resource being locked. If it refers to the options provided: c) The size of record or b) The size of data item are usually what granularity refers to in database locking. d) The size of file also represents a granularity level. Without further context, this is tricky. However, in the context of database locking, granularity often refers to the smallest unit that can be locked. Let's assume it refers to the size of the lockable unit. Given the options, c) The size of record or b) The size of data item are common interpretations. If we must choose one, c) The size of record is a common interpretation.) Self-correction: The question asks "what is Granularity?", not specifically "granularity of locking". In general, granularity refers to the *level of detail or size of a unit. In DBMS, it commonly refers to the size of data items that can be locked or the level of detail in data. Option (c) "The size of record" or (b) "The size of data item" are the most fitting. Let's go with c) The size of record as a general definition of a data unit size.*
(VII) Which of the following is the subset of SQL commands used to manipulate Oracle Structures, including tables? d) Data Definition Language (DDL commands like CREATE, ALTER, DROP are used to manipulate database structures.)
(VIII) A data dictionary does not provide information about d) how data is used (A data dictionary describes the structure of the database, its objects, constraints, security. It doesn't typically track how end-users process or apply the data in their applications.)
(IX) Which of the following set should be associated with weak entity set for weak entity to be meaningful? b) Strong entity set (A weak entity set is dependent on a strong entity set for its existence and identification.)
(X) Which of the following is the full form of TCL? c) transaction control language
(XI) Which of the following is TRUE about the type of SQL Injection attack? d) All of the above (SQL injection can be used to install malicious programs (via certain database features), export valuable data, and get user login details.)
(XII) The values appearing in given attributes of any tuple in the referencing relation must likewise occur in specified attributes of at least one tuple in the referenced relation, according to referential integrity integrity constraint. a) Referential

Group-B (Short Answer Type Question) - Answer any three [5 x 3 = 15]

2. Discuss the ACID properties of transaction. [5]

ACID properties are a set of characteristics that guarantee that database transactions are processed reliably. They are crucial for maintaining data integrity and consistency in a DBMS.

Atomicity: An atomic transaction is an indivisible unit of work. It means that either all operations within the transaction are completed successfully (committed), or none of them are (rolled back). There is no partial completion. If any part of the transaction fails, the entire transaction is aborted, and the database is returned to its state before the transaction began.
- Example: In a bank transfer from account A to B, funds are debited from A and credited to B. If the debit succeeds but the credit fails, Atomicity ensures both operations are undone, and the funds are returned to A.
Consistency: A transaction brings the database from one valid and consistent state to another. It ensures that any transaction, when executed, will leave the database in a valid state, respecting all defined rules, constraints, and triggers (e.g., primary keys, foreign keys, unique constraints). If a transaction attempts to violate any integrity rule, it will be rolled back.
- Example: If a bank transfer must maintain the sum of balances, a consistent transaction ensures that after the transfer, the total amount of money across all accounts remains the same.
Isolation: Transactions are executed independently and concurrently without interference from other concurrent transactions. The intermediate state of a transaction is not visible to other transactions. This ensures that the concurrent execution of multiple transactions results in a system state that is equivalent to some serial (sequential) execution of those transactions.
- Example: If two transactions try to read and update the same bank account balance concurrently, Isolation ensures that one transaction doesn't see the partial update of the other, preventing dirty reads or lost updates.
Durability: Once a transaction has been committed, its changes are permanent and will survive any subsequent system failures (e.g., power outages, crashes). The committed data is written to stable storage (like disk) and will not be lost.
- Example: After a bank transfer is committed, even if the system crashes immediately after, the changes to the account balances are permanently recorded and will be reflected when the system recovers.

3. Distinguish between locking and timestamp protocols for concurrency controls. [5]

Both locking and timestamp protocols are mechanisms used in database management systems to ensure concurrency control and maintain the ACID properties, especially isolation.

Feature	Locking Protocol (e.g., Two-Phase Locking - 2PL)	Timestamp Protocol
Mechanism	Transactions acquire locks on data items before accessing them. Locks can be shared (read) or exclusive (write). A transaction cannot access a data item if a conflicting lock is held.	Each transaction and data item is assigned a unique timestamp. Operations are ordered according to these timestamps. Older transactions take precedence.
Conflict Handling	Blocking: If a transaction requests a lock on a data item that is already locked by a conflicting transaction, it waits until the lock is released.	Rollback: If an operation from an older transaction (smaller timestamp) tries to access a data item that has already been accessed/modified by a younger transaction (larger timestamp) in a way that violates serializability, the younger transaction is rolled back.
Deadlock	Prone to deadlocks. Two or more transactions can get into a state where each is waiting for the other to release a lock, leading to a standstill. Deadlock detection and resolution mechanisms are required.	Not prone to deadlocks. Conflicts are resolved by rolling back transactions based on timestamps, preventing cycles of waiting. (Can have "livelock" if a transaction keeps getting rolled back).
Rollbacks	Fewer rollbacks (primarily for deadlocks or aborts).	More rollbacks (especially if many transactions access data out of timestamp order).
Starvation	Possible if a transaction repeatedly loses the race for a lock.	Possible (livelock) if a transaction repeatedly attempts an operation that causes it to be rolled back.
Implementation Complexity	Relatively complex to manage locks (acquire, release, upgrade, downgrade) and detect/resolve deadlocks.	Simpler in concept, but requires careful management of read/write timestamps on data items.
Typical Use Cases	Widely used in most commercial DBMS (e.g., Two-Phase Locking, Strict Two-Phase Locking).	Used in some specialized or distributed DBMS. Can be less efficient in highly concurrent environments due to frequent rollbacks.
Nature	Pessimistic: Assumes conflicts will happen and tries to prevent them by blocking.	Optimistic/Pessimistic (depends on variant): Can be more optimistic, allowing operations and rolling back if a conflict is detected.

In summary, locking protocols prevent conflicts by blocking access, which can lead to deadlocks, while timestamp protocols resolve conflicts by rolling back transactions, avoiding deadlocks but potentially increasing rollbacks.

4. What do you mean by heuristic based optimization? [5]

Heuristic-based optimization in database query processing refers to the use of rules of thumb, experience, and intuitive approaches to transform a relational algebra query into an equivalent but more efficient execution plan. These rules are not guaranteed to find the absolute optimal plan (as they don't explore all possibilities), but they typically lead to a good enough plan with significantly less computational cost than exhaustive search methods.

Query optimization is the process of choosing the most efficient way to execute a SQL query. The query optimizer typically generates multiple execution plans and selects the one with the lowest estimated cost.

Characteristics of Heuristic-based Optimization:

Rule-based: It relies on a set of predefined rules or transformations that have been empirically found to improve query performance.
Faster Optimization: Because it doesn't explore every possible execution path, the optimization process itself is much faster. This is crucial for interactive systems where query compilation time needs to be minimal.
Sub-optimal Plans: It may not always produce the globally optimal execution plan. However, for most queries, it yields a plan that performs well.
Less Resource Intensive: It doesn't require extensive statistical information about the data (like detailed histograms) that cost-based optimizers do, making it less complex to implement and manage for simpler systems.

Common Heuristic Rules (Transformation Rules):

Perform selection operations as early as possible: Selections reduce the number of rows processed, which minimizes the data passed to subsequent operations.
- Example: Instead of SELECT * FROM Employees WHERE Salary > 50000; then JOIN with Departments, perform the selection first: (σ Salary > 50000 (Employees)) ⋈ Departments.
Perform projection operations as early as possible: Projections reduce the number of columns in intermediate results, saving memory and I/O.
- Example: If you only need EmployeeName and DepartmentName, project these attributes early in the query plan.
Combine common sub-expressions: If parts of a query are repeated, compute them once and reuse the result.
Replace Cartesian products with joins (when possible): Cartesian products are very expensive (multiplying rows). If a join condition exists, it's always better to use a specific join operator.
Push down selections and projections through joins: Apply selections and projections before a join whenever possible to reduce the size of the relations being joined.
Apply operations on smaller relations first: In commutative operations (like joins), process the smaller relation first to minimize intermediate results.
Order join operations strategically: In a multi-way join, the order in which relations are joined significantly impacts performance. Heuristics might try to join relations with restrictive join conditions or those that produce smaller intermediate results first.

Heuristic-based optimization is simpler to implement and provides a quick, reasonably good query plan. Modern DBMS optimizers often combine heuristic rules with cost-based optimization (which uses statistical information to estimate costs for various plans) to achieve better overall performance.

Group-C (Long Answer Type Question) - Answer any three [15 x 3 = 45]

7. (a) An organization purchases items from a number of suppliers. Suppliers are identified by SUP_ID. It keeps track of the number of each item type purchased from each supplier. It also keeps a record of supplier's addresses. Supplied items are identified by ITEM_TYPE and have description (DESC). There may be more than one such address for each supplier and the price charged by supplier for each item type is stored. Identify the entities, attributes and relationships for this organization and construct an E-R diagram. [8+7=15 marks total, but this is part A, likely 8 marks for ER diagram]

Let's break down the requirements to identify entities, attributes, and relationships, then draw the ER diagram.

Entities:

Supplier: An organization from which items are purchased.
- Attributes:
  - SUP_ID (Primary Key)
  - SupplierName (Implicit, good to add for realism)
- Note on Address: "There may be more than one such address for each supplier" implies that Address is a multivalued attribute or needs to be a separate entity if it has its own attributes (e.g., AddressType, City, Zip). For this ER diagram, let's treat it as a separate entity for better modeling, as it implies a 1-to-many relationship with Supplier.
Item: The type of item purchased.
- Attributes:
  - ITEM_TYPE (Primary Key)
  - Description (DESC)
Address: To handle multiple addresses per supplier.
- Attributes:
  - AddressID (Primary Key, unique identifier for each address record)
  - Street
  - City
  - State
  - ZipCode

Relationships:

Supplies: Connects Supplier and Item. This relationship tracks which supplier supplies which item and the price charged for that item type, and the quantity.
- Relationship Type: Many-to-Many (Supplier can supply many Items, and an Item can be supplied by many Suppliers).
- Attributes of Relationship (Ternary or Attributes on M:N):
  - Price (Price charged by this supplier for this item type)
  - QuantityPurchased (Number of each item type purchased from each supplier - this sounds like an attribute of the transaction, not just the "supplies" relationship, but can be associated if we mean the cumulative quantity). Let's assume it's part of the interaction.
Has_Address: Connects Supplier and Address.
- Relationship Type: One-to-Many (Supplier can have multiple Addresses, but an Address belongs to one Supplier).

ER Diagram Sketch (Textual Representation):

Entities:
- SUPPLIER (SUP_ID, SupplierName) - Strong Entity
- ITEM (ITEM_TYPE, Description) - Strong Entity
- ADDRESS (AddressID, Street, City, State, ZipCode) - Strong Entity (or Dependent if part of Composite Key with SUP_ID, but here modelled as strong to allow reuse or independent existence, or simply to clarify it's not just an attribute of SUPPLIER)
Relationships:
- SUPPLIES (Between SUPPLIER and ITEM)
  - Cardinality: M:N (Many-to-Many)
  - Attributes: Price, QuantityPurchased
- HAS_ADDRESS (Between SUPPLIER and ADDRESS)
  - Cardinality: 1:N (One-to-Many, a supplier can have many addresses, but each address record belongs to one supplier)

Simplified ER Diagram (Conceptual):

+-------------+         1           N         +---------+
|   SUPPLIER  |<-----Has_Address----->|  ADDRESS  |
| - SUP_ID (PK)|                       | - AddressID (PK)|
|   SupplierName|                       |   Street      |
+-------------+                       |   City        |
                                       |   State       |
                                       |   ZipCode     |
                                       +---------+
      | N
      |
      |   Supplies (Relationship)
      |   (Attributes: Price, QuantityPurchased)
      |
      M
+-------------+
|    ITEM     |
| - ITEM_TYPE (PK)|
|   Description|
+-------------+

ER Diagram (Standard Notation - visual description):

Draw a rectangle for SUPPLIER. Inside, list attributes SUP_ID (underline for PK) and SupplierName.
Draw a rectangle for ITEM. Inside, list attributes ITEM_TYPE (underline for PK) and Description.
Draw a rectangle for ADDRESS. Inside, list attributes AddressID (underline for PK), Street, City, State, ZipCode.
Draw a diamond for the SUPPLIES relationship between SUPPLIER and ITEM.
- Draw a line from SUPPLIER to SUPPLIES with cardinality 'M' at SUPPLIER's end.
- Draw a line from ITEM to SUPPLIES with cardinality 'N' at ITEM's end.
- Draw ovals for Price and QuantityPurchased attached to the SUPPLIES diamond.
Draw a diamond for the HAS_ADDRESS relationship between SUPPLIER and ADDRESS.
- Draw a line from SUPPLIER to HAS_ADDRESS with cardinality '1' at SUPPLIER's end.
- Draw a line from ADDRESS to HAS_ADDRESS with cardinality 'N' at ADDRESS's end.

This structure allows for a supplier to have multiple addresses, and for multiple items to be supplied by multiple suppliers, with specific prices and quantities for each supplier-item pair.

7. (b) Insert the following entries in the B tree of order 3: 10,20,30,12,15,35,18,38,8,50 [7]

A B-Tree of order 3 (m=3) implies:

Maximum keys per node: m-1 = 2
Maximum children per node: m = 3
Minimum keys per internal node (except root): ceil(m/2)-1 = ceil(3/2)-1 = 2-1 = 1
Minimum children per internal node (except root): ceil(m/2) = 2
Root node minimum children: 2 (if not a leaf)

Let's represent a node as [Key1 | Key2]. When a node is full ([Key1 | Key2]) and a new key comes, it splits. The median key moves up to the parent.

Insert 10: [10]
Insert 20: [10 | 20]
Insert 30: [10 | 20 | 30] (Node is full). Split. Median 20 moves up.
Root: [20]
Children: [10] (left of 20), [30] (right of 20)
Insert 12: Insert into [10] (left child of 20).
[10] becomes [10 | 12]
Insert 15: Insert into [10 | 12] (Node is full). Split. Median 12 moves up to root [20].
Root: [12 | 20]
Children: [10] (left of 12), [15] (between 12 and 20), [30] (right of 20)
Insert 35: Insert into [30] (right child of 20).
[30] becomes [30 | 35]
Insert 18: Insert into [15] (between 12 and 20).
[15] becomes [15 | 18]
Insert 38: Insert into [30 | 35] (Node is full). Split. Median 35 moves up to root [12 | 20].
Root [12 | 20] becomes [12 | 20 | 35] (Node is full).
Children: [10], [15 | 18], [30], [38]
Insert 8: Insert into [10] (left child of 12).
[10] becomes [8 | 10]
Insert 50: Insert into [38] (right child of 35).
[38] becomes [38 | 50]

State of the B-Tree after all insertions (visual representation):

               [ 20 ]
              /      \
            /          \
          [12]         [35]
         /   \         /   \
       [8|10] [15|18] [30] [38|50]

Step-by-step with splits:

Initial: []

Insert 10: [10]
Insert 20: [10, 20]
Insert 30: Node [10, 20, 30] is full. Split. Median 20 moves up.
Root: [20]
Children: [10] (left of 20), [30] (right of 20)
Insert 12: Go to [10].
[10, 12]
Insert 15: Go to [10, 12]. Node [10, 12, 15] is full. Split. Median 12 moves up to root [20].
Root: [12, 20]
Children: [10] (left of 12), [15] (between 12 and 20), [30] (right of 20)
Insert 35: Go to [30].
[30, 35]
Insert 18: Go to [15].
[15, 18]
Insert 38: Go to [30, 35]. Node [30, 35, 38] is full. Split. Median 35 moves up to root [12, 20].
Root [12, 20] becomes [12, 20, 35] (Root is full, needs to split now!)
Leaf children are: [10], [15, 18], [30], [38]
Root Split (triggered by 35 moving up): [12, 20, 35] is full. Median 20 moves up to become the new root.
New Root: [20]
Left child of new root: [12]
Children of [12]: [10] (left of 12), [15, 18] (right of 12)
Right child of new root: [35]
Children of [35]: [30] (left of 35), [38] (right of 35)

Current tree structure:
```
       [ 20 ]
      /      \
    [12]     [35]
   /   \     /   \
[10] [15|18] [30] [38]
```
Insert 8: Go to [10].
[8, 10]
Insert 50: Go to [38].
[38, 50]

Final B-Tree of Order 3:

               [ 20 ]
              /      \
            /          \
          [12]         [35]
         /   \         /   \
       [8|10] [15|18] [30] [38|50]

This is the final structure. Each node has 1 or 2 keys, and internal nodes have 2 or 3 children, satisfying the B-tree of order 3 properties.

8. a) Discuss the ACID properties of transaction. [5]
(This question is a duplicate of 2.a in Group-B. I will provide the same answer here, but if answering in an exam, one would point out the duplication or simply provide the answer once.)

ACID properties are a set of characteristics that guarantee that database transactions are processed reliably. They are crucial for maintaining data integrity and consistency in a DBMS.

Atomicity: An atomic transaction is an indivisible unit of work. It means that either all operations within the transaction are completed successfully (committed), or none of them are (rolled back). There is no partial completion. If any part of the transaction fails, the entire transaction is aborted, and the database is returned to its state before the transaction began.
- Example: In a bank transfer from account A to B, funds are debited from A and credited to B. If the debit succeeds but the credit fails, Atomicity ensures both operations are undone, and the funds are returned to A.
Consistency: A transaction brings the database from one valid and consistent state to another. It ensures that any transaction, when executed, will leave the database in a valid state, respecting all defined rules, constraints, and triggers (e.g., primary keys, foreign keys, unique constraints). If a transaction attempts to violate any integrity rule, it will be rolled back.
- Example: If a bank transfer must maintain the sum of balances, a consistent transaction ensures that after the transfer, the total amount of money across all accounts remains the same.
Isolation: Transactions are executed independently and concurrently without interference from other concurrent transactions. The intermediate state of a transaction is not visible to other transactions. This ensures that the concurrent execution of multiple transactions results in a system state that is equivalent to some serial (sequential) execution of those transactions.
- Example: If two transactions try to read and update the same bank account balance concurrently, Isolation ensures that one transaction doesn't see the partial update of the other, preventing dirty reads or lost updates.
Durability: Once a transaction has been committed, its changes are permanent and will survive any subsequent system failures (e.g., power outages, crashes). The committed data is written to stable storage (like disk) and will not be lost.
- Example: After a bank transfer is committed, even if the system crashes immediately after, the changes to the account balances are permanently recorded and will be reflected when the system recovers.

8. b) Describe the Wait-die and wound-wait protocols for deadlock prevention. [5]

Wait-die and Wound-wait are two different deadlock prevention schemes used in concurrency control, specifically for handling resource requests in a way that avoids deadlocks by rolling back one of the transactions. They are based on the concept of timestamps, where each transaction is assigned a unique timestamp upon its creation (usually its start time). A smaller timestamp indicates an older transaction.

Let Ti be the transaction requesting a resource held by Tj.
TS(Ti) is the timestamp of Ti.
TS(Tj) is the timestamp of Tj.

Wait-Die Scheme (Non-Preemptive):
- Rule: If TS(Ti) < TS(Tj) (Ti is older than Tj), then Ti is allowed to wait for Tj to release the resource.
- If TS(Ti) > TS(Tj) (Ti is younger than Tj), then Ti is rolled back (dies) and restarted later with its original timestamp.
- Logic: An older transaction (Ti) makes a younger transaction (Tj) wait for the resource. A younger transaction (Ti) is never allowed to wait for an older transaction (Tj); instead, it dies to prevent a cycle.
- Analogy: An old person waits for a young person to finish. A young person, if encountering an old person using something, steps aside and comes back later.
- Advantage: Prevents deadlocks. Ensures that only older transactions wait for younger ones, avoiding a circular wait.
- Disadvantage: Can lead to unnecessary rollbacks (especially if a younger transaction might have quickly finished and released the resource) and potentially starvation for younger transactions if they repeatedly request resources held by older ones and keep getting rolled back.
Wound-Wait Scheme (Preemptive):
- Rule: If TS(Ti) < TS(Tj) (Ti is older than Tj), then Ti wounds Tj (i.e., Tj is rolled back and restarts with its original timestamp). After Tj is rolled back, Ti acquires the resource.
- If TS(Ti) > TS(Tj) (Ti is younger than Tj), then Ti waits for Tj to release the resource.
- Logic: An older transaction (Ti) has priority and will preempt (wound) a younger transaction (Tj) holding a needed resource. A younger transaction (Ti) must always wait for an older transaction (Tj).
- Analogy: An old person takes priority and asks a young person to give up the resource. A young person waits for an old person.
- Advantage: Prevents deadlocks. Guarantees no starvation for older transactions (they never wait for younger ones).
- Disadvantage: Can lead to cascading rollbacks if the wounded transaction had already performed visible actions. A transaction might be rolled back even if it could have finished soon.

Comparison Summary:

Feature	Wait-Die	Wound-Wait
Requesting Tx (Ti) is Older	Ti Waits	Ti Wounds (rolls back) Tj
Requesting Tx (Ti) is Younger	Ti Dies (rolls back)	Ti Waits
Rollback Trigger	Younger transaction requesting a resource from older	Older transaction requesting a resource from younger
Starvation	Younger transactions might starve (repeated rollbacks)	Older transactions are guaranteed to proceed (no starvation for older)
Preemption	Non-preemptive	Preemptive

Both schemes ensure deadlock freedom. Wait-die generally leads to fewer rollbacks than wound-wait on average, but Wound-wait ensures that older transactions eventually finish.

8. c) Describe B tree with example. [5]

A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. It is optimized for systems that read and write large blocks of data, making it ideal for disk-based storage in database management systems and file systems.

Key Properties of a B-Tree of Order m (where m is the maximum number of children a node can have):

All leaf nodes are at the same level (balanced): This ensures consistent search times for any record.
Each internal node (except the root) has at least ceil(m/2) children.
The root node has at least 2 children (unless it's a leaf node itself, i.e., the tree contains only one node).
A non-leaf node with k keys has k+1 children.
All keys within a node are kept in sorted order.
Keys in a node act as separation values:
- If a node has keys K1, K2, ..., Kk, and k+1 child pointers P0, P1, ..., Pk:
- P0 points to a subtree containing keys less than K1.
- Pi points to a subtree containing keys greater than Ki and less than Ki+1 (for 0 < i < k).
- Pk points to a subtree containing keys greater than Kk.
Nodes typically store both keys and pointers to the actual data records.

Example of a B-Tree (Order 3, same as 7.b):

Let's insert the keys: 10, 20, 30, 12, 15, 35, 18, 38, 8, 50 into a B-tree of order 3.
(This example is identical to question 7.b, so the resulting tree will be the same.)

Order m=3 implies:

Max keys per node: m-1 = 2
Max children per node: m = 3
Min keys per internal node (except root): ceil(3/2)-1 = 2-1 = 1
Min children per internal node (except root): ceil(3/2) = 2

Insertion Steps (summarized from 7.b for brevity):

[10]
[10, 20]
[10, 20, 30] splits. 20 moves up. Root: [20] Children: [10], [30]
12 inserted into [10] -> [10, 12]
15 inserted into [10, 12] splits. 12 moves up to [20]. Root: [12, 20] Children: [10], [15], [30]
35 inserted into [30] -> [30, 35]
18 inserted into [15] -> [15, 18]
38 inserted into [30, 35] splits. 35 moves up to [12, 20]. Root [12, 20] becomes [12, 20, 35] (full). Root splits. 20 moves up to be new root. New Root: [20] Left Child: [12] (with children [10] and [15, 18]) Right Child: [35] (with children [30] and [38])
8 inserted into [10] -> [8, 10]
50 inserted into [38] -> [38, 50]

Resulting B-Tree (Order 3):

               [ 20 ]
              /      \
            /          \
          [12]         [35]
         /   \         /   \
       [8|10] [15|18] [30] [38|50]

The root [20] has two keys, and two children ([12] and [35]).
The internal nodes [12] and [35] each have one key, and two children, satisfying the minimum requirement.
All leaf nodes ([8|10], [15|18], [30], [38|50]) are at the same level.
Each node has keys sorted.
The keys guide the search: e.g., to find 15, start at [20], 15 < 20, go left to [12]. 15 > 12, so go right to [15|18]. Found.

B-trees are highly efficient for disk-based databases because they minimize the number of disk accesses required for search, insertion, and deletion operations by keeping the tree shallow and wide.

9. a) Why is the optimization of a query needed? Write the different steps in query optimization. What do you mean by heuristic based optimization? [7+3+5 = 15 marks total, but this is part A. It's listed as [7] for this part, implying 7 for the entire 9.a question. Let's assume a split like 3 for need, 2 for steps, 2 for heuristic for the 7 marks.

Why is the optimization of a query needed? [3 marks]

Query optimization is a critical component of any database management system for several reasons:

Performance: The primary goal is to find the most efficient execution plan for a given SQL query, minimizing the time it takes to retrieve the results. A poorly optimized query can take orders of magnitude longer to execute than an optimized one, severely impacting application responsiveness and user experience.
Resource Utilization: Optimization aims to reduce the consumption of system resources such as CPU cycles, memory, and disk I/O. Efficient resource usage allows the database to handle more concurrent users and queries, improving overall system throughput.
Scalability: As data volumes grow and user loads increase, optimized queries are essential for maintaining performance and ensuring that the database system can scale effectively without becoming a bottleneck.
Transparency to Users: Users write queries in a declarative language (SQL), specifying what data they want, not how to retrieve it. The query optimizer's job is to translate this declarative request into an efficient procedural execution plan, abstracting away the complexities of data storage and access methods from the user.
Cost Reduction: In cloud environments, where resources are often billed based on usage (CPU, I/O), query optimization directly translates to cost savings by reducing the compute and storage resources required.

Without optimization, even simple queries on large datasets could become prohibitively slow, making the database unusable for practical applications.

Write the different steps in query optimization. [2 marks]

The query optimization process typically involves several phases:

Parsing and Translation: The SQL query is parsed into a syntax tree and then translated into an internal representation, often a relational algebra expression.
Canonical Form Generation: The relational algebra expression is transformed into a canonical (standard) form. This involves applying a set of equivalence rules to simplify the expression and eliminate redundant operations.
Logical Query Plan Generation (Heuristic Optimization): Heuristic rules are applied to the relational algebra expression to generate equivalent, but potentially more efficient, logical query plans. This phase aims to reorder operations (e.g., selections and projections performed early) and combine operations.
Physical Query Plan Generation (Cost-based Optimization): For each logical plan, the optimizer considers various physical implementation strategies (e.g., specific join algorithms like nested-loop, hash-join, merge-join; access paths like index scan, table scan). It uses statistics about the data (e.g., number of rows, index availability, data distribution) to estimate the cost (CPU, I/O, network) of each physical plan and selects the one with the lowest estimated cost.
Code Generation and Execution: The chosen physical plan is translated into executable code that the database execution engine can run.

What do you mean by heuristic based optimization? [2 marks]
(This question is a duplicate of 4.a in Group-B. I will provide the same answer here, but if answering in an exam, one would point out the duplication or simply provide the answer once.)

Characteristics:

Rule-based: It relies on a set of predefined rules or transformations (e.g., "perform selection operations as early as possible").
Faster Optimization: It doesn't explore every possible execution path, making the optimization process quicker.
Sub-optimal Plans: It may not always produce the globally optimal execution plan, but generally yields good performance.
Less Resource Intensive: It doesn't require extensive statistical information about the data, making it less complex to implement compared to cost-based optimizers.

9. b) What is meant by Granularity of Locking ? [4]

Granularity of locking refers to the size of the data item (or resource) that is being locked by a transaction. It determines the extent of data that is inaccessible to other concurrent transactions when a lock is held. In a database, data can be locked at various levels, from very fine-grained to very coarse-grained.

Levels of Granularity (from coarsest to finest):

Database Level: Locking the entire database. Only one transaction can access the database at a time. This provides high consistency but severely limits concurrency. (e.g., for schema changes or maintenance).
File/Table Level: Locking an entire file or a table. All records within that file/table are locked. High concurrency is prevented for that specific table.
Block/Page Level: Locking an entire disk block or page. This allows concurrent access to different blocks/pages within the same table. This is a common compromise.
Record/Tuple Level: Locking individual records (rows) within a table. This allows maximum concurrency, as different transactions can access different records in the same table concurrently.
Field/Attribute Level: Locking individual fields (columns) within a record. This provides the highest level of concurrency but is very complex to implement and manage due to the overhead of many small locks.

Trade-offs of Granularity:

Coarse Granularity (e.g., Table Level):
- Pros: Lower locking overhead (fewer locks to manage), simpler to implement.
- Cons: Reduces concurrency significantly (even if transactions need different parts of the table, they might be blocked), leading to lower throughput.
Fine Granularity (e.g., Record Level):
- Pros: Maximizes concurrency (more transactions can execute in parallel), leading to higher throughput.
- Cons: Higher locking overhead (more locks to manage, more storage for lock tables), more complex to implement (e.g., issues like "phantom reads" can arise).

DBMS typically use a multiple granularity locking protocol that allows transactions to lock at different levels, often with an intention lock mechanism. An intention lock (e.g., Intention Shared, Intention Exclusive) on a higher-level node (like a table) indicates that a lock will be placed on a lower-level node (like a record), preventing other transactions from acquiring conflicting coarse-grained locks on the higher level. This provides flexibility and a good balance between concurrency and overhead.

9. c) Briefly discuss different deadlock avoidance technique in concurrent transactions. [4]

Deadlock avoidance techniques ensure that a system will never enter a deadlocked state by checking the resource allocation state before allowing a transaction to proceed. Unlike deadlock prevention, which imposes rigid rules, avoidance techniques dynamically evaluate each request.

The core idea is to always ensure that there is at least one safe sequence of transactions (a sequence in which all transactions can complete without a deadlock).

Two common deadlock avoidance techniques based on timestamps are:

Wait-Die Scheme:
- Concept: A transaction Ti requests a resource held by Tj.
- If TS(Ti) < TS(Tj) (Ti is older), Ti is allowed to wait for Tj.
- If TS(Ti) > TS(Tj) (Ti is younger), Ti is immediately rolled back (dies) and restarted later with its original timestamp.
- Logic: Older transactions are allowed to wait for younger ones, but younger transactions must die if they request a resource held by an older transaction. This prevents a "circular wait" condition where a younger transaction waits for an older one, which could eventually lead to a cycle.
Wound-Wait Scheme:
- Concept: A transaction Ti requests a resource held by Tj.
- If TS(Ti) < TS(Tj) (Ti is older), Ti wounds (rolls back) Tj. Tj is restarted later with its original timestamp. Ti then acquires the resource.
- If TS(Ti) > TS(Tj) (Ti is younger), Ti waits for Tj to release the resource.
- Logic: An older transaction always takes precedence; it can preempt (wound) a younger transaction. A younger transaction always waits for an older one. This ensures that older transactions eventually complete without being blocked indefinitely by younger ones.

General Characteristics of Deadlock Avoidance:

Requires A Priori Information: These techniques often require information about future resource requests (or at least knowledge of which transactions are currently holding/requesting resources) to make decisions. Timestamp-based methods simplify this by using the timestamp as the decision criterion.
No Deadlocks: By definition, a deadlocked state is never reached.
Potential for Rollbacks: Both wait-die and wound-wait achieve avoidance by rolling back transactions, which can be costly.
Overhead: There is computational overhead associated with checking and potentially rolling back transactions.

Compared to deadlock prevention (which uses strict rules like no preemption or resource ordering) and deadlock detection (which allows deadlocks to occur and then identifies and resolves them), avoidance techniques attempt to prevent deadlocks dynamically while allowing more concurrency than strict prevention rules.

10. Write short notes (any three) [5+5+5=15]

(i) Fifth Normal Form (5NF)

Definition: Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is the highest level of database normalization. A relation is in 5NF if it is in 4NF and if every join dependency in it is implied by the candidate keys. It aims to eliminate redundancy that arises from complex relationships that cannot be captured by functional or multi-valued dependencies alone.
Purpose: 5NF is designed to handle relations where there are multiple independent many-to-many relationships, which might cause "join dependencies" that can lead to redundancy if not decomposed. It ensures that decomposition is done only if it is truly lossless and necessary, meaning no information is lost and no spurious tuples are generated when the decomposed tables are joined back.
Example (Classic): Consider a relation Supplier_Part_Project (Supplier, Part, Project).
- A supplier supplies certain parts.
- A part is used in certain projects.
- A supplier is involved in certain projects.
- If the rule is: "A supplier S supplies part P for project J if and only if S supplies P, P is used in J, and S is involved in J," then this suggests a join dependency * ( {Supplier, Part}, {Part, Project}, {Supplier, Project} ).
- If this is not in 5NF, you might have redundancy where one entry implies the existence of others. To achieve 5NF, Supplier_Part_Project would be decomposed into these three binary relations: Supplies(Supplier, Part), UsedIn(Part, Project), InvolvedIn(Supplier, Project). Joining these three relations should reconstruct the original table without spurious tuples.
Practicality: 5NF is rarely applied in practice. The redundancy it addresses is highly specific and often arises from very complex real-world constraints that are difficult to model. Achieving 5NF usually results in a large number of tables, increasing join complexity and potentially impacting query performance. Most practical databases are normalized to 3NF or BCNF.

(ii) View

Definition: A view (also known as a virtual table) is a logical table based on the result set of a SQL query. It does not store data itself but rather the definition of how to retrieve the data. When a view is queried, the DBMS executes the underlying query and presents the result as if it were a physical table.
Purpose/Advantages:
1. Security: Views can restrict access to specific rows and/or columns of a table, providing a granular level of security without granting direct access to the base tables.
2. Simplification: Complex queries involving joins, aggregations, or subqueries can be encapsulated within a view, simplifying future data access for users and applications.
3. Data Independence: Views provide a layer of abstraction between the application and the underlying physical structure of the tables. If the base table schema changes (e.g., a column name changes), the view definition can be updated without affecting applications that query the view.
4. Consistency: Views can present a consistent, standardized representation of data, even if the underlying tables are complex or frequently modified.
5. Business Logic: Views can embed business rules and calculations, ensuring that data is consistently presented and derived.
Types:
- Simple View: Based on a single table, without aggregate functions or GROUP BY clauses. Often updatable.
- Complex View: Based on multiple tables, contains aggregate functions, GROUP BY, or DISTINCT. Generally not directly updatable.
Example: Consider Employees (EmpID, Name, Salary, DeptID) and Departments (DeptID, DeptName). CREATE VIEW EmployeeDetails AS SELECT E.EmpID, E.Name, E.Salary, D.DeptName FROM Employees E JOIN Departments D ON E.DeptID = D.DeptID; Now, users can SELECT * FROM EmployeeDetails; to get combined employee and department information without needing to write the join themselves.

(iii) Trigger in database

Definition: A trigger is a special type of stored procedure that is automatically executed (fired) when a specific data modification event occurs in a database table or view. These events can be INSERT, UPDATE, or DELETE operations. Triggers can be defined to fire BEFORE or AFTER the event.
Purpose/Advantages:
1. Enforce Complex Business Rules: Triggers can enforce complex integrity constraints that cannot be expressed easily (or at all) using standard declarative constraints (like primary keys, foreign keys, or check constraints).
2. Audit Logging: Automatically record changes made to data, useful for auditing, tracking history, or maintaining data integrity.
3. Data Synchronization: Automatically update or synchronize data in other tables based on changes in the current table (e.g., maintaining aggregate summary tables).
4. Automated Actions: Perform automatic actions in response to data modifications, such as sending notifications or adjusting related data.
5. Derived Data Management: Automatically compute and update derived data in other tables when source data changes.
Types:
- Row-Level Trigger: Fires once for each row affected by the triggering DML statement. (e.g., FOR EACH ROW).
- Statement-Level Trigger: Fires once per DML statement, regardless of the number of rows affected. (e.g., FOR EACH STATEMENT).

Example (Oracle PL/SQL syntax):
Suppose you want to maintain a log of salary changes for employees.

CREATE TABLE Salary_Audit_Log (
    LogID NUMBER GENERATED ALWAYS AS IDENTITY,
    EmpID NUMBER,
    Old_Salary NUMBER,
    New_Salary NUMBER,
    Change_Date TIMESTAMP DEFAULT SYSTIMESTAMP
);

CREATE OR REPLACE TRIGGER trg_salary_audit
AFTER UPDATE OF Salary ON Employees
FOR EACH ROW
WHEN (OLD.Salary <> NEW.Salary)
BEGIN
    INSERT INTO Salary_Audit_Log (EmpID, Old_Salary, New_Salary)
    VALUES (:OLD.EmpID, :OLD.Salary, :NEW.Salary);
END;
/

This trigger will automatically insert a row into Salary_Audit_Log every time an employee's salary is updated and the new salary is different from the old one.

(iv) Database recovery

Definition: Database recovery refers to the process of restoring a database to a correct and consistent state after a system failure (e.g., hardware failure, software error, power outage, media failure, transaction failure). The goal is to ensure data durability and consistency, meaning that all committed transactions are reflected in the database and incomplete transactions are undone.
Key Concepts:
1. Transaction Log (Journal): A fundamental component of recovery. It records all changes made to the database, including undo (previous state) and redo (new state) information for each operation, as well as transaction start/commit/abort records.
2. Checkpoints: A mechanism where the DBMS periodically halts all activities, forces all modified buffer blocks (dirty pages) to disk, and records a checkpoint record in the log. Checkpoints reduce the amount of log scanning needed during recovery.
3. Backup: A copy of the entire database at a specific point in time. Used primarily for recovering from media failures.
Recovery Procedures:
- Deferred Update (NO-UNDO/REDO): Changes are written to the database only after a transaction commits. If a crash occurs before commit, no undo is needed. Redo is needed for committed transactions.
- Immediate Update (UNDO/REDO): Changes are written to the database as they occur. If a crash occurs, committed transactions need redo, and uncommitted transactions need undo.
Failure Types and Recovery Actions:
1. Transaction Failure: A transaction aborts or is rolled back (e.g., due to logical error, deadlock, constraint violation). Recovery involves undoing the changes made by that specific transaction using the log.
2. System Crash (Soft Crash): A hardware or software error causes the system to stop without damaging disk contents. Recovery involves:
  - Redoing changes of committed transactions (those committed after the last checkpoint).
  - Undoing changes of uncommitted transactions (those active at the time of crash).
3. Media Failure (Hard Crash): A non-recoverable error (e.g., disk head crash) damages part of the disk. Recovery involves:
  - Restoring the database from the most recent backup.
  - Redoing all committed transactions from the log that occurred after the backup was taken.
Importance: Database recovery ensures business continuity and data integrity, as data is one of the most valuable assets of an organization.

11. (A) What is the difference between vertical and horizontal fragmentation. [5+5+5 = 15 marks total, but this is part A, likely 5 marks]

Fragmentation is the process of breaking a database relation into smaller, more manageable pieces (fragments) that can be distributed across different sites in a distributed database system. This is done to improve performance, availability, and parallelism.

Feature	Vertical Fragmentation	Horizontal Fragmentation
How data is divided	Divides a table's columns (attributes) into groups. Each fragment contains a subset of the original table's columns and all original rows.	Divides a table's rows (tuples) into groups. Each fragment contains a subset of the original table's rows and all original columns.
Basis of division	Based on functional dependencies or access patterns, grouping frequently accessed columns together.	Based on predicates (conditions) on one or more attributes of the table.
Reconstruction	The original table can be reconstructed by performing a natural join operation on the fragments (using the primary key, which must be replicated in all fragments).	The original table can be reconstructed by performing a union operation on the fragments.
Primary Key Handling	The primary key (or a unique identifier) must be included in every fragment to allow for joining.	The primary key is included in each fragment, as each fragment is just a subset of rows.
Typical Use Case	When different applications or user groups need access to different subsets of columns from the same table (e.g., one department needs employee's personal info, another needs their professional info).	When different applications or user groups primarily deal with different subsets of rows (e.g., sales data for different regions, historical data vs. current data).
Example	`Employee (EmpID, Name, Salary, DeptID, Address)` could be vertically fragmented into: `EmpPersonal (EmpID, Name, Address)` `EmpWork (EmpID, Salary, DeptID)`	`Sales (SaleID, Region, Amount, Date)` could be horizontally fragmented into: `Sales_North (WHERE Region = 'North')` `Sales_South (WHERE Region = 'South')`
Overhead	Requires joins for reconstruction.	Requires unions for reconstruction.
Data Redundancy	Primary key is duplicated across fragments.	No duplication of rows (if fragmentation is disjoint).

Both fragmentation types can be combined (mixed fragmentation) to create more complex distributed database designs. The choice of fragmentation strategy depends on the specific query patterns, transaction workloads, and distribution requirements of the application.

11. (B) Write short notes on Distributed database management system. [5]

A Distributed Database Management System (DDBMS) is a system that allows a single logical database to be spread physically across multiple interconnected computer systems (nodes) located in different geographical locations, often over a network. Users interact with the DDBMS as if it were a single, centralized database, without knowing where the data is actually stored.

Key Characteristics:

Data Distribution: Data is fragmented and stored across multiple sites. This can be horizontal (rows), vertical (columns), or a combination.
Transparency: A well-designed DDBMS provides various forms of transparency to users:
- Location Transparency: Users don't need to know where the data is physically stored.
- Fragmentation Transparency: Users don't need to know that the data is fragmented.
- Replication Transparency: Users don't need to know if data is replicated (copies of data at multiple sites for availability/performance).
- Concurrency Transparency: Concurrent transactions are managed to ensure data integrity and isolation.
- Failure Transparency: The system recovers from failures without user intervention, masking the failure's impact.
Distributed Transaction Management: Handles transactions that span multiple sites, ensuring atomicity, consistency, isolation, and durability across the distributed environment (e.g., using Two-Phase Commit protocol).
Distributed Query Processing: Queries may involve data from multiple sites. The DDBMS must optimize query execution by minimizing data transfer across the network.
Replication: Data can be duplicated across multiple sites to improve availability and read performance, but introduces challenges in maintaining consistency during updates.

Advantages of DDBMS:

Improved Availability: If one site fails, other sites can continue operating.
Increased Reliability: Data replication ensures data is not lost if a single site fails.
Enhanced Performance: Queries can be processed in parallel across multiple sites, and data can be placed closer to the users who access it most frequently (data locality).
Scalability and Modularity: The system can be scaled by adding new sites or easily adapted to organizational structure changes.
Reduced Network Traffic: By keeping data close to its primary users, less data needs to be transferred across the network for local operations.

Disadvantages of DDBMS:

Increased Complexity: Design, implementation, and management are significantly more complex than centralized systems.
Cost: Higher development, deployment, and maintenance costs.
Security Concerns: Managing security across distributed sites can be challenging.
Concurrency Control and Deadlock Management: More complex due to distributed nature.

Examples of DDBMS include Oracle RAC, Apache Cassandra, MongoDB (for sharding/replication), and various specialized distributed database systems.

11. (C) Write short notes on Web based database management system. [5]

A Web-based Database Management System (WDBMS) refers to a database system that is accessed and managed primarily through a web browser interface, leveraging web technologies (like HTTP, HTML, CSS, JavaScript) to interact with the underlying database. It essentially means the database's functionalities, data access, and often administration, are exposed via a web application.

Key Characteristics and Architecture:

Client-Server Architecture (typically 3-tier or N-tier):
- Web Browser (Client): The user interface (HTML, CSS, JavaScript) runs in a standard web browser, eliminating the need for special client software installation.
- Web Server: Handles HTTP requests from the browser, serves web pages, and acts as an intermediary. (e.g., Apache, Nginx, IIS).
- Application Server (Middleware): Contains the business logic and connects the web server to the database. It processes requests, executes code (e.g., PHP, Python/Django, Java/Spring, Node.js/Express), and interacts with the database.
- Database Server: The actual DBMS (e.g., MySQL, PostgreSQL, Oracle, SQL Server) where the data is stored and managed.
Standard Web Protocols: Communication between the browser, web server, and application server typically uses standard web protocols like HTTP/HTTPS.
Platform Independence: Since access is via a web browser, the system is largely independent of the client's operating system or hardware.
Accessibility: Data and functionalities are accessible from anywhere with an internet connection, promoting remote work and collaboration.
Scalability: Web applications can be scaled by adding more web/application servers or using load balancers to distribute traffic.

Advantages:

Ubiquitous Access: Accessible from any device with a web browser and internet connection.
No Client-Side Installation: Reduces deployment and maintenance overhead for client applications.
Ease of Deployment and Updates: Updates to the application logic are deployed centrally on the server, immediately benefiting all users.
Lower Total Cost of Ownership (TCO): Reduced client software costs and simplified administration.
Standardized Interface: Uses familiar web browser interface.

Disadvantages:

Security Concerns: Exposing the database over the web introduces security vulnerabilities (e.g., SQL Injection, XSS), requiring robust security measures.
Performance Overhead: Web protocols (HTTP) can be chatty, and latency over the internet can impact performance compared to desktop applications.
Browser Compatibility: Ensuring consistent functionality and appearance across different web browsers can be challenging.
Limited Richness of UI: Historically, web interfaces were less rich than desktop applications, though modern web technologies have largely overcome this.

Examples: Many online services (e.g., e-commerce sites, online banking, social media platforms) are essentially web-based database management systems. Content Management Systems (CMS) like WordPress and Joomla also operate on this principle.

DEV Community

DBMS 2024

Group-A (Very Short Answer Type Question) - [1 x 10 = 10]

Group-B (Short Answer Type Question) - Answer any three [5 x 3 = 15]

Group-C (Long Answer Type Question) - Answer any three [15 x 3 = 45]

Top comments (0)