Distributed Databases

This assignment is ungraded and should not be submitted. The questions review key concepts from the reading for this module.

Expected duration: 15–30 minutes
Deadline: None (no explicit submission is required)
Points: None

Help Policy

Authorized Resources: Any
Notes: You may jointly work on this assignment with classmates

Assignment

Answer the following questions from Fundamentals of Database Management Systems (solutions):

What is a distributed database? What is a distributed database management system? (Question 12.6)

A distributed database spreads data across multiple servers (including those in disparate geographic locations), but the data comprises a single logical database.

A distributed database management system (DBMS) manages a distributed database, particularly providing location transparency – i.e., the abstraction of the physical location of the data – for clients.
Describe the two-phase commit approach to updating replicated data. (Question 12.11)

The two-phase commit protocol has a “prepare” phase where each replica must elect to proceed with the commit and the “commit” phase where the transition is committed if all replicas elected to proceed with the commit. (Conversely, the transaction is aborted if any replica aborts the transaction.)
Australian Boomerang, Ltd. wants to design a distributed relational database. The company is headquartered in Perth and has major operations in Sydney, Melbourne, and Darwin. The database involved consists of five tables, labeled A, B, C, D, and E, with the following characteristics:
- Table A consists of 500,000 records and is heavily used in Perth and Sydney.
- Table B consists of 100,000 records and is frequently required in all four cities.
- Table C consists of 800 records and is frequently required in all four cities.
- Table D consists of 75,000 records. Records 1-30,000 are most frequently used in Sydney. Records 30,001–75,000 are most frequently used in Melbourne.
- Table E consists of 20,000 records and is almost exclusively in Perth.
Design a distributed relational database for Australian Boomerang. Justify your placement, replication, and partitioning of the tables. (Exercise 12.1)
Based on the information provided, the tables would be located as follows:
- Table A is replicated at Perth and Sydney because both locations use it frequently. Replication also increases its availability and minimizes communication costs.
- If the data in Table B is not volatile (i.e., the frequency of updates is low), it is replicated at Perth, Sydney, Melbourne, and Darwin because it is used frequently in each location. If the data is volatile, it may be replicated to two cities for availability, but the expected increase in availability must be weighed against the cost of synchronous (or asynchronous) updates.
- Table C is replicated in two cities for availability. Given the small number of records, the data can be distributed to other sites when required for distributed joins.
- Table D is horizontally partitioned in Sydney, and Melbourne. Records 1–30000 are stored in Sydney and records 30001–75000 in Melbourne.
- Table E is stored in Perth because it is rarely used elsewhere.