Physical Database Design
This assignment is ungraded and should not be submitted. The questions review key concepts from the reading for this module.
- Expected duration
- 1.5–2 hours
- Deadline
- None (no explicit submission is required)
- Points
- None
Help Policy
- Authorized Resources
- Any
- Notes
- You may jointly work on this assignment with classmates
Assignment
Answer the following questions from Fundamentals of Database Management Systems (solutions):
-
Describe the idea of the simple linear index. What are its shortcomings? (Question 8.6)
A “simple linear index” (more traditionally referred to as an inverted index) maps values to the location of records. Given a specific value, the inverted index provides direct access to the record in a separate file.
A simple linear index has too major shortcomings: it cannot be stored in memory when the data is of significant size (e.g., millions of entries), and it cannot be updated easily when values are created, updated, or deleted.
-
Describe the idea of the B+-tree index. What are its advantages over the simple linear index? (Question 8.8)
A B+ tree is a form of a multilevel index that takes the form of a balanced tree were each path from the root of the tree to a leaf has the same length. Unlike a simple linear index, the root of a B+ tree can always be kept in memory (because it is typically the size of a memory page) and adding or removing values requires—at worst—modifying all nodes between the leaf and the root of the tree (\(O(\log n)\) complexity where \(n\) is the total number of records indexed).
- Answer the following general questions about indexes (Question 8.11):
-
Can an index be built over a non-unique field?
Yes, an index can be built over non-unique fields.
-
Can an index be built over a field if the file is not stored in sequence by that field?
Yes. Most indexes are built over fields that do not correspond to the physical storage of records.
-
Can an index be built over a combination of fields as well as over a single field?
Yes, although the index may not be able to be used for queries that do not specify constraints on all the fields. For example, an index over two fields, city and state, may not be able to be used to improve performance for a query that specifies only a state.
-
Is there a limit to the number of indexes that can be built for a file?
Theoretically, yes: The number of indexes is limited to all combinations of the fields (otherwise there would be duplicates).
-
How is an index affected when a change is made to a file? Does every change to a file affect every one of its indexes?
If the change modifies an indexed field, then the index (or indexes that include that field) must be updated. Changes to fields that are not indexed do not affect an index.
-
Can an index be used to achieve sequential access? Explain.
Yes, simply scan the index sequentially.
-
-
What is the difference between horizontal and vertical partitioning? What is their common advantage? Are their disadvantages the same or different? Explain. (Question 8.29)
Horizontal partitioning (or horizontal sharding) divides a table by its rows and stores each group of rows separately whereas vertical partitioning (or vertical sharding) divides a table by its columns and stores each group of columns separately. Partitioning may improve security and performance although it also increases complexity. Vertical partitioning has the disadvantage of joining each partition when the complete record is required.
-
What is denormalization? Denormalization, while improving performance under certain circumstances, also leads to a serious problem. How does denormalization improve performance and what is this major drawback? (Question 8.34)
Denormalization is an intentional violation of the various normal forms in an effort to improve performance, typically by reducing the number of joins required. The downside is significant data redundancy, increasing storage costs and requiring additional controls to ensure that data is consistent.
-
Consider the B+-tree index, that follows (Exercise 8.3):
-
A record has just been added to Cylinder 6, causing a cylinder split. The highest key value on Cylinder 6 is now 2156; the highest key value on Cylinder 20, and the empty reserve cylinder that received half of Cylinder 6’s records, is now 2348. Update the tree index accordingly.
-
A record has just been added to Cylinder 10, causing a cylinder split. The highest key value on Cylinder 10 is now 3780; the highest key value on Cylinder 25, and the empty reserve cylinder that received half of Cylinder 10’s records, is now 3900. Update the tree index accordingly. (Note: this question is intended to be independent of the prior question. Start each part from the figure shown.)
-
- A hashed file has space for 70 records. Relative record numbers of 0–69
label each of the 70 record positions. In addition, there is space for
several overflow (synonym) records. Draw a picture of the file and, using
the division-remainder method, store records with each of the following four
digit keys, taking collisions into account as necessary (Exercise 8.4):
- 4000
- 5207
- 0360
- 1410
Hash Index Record Key Synonym Pointer … … 10 4000 70 27 5207 … … 70 360 71 71 1410 … … -
Consider the Super Baseball League Player file shown below. It lists all of the players in the league by their unique identification number, their name, age, the year they joined the league, and the team on which they are currently playing.
Player Number Player Name Age First Year Team Number 1538 Fred Williams 23 2003 12 1882 Tom Parker 29 2000 35 2071 Juan Gomez 33 1990 12 2364 Steve Smith 24 2002 20 2757 Tim Jones 37 1988 18 3186 Dave Lester 29 1998 18 3200 Rod Smith 25 2002 20 3834 Chico Lopez 24 2003 12 4950 Chris Vernon 26 2003 15 5296 Barry Morton 30 1995 35 - Construct a B+-tree index of the type shown in this chapter for the
Player file, assuming that there are now many more records than are
shown above. The file and the index have the following characteristics
(Minicase 8.2.b):
- The file is stored on eight cylinders of the disk. The highest key
values on the eight cylinders, in order, are:
- 1427
- 1965
- 2848
- 3721
- 4508
- 5396
- 6530
- 7442
- Each index record has order 4—i.e., is has at most 4 children.
- There are three index records at the lowest level of the tree index.
- The file is stored on eight cylinders of the disk. The highest key
values on the eight cylinders, in order, are:
-
The same as the prior part, but now there are four index records at the lowest level of the tree index. (Minicase 8.2.c)
-
The same as the prior part, but each index record has order 3 and there are four index records at the lowest level of the tree index. (Minicase 8.2.d)
- Construct a B+-tree index of the type shown in this chapter for the
Player file, assuming that there are now many more records than are
shown above. The file and the index have the following characteristics
(Minicase 8.2.b):