This assignment is ungraded and should not be submitted. The questions review key concepts from the reading for this module.

Expected duration
1.5–2 hours
Deadline
None (no explicit submission is required)
Points
None

Help Policy

Authorized Resources
Any
Notes
You may jointly work on this assignment with classmates

Assignment

Answer the following questions from Fundamentals of Database Management Systems (solutions):

  1. Describe the idea of the simple linear index. What are its shortcomings? (Question 8.6)

    A “simple linear index” (more traditionally referred to as an inverted index) maps values to the location of records. Given a specific value, the inverted index provides direct access to the record in a separate file.

    A simple linear index has too major shortcomings: it cannot be stored in memory when the data is of significant size (e.g., millions of entries), and it cannot be updated easily when values are created, updated, or deleted.

  2. Describe the idea of the B+-tree index. What are its advantages over the simple linear index? (Question 8.8)

    A B+ tree is a form of a multilevel index that takes the form of a balanced tree were each path from the root of the tree to a leaf has the same length. Unlike a simple linear index, the root of a B+ tree can always be kept in memory (because it is typically the size of a memory page) and adding or removing values requires—at worst—modifying all nodes between the leaf and the root of the tree (\(O(\log n)\) complexity where \(n\) is the total number of records indexed).

  3. Answer the following general questions about indexes (Question 8.11):
    1. Can an index be built over a non-unique field?

      Yes, an index can be built over non-unique fields.

    2. Can an index be built over a field if the file is not stored in sequence by that field?

      Yes. Most indexes are built over fields that do not correspond to the physical storage of records.

    3. Can an index be built over a combination of fields as well as over a single field?

      Yes, although the index may not be able to be used for queries that do not specify constraints on all the fields. For example, an index over two fields, city and state, may not be able to be used to improve performance for a query that specifies only a state.

    4. Is there a limit to the number of indexes that can be built for a file?

      Theoretically, yes: The number of indexes is limited to all combinations of the fields (otherwise there would be duplicates).

    5. How is an index affected when a change is made to a file? Does every change to a file affect every one of its indexes?

      If the change modifies an indexed field, then the index (or indexes that include that field) must be updated. Changes to fields that are not indexed do not affect an index.

    6. Can an index be used to achieve sequential access? Explain.

      Yes, simply scan the index sequentially.

  4. What is the difference between horizontal and vertical partitioning? What is their common advantage? Are their disadvantages the same or different? Explain. (Question 8.29)

    Horizontal partitioning (or horizontal sharding) divides a table by its rows and stores each group of rows separately whereas vertical partitioning (or vertical sharding) divides a table by its columns and stores each group of columns separately. Partitioning may improve security and performance although it also increases complexity. Vertical partitioning has the disadvantage of joining each partition when the complete record is required.

  5. What is denormalization? Denormalization, while improving performance under certain circumstances, also leads to a serious problem. How does denormalization improve performance and what is this major drawback? (Question 8.34)

    Denormalization is an intentional violation of the various normal forms in an effort to improve performance, typically by reducing the number of joins required. The downside is significant data redundancy, increasing storage costs and requiring additional controls to ensure that data is consistent.

  6. Consider the B+-tree index, that follows (Exercise 8.3):

    B+-tree index

    1. A record has just been added to Cylinder 6, causing a cylinder split. The highest key value on Cylinder 6 is now 2156; the highest key value on Cylinder 20, and the empty reserve cylinder that received half of Cylinder 6’s records, is now 2348. Update the tree index accordingly.

      B+-tree index

    2. A record has just been added to Cylinder 10, causing a cylinder split. The highest key value on Cylinder 10 is now 3780; the highest key value on Cylinder 25, and the empty reserve cylinder that received half of Cylinder 10’s records, is now 3900. Update the tree index accordingly. (Note: this question is intended to be independent of the prior question. Start each part from the figure shown.)

      B+-tree index

  7. A hashed file has space for 70 records. Relative record numbers of 0–69 label each of the 70 record positions. In addition, there is space for several overflow (synonym) records. Draw a picture of the file and, using the division-remainder method, store records with each of the following four digit keys, taking collisions into account as necessary (Exercise 8.4):
    1. 4000
    2. 5207
    3. 0360
    4. 1410
    Hash Index Record Key Synonym Pointer
     
    10 4000 70
    27 5207  
     
    70 360 71
    71 1410  
     
  8. Consider the Super Baseball League Player file shown below. It lists all of the players in the league by their unique identification number, their name, age, the year they joined the league, and the team on which they are currently playing.

    Player Number Player Name Age First Year Team Number
    1538 Fred Williams 23 2003 12
    1882 Tom Parker 29 2000 35
    2071 Juan Gomez 33 1990 12
    2364 Steve Smith 24 2002 20
    2757 Tim Jones 37 1988 18
    3186 Dave Lester 29 1998 18
    3200 Rod Smith 25 2002 20
    3834 Chico Lopez 24 2003 12
    4950 Chris Vernon 26 2003 15
    5296 Barry Morton 30 1995 35
    1. Construct a B+-tree index of the type shown in this chapter for the Player file, assuming that there are now many more records than are shown above. The file and the index have the following characteristics (Minicase 8.2.b):
      • The file is stored on eight cylinders of the disk. The highest key values on the eight cylinders, in order, are:
        1. 1427
        2. 1965
        3. 2848
        4. 3721
        5. 4508
        6. 5396
        7. 6530
        8. 7442
      • Each index record has order 4—i.e., is has at most 4 children.
      • There are three index records at the lowest level of the tree index.

      B+-tree index

    2. The same as the prior part, but now there are four index records at the lowest level of the tree index. (Minicase 8.2.c)

      B+-tree index

    3. The same as the prior part, but each index record has order 3 and there are four index records at the lowest level of the tree index. (Minicase 8.2.d)

      B+-tree index