SQL | Comp Sci 364

Learning the Structured Query Language (SQL) requires practice – it’s no different than learning any other programming language. This programming exercise (PEX) offers an opportunity to practice various types of SQL queries.

Learning Objectives
Help Policy
Instructions
Submission
- Grading
- Automated Tests

Expected duration: 8–10 hours
Deadline: 2300 on Lesson 20
Points: 75 points
Submission: Canvas

Learning Objectives

Use SQL to query a database

Help Policy

Authorized Resources

Any, except posted solutions

Notes

Never copy another person’s work and submit it as your own.

Do not jointly work on this homework, but you may ask classmates questions about SQL syntax.

You must document all help received from all sources, including the instructor and instructor-provided course materials (including your textbook), except for the following:

Resources related to SQL syntax (e.g., PostgreSQL’s Reference for SQL Commands) unless received from a classmate or an AI system (e.g., ChatGPT)

Instructions

Use the provided link (distributed separately) to access the assignment in GitHub Classroom.

If the ConstructCo and LargeCo databases don’t already exist in your PostgreSQL cluster, create them and load the corresponding files to populate the data for this assignment. Both files are in the Git repository that is created in GitHub Classroom.

ConstructCo

The ConstructCo database (ER diagram) stores data for a consulting company that tracks all charges to projects. The charges are based on the hours each employee works on each project.

LargeCo

The LargeCo database (ER diagram) stores data for a company that sells paint products. The company tracks the sale of products to customers. The database keeps data on the following entities:

customers (LGCUSTOMER),
sales (LGINVOICE),
products (LGPRODUCT),
which products are on which invoices (LGLINE);
employees (LGEMPLOYEE),
the salary history of each employee (LGSALARY_HISTORY),
departments (LGDEPARTMENT),
product brands (LGBRAND),
vendors (LGVENDOR), and
which vendors supply each product (LGSUPPLIES).

Some of the tables contain only a few rows of data, while other tables are quite large; for example, there are only eight departments, but more than 3300 invoices containing over 11000 invoice lines.

The exercises are adapted from Database Systems: Design, Implementation, & Management (Coronel and Morris 2017). Modify the file in the src directory that corresponds to each query. For example, write the query for the first exercise in the file src/01.sql. The SQL comment that appears at the beginning of the file is for your convenience and may remain in the file. Do not modify any files in the test directory; doing so could prevent the automated tests from running correctly.

Submission

Create an archive of your Git repository (you can use GitHub’s “Clone or download” button when viewing your repository for this purpose) and submit that archive to Canvas. Be sure to include your documentation statement as part of the submission.

GitHub Classroom also tags the latest commit at the due date for the assignment.

Grading

The following grading rubric will be used for this assignment:

Description	Points
Queries (@ 3 points each)	75

Each query is worth the same amount. No partial credit will be awarded for queries that fail the automated tests unless you can prove that a test case rejected valid input!

Automated Tests

GitHub Actions tests all the queries when you push your changes to GitHub (i.e., continuous integration). If the results from executing your query match the expected results, then your query is correct; otherwise, you should review your query and the test results to determine where they differ. The entire class shares the resources provided for auto-grading; more specifically, there is a maximum number of minutes per month for GitHub Actions, and each job counts toward that maximum. You should test each query against a local copy of the database before pushing to GitHub.

Because GitHub Actions resources are shared, each workflow run in excess of 25 will incur a 1-point penalty.

To encourage judicious use of our shared resources, the aforementioned penalty will be applied as a deterrent against pushing changes that are not correct. For example, if you complete all the queries correctly and your repository has 22 workflow runs, then you will receive full credit, but if you complete all the queries correctly and your repository has 30 workflow runs, then you will incur a 5-point penalty and receive only 70 points for this assignment. If you test each query locally, then you should have no trouble avoiding a penalty. You may check the number of workflow runs by clicking on the “Actions” tab of your repository.

Each test case executes your query and compares its results to the expected results. Any difference is reported as a failure and the difference is reported in the test log. For example, the following difference is reported when a query fails to rename a COUNT expression:

--- test/12.out	2020-02-03 15:49:50.624441740 +0000
+++ /tmp/12.out	2020-02-03 15:49:57.944441740 +0000
@@ -1,13 +1,13 @@
-    brand_name     | numproducts
--------------------+-------------
- BINDER PRIME      |          27
- BUSTERS           |          25
- FORESTERS BEST    |          15
- HOME COMFORT      |          36
- LE MODE           |          36
- LONG HAUL         |          41
- OLDE TYME QUALITY |          27
- STUTTENFURST      |          27
- VALU-MATTE        |          18
+    brand_name     | count
+-------------------+-------
+ BINDER PRIME      |    27
+ BUSTERS           |    25
+ FORESTERS BEST    |    15
+ HOME COMFORT      |    36
+ LE MODE           |    36
+ LONG HAUL         |    41
+ OLDE TYME QUALITY |    27
+ STUTTENFURST      |    27
+ VALU-MATTE        |    18
 (9 rows)

The red lines (prefixed by -) represent the expected results, and the green lines (prefixed by +) indicate the actual results from executing your query. In this case, the difference is trivial: because the COUNT expression was not renamed, it changes the alignment of that column, which causes the textual representation of the two result sets to differ. In other cases, the majority of the results may be correct:

--- test/23.out	2020-02-03 22:51:06.441649555 +0000
+++ /tmp/23.out	2020-02-03 22:51:14.317649555 +0000
@@ -1,9 +1,9 @@
        903 | ROBIN      | ADDISON    | 323 LORETTA PLACE          | Mobile           | AL         | 36693    | 2015-08-26 |          230.63
        643 | NINA       | ALLEN      | 680 RED TALON DRIVE        | Robertsdale      | AL         | 36574    | 2015-06-21 |           11.99
        295 | DORTHY     | AUSTIN     | 829 BIG BEND LOOP          | Diamond Shamrock | AL         | 36614    | 2015-04-24 |          589.75
-       393 | FOSTER     | BERNAL     | 1299 EAST 3RD AVENUE       | Birmingham       | AL         | 35280    |            |
+       393 | FOSTER     | BERNAL     | 1299 EAST 3RD AVENUE       | Birmingham       | AL         | 35280    |            |            0.00
        853 | GAYLORD    | BOLTON     | 1069 LUGENE LANE           | Montgomery       | AL         | 36131    | 2015-11-25 |          372.68
        925 | ALANA      | BOOKER     | 1874 I STREET              | Mccullough       | AL         | 36502    | 2015-12-12 |          208.85
       1248 | LISA       | BRADY      | 491 LOWLAND AVENUE         | Daphne           | AL         | 36577    | 2015-12-05 |          414.47

Here, the actual result contains NULL instead of the expected value of 0.00 for the Bernal Foster’s largest invoice, although the rest of the results are correct.

You are strongly encouraged to complete this assignment several days prior to the due date so that you have time to resolve any failures for the automated tests. If you have questions regarding how to interpret their output, do not hesitate to ask the instructor.