SQL
Learning the Structured Query Language (SQL) requires practice – it’s no different than learning any other programming language. This programming exercise (PEX) offers an opportunity to practice various types of SQL queries.
- Expected duration
- 8–10 hours
- Deadline
- 2300 on Lesson 20
- Points
- 75 points
Learning Objectives
- Use SQL to query a database
Help Policy
- Authorized Resources
- Any, except posted solutions
- Notes
- Never copy another person’s work and submit it as your own.
-
Do not jointly work on this homework, but you may ask classmates questions about SQL syntax.
-
You must document all help received from all sources, including the instructor and instructor-provided course materials (including your textbook), except for the following:
- Resources related to SQL syntax (e.g., PostgreSQL’s Reference for SQL Commands)
Instructions
Use the provided link (distributed separately – look in Blackboard under “Assignments” for the invitation link) to access the assignment in GitHub Classroom.
If the ConstructCo and LargeCo databases don’t already exist in your PostgreSQL cluster, create them and load the corresponding files to populate the data for this assignment. Both files are in the Git repository that is created in GitHub Classroom.
- ConstructCo
- The ConstructCo database (ER diagram) stores data for a consulting company that tracks all charges to projects. The charges are based on the hours each employee works on each project.
- LargeCo
- The LargeCo database (ER diagram) stores data for a company
that sells paint products. The company tracks the sale of products to
customers. The database keeps data on the following entities:
- customers (
LGCUSTOMER
), - sales (
LGINVOICE
), - products (
LGPRODUCT
), - which products are on which invoices (
LGLINE
); - employees (
LGEMPLOYEE
), - the salary history of each employee (
LGSALARY_HISTORY
), - departments (
LGDEPARTMENT
), - product brands (
LGBRAND
), - vendors (
LGVENDOR
), and - which vendors supply each product (
LGSUPPLIES
).
Some of the tables contain only a few rows of data, while other tables are quite large; for examples, there are only eight departments, but more than 3300 invoices containing over 11000 invoice lines.
- customers (
The exercises are adapted from Database Systems: Design, Implementation, &
Management (Coronel and Morris 2017). Modify the file in the src
directory
that corresponds to each query. For example, write the query for the first
exercise in the file src/01.sql
. The SQL comment that appears at the
beginning of the file is for your convenience and may remain in the file. Do
not modify any files in the test
directory; doing so could prevent the
automated tests from running correctly.
Submission
Create an archive of your Git repository (you can use GitHub’s “Clone or download” button when viewing your repository for this purpose) and submit that archive to Blackboard. Be sure to include your documentation statement as part of the submission.
GitHub Classroom also tags the latest commit at the due date for the assignment.
Grading
The following grading rubric will be used for this assignment:
Description | Points |
---|---|
Queries (@ 3 points each) | 75 |
Each query is worth the same amount. No partial credit will be awarded for queries that fail the automated tests unless you can prove that a test case rejected valid input!
Automated Tests
Travis CI will test all the queries when you push your changes to GitHub (i.e., continuous integration). If the results from executing your query match the expected results, then your query is likely to be correct; otherwise, you should review your query and the test results to determine where they differ. The entire class must share the resources provided by Travis CI so there may be a significant delay before running the tests. You should test each query against a local copy of the database before pushing to GitHub.
Each test case executes your query and compares its results to the expected
results. Any difference is reported as a failure and the difference is reported
in the test log. For example, the following difference is reported when a query
fails to rename a COUNT
expression:
--- test/12.out 2020-02-03 15:49:50.624441740 +0000
+++ /tmp/12.out 2020-02-03 15:49:57.944441740 +0000
@@ -1,13 +1,13 @@
- brand_name | numproducts
--------------------+-------------
- BINDER PRIME | 27
- BUSTERS | 25
- FORESTERS BEST | 15
- HOME COMFORT | 36
- LE MODE | 36
- LONG HAUL | 41
- OLDE TYME QUALITY | 27
- STUTTENFURST | 27
- VALU-MATTE | 18
+ brand_name | count
+-------------------+-------
+ BINDER PRIME | 27
+ BUSTERS | 25
+ FORESTERS BEST | 15
+ HOME COMFORT | 36
+ LE MODE | 36
+ LONG HAUL | 41
+ OLDE TYME QUALITY | 27
+ STUTTENFURST | 27
+ VALU-MATTE | 18
(9 rows)
The red lines (prefixed by -
) represent the expected results, and the green
lines (prefixed by +
) indicate the actual results from executing your
query. In this case, the difference is trivial: because the COUNT
expression
was not renamed, it changes the alignment of that column, which causes the
textual representation of the two result sets to differ. In other cases, the
majority of the results may be correct:
--- test/23.out 2020-02-03 22:51:06.441649555 +0000
+++ /tmp/23.out 2020-02-03 22:51:14.317649555 +0000
@@ -1,9 +1,9 @@
903 | ROBIN | ADDISON | 323 LORETTA PLACE | Mobile | AL | 36693 | 2015-08-26 | 230.63
643 | NINA | ALLEN | 680 RED TALON DRIVE | Robertsdale | AL | 36574 | 2015-06-21 | 11.99
295 | DORTHY | AUSTIN | 829 BIG BEND LOOP | Diamond Shamrock | AL | 36614 | 2015-04-24 | 589.75
- 393 | FOSTER | BERNAL | 1299 EAST 3RD AVENUE | Birmingham | AL | 35280 | |
+ 393 | FOSTER | BERNAL | 1299 EAST 3RD AVENUE | Birmingham | AL | 35280 | | 0.00
853 | GAYLORD | BOLTON | 1069 LUGENE LANE | Montgomery | AL | 36131 | 2015-11-25 | 372.68
925 | ALANA | BOOKER | 1874 I STREET | Mccullough | AL | 36502 | 2015-12-12 | 208.85
1248 | LISA | BRADY | 491 LOWLAND AVENUE | Daphne | AL | 36577 | 2015-12-05 | 414.47
Here, the actual result contains NULL
instead of the expected value of 0.00
for the Bernal Foster’s largest invoice, although the rest of the results are
correct.
You are strongly encouraged to complete this assignment several days prior to the due date so that you have time to resolve any failures for the automated tests. If you have questions regarding how to interpret their output, do not hesitate to ask the instructor.