PostgreSQL

PostgreSQL is an open source relational database management system that has seen active development for multiple decades. Distributions run on most operating systems including Linux, macOS, and Windows. This lab focuses on pgAdmin, which is a cross-platform graphical user interface for administration.

In this lab, you will construct a database for a garage that handles automotive repairs. The tables required are described below along with each table’s fields and keys. Detailed instructions are included for constructing the first table; you are expected to repeat the steps for the others.

Note that no table or field names are capitalized. When executing a SQL query, the PostgreSQL server forces all table and attribute names to lower case. Thus, SELECT * FROM Person is seen as SELECT * FROM person by the server. (This can be overcome by placing double quotes (") around names, but not capitalizing names is easier.)

Table	Field	Data Type	Keys and Constraints
person	id	serial	primary key, not NULL
	lastname	varchar(25)	not NULL
	firstname	varchar(25)	not NULL
	phone	numeric(10)	not NULL
customer	id	serial	primary and foreign key, not NULL
employee	id	serial	primary and foreign key, not NULL
	wage	numeric(5, 2)	default value = 5.15, not NULL
car	carid	serial	primary key, not NULL
	customerid	serial	foreign key, not NULL
	vin	char(17)	unique
	make	varchar(15)	not NULL
	model	varchar(50)	not NULL
	year	numeric(4)	not NULL
work	workid	serial	primary key, not NULL
	carid	serial	foreign key, not NULL
	description	text	not NULL
	date	date	not NULL
	cost	numeric(7, 2)	default = 0.00, not NULL
workrecord	employeeid	serial	primary and foreign key, not NULL
	workid	serial	primary and foreign key, not NULL

Note: PostgreSQL considers varchar to be an alias for character varying. In pgAdmin, you might only see the character varying data type.

Configuring pgAdmin

Open pgAdmin. Because pgAdmin has just been installed, a connection to the PostgreSQL server must be established although you might simply be requested to provide the password for the local PostgreSQL instance.

If not immediately prompted for a password, right click Servers in the object browser (left side of the window), and click Create > Server…. Enter a server name (e.g., “PostgreSQL 11”) on the General tab and “localhost” as the Host name / address on the Connection tab. Click the “Save” button on the dialog.

Now that you have logged into the PostgreSQL server, the object browser displays contents of the database cluster. pgAdmin allows users to view a hierarchy of databases and their objects in the left frame. In the main window, the Dashboard tab displays details about the selected object and any pertinent statistics. When applicable, the SQL tab has a reverse-engineered SQL script that could be used to (re)construct the current object.

Creating Tables

Create a new database named garage by right-clicking the server and then clicking Create > Database…. Expand database, schema, and public nodes for the garage database. Right click on Tables and select Create > Table…. This opens the table creation wizard.

On the first tab, you will need to name your new table. Type “person” in the Name field and then move to the second tab to define the table columns.

Click the + icon. Create the id column as defined above. Repeat this process to define all the other fields. When defining the phone field, the length should be 10 and precision should be 0. The length attribute refers to the total number of digits while the precision refers only to those digits following the decimal point.

Move to the final tab to view the SQL statement that will actually create the table. Examine the SQL and make sure you understand the CREATE TABLE (...) portion.

Define all of the remaining tables. You will want to do this in the order they appear so you don’t have trouble setting the foreign key constraints. When creating foreign key constraints, use FK_tablename-field as a template for naming. The Foreign Key Properties tab specifies which table is being referenced, and the Columns tab allows for specifying the exact fields. The final tab, Action, allows you to specify what action the database should take if the key in the parent table is modified or removed. PostgreSQL specifies a NO ACTION (default) option which essentially mimics the RESTRICT option. All foreign keys should specify CASCADE on update and RESTRICT on delete.

As an objection-relational database management system, PostgreSQL also understands the concept of inheritance across tables which potentially could remove some of the need for foreign keys. Since the customer table contains only a single column, we’ll make a copy of it called customer2. This time, specify that it inherits (on the Columns tab of the create table wizard) from the person table. No columns need to be explicitly defined. Open the Query Tool (Tools > Query Tool) and execute the following two queries. Browse the person and customer2 tables by right-clicking each in the object browser and clicking View / Edit Data > All Rows.

INSERT INTO person (id, lastname, firstname, phone)
VALUES (-1, 'p-last', 'p-first', 123456789);
INSERT INTO customer2 (id, lastname, firstname, phone)
VALUES (-2, 'c-last', 'c-first', 3334445555);

You can see that the customer2 table displays only the information that was inserted directly into that specific table. Conversely, the person table contains a copy of the customer2 row. Although convenient, one major problem with this functionality is key constraints—they are not copied from the parent to the child table(s). Run the following query and then browse the person table again.

INSERT INTO person (id, lastname, firstname, phone)
VALUES (-2, 'x-last', 'x-first', 5436547654);

One expects an error to occur because the person table already contains a person with an id of -2 (the insert should violate the primary key constraint). Unfortunately, constraints neither transfer down to child tables nor carry across a parent and child. No error is raised even though the database is now in an inconsistent state.

More information regarding the inherits feature—including caveats—can be found at https://www.postgresql.org/docs/current/ddl-inherit.html. When you’ve finished experimenting with the inherits feature, drop (remove) the second customer table.

pgAdmin’s Query Tool

Included in pgAdmin is a query tool that can be used to execute arbitrary SQL statements. The top half of the query tool is the entry window where SQL commands are entered. The bottom section is reserved for the query’s output. Because the garage database is currently empty, the tables must be populated. Download garage_data.sql. This file consists of SQL statements that will insert data into each table. Open the Query Tool (Tools > Query Tool). Next, open garage_data.sql and examine the SQL. Try to understand what each query does. When you are ready, click the Execute / Refresh button (lightning bolt icon) to run the queries.

The Query Tool retains a history of all the queries which have been executed in the Query History tab. Along with the individual SQL statements, it keeps track of runtimes and results (number of rows retrieved/affected) until clicking the Clear history button. Clearing the history at this point is a good idea since so many individual queries have been executed.

Type SELECT * FROM person to view the entire person table. When a query is sent to the PostgreSQL server, the query optimizer generates a query plan which specifies how the SQL statement will be executed. pgAdmin can show the steps involved in running a query in a graphical way. Write a SQL statement to view all the records in the person table, but instead of executing the query, ask PostgreSQL to explain it by clicking the Explain button (an option when clicking the drop-down list beside the lightning bolt icon). The Explain tab is shown with a diagram of the person table. Clicking on the person table should bring up a tool tip reading something like

Seq Scan on person (cost=0.00..101 rows=5000 width=37).

The first piece of text indicates the operation, and the second indicates the table. Next come four numbers. The numbers contain the following information:

Estimated startup cost (how long before the first row can be returned)
Estimated total cost (if all rows are retrieved)
Estimated number of rows this step will return
Estimated average widths (in bytes) of output rows

Running several more complicated queries will demonstrate the complexities of the query plan. Three queries are listed below; following each query description is an explanation of the SQL required to perform it. If you are comfortable with the SQL language, try not to read the explanation unless you are stuck. (The SQL statement itself may be found below.) Pay most attention to the final query which takes advantage of a subquery expression by building upon the previous two queries.

Obtain a listing of the first and last names of all customers. The results should be sorted by last name in ascending order.

Since the customer table does not directly store an individual’s name, we must join the customer table to the person table. The key here (no pun intended) is the foreign key constraint we declared when we created the customer table. We need to specify that customer with id X is the same individual as person with id X. To do this join in SQL, include the following condition in the WHERE clause: person.id = customer.id. Ordering the data is done with ORDER BY, which in this case will look like ORDER BY lastname ASC. The rest should be straightforward.
Obtain the id number and number of vehicles for all customers who have more than three cars.

This query is relatively straightforward except for the fact that it requires an aggregate function—COUNT. No join operations are required, for the car table already stores the customer number. To count the number of cars (individual rows in the car table) associated with a single customer, include COUNT(*) in the SELECT clause. It is also necessary to specify how to perform the count by using GROUP BY customerid. This clause tells PostgreSQL to count the number of rows associated with each customer (because customerid is unique to a single customer). Finally, enter a HAVING clause to discard customers with fewer than three vehicles. The clause will look like HAVING COUNT(*) > 3 or HAVING COUNT(*) >= 4.
Obtain the first and last names and the total amount of money spent by individuals who own three or more vehicles and have spent more than fifteen hundred dollars at the garage. Sort in descending order by the total amount of money spent.

Once again, the customer and person tables must be joined. (The explanation of how to do this may be found above.) Additionally, we need to join the customer and car tables and the car and work tables. A join is specified by stating tableA.field = tableB.field in the WHERE clause. The WHERE clause also needs to specify that the id number of the person is in the id numbers of people who have more than three cars. Using the construct field IN (subquery) will achieve this goal. The subquery expression may only return a single column. Thus, modify your second query and do not display the number of vehicles belonging to each individual. The GROUP BY clause is also a little tricky in this situation. Because we’re displaying the first and last names of each person, these fields must be in the GROUP BY clause. Unfortunately, first and last names are not guaranteed to be unique so the id number should also be included. As in the second query, the HAVING clause restricts results to those individuals who meet certain additional criteria—they have spent more than 1500 dollars. The last step is ordering the results in descending order which is specified with DESC after the field name (as opposed to ASC in the first query).

Adding additional table indexes can improve query performance. Obtain the listing of first and last names in the person table where the last names are in ascending order (do not order first names as well). Examine the steps in the query plan. You should see a sequential scan followed by a sort. Look at the time required to do both operations. Now, index the lastname field of the person table using the following syntax:

CREATE INDEX index_name ON table (field [, ...]);

Run the first and last name query again. Pay special attention to the changes in the query plan. No longer is there a scan followed by a sort; the index allows the individual steps to be combined into one which is faster than before. Indexing fields that are used in join operations can significantly improve performance. Because queries in the garage database use primary keys for join operations, additional indexes are unneeded—PostgreSQL automatically defines a unique index when a primary key is specified.

Final Notes

PostgreSQL interfaces with many different programming languages by providing library functions for Java (Java Database Connectivity (JDBC)), PHP Hypertext Preprocessor (PHP), Open Database Connectivity (ODBC), C, C++, Perl, Ruby, and Python to name a few. More information regarding PostgreSQL can be found at https://www.postgresql.org.

Solutions

The following are the SQL statements requested previously:

Obtain a listing of the first and last names of all customers. The results should be sorted by last name in ascending order.

 SELECT person.firstname, person.lastname
 FROM person, customer
 WHERE person.id = customer.id
 ORDER BY person.lastname ASC;

Obtain the id number and number of vehicles for all customers who have more than three cars.
```
 SELECT customerid, COUNT(*)
 FROM car
 GROUP BY customerid
 HAVING COUNT(*) >= 4;
```

Obtain the first and last names and the total amount of money spent by individuals who own three or more vehicles and have spent more than fifteen hundred dollars at the garage. Sort in descending order by the total amount of money spent.

 SELECT firstname, lastname, SUM(cost) AS total_spent
 FROM person, customer, car, work
 WHERE person.id = customer.id AND
   customer.id = car.customerid AND
   car.carid = work.carid AND
   person.id IN (SELECT customerid
                 FROM car
                 GROUP BY customerid
                 HAVING COUNT(*) > 3)
 GROUP BY person.id, person.firstname, person.lastname
 HAVING SUM(cost) > 1500
 ORDER BY total_spent DESC;