PostgreSQL
PostgreSQL is an open source relational database management system that has seen active development for multiple decades. Distributions run on most operating systems including Linux, macOS, and Windows. This lab focuses on pgAdmin, which is a cross-platform graphical user interface for administering PostgreSQL databases.
- Expected Duration
- 1–2 hours
- Deadline
- None (no explicit submission is required)
- Points
- None
Learning Objectives
- Create a relational database using PostgreSQL (pgAdmin)
- Execute SQL queries using pgAdmin
Help Policy
- Authorized Resources
- Any
- Notes
- Never copy another person’s work and submit it as your own.
-
You must document all help received from all sources, including the instructor and instructor-provided course materials (such as the textbook).
Assignment
Prior to starting this assignment, you must complete the software installation instructions to create a virtual machine (VM) and install PostgreSQL (and pgAdmin) in that VM.
This tutorial introduces many topics that we will cover in detail throughout the rest of the course. You are not expected to be able to write the SQL queries by yourself, but trying to work through them will be good practice for upcoming lessons.
In this lab, you will construct a database for a garage that handles automotive repairs. This lab introduces data definition and data manipulation using the Structured Query Language (SQL).
Configuring pgAdmin
Open pgAdmin. Because pgAdmin has just been installed, a connection to the PostgreSQL server must be established. Right click Servers in the object browser (left side of the window), and click Register > Server…. Enter a server name (e.g., “PostgreSQL”) on the General tab. On the Connection tab, enter the following information:
- Host name/address
- localhost
- Maintenance database
- student
- Username
- student
- Password
- CompSci364
You may also want to save the password so you are not prompted to enter it repeatedly. Click the “Save” button on the dialog.
Now that you have logged into the PostgreSQL server, the object browser displays contents of the database cluster. pgAdmin allows users to view a hierarchy of databases and their objects in the left frame. In the main window, the Dashboard tab displays details about the selected object and any pertinent statistics. When applicable, the SQL tab has a reverse-engineered SQL script that could be used to (re)construct the current object.
Create the Database
Create a new database named garage
by right-clicking the server and then
clicking Create > Database…. Enter “garage” in the Database field and
click the Save button.
In the object browser (left side bar), you should see the garage
database
listed. If not, right-click Databases in the object browser and then click
the Refresh… button.
Tables
The tables required are described below along with each table’s fields and keys. Detailed instructions are included for constructing the first table; you are expected to repeat the steps for the others.
Expand database, schema, and public nodes for the garage
database in the
object browser (left side bar).1 Right click on Tables and select *Create
Table…* to open the table creation wizard. Use the following steps to create the
person
table:
- On the first tab, you will need to name your new table. Type “person” in the Name field.
- Move to the second tab, which is labeled Columns.
- Click the + icon. Create the
id
column by typing “id” in the Name field. Select “serial” as the data type. Ensure that Not NULL? is selected. - Click the + icon. Enter “lastname” in the Name field. Select “character varying” as the data type, and enter “25” in the Length field. (This length specifies the maximum number of characters that will be stored – longer values are truncated to fit.) Ensure that Not NULL? is selected.
- Define the
firstname
field analogously to thelastname
field. - Click the + icon. Enter “phone” in the Name field. Select “numeric” as
the data type, and enter “10” as the length. (The length of a
NUMERIC
field is the total number of digits that will be stored; precision is the number of digits that follow the decimal point.) Ensure that Not NULL? is selected.
- Click the + icon. Create the
- Move to the Constraints tab. In the tabbed pane, Primary Key should be
selected. Click the + icon. Select the Definition tab. Select
id
in the Columns field to make theid
attribute the table’s primary key. - Move to the final tab, which is labeled SQL, to view the SQL statement
that will actually create the table. Examine the SQL and make sure you
understand the
CREATE TABLE (...)
portion. - Click the “Save” button to create the table.
Verify that the person
table appears in the object browser. (You may have to
right click Tables and then click Refresh… to see it listed).
At this point, you may be wondering if all this effort, particularly navigating
through the table creation wizard, is worth it. As you’re beginning, you’ll
probably find the table creation wizard to be helpful because it eliminates the
need to know SQL’s CREATE TABLE
syntax. In the long term, though, writing the
SQL command “by hand” is much faster as you might have guessed from reviewing
it before clicking the “Save” button in the wizard.
Define all the remaining tables that are shown below. Do this in the order they appear so you don’t have trouble setting the foreign key constraints. Instructions to create a foreign key constraint follow the table definitions.
Table | Field | Data Type | Keys and Constraints |
---|---|---|---|
person | id | serial | primary key, not NULL |
lastname | character varying(25) | not NULL | |
firstname | character varying(25) | not NULL | |
phone | numeric(10) | not NULL | |
customer | id | serial | primary and foreign key, not NULL |
employee | id | serial | primary and foreign key, not NULL |
wage | numeric(5, 2) | default value = 5.15, not NULL | |
car | carid | serial | primary key, not NULL |
customerid | serial | foreign key, not NULL | |
vin | character(17) | unique | |
make | character varying(15) | not NULL | |
model | character varying(50) | not NULL | |
year | numeric(4) | not NULL | |
work | workid | serial | primary key, not NULL |
carid | serial | foreign key, not NULL | |
description | text | not NULL | |
date | date | not NULL | |
cost | numeric(7, 2) | default = 0.00, not NULL | |
workrecord | employeeid | serial | primary and foreign key, not NULL |
workid | serial | primary and foreign key, not NULL |
Creating a foreign key is done in the Constraints tab. Use the following
steps to define the foreign key for the customer
table:
- Select the Constraints tab in the table creation wizard.
- In the resulting tabbed pane, select Foreign Key, click the + icon.
- Click the Columns tab, ensure that
id
is selected for the Local column field, and selectpublic.person
for the References field andid
for the Referencing field. - Click the “Add” button below these fields.
This foreign key stipulates that any entry in the customer
table is
associated with a corresponding entry in the person
table. This concept is
known as referential integrity – i.e., the customer’s id must refer to (or
reference) a valid person’s id.
Queries
Included in pgAdmin is a query tool that can be used to execute arbitrary SQL statements. The top half of the query tool is the entry window where SQL commands are entered. The bottom section is reserved for the query’s output. Because the garage database is currently empty, the tables must be populated. Download garage_data.sql. This file consists of SQL statements that will insert data into each table. Open the Query Tool (Tools > Query Tool). Next, open garage_data.sql and examine the SQL.
Note: Due to the size of the file, do not copy-paste it into the query window. Doing so might cause pgAdmin to hang.
Try to understand what each query does. Click the play button on the toolbar (tooltip: “Execute script”) to run the queries.
The Query Tool retains a history of all the queries which have been executed in the Query History tab. Along with the individual SQL statements, it keeps track of runtimes and results (number of rows retrieved/affected) until clicking the Clear history button. Clearing the history at this point is a good idea since so many individual queries have been executed.
Type SELECT * FROM person
to view the entire person table. When a query is
sent to the PostgreSQL server, the query optimizer generates a query plan which
specifies how the SQL statement will be executed. pgAdmin can show the steps
involved in running a query in a graphical way. Write a SQL statement to view
all the records in the person table, but instead of executing the query, ask
PostgreSQL to explain it by clicking the Explain button on the tool bar. The
Explain tab is shown with a diagram of the person table. Clicking on the person
table should bring up a tool tip reading something like
Seq Scan on person (cost=0.00..101 rows=5000 width=37).
The first piece of text indicates the operation, and the second indicates the table. Next come four numbers. The numbers contain the following information:
- Estimated startup cost (how long before the first row can be returned)
- Estimated total cost (if all rows are retrieved)
- Estimated number of rows this step will return
- Estimated average widths (in bytes) of output rows
Running several more complicated queries will demonstrate the complexities of the query plan. Three queries are listed below; following each query description is an explanation of the SQL required to perform it. If you are comfortable with the SQL language, try not to look at the solutions unless you are stuck. Pay most attention to the final query which takes advantage of a subquery expression by building upon the previous two queries.
-
Obtain a listing of the first and last names of all customers. The results should be sorted by last name in ascending order.
Since the customer table does not directly store an individual’s name, we must join the customer table to the person table. The key here (no pun intended) is the foreign key constraint we declared when we created the customer table. We need to specify that customer with id X is the same individual as person with id X. To do this join in SQL, include the following condition in the
WHERE
clause:person.id = customer.id
. Ordering the data is done withORDER BY
, which in this case will look likeORDER BY lastname ASC
. The rest should be straightforward.SELECT person.firstname, person.lastname FROM person, customer WHERE person.id = customer.id ORDER BY person.lastname ASC;
-
Obtain the id number and number of vehicles for all customers who have more than three cars.
This query is relatively straightforward except for the fact that it requires an aggregate function—
COUNT
. No join operations are required, for the car table already stores the customer number. To count the number of cars (individual rows in the car table) associated with a single customer, includeCOUNT(*)
in theSELECT
clause. It is also necessary to specify how to perform the count by usingGROUP BY customerid
. This clause tells PostgreSQL to count the number of rows associated with each customer (because customerid is unique to a single customer). Finally, enter aHAVING
clause to discard customers with fewer than three vehicles. The clause will look likeHAVING COUNT(*) > 3
orHAVING COUNT(*) >= 4
.SELECT customerid, COUNT(*) FROM car GROUP BY customerid HAVING COUNT(*) >= 4;
-
Obtain the first and last names and the total amount of money spent by individuals who own three or more vehicles and have spent more than fifteen hundred dollars at the garage. Sort in descending order by the total amount of money spent.
Once again, the customer and person tables must be joined. (The explanation of how to do this may be found above.) Additionally, we need to join the customer and car tables and the car and work tables. A join is specified by stating
tableA.field = tableB.field
in theWHERE
clause. TheWHERE
clause also needs to specify that the id number of the person is in the id numbers of people who have more than three cars. Using the constructfield IN (subquery)
will achieve this goal. The subquery expression may only return a single column. Thus, modify your second query and do not display the number of vehicles belonging to each individual. TheGROUP BY
clause is also a little tricky in this situation. Because we’re displaying the first and last names of each person, these fields must be in theGROUP BY
clause. Unfortunately, first and last names are not guaranteed to be unique so the id number should also be included. As in the second query, theHAVING
clause restricts results to those individuals who meet certain additional criteria—they have spent more than 1500 dollars. The last step is ordering the results in descending order which is specified withDESC
after the field name (as opposed toASC
in the first query).SELECT firstname, lastname, SUM(cost) AS total_spent FROM person, customer, car, work WHERE person.id = customer.id AND customer.id = car.customerid AND car.carid = work.carid AND person.id IN (SELECT customerid FROM car GROUP BY customerid HAVING COUNT(*) > 3) GROUP BY person.id, person.firstname, person.lastname HAVING SUM(cost) > 1500 ORDER BY total_spent DESC;
Indexes
Adding additional table indexes can improve query performance. Obtain the listing of first and last names in the person table where the last names are in ascending order (do not order first names as well). Examine the steps in the query plan. You should see a sequential scan followed by a sort. Look at the time required to do both operations. Now, index the lastname field of the person table using the following syntax:
CREATE INDEX index_name ON table (field [, ...]);
Run the first and last name query again. Pay special attention to the changes in the query plan. No longer is there a scan followed by a sort; the index allows the individual steps to be combined into one which is faster than before. Indexing fields that are used in join operations can significantly improve performance. Because queries in the garage database use primary keys for join operations, additional indexes are unneeded—PostgreSQL automatically defines a unique index when a primary key is specified.
-
PostgreSQL organizes a server as a hierarchy: database > schema > object. (A table is one type of object.) A schema is loosely analogous to a package or module in most programming languages: schemas are used to group related objects, and objects in different schemas can share the same name. Schemas can also simplify access control by granting roles different permissions to different schemas. In practice, we will use the
public
schema, which exists by default in a PostgreSQL database. ↩