CSCI 3410 - Database Systems

Lecture Notes

Clément Aubert

August 8, 2022 (05:47:16 PM)

List of Problems



As of August 2022, the main author of those notes (Dr. Aubert) is not scheduled to teach CSCI 3410 - Database Systems in the forseeable future. As a result, those notes are archived.

Please, also note that pandoc-include-code and pandoc-numbering, required to compile those notes, are not compatible with the current version of pandoc (cf. the instructions). As a result, compiling those notes will require increasing version tinkering.

How to Use This Guide

How to Read This Guide

These lecture notes are written in an elusive style: they are a support for the explanations that will be made at the board. Reading them before coming to the lecture will help you getting a sense of the next topic we will be discussing, but you may sometimes have trouble deciphering their … unique style.

On top of the notes, you will find in this document:

Any feedback is greatly appreciated. Please refer to for how to contribute to those notes. The syllabus is at, and the webpage for those notes is at

Please, refer to those notes using this entry (Aubert 2019):

	author={Aubert, Clément},
	title={CSCI 3410 - Database Systems},
	institution={{School of Computer and Cyber Sciences, Augusta University}},
	location={Augusta, Georgia, USA},
	type={Lecture notes}

How to Access the Code in This Guide

There are four way to access the code shared in those lecture notes:

  1. You can simply copy-and-paste it from the document and use it as it is.
  2. You can browse the source code of the code snippets at to download them directly.
  3. You can clone the repository containing the notes, figures and code snippets to have a local copy of it. You can find instructions on how to do that at Instructions on how to compile those notes and how to contribute are linked from this document, if you are curious.
  4. You can use the links enclosed in the document.

For this latter aspect, note that some portion of code starts with a path in comment, and are followed by a link, like so:

/* code/sql/HW_HelloWorld.sql */
SELECT "Hello World!";

This means that this code can be found at

and that you can click the link below the code directly to access it1.

The SQL code frequently starts with


This parts starts by deleting the schema HW_NAME_OF_SCHEMA if it exists, then create and use it: it allows the code to run independently of your installation. It needs to be used with care, though, since it would delete everything you have in the HW_NAME_OF_SCHEMA schema before re-creating it, but empty.

Finally, the comments

-- start snippet something


-- end snippet something

can be ignored, as their are an artifice from pandoc-include-code to select which portion of the code to display in those notes.

Planned Schedule

A typical (meeting twice a week, ±17 weeks, ±30 classes) semester is divided as follows:

For information purposes, an indication like this:

marks the (usual) separation between two lectures.

Exams Yearbooks

To give you a sense of what you will be asked during the exams, quizzes and projects, or simply to practise, please find below the exams given previous semesters, in reverse chronological order. The quizzes are not indicated, but were generally a mix of up to five exercises and one problem from the relevant chapter(s).

Fall 2021

Spring 2021

Fall 2020

Spring 2020

Due to the Covid-19 pandemic, only one exam took place, and the final exam was taken remotely on D2L. A second project, more ambitious, was also asked from the students, and accounted for a large portion of their grade.

Fall 2019

Spring 2019

Spring 2018

Fall 2017

Typesetting and Acknowledgments

The source code for those notes is hosted at rocketgit, typeset in markdown, and then compiled using pandoc and multiple filters (pandoc-numbering, the citeproc library, pandoc-include-code). The drawings use various LaTeX packages, including PGF, TikZ, tikz-er2, pgf-umlcd and tikz-dependency. The help from the TeX - LaTeX Stack Exchange community greatly improved this document. The u͟n͟d͟e͟r͟l͟i͟n͟e͟3 text is obtained using YayText, the unicode symbols are searched in the “Unicode characters and corresponding LaTeX math mode commands”. Finally, the pdf version of the document uses Linux Libertine fonts, the html version uses Futura.

Those lecture notes were created under an Affordable Learning Georgia Mini-Grant for Ancillary Materials Creation and Revision (Proposal M71).

Affordable Learning Georgia 

Those lecture notes have greatly benefited from the contributions of many students, included but not limited to Crystal Anderson, Bobby Mcmanus, Minh Nguyen and Poonam Veeral. Additionally, (Redacted), Mark Holcomb, Assya Sellak, Sydney Strong and Patrick Woolard helped smash some bugs in the tools used to produce this document.

Please refer to for a detail of the contributions.


You can find at the end of this document the list of references, and some particular resources listed at the beginning of each chapter. Let me introduce some of them:

Those resources are listed as complements, but it is not require to read them to understand the content of those notes. (Watt and Eng 2014) –being available free of charge– is more descriptive than the current notes, and as such can constitutes a great complement. Unfortunately, it lacks some technical aspects, and the database program aspect is not discussed in detail.



The Need for a Specialized Tool

There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.

In theory, pretty much any programming language can be used to

But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.

In those lecture notes, we will introduce one of this tool–the SQL programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL offered by the NoSQL approach, as well as the limitations and highlights of both models.

What is a Database?

A database (DB) is a collection of related data.

It has two components, the data (= information, can be anything, really) and the management (= logical organization) of the data, generally through a Database Management System.

A database

  1. Represents a mini-world, a “Universe of Disclosure” (UoD).
  2. Is logically coherent, with a meaning.
  3. Has been populated for a purpose.

The mini-world is the part of the world, or universe, that will be represented in the database: as we can not represent the whole universe (every position of every atom at any given moment since the big-bang!), we must agree on what “slice” of it we should represent. Typically, a data-base designed to help in calculating students’ grades will include students’ names, transcript, classes taken, etc., but certainly not their height, favorite color or where they usually seat in class: altough all of this information is “part of the universe”, we will not need it and decide to exclude it from our data.

A DBMS has multiple components, as follows:

A Simplified DBMS

Note that

Database Management System (DBMS)

A DBMS contains a general purpose software that is used to

  1. Define (= datatype, constraints, structures, etc.)
  2. Construct / Create the data (= store the data)
  3. Manipulate / Maintain (= change the structure, query the data, update it, etc.)
  4. Share / Control access (= among users, applications)

You can think of a tool to

  1. Specify a storage unit,
  2. Fill it,
  3. Allow to change its content, as well as its organization,
  4. Allow multiple persons to access all or parts of it at the same time.

How Are the Tasks Distributed?

Exactly like a program can have

a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.

Role Task
Client Specify the business statement, the specifications
DB Administrator Install, configure, secure and maintain up-to-date the DBMS
Designer Lay out the global organization of the data
Programmer Implement the database, work on the programs that will interface with it
User Provide, search, and edit the data (usually)

In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.

Life of a Project

From the business statement to the usage, a project generally follows one of this path:

The life of a project

Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.

An Example

Let us consider the following:


Name Student_number Class Major
Morgan 18 2 IT
Bob 17 1 CS


Course_name Course_number Credit_hours Department
Intro. to CS 1301 4 CS
DB Systems 3401 3 CS
Principles of Scripting and Automation 2120 3 AIST


Section_identifier Course_num Semster Year Instructor
2910 1301 Fall 2019 Kate
9230 2103 Spring 2020 Todd


Student_number Section_identifier Grade
17 2910 A
18 2910 B


Course_number Prerequisite_number
2120 1301
1302 1301

You can describe the structure as a collection of relations, and a collection of columns:


Relation Name Number of Columns


Column Name Datatype Belongs to relation
Name String STUDENT
Student_number Integer STUDENT
Class String STUDENT
Major String STUDENT
Course_name String COURSE
Course_number Integer COURSE
Credit_hours Integer COURSE
Department String COURSE
Prerequisite_number Integer PREREQUISITE


  • Database structure and records, 5 files (=collection of records), each containing data records of the same type, stored in a persistent way.
  • Each record has a structure, different data elements, each has a data type.
  • Records have relationships between them (for instance, you expect the Course_number of PREREQUISITE to occur as a Course_number in COURSE).


  • This organization will allow some interactions. For instance, we can obtain the answer to questions like

    “What is the name of the course whose number is 1301?”,
    “What courses is Kate teaching this semester?”,
    “Does Bob meets the pre-requisite for 2910?”

    Note that this last query is a bit different, as it forces us to look up information in multiple relations.

  • We should also be able to perform updates, removal, addition of records in an efficient way (using auxiliary files (indexes), optimization).

  • Finally, selection (for any operation) requires care: do we want all the records, some of them, exactly one?


Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:

  • To avoid redundancy (“data normalization”), or having it controlled,
  • To controle multiple levels of access (multiple user interface),
  • Without sacrificing the usability!

In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.

How Is a Database Conceived?

  1. Specification and analysis. “Each student number will be unique, but they can have the same name. We want to access the letter grade, but not the numerical grade”, etc. This gives the businnes statement.
  2. Conceptual design
  3. Logical design
  4. Physical design

There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.

Characteristics of the Database Approach

  1. A database is more than just data: it also contains a complete description of the structure and constraints. We generally have a catalog (a.k.a. the meta-data, the schema) and the data (we can also have self-describing data, where meta-data and data are interleaved, but note that both are still present).
  2. Data-abstraction: A DBMS provides a conceptual representation, and hides implementation details. This implies that changing the internals of the database should not require to change the application (the DBMS) or the way any of the client (program, or CLI) was interacting with the data.
  3. Support of multiple views of the data: view is a subset of the database, or virtual data.
  4. Sharing and multiuser transaction processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety). Isolation, atomicity (all or nothing): cf. the ACID principles.


Exercise 1.1

What is the difference between a database and the meta-data of the database?

Exercise 1.2

Is a pile of trash a database? Why, or why not?

Exercise 1.3

Define the word “miniworld”.

Exercise 1.4

Expand the acronym “DBMS”.

Exercise 1.5

Name two DBMS.

Exercise 1.6

Name the four different kinds of action that can be performed on data.

Exercise 1.7

Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:

Task Assigned to
Install a DBMS on a server.  
Sketch the schema so that the data will not be redundant.  
Write client-side application that uses the DBMS API.  
Establish the purpose of the database.  
Exercise 1.8

List some of the tasks assigned to the Database Administrator.

Exercise 1.9

Why do DBMS include concurrency control?

Exercise 1.10

Do I have to change my DBMS if I want to change the structure of my data?

Exercise 1.11

What is independence between program and data? Why does it matter?

Exercise 1.12

Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?

Exercise 1.13

Which one comes first, the physical design, the conceptual design, or the logical design?

Exercise 1.14

What is a virtual data? How can I access it?

Solution to Exercises

Solution 1.1

The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.

Solution 1.2

No, because it lacks a logical structure.

Solution 1.3

The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.

Solution 1.4

Database Management System

Solution 1.5

Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL”, “NoSQL”, “Relational Model”, or such: we are asking for the names of actual softwares!

Solution 1.6

The four actions are:

  • Add / Insert
  • Update / Modify
  • Search / Query
  • Delete / Remove
Solution 1.7

We can have something like:

Task Assigned to
Install a DBMS on a server. Administrator, IT service
Sketch the schema so that the data will not be redundant. Designer
Write client-side application that uses the DBMS API. Programmer, Developer
Establish the purpose of the database. Client, business owner
Solution 1.8

The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.

Solution 1.9

DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.

Solution 1.10

Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model”, e.g., one of the NoSQL models, can require to change the DBMS.

Solution 1.11

The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.

Solution 1.12

If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.

Solution 1.13

The conceptual design.

Solution 1.14

It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.


Problem 1.1 (Define a database for CAMPUS)

Define a CAMPUS database organized into three files as follows:

  • A BUILDING file storing the name and GPS coordinates of each building.
  • A ROOM file storing the building, number and floor of each room.
  • A PROF file storing the name, phone number, email and room number where the office is located for each professor.
Pb 1.1 – Question 1

A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, fig. 1.3) or (Elmasri and Navathe 2015, fig. 1.3). Write the database catalog corresponding to the CAMPUS database.

Pb 1.1 – Question 2

Invent data for such a database, with two buildings, three rooms and two professors.

Pb 1.1 – Question 3

Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS database, which is an up-to-date and accurate representation of its miniworld:

  1. Is it possible to list all the professors?
  2. Is it possible to tell in which department is a professor?
  3. Is it possible to get the office hours of a professor?
  4. Is it possible to list all the professors whose offices are in the same building?
  5. Is it possible to list all the rooms?
  6. If a new professor arrives, and has to share his office with another professor, do you have to revise your database catalog?
  7. Can you list which professors are at the same floor?
  8. Can you tell which professor has the highest evaluations?

Solutions to Selected Problems

Solution to Problem 1.1 (Define a database for CAMPUS)
Pb 1.1 – Solution to Q. 1

The database catalog should be similar to the following:


Relation name Number of columns


Column name Datatype Belongs to relation
Building_Name Character(30) Building
GPSLat Decimal(9,6) Building
GPSLon Decimal(9,6) Building
Building_Name Character(30) ROOM
Room_Number Integer(1) ROOM
Floor Integer (1) ROOM
Prof_Name Character (30) PROF
Phone Integer (10) PROF
Email Character (30) PROF
Room_Number Integer (1) PROF
Pb 1.1 – Solution to Q. 2

For the data, you could have:

  • For the BUILDING file, we could have:
(Allgood Hall, 33.47520, -82.02503)
(Institut Galilé,  48.959001, 2.339999)
  • For the ROOM file, we could have:
(Allgood Hall, 128, 1)
(Institut Galilé, 205, 3)
(Allgood Hall, 228, 2)
  • For the PROF file, we could have:
(Aubert, 839401,, 128)
(Mazza, 938130,, 205)
Pb 1.1 – Solution to Q. 3

If everything we knew about the campus came from that database, then

  1. Yes, we could list all the professors.
  2. No, we could not tell in which department is a professor.
  3. No, we could not get the office hours of a professor.
  4. Yes, we could list all the professors whose offices are in the same building.
  5. Yes, we could list all the rooms.
  6. If a new professor arrives, and has to share his office with another professor, we would not have to revise our database catalog (it is fine for two professor to have the same room number, in our model).
  7. Yes, we could list which professors are at the same floor.
  8. No, we could not tell which professor has the highest evaluations.

The Relational Model




The relational data model (or relational database schema) is:

Domains, Attributes, Tuples and Relations


  • Domain (or type) = set of atomic (as far as the relation is concerned) values. You can compare it to datatype and literals, and indeed it can be given in the form of a data type, but it can be named and carry a logical definition (i.e., List_of_major as an enumerated data type, instead of just String), enforce some constraints (i.e., UNIQUE, to force all the values to be different), or even have a default value.
  • Attribute = Attribute name + attribute domain (but we’ll just write the name).
  • Relation Schema (or scheme) = description of a relation, often written “RELATION_NAME(Attribute1, …, Attributen)”, where n is the degre (arity) of the relation, and the domain of Attributei is written dom(Attributei).
  • Tuple t of the schema R(A1, …, An) is an ordered list of values <v1, …, vn> where vi is in dom(Ai) or a special NULL value.
  • Relation (or relation state) r of the schema R(A1, …, An), also written r(R), is the set of n-tuples t1, …, tm where each ti is a tuple of the schema R(A1, …, An).

Characteristics of Relations

  • In a relation, the order of tuples does not matter (a relation is a set). Order in tuple do matter (alternate representation where this is not true exist, cf. self-describing data).
  • Value is atomic = “flat relational model”, we will always be in the first normal form (not composite, not multi-valued).
  • NULL is N/A, unknown, unavailable (or withheld).
  • While a relation schema is to be read like an assertion (e.g., “Every student has a name, a SSN, …”) a tuple is a fact (e.g., “The student Bob Taylor has SSN 12898, …”).
  • Relations represents uniformly entities (STUDENT(…)) and relations (PREREQUISITE(Course_number, Prerequisite_number)).


  • STUDENT = relation schema + current relation state
  • STUDENT(Name, …, Major) = relation schema only
  • STUDENT.Name = Attribute Name in the relation STUDENT
  • t[Name], t[Name, Major], t.Name (overloading the previous notation) for the value of Name (and Major) in the tuple t.


We now study constraints on the tuples. There are constraints on the scheme, for instance, “a relation cannot have two attributes with the same name”, but we studied those already. The goal of those constraints is to maintain the validity of the relations, and to enforce particular connexions between relations.

Inherent Model-Based Constraints (implicit)

Those are part of the definition of the relational model and are independent of the particular relation we are looking at.

  • You can not have two identical tuples in the same relation,
  • The arity of the tuple must match the arity of the relation.

Schema-Based Constraints (explicit)

Those constraints are parts of the schema.

  • The value must match its domain (“Domain constraint”), knowing that a domain can have additional constraints (NOT NULL, UNIQUE).
  • The entity integrity constraint: no primary key value can be NULL5.
  • The referential integrity constraint: referred values must exist.

Those last two constraints will be studied in the next section.

Application-Based Constraints (semantics)

Constraints that cannot be expressed in the schema, and hence must be enforced by

  • the application program,
  • or the database itself, using triggers or assertions.

Examples: “the age of an employee must be greater than 16”, “this year’s salary increase must be more than last year’s”.


Since we can not have two identical tuples in the same relation, there must be a subset of values that distinguish them. We study the corresponding subset of attributes.

Let us consider the following example:

Yellow Square 10 (5, 3)
Blue Rectangle 10 (3, 9)
Blue Circle 9 (4, 6)

and the following sets of attributes:

{A, B, C, D} {A} {B, C} {D}
Superkey ?
Key ?

Note that here we “retro-fit” those definitions, in database design, they come first (i.e., you define what attributes should always distinguish between tuples before populating your database). We are making the assumption that the data pre-exist to the specification to make the concept clearer.

Foreign Keys

A foreign key (FK) is a set of attributes whose values must match the value in a tuple in another, pre-defined relation. Formally, the set of attributes FK in the relation schema R1 is a foreign key of R1 (“referencing relation”) that references R2 (“referenced relation”) if

If there is a foreign key from R1 to R2, then we say that there is a referential integrity constraint from R1 to R2. We draw it with an arrow from the FK to the PK. Note that it is possible that R1 = R2.


CAR(VIN (PK), Make, Model, Year) DRIVER(State (PK), Licence_number (PK), Name, Address) INSURANCE(Policy_Number (PK), Insured_Car (FK to CAR.VIN), Insured_Driver_State (FK to DRIVER.State), Insured_Driver_Num (FK to DRIVER.Licence_number), Rate) PRICE(Stock_number (PK), Car_Vin (FK to CAR.VIN), Price, Margin)

Transactions and Operations

The operations you can perform on your data are of two kinds: retrievals and updates.

They are two constraints for updates:

  1. The new relation state must be “valid” (i.e., comply with the constraints).
  2. There might be transition constraints (your balance cannot become negative, for instance).

A transaction is a series of retrievals and updates performed by an application program, that leaves the database in a consistent state.

In the following, we give examples of insertion, deletion and update that could be performed, as well as how they could lead a database to become inconsistent. The annotations (1.), (2.) and (3.) refer to the “remedies”, discussed afterward.


Insert <109920, Honda, Accord, 2012> into CAR

How things can go wrong:

  • Inserting the values in the wrong order (meta)
  • NULL for any value of the attributes of the primary key (1.)
  • Duplicate value for all the values in the primary key (1.)
  • Wrong number of arguments (1.)
  • Fail to reference an existing value for a foreign key (1.)


Delete the DRIVER tuple with State = GA and Licence_number = 123

How things can go wrong:

  • Deleting tuples inadvertently (meta)
  • Deleting tuples that are referenced (1., 2., 3.)

Update (a.k.a. Modify)

Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges

How things can go wrong:

  • NULL for the any value of the attributes of the primary key (1.)
  • Duplicate value for the primary key (1.)
  • Change value that are referenced (1., 2., 3.)
  • Change foreign key to a non-existing value (1.)

Dealing with Violations

When the operation leads the database to become inconsistent, you can either:

  1. Reject (restrict) the operation,
  2. Cascade (propagate) the modification,
  3. Set default, or set NULL, the corresponding value(s).


Exercise 2.1

What are the meta-data and the data called in the relational model?

Exercise 2.2

Connect the dots:

Row •   • Attribute
Column header •   • Tuple
Table •   • Relation
Exercise 2.3

What do we call the number of attributes in a relation?

Exercise 2.4

At the logical level, does the order of the tuples in a relation matter?

Exercise 2.5

What is the difference between a database schema and a database state?

Exercise 2.6

What should we put as a value in an attribute if its value is unknown?

Exercise 2.7

What, if any, is the difference between a superkey, a key, and a primary key?

Exercise 2.8

Name the two kinds of integrity that must be respected by the tuples in a relation.

Exercise 2.9

What is entity integrity? Why is it useful?

Exercise 2.10

Are we violating an integrity constraint if we try to set the value of an attribute that is part of a primary key to NULL? If yes, which one?

Exercise 2.11

If in a relation R1, an attribute A1 is a foreign key referencing an attribute A2 in a relation R2, what does this implies about A2?

Exercise 2.12

Give three examples of operations.

Exercise 2.13

What is the difference between an operation and a transaction?

Exercise 2.14

Consider the following two relations:

COMPUTER(Owner, RAM, Year, Brand)
OS(Name, Version, Architecture)

For each, give

  1. The arity of the relation,
  2. A (preferably plausible) example of tuple to insert.
Exercise 2.15

Give three different ways to deal with operations whose execution in isolation would result in the violation of one of the constraint.

Exercise 2.16

Define what is the domain constraint.

Exercise 2.17

Circle the correct statements:

  • Every key is a superkey.
  • Every superkey is a singleton.
  • Every singleton is either a superkey, or a key.
  • Every primary key is a key.
  • Every superkey with one element is a key.
Exercise 2.18

Consider the following three relations:

AUTHOR(Ref, Name, Address) BOOK(ISSN, AuthorRef, Title) GAINED-AWARD(Ref, Name, BookISSN, Year)  

For each relation, answer the following:

  1. What is, presumably, the primary key?
  2. Are they, presumably, any foreign key?
  3. Using the model you defined, could we determine which author won the greatest number of awards a particular year?
Exercise 2.19

Consider the following three relations

TRAIN(Ref (PK), Model, Year) CONDUCTOR(CompanyID (PK), Name, ExperienceLevel) ASSIGNED-TO(TrainRef (PK, FK to TRAIN.Ref), ConductorID (PK, FK to CONDUCTOR.CompanyID), Date (PK))  

  1. What are the foreign keys in the ASSIGNED-TO relation? What are they refering?

  2. In the ASSIGNED-TO relation, explain why the Date attribute is part of the primary key. What would happen if it was not?

  3. Assuming the database is empty, are the following instructions valid? If not, what integrity constraint are they violating?

    1. Insert <'AM-356', 'Surfliner', 2012> into TRAIN
    2. Insert <NULL, 'Graham Palmer', 'Senior'> into CONDUCTOR
    3. Insert <'XB-124', 'GPalmer', '02/04/2018'> into ASSIGNED-TO
    4. Insert <'BTed, 'Bobby Ted', 'Senior'> and <'BTed', 'Bobby Ted Jr.', 'Junior'> into CONDUCTOR
Exercise 2.20

Consider the following relation schema and state:

2 Blue Austin true
1 Yellow Paris true
1 Purple Pisa false
2 Yellow Augusta true

Assuming that this is all the data we will ever have, discuss whenever {A, B, C, D}, {A, B} and {B} are superkeys and/or keys.

Exercise 2.21

Consider the following relation and possible state. Assuming that this is all the data we will ever have, give two superkeys, and one key, for this relation.

1 Austin true Shelly
1 Paris true Cheryl
3 Pisa false Sheila
1 Augusta true Ash
1 Pisa true Linda
Exercise 2.22

Consider the following relation and possible state. Assuming that this is all the data we will ever have, give three superkeys for this relation, and, for each of them, indicate if they are a key as well.

1 A Austin true
2 B Paris true
1 C Pisa false
2 C Augusta true
1 B Augusta true
Exercise 2.23

Consider the following two relations:

BUILDING(Name (PK), Address) ROOM(Code (PK), Building (FK to BUILDING.Name))  

  1. Give two possible tuples for the BUILDING relation, and two possible tuples for the ROOM relation such that the state is consistent.
  2. Based on the data you gave previously, write (in pseudo-code) one INSERT and one UPDATE instruction. Both should violate the integrity of your database.
Exercise 2.24

Consider the following two relations:

  • A Movie relation, with attributes “Title” and “Year”. The “Title” attribute should be the primary key.
  • A Character relation, with attributes “Name”, “First_Appearance”. The “Name” attribute should be the primary key, and the “First_Appearance” attribute should be a foreign key referencing the Movie relation.
  1. Draw its relational model.
  2. Give an example of data that would violate the integrity of your database, and name the kind of integrity you are violating.

Solution to Exercises

Solution 2.1

The meta-data is called the schema, and the data is called the relation state. You can refer to the diagram we studied at the beginnig of the Chapter for a reminder.

Solution 2.2

Row is Tuple, Column header is Attribute, Table is Relation.

Solution 2.3

The degree, or arity, of the relation.

Solution 2.4

No, it is a set.

Solution 2.5

The schema is the organization of the database (the meta-data), while the state is the state is the content of the database (the data).

Solution 2.6


Solution 2.7

A superkey is a subset of attributes such that no two tuples have the same combination of values for all those attributes. A key is a minimal superkey, i.e., a superkey from which we cannot remove any attribute without losing the uniqueness constraint. The primary key is one of the candidate key, i.e., the key that was chosen.

Solution 2.8

Referential integrity and entity integrity.

Solution 2.9

Entity integrity ensures that each row of a table has a unique and non-null primary key value. It allows to make sure that every tuple is different from the others, and helps to “pick” elements in the database.

Solution 2.10

Yes, the entity integrity constraint.

Solution 2.11

Then we know that A2 is the primary key of R2, and that A1 and A2 have the same domain.

Solution 2.12

Reading from the database, performing UPDATE or DELETE operations.

Solution 2.13

An operation is an “atomic action” that can be performed on the database (adding an element, updating a value, removing an element, etc.). A transaction is a series of such operations, and the assumption is that, even if it can be made of operations that, taken individually, could violate a constraint, the overall transaction will leave the database in a consistent state.

Solution 2.14
  1. The arities of the relations are: COMPUTER has for arity 4, and OS has for arity 3.
  2. Examples of tuple to insert are (“Linda McFather”, 32, 2017, “Purism”), and (“Debian”, “Stable”, “amd64”).
Solution 2.15

An operation whose execution in isolation would result in the violation of a constraint can either a) be “restricted” (i.e., not executed), b) result in a propagation (i.e., the tuples that would violate a constraint are updated or deleted accordingly), or c) result in some values in tuples that would violate a constraint to be set to a default value, or the NULL value (this last option works only if the constraint violated is the referential entity constraint).

Solution 2.16

The requirement that each tuple must have for an attribute A an atomic value from the domain dom(A), or NULL.

Solution 2.17

“Every key is a superkey.”, “Every primary key is a key.” and “Every superkey with one element is a key.” are correct statements.

Solution 2.18

To answer 1 and 2, the diagram would become:

AUTHOR(Ref (PK), Name, Address) BOOK(ISSN (PK), AuthorRef (FK to AUTHOR.REF), Title) GAINED-AWARD(Ref (PK), Name, BookISSN (FK to BOOK.ISSN), Year)  

For the last question, the answer is yes: based on the ISSN of the book, we can retrieve the author of the book. Hence, knowing which book was awarded which year, by looking in the GAINED-AWARD table, gives us the answer to that question.

Solution 2.19
  1. In ASSIGNED-TO, TrainRef is a FK to TRAIN.Ref, and ConductorID is a FK to CONDUCTOR.CompanyID.
  2. In this model, a conductor can be assigned to different trains on different days. If Date was not part of the PK of ASSIGNED-TO, then a conductor could be assigned to only one train.
    1. Yes, this instruction is valid.
    2. No, it violates the entity integrity constraint: NULL can be given as a value to an attribute that is part of the PK.
    3. No, it violates the referential integrity constraint: 'XB-124 and 'GPalmer' are not values in TRAIN.Ref and CONDUCTOR.CompanyID.
    4. No, it violates the key constraint: two tuples cannot have the same value for the values of the primary key.
Solution 2.20
  • {A, B, C, D} is a superkey (the set of all the attributes is always a superkey), but not a superkey, as removing e.g. D would still make it a superkey.
  • {A, B} is a superkey and a key, as neither {A} nor {B} are keys.
  • {A} is not a key, and not a superkey: multiple tuples have the value 1.
Solution 2.21
For this relation, {A, B, C, D}, {A, B, C}, and {D} are superkey. Only the latter, {D}, is a key (for {A, B, C}, removing either A or C still gives a superkey).
Solution 2.22
Possible superkeys are {A, B, C, D}, {A, B, C}, {A, C, D}, {B, C, D}, {A, B}, {B, C} . The possible keys are {A, B} {A, C}, and {B, C}.
Solution 2.23
  1. For the BUILDING relation: <“A.H”, “123 Main St.”>, <“U.H.”, “123 Main St.”>. For the ROOM relation: <12, “A.H.”>, <15, “A.H.”>.
  2. INSERT <"A.H.", NULL> would violate the requirement not to have two tuples with the same value for the attributes that constitute the primary key in the BUILDING relation. UPDATE ROOM with CODE = 12 to Building = "G.C.C." would create an entry referencing a name in the BUILDING relation that does not exist.
Solution 2.24
  1. The relations would be drawn as follows:

MOVIE(Title (PK), Year) CHARACTER(Name(PK), First_Appearance (FK referencing MOVIE.Title))  

  1. Inserting <“Ash”, “Evil Dead”> into the CHARACTER relation would cause an error if the database was empty, since no movie with the primary key “Evil Dead” has been introduced yet: this would be a referential integrity constraint violation. To violate the entity integrity constraint, it would suffice to insert the value <NULL, 2019> into the MOVIE relation.


Problem 2.1 (Find a candidate key for the CLASS relation)

Consider the relation representing classes taught in a university:

CLASS(Major, Number, Section, Instructor, Term, Year, Time, Weekdays, Room)

The goal is to be able to have multiple offerings (classes) of courses over several semesters. Here are some examples of values for the attributes:

Attribute Possible Value
Number 1301, 3401, 1201, …
Section A, B, C, …
Instructor John Smith, Sophie Adams, …
Term Spring, Fall, …
Year 1990, 2010, …
Time 1400, 1230, 0900, …
Weekdays M, MW, MWF, …
Room UH 120, GCC 3014, …

List three possible candidate keys and describe under what conditions each candidate key would be valid.

Problem 2.2 (Design a relational model for a cinema company)

A cinema company wants you to design a relational model for the following set-up:

  • The company has movie stars. Each star has a name, birth date, and unique ID.
  • The company has the following information about movies: title, year, length, and genre. Each movie has a unique ID and features multiple stars.
  • The company owns movie theaters as well. Each theater has a name, address, and a unique ID.
  • Furthermore, each theater has a set of auditoriums. Each auditorium has a unique number, and seating capacity.
  • Each theater can schedule movies at show-times. Each show-time has a unique ID, a start time, is for a specific movie, and is in a specific theater auditorium.
  • The company sells tickets for scheduled show-times. Each ticket has a unique ticket ID and a price.

Problem 2.3 (Design a relational model for bills)

Propose a relational model for the following situation:

  • The database will be used to store all of the bills that are debated and voted on by the U.S. House of Representatives (HR). Each bill has a name, a unique sponsor who must be a member of the HR, and an optional date of when it was discussed.
  • It must record the name, political group, and beginning and expected end-of-term dates for each HR member.
  • It will also record the names of the main HR positions: Speaker, Majority Leader, Minority Leader, Majority Whip, and Minority Whip.
  • Finally, it will record the vote of every member of the HR for each bill.

Problem 2.4 (Relational model for universities)

Propose a relational model for the following situation:

  • You want to store information about multiple universities. A university has multiple departments, a name and a website.
  • Each department offers multiple courses. A course has a name, one (or multiple, when it is cross-listed) code, a number of credit hours.
  • A campus has a name, an address, and belong to one university.
  • A department has a contact address, a date of creation and a unique code.

Problem 2.5 (Relational model for an auction website)

We want to design a relational model for an auction website. Members (that can be buyers, sellers, both or neither) can participate in the sale of items.

  • Members are identified by a unique identifier and have an email address and a nickname.
  • Buyers have a unique identifier, a preferred method of payment and a shipping address.
  • Sellers have a unique identifier, a rating and a bank account number.
  • Items are offered by a seller for sale and are identified by a unique item number. Items also have a name and a starting bid price.
  • Members make bids for items that are for sale. Each bid has a unique identifier, a bidding price and a timestamp.

When creating your schema, do not add any new information, and try as much as possible to avoid relations that will create redundant data and NULL entries. Note that we should be able to uniquely determine the member account linked to the seller account, and similarly for buyers accounts. Furthermore, members can have at most one buyer and one seller account.

Problem 2.6 (Relational model for a pet shelter)

We want to design a relational model for an animal shelter, with three goals in mind: to keep track of the pets currently sheltered, of the veterinarian for each type of pet, and of each pet’s favorite toy (needed during a visit to the veterinarian!).

Follow the specification below:

  • An animal has a type (cat, fish, dog, etc.), an arrival date, a name, and an id number.
  • Every type of animal has a veterinarian.
  • A veterinarian has a name, a phone number, an email address, and a postal address.
  • Multiple types of animals can have the same veterinarian.
  • A toy has a location, a description, a name, and is best suited for a particular type of animal.
  • Each animal has at most one preferred toy.

When creating your schema (that you can draw at the back of previous page), do not add any new information (except possibly “id” attributes), and try as much as possible to avoid relations that will create redundant data and NULL entries. Identify the primary key for each relation that you create. When you are done, answer the true / false question below.

With your model … Yes No
…it is possible to determine which pet don’t have a favorite toy.    
…it is possible to determine what is the average stay in the shelter.    
…it is possible to determine if a pet’s favorite toy is best suited for their type.    
…it is possible for multiple types of animal to have the same veterinarian.    
…it is possible for multiple veterinarians to be attributed to the same type.    

Solutions to Selected Problems

Solution to Problem 2.1 (Find a candidate key for the CLASS relation)

We discuss four possible choices:

  1. {Major, Number, Section, Year} This key would be valid if there was only 1 semester per year.
  2. {Instructor, Term} This key would be valid if instructors were always teaching the same unique class each term (i.e., an instructor only teaching CSCI 3410 in the Fall, and nobody else teaching it during Fall).
  3. {Room, Weekdays, Time} This key would be valid if the same room was used all the time (accross years, and terms) for the same class. Note also that remote classes would probably become problematic.
  4. {Major, Number, Term, Year} This key would be valid if no two sections of the same class was offered at the same time.

All in all, {Major, Number, Term, Year, Section} seems like the safest choice.

Solution to Problem 2.2 (Design a relational model for a cinema company)

A possible solution is:

STAR(ID (PK), Name, BirthDate) MOVIE(ID (PK), Title, Year, Length, Genre) FEATURE-IN(StarId (PK, FK to STAR.ID), MovieId (PK, FK to MOVIE.ID)) THEATER(ID (PK), Name, Address) AUDITORIUM(ID (PK), Capacity, Theater (FK to THEATER.ID)) SHOWTIME(ID (PK), MovieId (FK to MOVIE.ID), AuditoriumId (FK to AUDITORIUM.ID), StartTime) TICKETS(ID (PK), ShowTimeId (FK to SHOWTIME.ID), Price)

Solution to Problem 2.3 (Design a relational model for bills)

Be careful: saying that a bill has a unique sponsor does not imply that a the sponsor is a good primary key for the bills: a house member could very well be the sponsor of multiple bills! It just implies that a single attribute is enough to hold the name of the sponsor.

BILL(Name, Sponsor (FK to MEMBER.ID), Date, ID (PK)) MEMBER(Name, Political Group, BTerm, ETerm, ID (PK)) REPRESENTATIVE(Role (PK), Member (FK to MEMBER.ID)) VOTE(Bill (PK, FK to BILL.ID), Member (PK, FK to MEMBER.ID), Vote)  

For simplicity, we added an ID to our MEMBER and BILL relations. Note that having a “role” in the MEMBER relation to store the information about speaker, etc., would be extremely inefficient, since we would add an attribute to the ~435 members that would be NULL in ~430 of them.

Solution to Problem 2.4 (Relational model for universities)

A possible solution follows. The part that is the hardest to accomodate is the fact that a course can have multiple codes. We are reading here “cross-listed” as “a course that is offered under more than one departmental heading and can receive different codes (e.g., CSCI XXXX and AIST YYYY)”.

UNIVERSITY (Name (PK), Website) CAMPUS (Address (PK), University (FK to UNIVERSITY.Name)) DEPARTMENT (Code (PK), Contact, CreationDate, University (FK to UNIVERSITY.Name)) COURSE (Name (PK), CreditHours) OFFERING (Department (PK, FK to DEPARTMENT.Name), Course (PK, FK to COURSE.Name), Code)

Solution to Problem 2.6 (Relational model for a pet shelter)

A possible solution follows.

TYPE(Veterinarian(FK to VETERINARIAN.Id), Name (PK)) VETERINARIAN (Name, Phone, Email, Address, Id (PK)) ANIMAL (Name, ArrivalDate, FavoriteToy (FK to TOY.ID), Type (FK to TYPE.Name), Id (PK)) TOY (Id (PK), Location, Description, Name, BestSuited (FK to TYPE.Name))  

In this model,
…it is possible to determine which pet don’t have a favorite toy.
…it is not possible to determine what is the average stay in the shelter, because their exit date is not stored.
…it is possible to determine if a pet’s favorite toy is best suited for their type.
…it is possible for multiple types of animal to have the same veterinarian, as the same value for “Veterinarian” could occur in multiple tuples in the TYPE relation. If both “Veterinarian” and “Name” were parts of the primary key, then that would not be the case.
…it is not possible for multiple veterinarians to be attributed to the same type, as the name of the type is the primary key in the TYPE relation.

The SQL Programming Language


This chapter will be “code-driven”: the code will illustrate and help you understand some concepts. You may want to have a look at the “Setting Up Your Work Environment” Section as early as possible in this lecture. On top of being a step-by-step guide to install and configure a relational database managment system, it contains a list of useful links.



  • There are other models than relational: document-based, graph, column-based, and key-value models. Those corresponds to the “NoSQL” data-model, that are often more flexible, but only defined by opposition. They will be studied separately, in the Presentation of NoSQL Chapter.
  • The most commons DBMS are relational database management system (RDBMS): Most of them supports semi-structured data, i.e., models that are not strictly speaking relational, some are “multi-model DBMS”.
  • The Structured Query Language (SQL) is the language for RDBMS, it is made of 4 sublanguages:
    • Data Query Language,
    • Data Definition Language (schema creation and modification),
    • Data Control Language (authorizations, users),
    • Data Manipulation Language (insert, update and delete).


Yet Another Vocabulary

“Common” / Relational SQL
“Set of databases” Catalog (named collection of schema)7
“Database” Schema
Relation Table
Tuple Row
Attribute Column, or Field

Schema Elements

A schema is made of

  • Tables (≈ relation)
  • Type (≈ datatype)
  • Domain (≈ more complex datatype)
  • View (result set of a stored query on the data, ≈ saved search)
  • Assertion (constraints, transition constraints)
  • Triggers (tool to automate certain actions after pre-defined operations are performed)
  • Stored procedures (≈ functions)

Type and domains are two different things in some implementations, cf. for instance PostgreSQL, where a domain is defined to be essentially a datatype with constraint.8


SQL is a programming language: it has a strict syntax, sometimes cryptic error messages, it evolves, etc. Some of its salient aspects are:


The following is an adaptation of, the canonical source being MySQL’s documentation:

  • For integer types, you can use INTEGER (or its short-hand notation INT) or SMALLINT.
  • For floating-point types, you can use FLOAT and DOUBLE (or its synonym, REAL). MySQL also allows the syntax FLOAT(M,D) or REAL(M,D), where the values can be stored up to M digits in total where D represents the decimal point.
  • For monetary amounts, it is recommended to use DECIMAL(10, 2) (or its synonym in MySQL NUMERIC).
  • Characters can be stored using CHAR and VARCHAR: the length (resp. maximal length) of the CHAR (resp. VARCHAR) has to be declared, and CHAR are right-padded with spaces to the specified length. Historically, 255 was the size used, because it is the largest number of characters that can be counted with an 8-bit number, but, whenever possible, the “right size” should be used.
  • You can store a single bit using BIT(1), and a boolean using BOOLEAN (or BOOL, both actually being aliases for TINYINT(1)).
  • For date and time types, you can use DATE, TIME, DATETIME and TIMESTAMP (which convert the current day / time to from the current time zone to UTC).

There are many other datatypes, but they really depends on the particular implementation, so we will not consider them too much.

First Commands

/* code/sql/HW_Faculty.sql */
-- We first drop the schema if it already exists:

-- Then we create the schema:

Or we could have use the syntax:

-- Now, let us create a table in it:
  Fname VARCHAR(15),
   No String!
   The value "15" vas picked randomly, any value below 255 would
   more or less do the same. Note that declaring extremely large
   values without using them can impact the performance of
   your database, cf. for instance
  Room INT,
   shorthand for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC
   The "REAL" datatype is like the "DOUBLE" datatype of C# (they are actually synonyms in SQL):
   more precise than the "FLOAT" datatype, but not as exact as the "NUMERIC" datatype.
  Title CHAR(3),
  -- fixed-length string, padded with blanks if needed
  Tenured BIT(1),
  -- True / False (= 0) / Unknown
  Hiring DATE,
   The DATE is always supposed to be entered in a YEAR/MONTH/DAY variation.
   To tune the way it will be displayed, you can use the "DATE_FORMAT" function
   but you can enter those values only using the "standard" literals
   (cf. )
  Last_seen TIME,
  FavoriteFruit ENUM ('apple', 'orange', 'pear'),