CSCI 3410 - Database Systems

Lecture Notes

Clément Aubert

August 8, 2022 (05:47:16 PM)

List of Problems

Preamble

Disclaimer

As of August 2022, the main author of those notes (Dr. Aubert) is not scheduled to teach CSCI 3410 - Database Systems in the forseeable future. As a result, those notes are archived.

Please, also note that pandoc-include-code and pandoc-numbering, required to compile those notes, are not compatible with the current version of pandoc (cf. the INSTALL.md instructions). As a result, compiling those notes will require increasing version tinkering.

How to Use This Guide

How to Read This Guide

These lecture notes are written in an elusive style: they are a support for the explanations that will be made at the board. Reading them before coming to the lecture will help you getting a sense of the next topic we will be discussing, but you may sometimes have trouble deciphering their … unique style.

On top of the notes, you will find in this document:

Any feedback is greatly appreciated. Please refer to https://spots.augusta.edu/caubert/db/ln/README.html#contributing for how to contribute to those notes. The syllabus is at https://spots.augusta.edu/caubert/db/, and the webpage for those notes is at https://spots.augusta.edu/caubert/db/ln/.

Please, refer to those notes using this entry (Aubert 2019):

@report{AubertCSCI3410-DatabaseSystems,
	author={Aubert, Clément},
	title={CSCI 3410 - Database Systems},
	url={https://spots.augusta.edu/caubert/db/ln/}, 
	urldate={2019-11-03},
	year={2019},
	institution={{School of Computer and Cyber Sciences, Augusta University}},
	location={Augusta, Georgia, USA},
	langid={en},
	type={Lecture notes}
}
entry.bib

How to Access the Code in This Guide

There are four way to access the code shared in those lecture notes:

  1. You can simply copy-and-paste it from the document and use it as it is.
  2. You can browse the source code of the code snippets at https://rocketgit.com/user/caubert/CSCI_3410/source/tree/branch/master/tree/notes/code to download them directly.
  3. You can clone the repository containing the notes, figures and code snippets to have a local copy of it. You can find instructions on how to do that at https://spots.augusta.edu/caubert/db/ln/README.html. Instructions on how to compile those notes and how to contribute are linked from this document, if you are curious.
  4. You can use the links enclosed in the document.

For this latter aspect, note that some portion of code starts with a path in comment, and are followed by a link, like so:

/* code/sql/HW_HelloWorld.sql */
SELECT "Hello World!";
HW_HelloWorld.sql

This means that this code can be found at

https://rocketgit.com/user/caubert/CSCI_3410/source/tree/branch/master/
blob/notes/code/sql/HW_HelloWorld.sql

and that you can click the link below the code directly to access it1.

The SQL code frequently starts with

DROP SCHEMA IF EXISTS HW_NAME_OF_SCHEMA;
CREATE SCHEMA HW_NAME_OF_SCHEMA;
USE HW_NAME_OF_SCHEMA;

This parts starts by deleting the schema HW_NAME_OF_SCHEMA if it exists, then create and use it: it allows the code to run independently of your installation. It needs to be used with care, though, since it would delete everything you have in the HW_NAME_OF_SCHEMA schema before re-creating it, but empty.

Finally, the comments

-- start snippet something

and

-- end snippet something

can be ignored, as their are an artifice from pandoc-include-code to select which portion of the code to display in those notes.

Planned Schedule

A typical (meeting twice a week, ±17 weeks, ±30 classes) semester is divided as follows:

For information purposes, an indication like this:


marks the (usual) separation between two lectures.

Exams Yearbooks

To give you a sense of what you will be asked during the exams, quizzes and projects, or simply to practise, please find below the exams given previous semesters, in reverse chronological order. The quizzes are not indicated, but were generally a mix of up to five exercises and one problem from the relevant chapter(s).

Fall 2021

Spring 2021

Fall 2020

Spring 2020

Due to the Covid-19 pandemic, only one exam took place, and the final exam was taken remotely on D2L. A second project, more ambitious, was also asked from the students, and accounted for a large portion of their grade.

Fall 2019

Spring 2019

Spring 2018

Fall 2017

Typesetting and Acknowledgments

The source code for those notes is hosted at rocketgit, typeset in markdown, and then compiled using pandoc and multiple filters (pandoc-numbering, the citeproc library, pandoc-include-code). The drawings use various LaTeX packages, including PGF, TikZ, tikz-er2, pgf-umlcd and tikz-dependency. The help from the TeX - LaTeX Stack Exchange community greatly improved this document. The u͟n͟d͟e͟r͟l͟i͟n͟e͟3 text is obtained using YayText, the unicode symbols are searched in the “Unicode characters and corresponding LaTeX math mode commands”. Finally, the pdf version of the document uses Linux Libertine fonts, the html version uses Futura.

Those lecture notes were created under an Affordable Learning Georgia Mini-Grant for Ancillary Materials Creation and Revision (Proposal M71).

Affordable Learning Georgia 

Those lecture notes have greatly benefited from the contributions of many students, included but not limited to Crystal Anderson, Bobby Mcmanus, Minh Nguyen and Poonam Veeral. Additionally, (Redacted), Mark Holcomb, Assya Sellak, Sydney Strong and Patrick Woolard helped smash some bugs in the tools used to produce this document.

Please refer to https://spots.augusta.edu/caubert/db/ln/README.html#authors-and-contributors for a detail of the contributions.

Resources

You can find at the end of this document the list of references, and some particular resources listed at the beginning of each chapter. Let me introduce some of them:

Those resources are listed as complements, but it is not require to read them to understand the content of those notes. (Watt and Eng 2014) –being available free of charge– is more descriptive than the current notes, and as such can constitutes a great complement. Unfortunately, it lacks some technical aspects, and the database program aspect is not discussed in detail.

Introduction

Resources

The Need for a Specialized Tool

There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.

In theory, pretty much any programming language can be used to

But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.

In those lecture notes, we will introduce one of this tool–the SQL programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL offered by the NoSQL approach, as well as the limitations and highlights of both models.

What is a Database?

A database (DB) is a collection of related data.

It has two components, the data (= information, can be anything, really) and the management (= logical organization) of the data, generally through a Database Management System.

A database

  1. Represents a mini-world, a “Universe of Disclosure” (UoD).
  2. Is logically coherent, with a meaning.
  3. Has been populated for a purpose.

The mini-world is the part of the world, or universe, that will be represented in the database: as we can not represent the whole universe (every position of every atom at any given moment since the big-bang!), we must agree on what “slice” of it we should represent. Typically, a data-base designed to help in calculating students’ grades will include students’ names, transcript, classes taken, etc., but certainly not their height, favorite color or where they usually seat in class: altough all of this information is “part of the universe”, we will not need it and decide to exclude it from our data.

A DBMS has multiple components, as follows:

A Simplified DBMS

Note that

Database Management System (DBMS)

A DBMS contains a general purpose software that is used to

  1. Define (= datatype, constraints, structures, etc.)
  2. Construct / Create the data (= store the data)
  3. Manipulate / Maintain (= change the structure, query the data, update it, etc.)
  4. Share / Control access (= among users, applications)

You can think of a tool to

  1. Specify a storage unit,
  2. Fill it,
  3. Allow to change its content, as well as its organization,
  4. Allow multiple persons to access all or parts of it at the same time.

How Are the Tasks Distributed?

Exactly like a program can have

a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.

Role Task
Client Specify the business statement, the specifications
DB Administrator Install, configure, secure and maintain up-to-date the DBMS
Designer Lay out the global organization of the data
Programmer Implement the database, work on the programs that will interface with it
User Provide, search, and edit the data (usually)

In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.

Life of a Project

From the business statement to the usage, a project generally follows one of this path:

The life of a project

Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.


An Example

Let us consider the following:

STUDENT

Name Student_number Class Major
Morgan 18 2 IT
Bob 17 1 CS

COURSE

Course_name Course_number Credit_hours Department
Intro. to CS 1301 4 CS
DB Systems 3401 3 CS
Principles of Scripting and Automation 2120 3 AIST

SECTION

Section_identifier Course_num Semster Year Instructor
2910 1301 Fall 2019 Kate
9230 2103 Spring 2020 Todd

GRADE_REPORT

Student_number Section_identifier Grade
17 2910 A
18 2910 B

PREREQUISITE

Course_number Prerequisite_number
2120 1301
1302 1301

You can describe the structure as a collection of relations, and a collection of columns:

RELATIONS

Relation Name Number of Columns
STUDENT 4
COURSE 4
SECCTION 5
GRADE_REPORT 3
PREREQUISITE 2

COLUMNS

Column Name Datatype Belongs to relation
Name String STUDENT
Student_number Integer STUDENT
Class String STUDENT
Major String STUDENT
Course_name String COURSE
Course_number Integer COURSE
Credit_hours Integer COURSE
Department String COURSE
Prerequisite_number Integer PREREQUISITE

Structure

  • Database structure and records, 5 files (=collection of records), each containing data records of the same type, stored in a persistent way.
  • Each record has a structure, different data elements, each has a data type.
  • Records have relationships between them (for instance, you expect the Course_number of PREREQUISITE to occur as a Course_number in COURSE).

Interactions

  • This organization will allow some interactions. For instance, we can obtain the answer to questions like

    “What is the name of the course whose number is 1301?”,
    “What courses is Kate teaching this semester?”,
    “Does Bob meets the pre-requisite for 2910?”

    Note that this last query is a bit different, as it forces us to look up information in multiple relations.

  • We should also be able to perform updates, removal, addition of records in an efficient way (using auxiliary files (indexes), optimization).

  • Finally, selection (for any operation) requires care: do we want all the records, some of them, exactly one?

Organization

Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:

  • To avoid redundancy (“data normalization”), or having it controlled,
  • To controle multiple levels of access (multiple user interface),
  • Without sacrificing the usability!

In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.

How Is a Database Conceived?

  1. Specification and analysis. “Each student number will be unique, but they can have the same name. We want to access the letter grade, but not the numerical grade”, etc. This gives the businnes statement.
  2. Conceptual design
  3. Logical design
  4. Physical design

There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.

Characteristics of the Database Approach

  1. A database is more than just data: it also contains a complete description of the structure and constraints. We generally have a catalog (a.k.a. the meta-data, the schema) and the data (we can also have self-describing data, where meta-data and data are interleaved, but note that both are still present).
  2. Data-abstraction: A DBMS provides a conceptual representation, and hides implementation details. This implies that changing the internals of the database should not require to change the application (the DBMS) or the way any of the client (program, or CLI) was interacting with the data.
  3. Support of multiple views of the data: view is a subset of the database, or virtual data.
  4. Sharing and multiuser transaction processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety). Isolation, atomicity (all or nothing): cf. the ACID principles.

Exercises

Exercise 1.1

What is the difference between a database and the meta-data of the database?

Exercise 1.2

Is a pile of trash a database? Why, or why not?

Exercise 1.3

Define the word “miniworld”.

Exercise 1.4

Expand the acronym “DBMS”.

Exercise 1.5

Name two DBMS.

Exercise 1.6

Name the four different kinds of action that can be performed on data.

Exercise 1.7

Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:

Task Assigned to
Install a DBMS on a server.  
Sketch the schema so that the data will not be redundant.  
Write client-side application that uses the DBMS API.  
Establish the purpose of the database.  
Exercise 1.8

List some of the tasks assigned to the Database Administrator.

Exercise 1.9

Why do DBMS include concurrency control?

Exercise 1.10

Do I have to change my DBMS if I want to change the structure of my data?

Exercise 1.11

What is independence between program and data? Why does it matter?

Exercise 1.12

Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?

Exercise 1.13

Which one comes first, the physical design, the conceptual design, or the logical design?

Exercise 1.14

What is a virtual data? How can I access it?

Solution to Exercises

Solution 1.1

The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.

Solution 1.2

No, because it lacks a logical structure.

Solution 1.3

The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.

Solution 1.4

Database Management System

Solution 1.5

Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL”, “NoSQL”, “Relational Model”, or such: we are asking for the names of actual softwares!

Solution 1.6

The four actions are:

  • Add / Insert
  • Update / Modify
  • Search / Query
  • Delete / Remove
Solution 1.7

We can have something like:

Task Assigned to
Install a DBMS on a server. Administrator, IT service
Sketch the schema so that the data will not be redundant. Designer
Write client-side application that uses the DBMS API. Programmer, Developer
Establish the purpose of the database. Client, business owner
Solution 1.8

The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.

Solution 1.9

DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.

Solution 1.10

Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model”, e.g., one of the NoSQL models, can require to change the DBMS.

Solution 1.11

The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.

Solution 1.12

If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.

Solution 1.13

The conceptual design.

Solution 1.14

It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.

Problems

Problem 1.1 (Define a database for CAMPUS)

Define a CAMPUS database organized into three files as follows:

  • A BUILDING file storing the name and GPS coordinates of each building.
  • A ROOM file storing the building, number and floor of each room.
  • A PROF file storing the name, phone number, email and room number where the office is located for each professor.
Pb 1.1 – Question 1

A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, fig. 1.3) or (Elmasri and Navathe 2015, fig. 1.3). Write the database catalog corresponding to the CAMPUS database.

Pb 1.1 – Question 2

Invent data for such a database, with two buildings, three rooms and two professors.

Pb 1.1 – Question 3

Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS database, which is an up-to-date and accurate representation of its miniworld:

  1. Is it possible to list all the professors?
  2. Is it possible to tell in which department is a professor?
  3. Is it possible to get the office hours of a professor?
  4. Is it possible to list all the professors whose offices are in the same building?
  5. Is it possible to list all the rooms?
  6. If a new professor arrives, and has to share his office with another professor, do you have to revise your database catalog?
  7. Can you list which professors are at the same floor?
  8. Can you tell which professor has the highest evaluations?

Solutions to Selected Problems

Solution to Problem 1.1 (Define a database for CAMPUS)
Pb 1.1 – Solution to Q. 1

The database catalog should be similar to the following:

RELATIONS

Relation name Number of columns
BUILDING 3
ROOM 3
PROF 4

COLUMNS

Column name Datatype Belongs to relation
Building_Name Character(30) Building
GPSLat Decimal(9,6) Building
GPSLon Decimal(9,6) Building
Building_Name Character(30) ROOM
Room_Number Integer(1) ROOM
Floor Integer (1) ROOM
Prof_Name Character (30) PROF
Phone Integer (10) PROF
Email Character (30) PROF
Room_Number Integer (1) PROF
Pb 1.1 – Solution to Q. 2

For the data, you could have:

  • For the BUILDING file, we could have:
(Allgood Hall, 33.47520, -82.02503)
(Institut Galilé,  48.959001, 2.339999)
  • For the ROOM file, we could have:
(Allgood Hall, 128, 1)
(Institut Galilé, 205, 3)
(Allgood Hall, 228, 2)
  • For the PROF file, we could have:
(Aubert, 839401, dae@ipn.net, 128)
(Mazza, 938130, Dm@fai.net, 205)
Pb 1.1 – Solution to Q. 3

If everything we knew about the campus came from that database, then

  1. Yes, we could list all the professors.
  2. No, we could not tell in which department is a professor.
  3. No, we could not get the office hours of a professor.
  4. Yes, we could list all the professors whose offices are in the same building.
  5. Yes, we could list all the rooms.
  6. If a new professor arrives, and has to share his office with another professor, we would not have to revise our database catalog (it is fine for two professor to have the same room number, in our model).
  7. Yes, we could list which professors are at the same floor.
  8. No, we could not tell which professor has the highest evaluations.

The Relational Model

Resources

Concepts

Terminology

The relational data model (or relational database schema) is:

Domains, Attributes, Tuples and Relations

Definitions

  • Domain (or type) = set of atomic (as far as the relation is concerned) values. You can compare it to datatype and literals, and indeed it can be given in the form of a data type, but it can be named and carry a logical definition (i.e., List_of_major as an enumerated data type, instead of just String), enforce some constraints (i.e., UNIQUE, to force all the values to be different), or even have a default value.
  • Attribute = Attribute name + attribute domain (but we’ll just write the name).
  • Relation Schema (or scheme) = description of a relation, often written “RELATION_NAME(Attribute1, …, Attributen)”, where n is the degre (arity) of the relation, and the domain of Attributei is written dom(Attributei).
  • Tuple t of the schema R(A1, …, An) is an ordered list of values <v1, …, vn> where vi is in dom(Ai) or a special NULL value.
  • Relation (or relation state) r of the schema R(A1, …, An), also written r(R), is the set of n-tuples t1, …, tm where each ti is a tuple of the schema R(A1, …, An).

Characteristics of Relations

  • In a relation, the order of tuples does not matter (a relation is a set). Order in tuple do matter (alternate representation where this is not true exist, cf. self-describing data).
  • Value is atomic = “flat relational model”, we will always be in the first normal form (not composite, not multi-valued).
  • NULL is N/A, unknown, unavailable (or withheld).
  • While a relation schema is to be read like an assertion (e.g., “Every student has a name, a SSN, …”) a tuple is a fact (e.g., “The student Bob Taylor has SSN 12898, …”).
  • Relations represents uniformly entities (STUDENT(…)) and relations (PREREQUISITE(Course_number, Prerequisite_number)).

Notation

  • STUDENT = relation schema + current relation state
  • STUDENT(Name, …, Major) = relation schema only
  • STUDENT.Name = Attribute Name in the relation STUDENT
  • t[Name], t[Name, Major], t.Name (overloading the previous notation) for the value of Name (and Major) in the tuple t.

Constraints

We now study constraints on the tuples. There are constraints on the scheme, for instance, “a relation cannot have two attributes with the same name”, but we studied those already. The goal of those constraints is to maintain the validity of the relations, and to enforce particular connexions between relations.

Inherent Model-Based Constraints (implicit)

Those are part of the definition of the relational model and are independent of the particular relation we are looking at.

  • You can not have two identical tuples in the same relation,
  • The arity of the tuple must match the arity of the relation.

Schema-Based Constraints (explicit)

Those constraints are parts of the schema.

  • The value must match its domain (“Domain constraint”), knowing that a domain can have additional constraints (NOT NULL, UNIQUE).
  • The entity integrity constraint: no primary key value can be NULL5.
  • The referential integrity constraint: referred values must exist.

Those last two constraints will be studied in the next section.

Application-Based Constraints (semantics)

Constraints that cannot be expressed in the schema, and hence must be enforced by

  • the application program,
  • or the database itself, using triggers or assertions.

Examples: “the age of an employee must be greater than 16”, “this year’s salary increase must be more than last year’s”.

Keys

Since we can not have two identical tuples in the same relation, there must be a subset of values that distinguish them. We study the corresponding subset of attributes.

Let us consider the following example:

A B C D
Yellow Square 10 (5, 3)
Blue Rectangle 10 (3, 9)
Blue Circle 9 (4, 6)

and the following sets of attributes:

{A, B, C, D} {A} {B, C} {D}
Superkey ?
Key ?

Note that here we “retro-fit” those definitions, in database design, they come first (i.e., you define what attributes should always distinguish between tuples before populating your database). We are making the assumption that the data pre-exist to the specification to make the concept clearer.

Foreign Keys

A foreign key (FK) is a set of attributes whose values must match the value in a tuple in another, pre-defined relation. Formally, the set of attributes FK in the relation schema R1 is a foreign key of R1 (“referencing relation”) that references R2 (“referenced relation”) if

If there is a foreign key from R1 to R2, then we say that there is a referential integrity constraint from R1 to R2. We draw it with an arrow from the FK to the PK. Note that it is possible that R1 = R2.


Example

CAR(VIN (PK), Make, Model, Year) DRIVER(State (PK), Licence_number (PK), Name, Address) INSURANCE(Policy_Number (PK), Insured_Car (FK to CAR.VIN), Insured_Driver_State (FK to DRIVER.State), Insured_Driver_Num (FK to DRIVER.Licence_number), Rate) PRICE(Stock_number (PK), Car_Vin (FK to CAR.VIN), Price, Margin)

Transactions and Operations

The operations you can perform on your data are of two kinds: retrievals and updates.

They are two constraints for updates:

  1. The new relation state must be “valid” (i.e., comply with the constraints).
  2. There might be transition constraints (your balance cannot become negative, for instance).

A transaction is a series of retrievals and updates performed by an application program, that leaves the database in a consistent state.

In the following, we give examples of insertion, deletion and update that could be performed, as well as how they could lead a database to become inconsistent. The annotations (1.), (2.) and (3.) refer to the “remedies”, discussed afterward.

Insert

Insert <109920, Honda, Accord, 2012> into CAR

How things can go wrong:

  • Inserting the values in the wrong order (meta)
  • NULL for any value of the attributes of the primary key (1.)
  • Duplicate value for all the values in the primary key (1.)
  • Wrong number of arguments (1.)
  • Fail to reference an existing value for a foreign key (1.)

Delete

Delete the DRIVER tuple with State = GA and Licence_number = 123

How things can go wrong:

  • Deleting tuples inadvertently (meta)
  • Deleting tuples that are referenced (1., 2., 3.)

Update (a.k.a. Modify)

Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges

How things can go wrong:

  • NULL for the any value of the attributes of the primary key (1.)
  • Duplicate value for the primary key (1.)
  • Change value that are referenced (1., 2., 3.)
  • Change foreign key to a non-existing value (1.)

Dealing with Violations

When the operation leads the database to become inconsistent, you can either:

  1. Reject (restrict) the operation,
  2. Cascade (propagate) the modification,
  3. Set default, or set NULL, the corresponding value(s).

Exercises

Exercise 2.1

What are the meta-data and the data called in the relational model?

Exercise 2.2

Connect the dots:

Row •   • Attribute
Column header •   • Tuple
Table •   • Relation
Exercise 2.3

What do we call the number of attributes in a relation?

Exercise 2.4

At the logical level, does the order of the tuples in a relation matter?

Exercise 2.5

What is the difference between a database schema and a database state?

Exercise 2.6

What should we put as a value in an attribute if its value is unknown?

Exercise 2.7

What, if any, is the difference between a superkey, a key, and a primary key?

Exercise 2.8

Name the two kinds of integrity that must be respected by the tuples in a relation.

Exercise 2.9

What is entity integrity? Why is it useful?

Exercise 2.10

Are we violating an integrity constraint if we try to set the value of an attribute that is part of a primary key to NULL? If yes, which one?

Exercise 2.11

If in a relation R1, an attribute A1 is a foreign key referencing an attribute A2 in a relation R2, what does this implies about A2?

Exercise 2.12

Give three examples of operations.

Exercise 2.13

What is the difference between an operation and a transaction?

Exercise 2.14

Consider the following two relations:

COMPUTER(Owner, RAM, Year, Brand)
OS(Name, Version, Architecture)

For each, give

  1. The arity of the relation,
  2. A (preferably plausible) example of tuple to insert.
Exercise 2.15

Give three different ways to deal with operations whose execution in isolation would result in the violation of one of the constraint.

Exercise 2.16

Define what is the domain constraint.

Exercise 2.17

Circle the correct statements:

  • Every key is a superkey.
  • Every superkey is a singleton.
  • Every singleton is either a superkey, or a key.
  • Every primary key is a key.
  • Every superkey with one element is a key.
Exercise 2.18

Consider the following three relations:

AUTHOR(Ref, Name, Address) BOOK(ISSN, AuthorRef, Title) GAINED-AWARD(Ref, Name, BookISSN, Year)  

For each relation, answer the following:

  1. What is, presumably, the primary key?
  2. Are they, presumably, any foreign key?
  3. Using the model you defined, could we determine which author won the greatest number of awards a particular year?
Exercise 2.19

Consider the following three relations

TRAIN(Ref (PK), Model, Year) CONDUCTOR(CompanyID (PK), Name, ExperienceLevel) ASSIGNED-TO(TrainRef (PK, FK to TRAIN.Ref), ConductorID (PK, FK to CONDUCTOR.CompanyID), Date (PK))  

  1. What are the foreign keys in the ASSIGNED-TO relation? What are they refering?

  2. In the ASSIGNED-TO relation, explain why the Date attribute is part of the primary key. What would happen if it was not?

  3. Assuming the database is empty, are the following instructions valid? If not, what integrity constraint are they violating?

    1. Insert <'AM-356', 'Surfliner', 2012> into TRAIN
    2. Insert <NULL, 'Graham Palmer', 'Senior'> into CONDUCTOR
    3. Insert <'XB-124', 'GPalmer', '02/04/2018'> into ASSIGNED-TO
    4. Insert <'BTed, 'Bobby Ted', 'Senior'> and <'BTed', 'Bobby Ted Jr.', 'Junior'> into CONDUCTOR
Exercise 2.20

Consider the following relation schema and state:

A B C D
2 Blue Austin true
1 Yellow Paris true
1 Purple Pisa false
2 Yellow Augusta true

Assuming that this is all the data we will ever have, discuss whenever {A, B, C, D}, {A, B} and {B} are superkeys and/or keys.

Exercise 2.21

Consider the following relation and possible state. Assuming that this is all the data we will ever have, give two superkeys, and one key, for this relation.

A B C D
1 Austin true Shelly
1 Paris true Cheryl
3 Pisa false Sheila
1 Augusta true Ash
1 Pisa true Linda
Exercise 2.22

Consider the following relation and possible state. Assuming that this is all the data we will ever have, give three superkeys for this relation, and, for each of them, indicate if they are a key as well.

A B C D
1 A Austin true
2 B Paris true
1 C Pisa false
2 C Augusta true
1 B Augusta true
Exercise 2.23

Consider the following two relations:

BUILDING(Name (PK), Address) ROOM(Code (PK), Building (FK to BUILDING.Name))  

  1. Give two possible tuples for the BUILDING relation, and two possible tuples for the ROOM relation such that the state is consistent.
  2. Based on the data you gave previously, write (in pseudo-code) one INSERT and one UPDATE instruction. Both should violate the integrity of your database.
Exercise 2.24

Consider the following two relations:

  • A Movie relation, with attributes “Title” and “Year”. The “Title” attribute should be the primary key.
  • A Character relation, with attributes “Name”, “First_Appearance”. The “Name” attribute should be the primary key, and the “First_Appearance” attribute should be a foreign key referencing the Movie relation.
  1. Draw its relational model.
  2. Give an example of data that would violate the integrity of your database, and name the kind of integrity you are violating.

Solution to Exercises

Solution 2.1

The meta-data is called the schema, and the data is called the relation state. You can refer to the diagram we studied at the beginnig of the Chapter for a reminder.

Solution 2.2

Row is Tuple, Column header is Attribute, Table is Relation.

Solution 2.3

The degree, or arity, of the relation.

Solution 2.4

No, it is a set.

Solution 2.5

The schema is the organization of the database (the meta-data), while the state is the state is the content of the database (the data).

Solution 2.6

NULL

Solution 2.7

A superkey is a subset of attributes such that no two tuples have the same combination of values for all those attributes. A key is a minimal superkey, i.e., a superkey from which we cannot remove any attribute without losing the uniqueness constraint. The primary key is one of the candidate key, i.e., the key that was chosen.

Solution 2.8

Referential integrity and entity integrity.

Solution 2.9

Entity integrity ensures that each row of a table has a unique and non-null primary key value. It allows to make sure that every tuple is different from the others, and helps to “pick” elements in the database.

Solution 2.10

Yes, the entity integrity constraint.

Solution 2.11

Then we know that A2 is the primary key of R2, and that A1 and A2 have the same domain.

Solution 2.12

Reading from the database, performing UPDATE or DELETE operations.

Solution 2.13

An operation is an “atomic action” that can be performed on the database (adding an element, updating a value, removing an element, etc.). A transaction is a series of such operations, and the assumption is that, even if it can be made of operations that, taken individually, could violate a constraint, the overall transaction will leave the database in a consistent state.

Solution 2.14
  1. The arities of the relations are: COMPUTER has for arity 4, and OS has for arity 3.
  2. Examples of tuple to insert are (“Linda McFather”, 32, 2017, “Purism”), and (“Debian”, “Stable”, “amd64”).
Solution 2.15

An operation whose execution in isolation would result in the violation of a constraint can either a) be “restricted” (i.e., not executed), b) result in a propagation (i.e., the tuples that would violate a constraint are updated or deleted accordingly), or c) result in some values in tuples that would violate a constraint to be set to a default value, or the NULL value (this last option works only if the constraint violated is the referential entity constraint).

Solution 2.16

The requirement that each tuple must have for an attribute A an atomic value from the domain dom(A), or NULL.

Solution 2.17

“Every key is a superkey.”, “Every primary key is a key.” and “Every superkey with one element is a key.” are correct statements.

Solution 2.18

To answer 1 and 2, the diagram would become:

AUTHOR(Ref (PK), Name, Address) BOOK(ISSN (PK), AuthorRef (FK to AUTHOR.REF), Title) GAINED-AWARD(Ref (PK), Name, BookISSN (FK to BOOK.ISSN), Year)  

For the last question, the answer is yes: based on the ISSN of the book, we can retrieve the author of the book. Hence, knowing which book was awarded which year, by looking in the GAINED-AWARD table, gives us the answer to that question.

Solution 2.19
  1. In ASSIGNED-TO, TrainRef is a FK to TRAIN.Ref, and ConductorID is a FK to CONDUCTOR.CompanyID.
  2. In this model, a conductor can be assigned to different trains on different days. If Date was not part of the PK of ASSIGNED-TO, then a conductor could be assigned to only one train.
    1. Yes, this instruction is valid.
    2. No, it violates the entity integrity constraint: NULL can be given as a value to an attribute that is part of the PK.
    3. No, it violates the referential integrity constraint: 'XB-124 and 'GPalmer' are not values in TRAIN.Ref and CONDUCTOR.CompanyID.
    4. No, it violates the key constraint: two tuples cannot have the same value for the values of the primary key.
Solution 2.20
  • {A, B, C, D} is a superkey (the set of all the attributes is always a superkey), but not a superkey, as removing e.g. D would still make it a superkey.
  • {A, B} is a superkey and a key, as neither {A} nor {B} are keys.
  • {A} is not a key, and not a superkey: multiple tuples have the value 1.
Solution 2.21
For this relation, {A, B, C, D}, {A, B, C}, and {D} are superkey. Only the latter, {D}, is a key (for {A, B, C}, removing either A or C still gives a superkey).
Solution 2.22
Possible superkeys are {A, B, C, D}, {A, B, C}, {A, C, D}, {B, C, D}, {A, B}, {B, C} . The possible keys are {A, B} {A, C}, and {B, C}.
Solution 2.23
  1. For the BUILDING relation: <“A.H”, “123 Main St.”>, <“U.H.”, “123 Main St.”>. For the ROOM relation: <12, “A.H.”>, <15, “A.H.”>.
  2. INSERT <"A.H.", NULL> would violate the requirement not to have two tuples with the same value for the attributes that constitute the primary key in the BUILDING relation. UPDATE ROOM with CODE = 12 to Building = "G.C.C." would create an entry referencing a name in the BUILDING relation that does not exist.
Solution 2.24
  1. The relations would be drawn as follows:

MOVIE(Title (PK), Year) CHARACTER(Name(PK), First_Appearance (FK referencing MOVIE.Title))  

  1. Inserting <“Ash”, “Evil Dead”> into the CHARACTER relation would cause an error if the database was empty, since no movie with the primary key “Evil Dead” has been introduced yet: this would be a referential integrity constraint violation. To violate the entity integrity constraint, it would suffice to insert the value <NULL, 2019> into the MOVIE relation.

Problems

Problem 2.1 (Find a candidate key for the CLASS relation)

Consider the relation representing classes taught in a university:

CLASS(Major, Number, Section, Instructor, Term, Year, Time, Weekdays, Room)

The goal is to be able to have multiple offerings (classes) of courses over several semesters. Here are some examples of values for the attributes:

Attribute Possible Value
Major CSCI, AIST, CYBER, HIST, …
Number 1301, 3401, 1201, …
Section A, B, C, …
Instructor John Smith, Sophie Adams, …
Term Spring, Fall, …
Year 1990, 2010, …
Time 1400, 1230, 0900, …
Weekdays M, MW, MWF, …
Room UH 120, GCC 3014, …

List three possible candidate keys and describe under what conditions each candidate key would be valid.


Problem 2.2 (Design a relational model for a cinema company)

A cinema company wants you to design a relational model for the following set-up:

  • The company has movie stars. Each star has a name, birth date, and unique ID.
  • The company has the following information about movies: title, year, length, and genre. Each movie has a unique ID and features multiple stars.
  • The company owns movie theaters as well. Each theater has a name, address, and a unique ID.
  • Furthermore, each theater has a set of auditoriums. Each auditorium has a unique number, and seating capacity.
  • Each theater can schedule movies at show-times. Each show-time has a unique ID, a start time, is for a specific movie, and is in a specific theater auditorium.
  • The company sells tickets for scheduled show-times. Each ticket has a unique ticket ID and a price.

Problem 2.3 (Design a relational model for bills)

Propose a relational model for the following situation:

  • The database will be used to store all of the bills that are debated and voted on by the U.S. House of Representatives (HR). Each bill has a name, a unique sponsor who must be a member of the HR, and an optional date of when it was discussed.
  • It must record the name, political group, and beginning and expected end-of-term dates for each HR member.
  • It will also record the names of the main HR positions: Speaker, Majority Leader, Minority Leader, Majority Whip, and Minority Whip.
  • Finally, it will record the vote of every member of the HR for each bill.

Problem 2.4 (Relational model for universities)

Propose a relational model for the following situation:

  • You want to store information about multiple universities. A university has multiple departments, a name and a website.
  • Each department offers multiple courses. A course has a name, one (or multiple, when it is cross-listed) code, a number of credit hours.
  • A campus has a name, an address, and belong to one university.
  • A department has a contact address, a date of creation and a unique code.

Problem 2.5 (Relational model for an auction website)

We want to design a relational model for an auction website. Members (that can be buyers, sellers, both or neither) can participate in the sale of items.

  • Members are identified by a unique identifier and have an email address and a nickname.
  • Buyers have a unique identifier, a preferred method of payment and a shipping address.
  • Sellers have a unique identifier, a rating and a bank account number.
  • Items are offered by a seller for sale and are identified by a unique item number. Items also have a name and a starting bid price.
  • Members make bids for items that are for sale. Each bid has a unique identifier, a bidding price and a timestamp.

When creating your schema, do not add any new information, and try as much as possible to avoid relations that will create redundant data and NULL entries. Note that we should be able to uniquely determine the member account linked to the seller account, and similarly for buyers accounts. Furthermore, members can have at most one buyer and one seller account.


Problem 2.6 (Relational model for a pet shelter)

We want to design a relational model for an animal shelter, with three goals in mind: to keep track of the pets currently sheltered, of the veterinarian for each type of pet, and of each pet’s favorite toy (needed during a visit to the veterinarian!).

Follow the specification below:

  • An animal has a type (cat, fish, dog, etc.), an arrival date, a name, and an id number.
  • Every type of animal has a veterinarian.
  • A veterinarian has a name, a phone number, an email address, and a postal address.
  • Multiple types of animals can have the same veterinarian.
  • A toy has a location, a description, a name, and is best suited for a particular type of animal.
  • Each animal has at most one preferred toy.

When creating your schema (that you can draw at the back of previous page), do not add any new information (except possibly “id” attributes), and try as much as possible to avoid relations that will create redundant data and NULL entries. Identify the primary key for each relation that you create. When you are done, answer the true / false question below.

With your model … Yes No
…it is possible to determine which pet don’t have a favorite toy.    
…it is possible to determine what is the average stay in the shelter.    
…it is possible to determine if a pet’s favorite toy is best suited for their type.    
…it is possible for multiple types of animal to have the same veterinarian.    
…it is possible for multiple veterinarians to be attributed to the same type.    

Solutions to Selected Problems

Solution to Problem 2.1 (Find a candidate key for the CLASS relation)

We discuss four possible choices:

  1. {Major, Number, Section, Year} This key would be valid if there was only 1 semester per year.
  2. {Instructor, Term} This key would be valid if instructors were always teaching the same unique class each term (i.e., an instructor only teaching CSCI 3410 in the Fall, and nobody else teaching it during Fall).
  3. {Room, Weekdays, Time} This key would be valid if the same room was used all the time (accross years, and terms) for the same class. Note also that remote classes would probably become problematic.
  4. {Major, Number, Term, Year} This key would be valid if no two sections of the same class was offered at the same time.

All in all, {Major, Number, Term, Year, Section} seems like the safest choice.


Solution to Problem 2.2 (Design a relational model for a cinema company)

A possible solution is:

STAR(ID (PK), Name, BirthDate) MOVIE(ID (PK), Title, Year, Length, Genre) FEATURE-IN(StarId (PK, FK to STAR.ID), MovieId (PK, FK to MOVIE.ID)) THEATER(ID (PK), Name, Address) AUDITORIUM(ID (PK), Capacity, Theater (FK to THEATER.ID)) SHOWTIME(ID (PK), MovieId (FK to MOVIE.ID), AuditoriumId (FK to AUDITORIUM.ID), StartTime) TICKETS(ID (PK), ShowTimeId (FK to SHOWTIME.ID), Price)


Solution to Problem 2.3 (Design a relational model for bills)

Be careful: saying that a bill has a unique sponsor does not imply that a the sponsor is a good primary key for the bills: a house member could very well be the sponsor of multiple bills! It just implies that a single attribute is enough to hold the name of the sponsor.

BILL(Name, Sponsor (FK to MEMBER.ID), Date, ID (PK)) MEMBER(Name, Political Group, BTerm, ETerm, ID (PK)) REPRESENTATIVE(Role (PK), Member (FK to MEMBER.ID)) VOTE(Bill (PK, FK to BILL.ID), Member (PK, FK to MEMBER.ID), Vote)  

For simplicity, we added an ID to our MEMBER and BILL relations. Note that having a “role” in the MEMBER relation to store the information about speaker, etc., would be extremely inefficient, since we would add an attribute to the ~435 members that would be NULL in ~430 of them.


Solution to Problem 2.4 (Relational model for universities)

A possible solution follows. The part that is the hardest to accomodate is the fact that a course can have multiple codes. We are reading here “cross-listed” as “a course that is offered under more than one departmental heading and can receive different codes (e.g., CSCI XXXX and AIST YYYY)”.

UNIVERSITY (Name (PK), Website) CAMPUS (Address (PK), University (FK to UNIVERSITY.Name)) DEPARTMENT (Code (PK), Contact, CreationDate, University (FK to UNIVERSITY.Name)) COURSE (Name (PK), CreditHours) OFFERING (Department (PK, FK to DEPARTMENT.Name), Course (PK, FK to COURSE.Name), Code)


Solution to Problem 2.6 (Relational model for a pet shelter)

A possible solution follows.

TYPE(Veterinarian(FK to VETERINARIAN.Id), Name (PK)) VETERINARIAN (Name, Phone, Email, Address, Id (PK)) ANIMAL (Name, ArrivalDate, FavoriteToy (FK to TOY.ID), Type (FK to TYPE.Name), Id (PK)) TOY (Id (PK), Location, Description, Name, BestSuited (FK to TYPE.Name))  

In this model,
…it is possible to determine which pet don’t have a favorite toy.
…it is not possible to determine what is the average stay in the shelter, because their exit date is not stored.
…it is possible to determine if a pet’s favorite toy is best suited for their type.
…it is possible for multiple types of animal to have the same veterinarian, as the same value for “Veterinarian” could occur in multiple tuples in the TYPE relation. If both “Veterinarian” and “Name” were parts of the primary key, then that would not be the case.
…it is not possible for multiple veterinarians to be attributed to the same type, as the name of the type is the primary key in the TYPE relation.

The SQL Programming Language

Resources

This chapter will be “code-driven”: the code will illustrate and help you understand some concepts. You may want to have a look at the “Setting Up Your Work Environment” Section as early as possible in this lecture. On top of being a step-by-step guide to install and configure a relational database managment system, it contains a list of useful links.

Actors

Technologies

  • There are other models than relational: document-based, graph, column-based, and key-value models. Those corresponds to the “NoSQL” data-model, that are often more flexible, but only defined by opposition. They will be studied separately, in the Presentation of NoSQL Chapter.
  • The most commons DBMS are relational database management system (RDBMS): Most of them supports semi-structured data, i.e., models that are not strictly speaking relational, some are “multi-model DBMS”.
  • The Structured Query Language (SQL) is the language for RDBMS, it is made of 4 sublanguages:
    • Data Query Language,
    • Data Definition Language (schema creation and modification),
    • Data Control Language (authorizations, users),
    • Data Manipulation Language (insert, update and delete).

SQL

Yet Another Vocabulary

“Common” / Relational SQL
“Set of databases” Catalog (named collection of schema)7
“Database” Schema
Relation Table
Tuple Row
Attribute Column, or Field

Schema Elements

A schema is made of

  • Tables (≈ relation)
  • Type (≈ datatype)
  • Domain (≈ more complex datatype)
  • View (result set of a stored query on the data, ≈ saved search)
  • Assertion (constraints, transition constraints)
  • Triggers (tool to automate certain actions after pre-defined operations are performed)
  • Stored procedures (≈ functions)

Type and domains are two different things in some implementations, cf. for instance PostgreSQL, where a domain is defined to be essentially a datatype with constraint.8

Syntax

SQL is a programming language: it has a strict syntax, sometimes cryptic error messages, it evolves, etc. Some of its salient aspects are:

Datatypes

The following is an adaptation of w3resource.com, the canonical source being MySQL’s documentation:

  • For integer types, you can use INTEGER (or its short-hand notation INT) or SMALLINT.
  • For floating-point types, you can use FLOAT and DOUBLE (or its synonym, REAL). MySQL also allows the syntax FLOAT(M,D) or REAL(M,D), where the values can be stored up to M digits in total where D represents the decimal point.
  • For monetary amounts, it is recommended to use DECIMAL(10, 2) (or its synonym in MySQL NUMERIC).
  • Characters can be stored using CHAR and VARCHAR: the length (resp. maximal length) of the CHAR (resp. VARCHAR) has to be declared, and CHAR are right-padded with spaces to the specified length. Historically, 255 was the size used, because it is the largest number of characters that can be counted with an 8-bit number, but, whenever possible, the “right size” should be used.
  • You can store a single bit using BIT(1), and a boolean using BOOLEAN (or BOOL, both actually being aliases for TINYINT(1)).
  • For date and time types, you can use DATE, TIME, DATETIME and TIMESTAMP (which convert the current day / time to from the current time zone to UTC).

There are many other datatypes, but they really depends on the particular implementation, so we will not consider them too much.

First Commands

/* code/sql/HW_Faculty.sql */
-- We first drop the schema if it already exists:
DROP SCHEMA IF EXISTS HW_Faculty;

-- Then we create the schema:
CREATE SCHEMA HW_Faculty;


/*
Or we could have use the syntax:

CREATE DATABASE HW_FACUTLY;
 */
-- Now, let us create a table in it:
CREATE TABLE HW_Faculty.PROF (
  Fname VARCHAR(15),
  /*
   No String!
   The value "15" vas picked randomly, any value below 255 would
   more or less do the same. Note that declaring extremely large
   values without using them can impact the performance of
   your database, cf. for instance https://dba.stackexchange.com/a/162117/
   */
  Room INT,
  /*
   shorthand for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC
   The "REAL" datatype is like the "DOUBLE" datatype of C# (they are actually synonyms in SQL):
   more precise than the "FLOAT" datatype, but not as exact as the "NUMERIC" datatype.
   cf. https://dev.mysql.com/doc/refman/8.0/en/numeric-types.html
   */
  Title CHAR(3),
  -- fixed-length string, padded with blanks if needed
  Tenured BIT(1),
  Nice BOOLEAN,
  -- True / False (= 0) / Unknown
  Hiring DATE,
  /*
   The DATE is always supposed to be entered in a YEAR/MONTH/DAY variation.
   To tune the way it will be displayed, you can use the "DATE_FORMAT" function
   (cf. https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format),
   but you can enter those values only using the "standard" literals
   (cf. https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html )
   */
  Last_seen TIME,
  FavoriteFruit ENUM ('apple', 'orange', 'pear'),
  PRIMARY KEY (Fname, Hiring)
);


/*
 Or, instead of using the fully qualified name HW_Faculty.PROF,
 we could have done:

 USE HW_Faculty;
 CREATE TABLE PROF(…)
 */
-- Let us use this schema, from now on.
USE HW_Faculty;

-- Let us insert some "Dummy" value in our table:
INSERT INTO PROF
VALUES (
  "Clément", -- Or 'Clément'.
  290,
  'PhD',
  0,
  NULL,
  '19940101', -- Or '940101',  '1994-01-01',  '94/01/01'
  '090500', -- Or '09:05:00', '9:05:0',  '9:5:0',  '090500'
  -- Note also the existence of DATETIME, with 'YYYY-MM-DD
  --		   HH:MM:SS'
  'Apple' -- This is not case-sensitive, oddly enough.
);
HW_Faculty.sql

Useful Commands

The following commands are particularly useful. They allow you to get a sense of the current state of your databases.

For Schemas

In the following, <SchemaName> should be substituted with an actual schema name.

SHOW SCHEMAS; -- List the schemas.
SHOW TABLES; -- List the tables in a schema.
DROP SCHEMA <SchemaName>; -- "Drop" (erase) SchemaName.

You can also use the variation

DROP SCHEMA IF EXISTS <SchemaName>;

that will not issue an error if <SchemaName> does not exist.

For Tables

In the following, <TableName> should be substituted with an actual table name.

SHOW CREATE TABLE <TableName>-- Gives the command to "re-construct" TableName.
DESCRIBE <TableName>; -- Show the structure of TableName.
DROP TABLE <TableName>; -- "Drop" (erase) TableName.

Note that if the table <TableName> you are trying to erase is referenced by other tables through foreign keys, you will obtain an error

ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails

you must delete first the table containing the foreign key, as by default this operation is restricted.

If you want to erase a table if it exists, you can use the variation

DROP TABLE IF EXISTS <TableName>;

that will not issue an error if <TableName> does not exist.

See Also

SELECT * FROM <TableName> -- List all the rows in TableName.
SHOW WARNINGS; -- Show the content of the latest warning issued.

Overview of Constraints

There are six different kind of constraints that one can add to an attribute:

  1. Primary Key
  2. Foreign Key
  3. NOT NULL
  4. UNIQUE
  5. DEFAULT
  6. CHECK

We already know the first two from the relational model. The other four are new, and could not be described in this model.

We will review them below, and show how they can be specified at the time the table is declared, or added and removed later. For more in-depth examples, you can refer to https://www.w3resource.com/mysql/creating-table-advance/constraint.php.

Note that all of them but DEFAULT are indeed, constraints, as they prevent the user from inserting some data (i.e. you can not insert NULL if the attribute has the constraint NOT NULL). DEFAULT is not a constraint in that sense, as it does not prevent some data from being inserted, but it is called a constraint nevertheless. We will see another example of such “helper” qualification with AUTO-INCREMENT.

Declaring Constraints

We will now see how to declare those constraints when we create the table (except for the foreign key, which we save for later).

/* code/sql/HW_ConstraintsPart1.sql */
DROP SCHEMA IF EXISTS HW_ConstraintsPart1;

CREATE SCHEMA HW_ConstraintsPart1;

USE HW_ConstraintsPart1;

CREATE TABLE HURRICANE (
  Name VARCHAR(25) PRIMARY KEY,
  WindSpeed INT DEFAULT 76 CHECK (WindSpeed > 74 AND
    WindSpeed < 500),
  -- 75mph is the minimum to be considered as a hurricane
  --		    cf. https://www.hwn.org/resources/bws.html
  Above VARCHAR(25)
);

CREATE TABLE STATE (
  Name VARCHAR(25) UNIQUE,
  Postal_abbr CHAR(2) NOT NULL
);
HW_ConstraintsPart1.sql

If we wanted to combine multiple constraints, we could10, but we would have to follow the order described at https://dev.mysql.com/doc/refman/8.0/en/create-table.html, which is NOT NULL, DEFAULT, AUTO_INCREMENT, UNIQUE, PRIMARY KEY, CHECK (even if, in practise, derivation from this order is oftentimes accepted by DBMSes).

MySQL can output a description of those tables for us:

MariaDB [HW_ConstraintsPart1]> DESCRIBE HURRICANE;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| Name      | varchar(25) | NO   | PRI | NULL    |       |
| WindSpeed | int(11)     | YES  |     | 76      |       |
| Above     | varchar(25) | YES  |     | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

MariaDB [HW_ConstraintsPart1]> DESCRIBE STATE;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| Name        | varchar(25) | NO   | PRI | NULL    |       |
| Postal_abbr | char(2)     | NO   | UNI | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Note that more than one attribute can be the primary key, in which case the syntax needs to be something like the following:

/* code/sql/HW_PKtest.sql */
DROP SCHEMA IF EXISTS HW_PKtest;

CREATE SCHEMA HW_PKtest;

USE HW_PKtest;

CREATE TABLE TEST (
  A INT,
  B INT,
  PRIMARY KEY (A, B)
);
HW_PKtest.sql

Note that in this case, a statement like

INSERT INTO TEST VALUE (1, NULL);

would result in an error: all the values that are part of the primary key needs to be non-NULL.

For the UNIQUE constraint, note that NULL can be inserted: the rationale is that all the values need to be different from one another or NULL.

A couple of comments about the CHECK constraint:

  • Before MariaDB 10.2.1, WindSpeed INT CHECK (WindSpeed > 74 AND WindSpeed < 500) would have been parsed but would not have any effect, cf. https://mariadb.com/kb/en/constraint/#check-constraints. Since MariaDB 10.2.1, the CHECK constraint are enforced.
  • If we try to violate the CHECK constraint, with a command like
INSERT INTO HURRICANE VALUES ("Test1", 12, NULL);

then the insertion would not take place, and the system would issue an error message:

ERROR 4025 (23000): CONSTRAINT `HURRICANE.WindSpeed` failed for `HW_ConstraintsPart1]>`.`HURRICANE`
  • Note that you could still insert a value of NULL for the wind, and it would not triggered the error.

To use the DEFAULT value, use

INSERT INTO HURRICANE VALUES ("Test2", DEFAULT, NULL);

Note that, by default, the DEFAULT value is NULL, regardless of the datatype. You can experiment it by running the following code:

/* code/sql/HW_DefaultTest.sql */
CREATE TABLE TEST (
  TestA VARCHAR(15),
  TestB INT,
  TestC FLOAT,
  TestD BOOLEAN,
  TestE BIT(1),
  TestF DATE
);

INSERT INTO TEST
VALUES (
  DEFAULT,
  DEFAULT,
  DEFAULT,
  DEFAULT,
  DEFAULT,
  DEFAULT);

SELECT *
FROM TEST;
HW_DefaultTest.sql

Editing Constraints

Let us know pretend that we want to edit some attributes, by either adding or removing constraints. SQL’s syntax is a bit inconsistent on this topic, because it treats the constraints as being of different natures.

Primary Keys

Adding a primary key:

ALTER TABLE STATE ADD PRIMARY KEY (Name); 

Removing the primary key:

ALTER TABLE STATE DROP PRIMARY KEY;

UNIQUE Constraint

Adding a UNIQUE constraint:

ALTER TABLE STATE ADD UNIQUE (Postal_abbr);

Removing a UNIQUE constraint:

ALTER TABLE STATE DROP INDEX Postal_abbr;

Note the difference between adding and removing the UNIQUE constraint: the parenthesis around (Postal_abbr) are mandatory when adding the constraint, but would cause an error when removing it!

NOT NULL Constraint

Adding the NOT NULL constraint:

ALTER TABLE STATE MODIFY Postal_abbr CHAR(2) NOT NULL;

Removing the NOT NULL constraint:

ALTER TABLE STATE MODIFY Postal_abbr CHAR(2);

The syntax of NOT NULL comes from the fact that this constraint is taken to be part of the datatype.

Default value

Changing the default value:

ALTER TABLE HURRICANE ALTER COLUMN WindSpeed SET DEFAULT 74;

Removing the default value:

ALTER TABLE HURRICANE ALTER COLUMN  WindSpeed DROP DEFAULT;

Note that if you change the default value, it does not change the values you inserted retro-actively. To resume on our previous example, the values inserted with DEFAULT as a value would still be NULL even after executing the following instruction:

/* code/sql/HW_DefaultTest.sql */
ALTER TABLE TEST
  ALTER COLUMN TestA SET DEFAULT "A";

SELECT *
FROM TEST;
HW_DefaultTest.sql

Foreign key

Adding a foreign key constraint:

ALTER TABLE HURRICANE ADD FOREIGN KEY (Above) REFERENCES STATE(Name); 

Removing a foreign key constraint is out of the scope of this lecture. If you are curious, you can have a look at https://www.w3schools.com/sql/sql_foreignkey.asp: dropping a foreign key constraint requires your constraint to have a name, something we did not introduce.

Two important remarks:

  • The datatype of the foreign key has to be the exactly the same as the datatype of the attribute that we are referring.
  • The target of the foreign key must be the primary key.

Refer to Problem 3.4 (Constraints on foreign keys) for a slightly more accurate picture of the constraints related to the creation of foreign keys. Note that a foreign key could be declared at the time of creation of the table as well, using the syntax we will introduce below.

Testing the Constraints

Let us test our constraints:

INSERT INTO STATE VALUES('Georgia', 'GA');
INSERT INTO STATE VALUES('Texas', 'TX');
INSERT INTO STATE VALUES('FLORIDA', 'FL');
UPDATE STATE SET Name = 'Florida'
    WHERE Postal_abbr = 'FL';

-- There's an error with the following request. Why?
INSERT INTO HURRICANE VALUES('Irma', 150, 'FL');

/*
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_ConstraintsPart1`.`HURRICANE`, CONSTRAINT `HURRICANE_ibfk_1` FOREIGN KEY (`Above`) REFERENCES `STATE` (`Name`))
*/

INSERT INTO HURRICANE VALUES('Harvey', DEFAULT, 'Texas');
INSERT INTO HURRICANE VALUES('Irma', 150, 'Florida');
DELETE FROM HURRICANE
    WHERE Name = 'Irma';
INSERT INTO HURRICANE VALUES('Irma', 150, 'Georgia');

UPDATE HURRICANE SET Above = 'Georgia'
    WHERE Name = 'Irma';

/*
MariaDB [HW_ConstraintsPart1]> SELECT * FROM HURRICANE;
+--------+-----------+---------+
| Name   | WindSpeed | Above   |
+--------+-----------+---------+
| Harvey |        74 | Texas   |
| Irma   |       150 | Georgia |
+--------+-----------+---------+
*/

-- There's an error with the following request. Why?
UPDATE HURRICANE SET Above = 'North Carolina'
    WHERE Name = 'Irma';

-- Let's patch it, by adding North Carolina to our STATE table.
INSERT INTO STATE VALUES('North Carolina', 'NC');
UPDATE HURRICANE SET Above = 'North Carolina'
    WHERE Name = 'Irma';

Foreign Keys

Let us come back more specifically to foreign key.

A First Example

In the example below, we introduce the foreign key update and delete rules. We also introduce, passing by, the enumerated data type, and how to edit it.

CREATE TABLE STORM (
  NAME VARCHAR(25) PRIMARY KEY,
  Kind ENUM ("Tropical
    Storm", "Hurricane"),
  WindSpeed INT,
  Creation DATE
);

-- We can change the enumerated datatype:
ALTER TABLE STORM MODIFY Kind ENUM ("Tropical Storm",
  "Hurricane", "Typhoon");

CREATE TABLE STATE (
  NAME VARCHAR(25) UNIQUE,
  Postal_abbr CHAR(2) PRIMARY KEY,
  Affected_by VARCHAR(25),
  FOREIGN KEY (Affected_by) REFERENCES STORM (NAME) ON
    DELETE SET NULL ON UPDATE CASCADE
);
HW_Storm.sql

Note that we can “inline” the foreign key constraint like we “inlined” the primary key constraint (cf. https://stackoverflow.com/q/24313143/), but that it will not be enforced!

Let us now illustrate this table by introducing some data in it:

INSERT INTO STORM
VALUES (
  "Harvey",
  "Hurricane",
  130,
  "2017-08-17");

-- In the following, the entry gets created, but date is
--		   "corrected" to "2017-17-08"!
--		   INSERT INTO STORM
--		     VALUES ("Dummy", "Hurricane", 120,
--	   "2017-17-08");
--		    The error message returned is
--		    ERROR 1292 (22007) at line 34:
-- Incorrect
--      date
--	     value:
--		   "2017-17-08" for column
--	     `HW_STORM`.`STORM`.`Creation`
--		at
--		   row 1
--		    In the following, we explicitely use
--     "DATE",
--	 and
--	       since
--		   the date is incorrect, nothing gets
--     inserted.
--		   INSERT INTO STORM
--		     VALUES ("Dummy2", "Hurricane", 120,
--  DATE
--		 "2017-17-08");
--		    ERROR 1525 (HY000) at line 40:
-- Incorrect
--      DATE
--	     value:
--		   "2017-17-08"
--		    The next one sets NULL for DATE.
INSERT INTO STORM
VALUES (
  "Irma",
  "Tropical Storm",
  102,
  DEFAULT);
HW_Storm.sql

MySQL will always notify you if there is an error in a date attribute when you use the DATE prefix.

INSERT INTO STATE
VALUES (
  "Georgia",
  "GA",
  NULL);

INSERT INTO STATE
VALUES (
  "Texas",
  "TX",
  NULL);

INSERT INTO STATE
VALUES (
  "Florida",
  "FL",
  NULL);

-- This instruction is not using the primary key, is that a
--		   problem?
UPDATE
  STATE
SET Affected_by = "Harvey"
WHERE Name = "Georgia";

UPDATE
  STORM
SET Name = "Harley"
WHERE Name = "Harvey";

DELETE FROM STORM
WHERE Name = "Harley";
HW_Storm.sql

We will see in the “Reverse-Engineering” section why this schema is poorly designed, but for now, let’s focus on the foreign keys and their restrictions.

Foreign Keys Restrictions

The following is a code-driven explanation of the foreign key update and delete rules (or “restrictions”). It is intended to make you understand the default behavior of foreig keys, and to understand how the system reacts to the possible restrictions.

CREATE TABLE F_Key(
    Attribute VARCHAR(25) PRIMARY KEY
    );

CREATE TABLE Table_default(
    Attribute1 VARCHAR(25) PRIMARY KEY,
    Attribute2 VARCHAR(25),
    FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
    );

-- By default, this foreign key will restrict.

CREATE TABLE Table_restrict(
    Attribute1 VARCHAR(25) PRIMARY KEY,
    Attribute2 VARCHAR(25),
    FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
        ON DELETE RESTRICT
        ON UPDATE RESTRICT
    );

CREATE TABLE Table_cascade(
    Attribute1 VARCHAR(25) PRIMARY KEY,
    Attribute2 VARCHAR(25),
    FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
        ON DELETE CASCADE
        ON UPDATE CASCADE
    );

CREATE TABLE Table_set_null(
    Attribute1 VARCHAR(25) PRIMARY KEY,
    Attribute2 VARCHAR(25),
    FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
        ON DELETE SET NULL
        ON UPDATE SET NULL 
    );

/*
* You might encounter a 
* ON UPDATE SET DEFAULT
* but this reference option (cf. https://mariadb.com/kb/en/library/foreign-keys/ )
* worked only with a particular engine ( https://mariadb.com/kb/en/library/about-pbxt/ )
* and will not be treated here.
*/

INSERT INTO F_Key VALUES('First Test');
INSERT INTO Table_default VALUES('Default', 'First Test');
INSERT INTO Table_restrict VALUES('Restrict', 'First Test');
INSERT INTO Table_cascade VALUES('Cascade', 'First Test');
INSERT INTO Table_set_null VALUES('Set null', 'First Test');

SELECT * FROM Table_default;
SELECT * FROM Table_restrict;
SELECT * FROM Table_cascade;
SELECT * FROM Table_set_null;

-- The following will fail because of the Table_default table:
UPDATE F_Key SET Attribute = 'After Update'
    WHERE Attribute = 'First Test';
DELETE FROM F_Key
    WHERE Attribute = 'First Test';

-- Let us drop this table, and try again.
DROP TABLE Table_default;

-- The following fails too, this time because of the Table_restrict table:
UPDATE F_Key SET Attribute = 'After Update'
    WHERE Attribute = 'First Test';
DELETE FROM F_Key
    WHERE Attribute = 'First Test';

-- Let us drop this table, and try again.
DROP TABLE Table_restrict;

-- Let's try again:
UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';

-- And let's print the situation after this update:
SELECT * FROM Table_cascade;
SELECT * FROM Table_set_null;

/*
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade;
+------------+--------------+
| Attribute1 | Attribute2   |
+------------+--------------+
| Cascade    | After Update |
+------------+--------------+
1 row in set (0.00 sec)

MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null;
+------------+------------+
| Attribute1 | Attribute2 |
+------------+------------+
| Set null   | NULL       |
+------------+------------+
1 row in set (0.00 sec)
*/

-- Let's make a second test.
INSERT INTO F_Key VALUES('Second Test');
INSERT INTO Table_cascade VALUES('Default', 'Second Test');
INSERT INTO Table_set_null VALUES('Restrict', 'Second Test');

DELETE FROM F_Key
    WHERE Attribute = 'Second Test';

/*
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade;
+------------+--------------+
| Attribute1 | Attribute2   |
+------------+--------------+
| Cascade    | After Update |
+------------+--------------+
1 row in set (0.00 sec)

MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null;
+------------+------------+
| Attribute1 | Attribute2 |
+------------+------------+
| Restrict   | NULL       |
| Set null   | NULL       |
+------------+------------+
2 rows in set (0.00 sec)
*/

Constructing and Populating a New Example

Construction

  • Remember, we start by creating a schema and tables inside of it.
  • What if foreign keys are mutually dependent? What if we have something like:

PROF(Login (PK), Name, Department (FK to DEPARTMENT.Code)) DEPARTMENT(Code (PK), Name, Head (FK to PROF.Login))  

Then note that we cannot create both tables as pictured directly, as PROF requires DEPARTMENT to exist, to have a foreign key referencing it, and similarly for DEPARTMENT: it is an egg and chicken situation! Hence, we have to first create a table without the foreign key, and then add it later on, as described below:

/* code/sql/HW_ProfExample.sql */
CREATE TABLE PROF (
  Login VARCHAR(25) PRIMARY KEY,
  NAME VARCHAR(25),
  Department CHAR(5)
);

CREATE TABLE DEPARTMENT (
  Code CHAR(5) PRIMARY KEY,
  NAME VARCHAR(25),
  Head VARCHAR(25),
  FOREIGN KEY (Head) REFERENCES PROF (LOGIN) ON UPDATE CASCADE
);

ALTER TABLE PROF
  ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT (Code);
HW_ProfExample.sql

Note the structure of the ALTER TABLE command:

  • KEY Department REFERENCES Code;⇒ error
  • KEY (Department) REFERENCES (Code);⇒ error
  • KEY PROF(Department) REFERENCES DEPARTMENT(Code); ⇒ ok
CREATE TABLE STUDENT (
  Login VARCHAR(25) PRIMARY KEY,
  NAME VARCHAR(25),
  Registered DATE,
  Major CHAR(5),
  FOREIGN KEY (Major) REFERENCES DEPARTMENT (Code)
);

CREATE TABLE GRADE (
  Login VARCHAR(25),
  Grade INT,
  PRIMARY KEY (LOGIN, Grade),
  FOREIGN KEY (LOGIN) REFERENCES STUDENT (LOGIN)
);
HW_ProfExample.sql

On a side note, note that we do not have the same difficulty when inserting a value in a table that contains a foreign key referencing itself: it is accepted to insert a value that is referencing itself, as illustrated below.

CREATE TABLE TEST (
  ID INT PRIMARY KEY,
  Reference INT,
  FOREIGN KEY (Reference) REFERENCES TEST (ID)
);

INSERT INTO TEST
VALUES (
  1,
  1);
HW_FK_Self_Reference.sql

Populating

We can insert multiple values at once:

INSERT INTO DEPARTMENT
VALUES (
  "MATH",
  "Mathematics",
  NULL),
(
  "CS",
  "Computer
    Science",
  NULL);
HW_ProfExample.sql

We can specify which attributes we are giving:

INSERT INTO DEPARTMENT (
  Code,
  Name)
VALUES (
  "CYBR",
  "Cyber Secturity");
HW_ProfExample.sql

And we can even specify the order (even the trivial one):

INSERT INTO PROF (
  LOGIN,
  Department,
  Name)
VALUES (
  "caubert",
  "CS",
  "Clément Aubert");

INSERT INTO PROF (
  LOGIN,
  Name,
  Department)
VALUES (
  "aturing",
  "Alan Turing",
  "CS"),
(
  "perdos",
  "Paul
    Erdős",
  "MATH"),
(
  "bgates",
  "Bill Gates",
  "CYBR");

INSERT INTO STUDENT (
  LOGIN,
  Name,
  Registered,
  Major)
VALUES (
  "jrakesh",
  "Jalal Rakesh",
  DATE "2017-12-01",
  "CS"),
(
  "svlatka",
  "Sacnite Vlatka",
  "2015-03-12",
  "MATH"),
(
  "cjoella",
  "Candice Joella",
  "20120212",
  "CYBR"),
(
  "aalyx",
  "Ava Alyx",
  20121011,
  "CYBR"),
(
  "caubert",
  "Clément Aubert",
  NULL,
  "CYBR");

INSERT INTO GRADE
VALUES (
  "jrakesh",
  3.8),
(
  "svlatka",
  2.5);
HW_ProfExample.sql

(Note the date literals)

By default, the values that are not given are set to their respective DEFAULT values.

/* code/sql/HW_DefaultTest.sql */
INSERT INTO TEST (
  TestB)
VALUES (
  1);

SELECT *
FROM TEST;

-- The value of TestA is set to "A",
--     all the other values are set to NULL.
HW_DefaultTest.sql

A Bit More on Foreign Keys

Note that we can create foreign keys to primary keys made of multiple attributes, and to the primary key of the table we are currently creating.

/* code/sql/HW_AdvancedFK.sql */
CREATE TABLE T1 (
  A1 INT,
  A2 INT,
  B INT,
  PRIMARY KEY (A1, A2)
);

CREATE TABLE T2 (
  A1 INT,
  A2 INT,
  B1 INT PRIMARY KEY,
  B2 INT,
  -- We can create a "pair" of foreign key in one line, as
  -- follows:
  FOREIGN KEY (A1, A2) REFERENCES T1 (A1, A2),
  -- We can create a foreign key that references the primary
  -- key of the table we are currently creating, and name
  -- it, as follows:
  CONSTRAINT My_pk_to_T1 FOREIGN KEY (B2) REFERENCES T2 (B1)
);
HW_AdvancedFK.sql

In the example, we are also naming our foreign key. The benefit of naming our fk constraint is that, if we violate it, for instance with

INSERT INTO T2 VALUES (1, 1, 1, 3);

then the name of the constraint (here “My_pk_to_T1”) will be displayed in the error message:

Cannot add or update a child row: a foreign key constraint fails (`db_9_9837c1`.`t2`, CONSTRAINT 
`My_pk_to_T1` FOREIGN KEY (`B2`) REFERENCES `t2`(`B1`))

A First Look at Conditions

Order of clauses does not matter, not even for optimization purpose.

UPDATE <table>
SET <attribute1> = <value1>, <attribute2> = <value2>, …
WHERE <condition>; 
SELECT <attribute list, called projection attributes>
FROM <table list>
WHERE <condition>;
DELETE FROM <table list>
WHERE <condition>;

Conditions can

SELECT LOGIN
FROM STUDENT;

UPDATE
  DEPARTMENT
SET Head = "aturing"
WHERE Code = "MATH";

UPDATE
  DEPARTMENT
SET Head = "bgates"
WHERE Code = "CS"
  OR Code = "CYBR";

SELECT LOGIN
FROM STUDENT
WHERE NOT Major = "CYBR";

SELECT LOGIN,
  Name
FROM PROF
WHERE Department = "CS";

SELECT LOGIN
FROM STUDENT
WHERE Major = "CYBR"
  AND Registered > DATE "20121001";

SELECT LOGIN
FROM STUDENT
WHERE Name LIKE "Ava%";

SELECT Name
FROM PROF
WHERE LOGIN LIKE "_aubert";
HW_ProfExample.sql

Note that LIKE is by default case-insensitive, both in MariaDB and in MySQL. The COLLATE operator can be used to force the search to be case-sensitive, as well as LIKE BINARY.


Three-Valued Logic

Cf. (Elmasri and Navathe 2010, 5.1.1), (Elmasri and Navathe 2015, 7.1.1)

The Boolean logic in SQL is three-valued: a statement can be true, false or unknown. If you pick the following two commands:

/* code/sql/HW_DefaultTest.sql */
SELECT *
FROM TEST
WHERE TestA = "A";

SELECT *
FROM TEST
WHERE TestA <> "A";
HW_DefaultTest.sql

you may believe that they will capture all the tuples in the TEST table, as the value for TestA is either "A" or not "A", but you would be wrong. If the value of TestA is NULL, then both conditions would fail, as SQL cannot say that the value is or is not "A": it is simply undefined!

Meaning of NULL

NULL is

  1. Unknown value (“Nobody knows”)

    What is the date of birth of Jack the Ripper?

    Does P equal NP?

  2. Unavailable / Withheld (“I do not have that information with me at the moment”)

    What is the number of english spies in France?

    What is the VIN of your car?

    What is the identity of the Tiananmen Square person?

  3. Not Applicable (“Your question does not make sense”)

    What is the US SSN of a French person?

    What is the email address of an author of the XIXth century?

Comparison with Unknown Values

If NULL is involved in a comparison, the result evaluates to “Unknown.”

AND T F U
T T F U
F F F F
U U F U
OR T F U
T T T T
F T F U
U T U U
NOT  
T F
F T
U U

You can test if a value is NULL with IS NULL.

INSERT INTO DEPARTMENT
VALUES (
  "Hist",
  "History",
  NULL);

SELECT *
FROM DEPARTMENT
WHERE Head IS NULL;

SELECT *
FROM DEPARTMENT
WHERE Head IS NOT NULL;

SELECT COUNT(*)
FROM GRADE
WHERE Grade IS NULL;
HW_ProfExample.sql

Note that you can not use IS to compare values: this key word is reserved to test if a value is (not) NULL, and nothing else.

This means that if you want to capture all the tuples, you cannot write

/* code/sql/HW_DefaultTest.sql */
SELECT *
FROM TEST
WHERE TestA = "A";

SELECT *
FROM TEST
WHERE TestA <> "A";
HW_DefaultTest.sql

but should have something like

/* code/sql/HW_DefaultTest.sql */
SELECT *
FROM TEST
WHERE TestA IS NULL;

SELECT *
FROM TEST
WHERE TestA IS NOT NULL;
HW_DefaultTest.sql

Trivia

There are no if…then…else statements in SQL, but you can do something similar with CASE (cf. https://dev.mysql.com/doc/refman/8.0/en/case.html). However, note that SQL is probably not the right place to try to control the flow of execution.

This probably depends on the system a lot, but one could wonder if MySQL uses some form of short-cut evaluation when comparing with NULL. Unfortunately, even with three times (!) the verbose option, MySQL does not give more insight as to whenever it drops comparing values once a NULL was encountered (cf. https://dev.mysql.com/doc/refman/8.0/en/mysql-command-options.html#option_mysql_verbose, you can log-in using mysql -u testuser -p --password=password -v -v -v to activate the most verbose mode). Normally, EXPLAIN (https://dev.mysql.com/doc/refman/8.0/en/explain.html) should be useful in answering this question, but failed to answer it as well.

Various Tools

For DISTINCT, ALL and UNION, cf. (Elmasri and Navathe 2010, 4.3.4) or (Elmasri and Navathe 2015, 6.3.4). For ORDER BY, cf. (Elmasri and Navathe 2010, 4.3.6) or (Elmasri and Navathe 2015, 6.3.6). For aggregate functions, cf. (Elmasri and Navathe 2010, 5.1.7) or (Elmasri and Navathe 2015, 7.1.7).

AUTO_INCREMENT

Something that is not exactly a constraint, but that can be used to “qualify” domains, is the AUTO_INCREMENT feature of MySQL. Cf. https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html, you can have MySQL increment a particular attribute (most probably intended to be your primary key, or some form of counter) for you.

A typical example could be:

/* code/sql/HW_AutoIncrement.sql */
CREATE TABLE PERSON (
  PersonID INT AUTO_INCREMENT,
  Name VARCHAR(255),
  PRIMARY KEY (PersonID)
);

INSERT INTO PERSON (
  Name)
VALUES (
  'Lars'),
(
  'Kristina'),
(
  'Sophie');

SELECT *
FROM PERSON;
HW_AutoIncrement.sql

This way, the burden of having to keep track of the persons’ ids is left to the program, and not to the person inserting data in the table.

Transactions

We can save the current state, and start a series of transactions, with the command

START TRANSACTION;

All the commands that follows are “virtually” executed: you can undo them all using

ROLLBACK;

This puts you back in the state you were in before starting the transaction. If you want all the commands you typed in-between to be actually enforced, you can use the command

COMMIT;

Nested transactions are technically possible, but they are counter-intuitive and should be avoided, cf. https://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-2630-nested-transactions-are-real/.

DISTINCT / ALL

The result of a SELECT query, for instance, is a table, and SQL treats tables as multi-set, hence there can be repetitions in the result of a query, but we can remove them:

SELECT DISTINCT Major FROM STUDENT;

The default behaviour is equivalent to specifying ALL, and it display the duplicates. In this case, it would be

> SELECT Major FROM STUDENT;
+-------+
| Major |
+-------+
| CS    |
| CYBR  |
| CYBR  |
| CYBR  |
| MATH  |
+-------+

UNION

Set-theoretic operations are available as well. For instance, one can use:

(SELECT Login FROM STUDENT) UNION (SELECT Login FROM PROF);

to collect all the logins from both tables.

There is also INTERSECT and EXCEPT in the specification, but MySQL does not implement them (cf. https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems#Database_capabilities).

ORDER BY

You can have ORDER BY specifications:

SELECT LOGIN
FROM GRADE
WHERE Grade > 2.0
ORDER BY Grade;

SELECT LOGIN
FROM GRADE
WHERE Grade > 2.0
ORDER BY Grade DESC;

SELECT LOGIN,
  Major
FROM STUDENT
ORDER BY Major,
  Name;
HW_ProfExample.sql

ORDER BY order by ascending order by default.

Aggregate Functions

You can use MAX, SUM, MIN, AVG, COUNT to peform simple operations.

SELECT MAX(Registered) FROM STUDENT;

returns the “greatest” date of registration of a student, i.e., the date of the latest registration.

SELECT COUNT(Name) FROM STUDENT;

returns the number of names, i.e., the number of students.

SELECT COUNT(DISTINCT Name) FROM STUDENT;

returns the number of different names (which in this case is the same as the number of names, since we have no homonyms).

Note that AVG returns the average of all non-NULL values, as we can see on the following example:

/* code/sql/HW_Avg.sql */
CREATE TABLE TEST (
  Test INT
);

INSERT INTO TEST
VALUES (
  NULL),
(
  0),
(
  10);

SELECT AVG(Test)
FROM TEST;

-- Returns 5.0
HW_Avg.sql

The same goes for e.g. MAX:

/* code/sql/HW_Max.sql */
CREATE TABLE TEST (
  A DATE
);

INSERT INTO TEST
VALUES (
  DATE "2020-01-01"),
(
  DATE "2019-01-01"),
(
  NULL);

SELECT MAX(A)
FROM TEST;

-- Returns 2020-01-01
HW_Max.sql

Aliases for Columns

We can use aliases for the columns. Compare

SELECT Login FROM PROF;
+---------+
| Login   |
+---------+
| aturing |
| caubert |
| bgates  |
| perdos  |
+---------+

with

SELECT Login AS "Username" FROM PROF;
+----------+
| Username |
+----------+
| aturing  |
| caubert  |
| bgates   |
| perdos   |
+----------+

Aliases can also be used on table names. Adiases for columns are a helpful way of describing the result of the query, while alias on table have a specific purpose that will be clearer as we study select-project-join queries.

More Select Queries

For select-project-join, cf. (Elmasri and Navathe 2010, 4.3.1) or (Elmasri and Navathe 2015, 6.3.1). For aliases, cf. (Elmasri and Navathe 2010, 4.3.2) or (Elmasri and Navathe 2015, 6.3.2), For nested queries, cf. (Elmasri and Navathe 2010, 5.1.2) or (Elmasri and Navathe 2015, 7.1.2).

Select-Project-Join

SELECT LOGIN
FROM PROF,
  DEPARTMENT
WHERE DEPARTMENT.Name = "Mathematics"
  AND Department = Code;
HW_ProfExample.sql
  • Department.Name = 'Mathematics' is the selection condition
  • Department = Code is the join condition, because it combines two tuples.
  • Why do we use the fully qualified name attribute for Name?
  • We have to list all the tables we want to consult, even if we use fully qualified names.
SELECT Name
FROM STUDENT,
  GRADE
WHERE Grade > 3.0
  AND STUDENT.Login = GRADE.Login;
HW_ProfExample.sql
  • Grade > 3.0 is the selection condition
  • STUDENT.Login = GRADE.Login is the join condition

We can have two join conditions!

SELECT PROF.Name
FROM PROF,
  DEPARTMENT,
  STUDENT
WHERE STUDENT.Name = "Ava Alyx"
  AND STUDENT.Major = DEPARTMENT.Code
  AND DEPARTMENT.Head = PROF.Login;
HW_ProfExample.sql

Note that for the kind of join we are studiying (called “inner joins”), the order does not matter.

In Problem 3.3 (Duplicate rows in SQL), we saw that SQL was treating tables as multi-sets, i.e., repetitions are allowed. This can lead to strange behaviour when performing Select-Project-Join queries. Consider the following example:

Aliasing Tuples

We can use aliases on tables to shorten the previous query:

SELECT PROF.Name
FROM PROF,
  DEPARTMENT,
  STUDENT AS B
WHERE B.Name = "Ava Alyx"
  AND B.Major = DEPARTMENT.Code
  AND DEPARTMENT.Head = PROF.Login;
HW_ProfExample.sql

We can use multiple aliases to make it even shorter (but less readable):

SELECT A.Name
FROM PROF AS A,
  DEPARTMENT AS B,
  STUDENT AS C
WHERE C.Name = "Ava Alyx"
  AND C.Major = B.Code
  AND B.Head = A.Login;
HW_ProfExample.sql

For those two, aliases are convenient, but not required to write the query. In some cases, we cannot do without aliases. For instance if we want to compare two rows in the same table:

SELECT Other.Login
FROM GRADE AS Mine,
  GRADE AS Other
WHERE Mine.Login = "aalyx"
  AND Mine.Grade < Other.Grade;
HW_ProfExample.sql

Generally, when you want to perform a join within the same table, then you have to “make two copies of the tables” and name them differently using aliases. Let us try to write a query that answers the question

What are the login of the professors that have the same department as the professor whose login is caubert?

We need a way of distinguising between the professors we are projecting on (the one whole login is caubert) and the one we are joining with (the ones that have the same department). This can be done using something like:

SELECT JOINT.Login
FROM PROF AS PROJECT,
  PROF AS JOINT
WHERE PROJECT.Login = "caubert"
  AND PROJECT.Department = JOINT.Department;
HW_ProfExample.sql

Note that we are “opening up two copies of the PROF tables”, and naming them differently (PROJECT and JOINT).


Another (improved) example of a similar query is

SELECT Fellow.Name AS "Fellow of Ava"
FROM STUDENT AS Ava,
  STUDENT AS Fellow
WHERE Ava.Name = "Ava Alyx"
  AND Fellow.Major = Ava.Major
  AND NOT Fellow.Login = Ava.Login;
HW_ProfExample.sql

A couple of remarks about this query:

  • At the beginning of the query, AS "Fellow of Ava" is another kind of aliasing, mentioned in a previous section.
  • In the condition, NOT Fellow.Login = Ava.Login guarantees that we will not select Ava again, and exclude her from the results (Ava is not supposed to be a fellow of herself).
  • In the (unlikely, but possible) case of an homonym, writing NOT Fellow.Name = Me.Name; instead of NOT Fellow.Login = Ava.Login would prevent the homonym from occuring in the results.
  • In the condition, substituting AND NOT Me = Fellow by NOT Fellow.Login = Ava.Login would not work: you have to compare attributes of the tuples, not the tuples.

Nested Queries

Let us look at a first example

SELECT LOGIN
FROM GRADE
WHERE Grade > (
    SELECT AVG(Grade)
    FROM GRADE);
HW_ProfExample.sql

A nested query is made of an outer query (SELECT Login…) and an inner query (SELECT AVG(Grade)…). Note that the inner query does not terminate with a ;.

Logical operators such as ALL or IN can be used in nested queries. To learn more about those operators, refer to https://www.w3schools.com/sql/sql_operators.asp.

An example could be

SELECT LOGIN
FROM GRADE
WHERE Grade >= ALL (
    SELECT Grade
    FROM GRADE
    WHERE Grade IS NOT NULL);
HW_ProfExample.sql

Note that

  • We have to use >=, and not >, since no grade is strictly greater than itself.
  • The part IS NOT NULL is needed: otherwise, if one of the grade is NULL, then the comparison would yelds “unknown”, and no grade would be greater than all of the others.
  • This query could be simplified, using MAX:
SELECT LOGIN
FROM GRADE
WHERE Grade >= (
    SELECT MAX(Grade)
    FROM GRADE);
HW_ProfExample.sql

Answering the question

What are the logins of the professors belonging to a department that is the major of at least one student whose name ends with an “a”?

–that sounds like the what would ask a police officer in a whodunit– could be answer using

SELECT LOGIN
FROM PROF
WHERE DEPARTMENT IN (
    SELECT Major
    FROM STUDENT
    WHERE LOGIN LIKE "%a");
HW_ProfExample.sql

For this query, we could not use =, since more than one major could be returned.

Furthermore, nested query that uses = can often be rewritten without being nested. For instance,

SELECT LOGIN
FROM PROF
WHERE DEPARTMENT = (
    SELECT Major
    FROM STUDENT
    WHERE LOGIN = "cjoella");
HW_ProfExample.sql

becomes

SELECT PROF.Login
FROM PROF,
  STUDENT
WHERE DEPARTMENT = Major
  AND STUDENT.Login = "cjoella";
HW_ProfExample.sql

Conversly, you can sometimes write select-project-join as nested queries For instance,

SELECT Name
FROM STUDENT,
  GRADE
WHERE Grade > 3.0
  AND STUDENT.Login = GRADE.Login;
HW_ProfExample.sql

becomes

SELECT Name
FROM STUDENT
WHERE LOGIN IN (
    SELECT LOGIN
    FROM GRADE
    WHERE Grade > 3.0);
HW_ProfExample.sql

Procedures

A “stored” procedure is a SQL function statements that can take arguments and may be called from another part of your program. Stated differently, a procedure is a serie of statements stored in a schema, that can easily be executed repeatedly, cf. https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html or https://mariadb.com/kb/en/library/create-procedure/.

Imagine we have the following:

/* code/sql/HW_ProcedureExamples.sql */
CREATE TABLE STUDENT (
  Login INT PRIMARY KEY,
  NAME VARCHAR(30),
  Major VARCHAR(30),
  Email VARCHAR(30)
);

INSERT INTO STUDENT
VALUES (
  123,
  "Test A",
  "CS",
  "a@a.edu"),
(
  124,
  "Test B",
  "IT",
  "b@a.edu"),
(
  125,
  "Test C",
  "CYBR",
  "c@a.edu");
HW_ProcedureExamples.sql

SQL is extremely litteral: when it reads the delimiter ;, it must execute the command that was shared. But a procedure, being composed of commands, will contain the ; symbol. To “solve” this (weird) issue, and be able to define a procedure, we have to “temporarily alter the language”, using DELIMITER // that makes the delimiter being // instead of ;11.

In any case, we can then define and execute a simpe procedure called STUDENTLIST as follows:

DELIMITER $$
CREATE PROCEDURE STUDENTLIST ()
BEGIN
  SELECT *
  FROM STUDENT;

-- This ";" is not the end of the procedure definition!
END;
$$
-- This is the delimiter that marks the end of the procedure
--		   definition.
DELIMITER ;

-- Now, we want ";" to be the "natural" delimiter again.
CALL STUDENTLIST ();

-- Now, we want ";" to be the "natural" delimiter again.
CALL STUDENTLIST ();
HW_ProcedureExamples.sql

A procedure an also take arguments, and an example could be:

DELIMITER $$
CREATE PROCEDURE STUDENTLOGIN (
  NameP VARCHAR(30)
)
BEGIN
  SELECT LOGIN
  FROM STUDENT
  WHERE NameP = Name;

END;
$$
DELIMITER ;

SHOW CREATE PROCEDURE STUDENTLOGIN;

-- This display information about the procedure just created.
--		    We can pass quite naturally an argument
--   to
--       our
--		  procedure.
CALL STUDENTLOGIN ("Test A");
HW_ProcedureExamples.sql

Triggers

A trigger is a series of statements stored in a schema that can be automatically executed whenever a particular event in the schema occurs. Triggers are extremely powerfull, and are a way of automating part of the work in your database. In MariaDB, you could have the following program.

Imagine we have the following:

CREATE TABLE STUDENT (
  Login VARCHAR(30) PRIMARY KEY,
  Average FLOAT
);

CREATE TABLE GRADE (
  Student VARCHAR(30),
  Exam VARCHAR(30),
  Grade INT,
  PRIMARY KEY (Student, Exam),
  FOREIGN KEY (Student) REFERENCES STUDENT (LOGIN)
);
HW_TriggerExample.sql

We want to create a trigger that counts the number of times something was inserted in our STUDENT table. SQL supports some primitive form of variables (cf. https://dev.mysql.com/doc/refman/8.0/en/user-variables.html and https://mariadb.com/kb/en/library/user-defined-variables/). There is no “clear” form of type, https://dev.mysql.com/doc/refman/8.0/en/user-variables.html reads:

In addition, the default result type of a variable is based on its type at the beginning of the statement. This may have unintended effects if a variable holds a value of one type at the beginning of a statement in which it is also assigned a new value of a different type. To avoid problems with this behavior, either do not assign a value to and read the value of the same variable within a single statement, or else set the variable to 0, 0.0, or ’’ to define its type before you use it.

In other words, SQL just “guess” the type of your value and go with it. Creating a simple trigger that increment a variable every time an insertion is performed in the STUDENT table can be done as follows:

SET @number_of_student = 0;

CREATE TRIGGER NUMBER_OF_STUDENT_INC
  AFTER INSERT ON STUDENT
  FOR EACH ROW
  SET @number_of_student = @number_of_student + 1;
HW_TriggerExample.sql

Now, assume we want to create a trigger that calculates the average for us. Note that the trigger will need to manipulate two tables (STUDENT and GRADE) at the same time.

CREATE TRIGGER STUDENT_AVERAGE
  AFTER INSERT ON GRADE
  FOR EACH ROW -- Woh, a whole query inside our trigger!
  UPDATE STUDENT
  SET STUDENT.Average = (
  SELECT AVG(Grade)
  FROM GRADE
  WHERE GRADE.Student = STUDENT.Login)
WHERE STUDENT.Login = NEW.Student;

-- The "NEW" keyword here refers to the "new" entry
--		    that is being inserted by the INSERT
--       statement
--		  triggering
--		   the trigger.
HW_TriggerExample.sql

The source code contains examples of insertion and explanations on how to witness the trigger in action.

Setting Up Your Work Environment

This part is a short tutorial to install and configure a working relational DBMS. We will proceed in 5 steps:

  1. Install the required software,
  2. Create a user,
  3. Log-in as this user,
  4. Create and populate our first database,
  5. Discuss the security holes in our set-up.

Installation

You will install the MySQL DataBase Managment System, or its community-developed fork, MariaDB. Below are the instruction to install MySQL Community Edition on Windows 10 and macOS, and MariaDB on Linux-based distribution, but both are developped for every major operating system (macOS, Windows, Debian, Ubuntu, etc.): feel free to pick one or the other, it will not make a difference in this course (up to some minor aspects). MySQL is more common, MariaDB is growing, both are released under GNU General Public License, well-documented and free of charge for their “community” versions.

It is perfectly acceptable, and actually encouraged, to install MySQL or MariaDB on a virtual machine for this class. You can use the Windows Subsystem for Linux, VMware or Virtual Box to run a “sandboxed” environment that should keep your data isolated from our experimentations.

Below are precise and up-to-date instructions, follow them carefully, read the messages displayed on your screen, make sure a step was correctly executed before moving to the next one, and everything should be all right. Also, remember:

  1. Do not wait, set your system early.
  2. To look for help, be detailed and clear about what you think went wrong.

The following links could be useful:

Installing MySQL on Windows 10

  1. Visit https://dev.mysql.com/downloads/installer/, click on “Download” next to

    Windows (x86, 32-bit), MSI Installer XXX YYY (mysql-installer-web-community-XXX.msi)

    where XXX is a number version (e.g., 8.0.13.0.), and YYY is the size of the file (e.g., 16.3M). On the next page, click on the (somewhat hidden) “No thanks, just start my download.” button.

  2. Save the “mysql-installer-web-community-XXX.msi” file, and open it. If there is an updated version of the installer available, agree to download it. Accept the license term.

  3. We will now install the various components needed for this class, leaving all the choices by defaults. This means that you need to do the following:

    1. Leave the first option on “Developer Default” and click on “Next”, or click on “Custom”, and select the following: mysql Installation 
    2. Click on “Next” even if you do not meet all the requirements
    3. Click on “Execute”. The system will download and install several softwares (this may take some time).
    4. Click on “Next” twice, leave “Type and Networking” on “Standalone MySQL Server / Classic MySQL Replication” and click “Next”, and leave the next options as they are (unless you know what you do and want to change the port, for instance) and click on “Next”.
    5. You now need to choose a password for the MySQL root account. It can be anything, just make sure to memorize it. Click on “Next”.
    6. On the “Windows Service” page, leave everything as it is and click on “Next”.
    7. On the “Plugins and Extensions” page, leave everything as it is and click on “Next”.
    8. Finally, click “Execute” on the “Apply Configuration” page, and then on “Finish”.
    9. Click on “Cancel” on the “Product Configuration” page and confirm that you do not want to add products: we only need to have MySQL Server XXX configured.
  4. We now want to make sure that MySQL is running: launch Windows’ “Control Panel”, then click on “Administrative Tools”, and on “Services”. Look for “MySQLXX”, its status should be “Running”. If it is not, right-click on it and click on “Start”.

  5. Open a command prompt (search for cmd, or use PowerShell) and type

    cd "C:\Program Files\MySQL\MySQL Server 8.0\bin"

    If this command fails, it is probably because the version number changed: open the file explorer, go to C:\Program Files\MySQL\, look for the right version number, and update the command accordingly.

    Then, enter

    mysql -u root -p

    and enter the password you picked previously for the root account. You are now logged as root in your database management system, you should see a brief message, followed by a prompt

    mysql >
  6. Now, move on to “Creating a User”.

Installing MySQL on macOS

The instructions are almost the same as for Windows. Read https://dev.mysql.com/doc/refman/8.0/en/osx-installation-pkg.html and download the file from https://dev.mysql.com/downloads/mysql/ once you selected “macOS” as your operating system. Install it, leaving everything by default but adding a password (refer to the instructions for windows). Then, open a command-line interface (the terminal), enter

mysql -u root -p

and enter the password you picked previously for the root account. You are now logged as root in your database management system, you should see a brief message, followed by a prompt

mysql >

Now, move on to “Creating a User”.

Installing MariaDB on Linux

  1. Install, through your standard package management system (apt or aptitude for debian-based systems, pacman for Arch Linux, etc.), the packages mysql-client and mysql-server (or default-mysql-client and default-mysql-server) as well as their dependencies12.

  2. Open a terminal and type

    /etc/init.d/mysql status

    or, as root,

    service mysql status

    to see if MySQL is running: if you read something containing

    Active: active (running)

    then you can move on to the next step, otherwise run (as root)

    service mysqld start

    and try again.

  3. As root, type in your terminal

    mysql_secure_installation

    You will be asked to provide the current password for the root MySQL user: this password has not be defined yet, so just hit “Enter”. You will be asked if you want to set a new password (that you can freely chose, just make sure to memorize it). Then, answer “n” to the question “Remove anonymous users?”, “Y” to “Disallow root login remotely?”, “n” to “Remove test database and access to it?” and finally “Y” to “Reload privilege tables now?”.

  4. Still as root, type in your terminal

    mysql -u root -p

    and enter the password you picked previously for the root account. You are now logged as root in your database management system: you should see a brief message, followed by a prompt

    MariaDB [(none)]>
  5. Now, move on to “Creating a User”.

Creating a User

This step will create a non-root user13 and grant it some rights. Copy-and-paste or type the following three commands, one by one (that is, enter the first one, hit “enter”, enter the second, hit “enter”, etc.). This step will create a non-root user14 and grant it some rights. Copy-and-paste or type the following three commands, one by one (that is, enter the first one, hit “enter”, enter the second, hit “enter”, etc.).

We first create a new user called testuser on our local installation, and give it the password password:

CREATE USER 'testuser'@'localhost' IDENTIFIED BY 'password';

Then, we grant the user all the privileges on the databases whose name starts with HW_:

GRANT ALL PRIVILEGES ON  `HW\_%` . * TO 'testuser'@'localhost';

Be careful: backticks (`) are surrounding HW\_% whereas single quotes (') are surrounding testuser and localhost.

And then we quit the DBMS, using

EXIT;

The message displayed after the two first commands should be

Query OK, 0 rows affected (0.00 sec)

and the message displayed after the last command should be

Bye

Logging-In as testuser

We now log in as the normal user called “testuser”.

Linux users should type as a normal user, i.e., not as root, in their terminal the following, and Windows users should type in their command prompt the following15:

mysql -u testuser -p

Enter password as your password. If you are prompted with a message

ERROR 1045 (28000): Access denied for user 'testuser'@'localhost' (using password: YES)

then you probably typed the wrong password. Otherwise, you should see a welcoming message from MySQL or MariaDB and a prompt.

To save yourself the hassle of typing the password, you can use

mysql -u testuser -ppassword

or

mysql -u testuser -p --password=password

to log-in as testuser immediately.

If at some point you want to know if you are logged as root or testuser, simply enter

\s;

Creating Our First Database

Now, let us create our first schema, our first table, populate it with data, and display various information.

We first create the schema (or database) HW_FirstTest:

CREATE DATABASE HW_FirstTest; -- Or CREATE SCHEMA HW_FirstTest;

Let us make sure that we created it:

SHOW DATABASES;

Let us use it:

USE HW_FirstTest;

And see what it contains now:

SHOW TABLES;

We now create a table called TableTest, wtih two integer attributes called Attribute1 and Attribute2:

CREATE TABLE TableTest (Attribute1 INT, Attribute2 INT);

And can make sure that the table was indeed created:

SHOW TABLES;

We can further ask our DBMS to display the structure of the table we just created:

DESCRIBE TableTest; -- Can be abbreviated as DESC TableTest;

And even ask to get back the code that would create the exact same structure (but without the data!):

SHOW CREATE TABLE TableTest;

Now, let us populate it with some data:

INSERT INTO TableTest
    VALUES (1,2),
           (3,4),
           (5,6);

Note that the SQL syntax and your DBMS are completely fine with your statement spreading over multiple lines. Let us now display the data stored in the table:

SELECT * FROM TableTest;

After that last command, you should see

+------------+------------+
| Attribute1 | Attribute2 |
+------------+------------+
|          1 |          2 |
|          3 |          4 |
|          5 |          6 |
+------------+------------+

Finally, we can erase the content of the table, then erase (“drop”) the table, and finally the schema:

DELETE FROM TableTest; -- Delete the rows
DROP TABLE TableTest; -- Delete the table
DROP DATABASE HW_FirstTest; -- Delete the schema

You’re all set! All you have to do is to quit, using the command

EXIT;

Security Concerns

Note that we were quite careless when we set-up our installation:

  • We installed a software without checking its signature. MySQL has a short tutorial on how to check the signature of their packages.
  • We did not impose any requirement on the root password of our installation. Using a good, secure, and unique password, should have been required / advised.
  • We left all the options on default, whereas a good, secure, installation, always fine-tune what is enabled and what is not.
  • We chosed a very weak password for testuser that is common to all of our installation.
  • Using the command mysqldump -u testuser -ppassword means that the password will be stored in the history of your command-line interface (that you should be able to access using history or Get-History for Powershell) and could be accessed by anyone having access to it.

All of those are obvious security risks, and make this installation unsafe to be a production environment. We will only use it as a testing / learning environment, but it is strongly recommended to:

  • Install it on a virtual machine, so that your personal files would not be impacted by any mis-use of your DBMS,
  • Perform a fresh, secured installation if you want to use a DBMS for anything but testing / learning purposes.

Exercises

Exercise 3.1

For each of the following, fill in the blanks:

  • In SQL, a relation is called a ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ .
  • In SQL, every statement ends with ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟, and in-line comments start with a ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ .
  • In SQL, there is no string datatype, so we have to use ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟.
  • The Data Control Language of SQL’s role is to ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ .
Exercise 3.2

What does it mean to say that SQL is at the same time a “data definition language” and a “data manipulation language”?

Exercise 3.3

Name three kind of objects (for lack of a better word) a CREATE statement can create.

Exercise 3.4

Write a SQL statement that adds a primary key constraint to an attribute named ID in an already existing table named STAFF.

Exercise 3.5

Complete each row of the following table with either a datatype or two different examples:

Data type Examples
  4, -32
Char(4)  
VarChar(10) 'Train', 'Michelle'
Bit(4)  
  TRUE, UNKNOWN
Exercise 3.6

In the datatype CHAR(3), what does the 3 indicate?

Exercise 3.7

Explain this query: CREATE SCHEMA FACULTY;.

Exercise 3.8

Write code to

  • declare a first table with two attributes, one of which is the primary key,
  • declare a second table with two attributes, one of which is the primary key, and the other references the primary key of the first table,
  • insert one tuple in the first table,
  • insert one tuple in the second table, referencing the only tuple of the first table,

You are free to come up with an example (even very simple or cryptic) or to re-use an example from class.

Exercise 3.9

Explain this query:

ALTER TABLE TABLEA
    DROP INDEX Attribute1;
Exercise 3.10

If I want to enter January 21, 2016, as a value for an attribute with the DATE datatype, what value should I enter?

Exercise 3.11

Write a statement that inserts the values "Thomas" and 4 into the table TRAINS.

Exercise 3.12

If PkgName is the primary key in the table MYTABLE, what can you tell about the number of rows returned by the following statement?

SELECT * FROM MYTABLE WHERE PkgName = 'MySQL';.

Exercise 3.13

If you want that every time a referenced row is delted, all the refering rows are deleted as well, what mechanism should you use?

Exercise 3.14

By default, does the foreign key restrict, cascade, or set null on update? Can you justify this choice?

Exercise 3.15

If a database designer is using the ON UPDATE SET NULL for a foreign key, what mechanism is (s)he implementing (i.e., describe how the database will react a certain operation)?

Exercise 3.16

If the following is part of the design of a table:

FOREIGN KEY (DptNumber) REFERENCES DEPARTMENT(Number)
    ON DELETE SET DEFAULT
    ON UPDATE CASCADE;

What happen to the rows whose foreign key DptNumber are set to 3 if the row in the DEPARTEMENT table with primary key Number set to 3 is…

  1. … deleted?
  2. …updated to 5?
Exercise 3.17

If the following is part of the design of a WORKER table:

FOREIGN KEY WORKER(DptNumber) REFERENCES DEPARTMENT(DptNumber)
    ON UPDATE CASCADE;

What happen to the rows whose foreign key DptNumber are set to 3 if the row in the DEPARTMENT table with primary key Number set to 3 is…

  1. … deleted?
  2. … updated to 5?
Exercise 3.18

Given a relation TOURIST(Name, EntryDate, Address), write a SQL statement printing the name and address of all the tourists who entered the territory after the 15 September, 2012.

Exercise 3.19

Describe what the star do in the statement

SELECT ALL * FROM MYTABLE;
Exercise 3.20

What is the fully qualified name of an attribute? Give an example.

Exercise 3.21

If DEPARTMENT is a database, what is DEPARTMENT.*?

Exercise 3.22

What is a multi-set? What does it mean to say that MySQL treats tables as multisets?

Exercise 3.23

What is the difference between

SELECT ALL * FROM MYTABLE;

and

SELECT DISTINCT * FROM MYTABLE;

How are the results the same? How are they different?

Exercise 3.24

What is wrong with the statement

SELECT * WHERE Name = 'CS' FROM DEPARTMENT;
Exercise 3.25

Write a query that returns the number of row (i.e., of entries, of tuples) in a table named BOOK.

Exercise 3.26

When is it useful to use a select-project-join query?

Exercise 3.27

When is a tuple variable useful?

Exercise 3.28

Write a query that changes the name of the professor whose Login is 'caubert' to 'Hugo Pernot' in the table PROF.

Exercise 3.29

Can an UPDATE statement have a WHERE condition using an attribute that is not the primary key? If no, justify, if yes, tell what could happen.

Exercise 3.30

Give the three possible meaning of the NULL value, and an example for each of them.

Exercise 3.31

What are the values of the following expressions (i.e., do they evaluate to TRUE, FALSE, or UNKNOWN)?

  • TRUE AND FALSE
  • TRUE AND UNKNOWN
  • NOT UNKNOWN
  • FALSE OR UNKNOWN
Exercise 3.32

Write the truth table for AND for the three-valued logic of SQL.

Exercise 3.33

What comparison expression should you use to test if a value is different from NULL?

Exercise 3.34

Explain this query:

SELECT Login 
    FROM PROF
    WHERE Department IN ( SELECT Major
                        FROM STUDENT
                        WHERE Login = 'jrakesh');

Can you rewrite it without nesting queries?

Exercise 3.35

What is wrong with this query?

SELECT Name FROM STUDENT
    WHERE Login IN
    ( SELECT Code FROM Department WHERE head = 'aturing');
Exercise 3.36

Write a query that returns the sum of all the values stored in the Pages attribute of a BOOK table.

Exercise 3.37

Write a query that adds a Pages attribute of type INT into a (already existing) BOOK table.

Exercise 3.38

Write a query that removes the default value for a Pages attribute in a BOOK table.

Exercise 3.39

Under which conditions does SQL allow you to enter the same row in a table twice?

Exercise 3.40

Explain this query: ROLLBACK;.

Exercise 3.41

Explain this query: DELIMITER ;.

Solution to Exercises

Solution 3.1

The blanks can be filled as follow:

  • In SQL, a relation is called a ͟ ͟ ͟ ͟table ͟ ͟ ͟ ͟ ͟ ͟ ͟ .
  • In SQL, every statement ends with ͟ ͟a semi-colon (;) ͟ ͟, and in-line comments start with a ͟ ͟ ͟two minus signs (--) ͟ ͟ .
  • In SQL, there is no string datatype, so we have to use ͟ ͟ ͟VARCHAR(x) or CHAR(x) where x is an integer reflecting the maximum (or fixed) size of the string ͟ ͟ ͟ ͟ ͟ .
  • The Data Control Language of SQL’s role is to ͟ ͟ control access to the data stored, by creating users and granting them rights ͟ ͟ .
Solution 3.2

It can specify the conceptual and internal schema, and it can manipulate the data.

Solution 3.3

Database (schema), table, view, assertion, trigger, etc.

Solution 3.4

ALTER TABLE STAFF ADD PRIMARY KEY(ID);

Solution 3.5
Data type Examples
INT 4, -32
CHAR(4) 'abCD', "dEfG"
VARCHAR(10) 'Train', 'Michelle'
BIT(4) B'1010', B'0101'

BOOL`` |TRUE,FALSE,NULL`

NULL is actually a valid answer for every single type of

Solution 3.6

That we can store exactly three characters.

Solution 3.7

It creates a schema, i.e., a database, named Faculty.

Solution 3.8
A simple and compact code could be:
/* code/sql/HW_Short.sql */
CREATE TABLE A (
  Att1 INT PRIMARY KEY,
  Att2 INT
);

CREATE TABLE B (
  Att3 INT PRIMARY KEY,
  Att4 INT,
  FOREIGN KEY (Att4) REFERENCES A (Att1)
);

INSERT INTO A
VALUES (
  1,
  2);

INSERT INTO B
VALUES (
  3,
  1);
HW_Short.sql
Solution 3.9

It removes the UNIQUE constraint on the Attribute1 in the TABLEA table.

Solution 3.10

DATE'2016-01-21', '2016-01-21', '2016/01/21', '20160121'.

Solution 3.11

INSERT INTO TRAINS VALUES('Thomas', 4);

Solution 3.12

We know that at most one (but possibly 0) row will be returned.

Solution 3.13

We should use a referential triggered action clause, ON DELETE CASCADE.

Solution 3.14

By default, the foreign key restricts updates. This prevents unwanted update of information: if an update needs to be propagated, then it needs to be “acknowledged” and done explicitely.

Solution 3.15

If the referenced row is updated, then the attribute of the referencing rows are set to NULL.

Solution 3.16

In the referencing rows,

  1. the department number is set to the default value.
  2. the department number is updated accordingly.
Solution 3.17
  1. This operation is rejected: the row in the DEPARTMENT table with primary key Number set to 3 cannot be deleted if a row in the WORKER table references it.
  2. In the referencing rows, the department number is updated accordingly.
Solution 3.18

We could use the following:

SELECT Name, Address
    FROM TOURIST
    WHERE EntryDate > DATE'2012-09-15';
Solution 3.19

It selects all the attributes, it is a wildcard.

Solution 3.20

The name of the relation with the name of its schema and a period beforehand. An example would be EMPLOYEE.Name.

Solution 3.21

All the tables in that database.

Solution 3.22

A multiset is a set where the same value can occur twice. In MySQL, the same row can occur twice in a table.

Solution 3.23

They both select all the rows in the MYTABLE table, but ALL will print the duplicate values, whereas DISTINCT will print them only once.

Solution 3.24

You cannot have the WHERE before FROM.

Solution 3.25

SELECT COUNT(*) FROM BOOK;

Solution 3.26

We use those query that projects on attributes using a selection and join conditions when we need to construct for information based on pieces of data spread in multiple tables.

Solution 3.27

It makes the distinction between two different rows of the same table, it is useful when we want to select a tuple in a relation that is in a particular relation with a tuple in the same relation. Quoting https://stackoverflow.com/a/7698796/:

They are useful for saving typing, but there are other reasons to use them:

  • If you join a table to itself you must give it two different names otherwise referencing the table would be ambiguous.
  • It can be useful to give names to derived tables, and in some database systems it is required… even if you never refer to the name.
Solution 3.28

We could use the following:

UPDATE PROF SET Name = 'Hugo Pernot'
    WHERE Login = 'caubert';
Solution 3.29

Yes, we can have select condition that does not use primary key. In that case, it could be the case that we update more than one tuple with such a command (which is not necessarily a bad thing).

Solution 3.30

Unknown value (“Will it rain tomorrow?”), unavailable / withheld (“What is the phone number of Harry Belafonte?”), N/A (“What is the email address of Abraham Lincoln?”).

Solution 3.31
  • TRUE AND FALSEFALSE
  • TRUE AND UNKNOWNUNKNOWN
  • NOT UNKNOWNUNKNOWN
  • FALSE OR UNKNOWNFALSE
Solution 3.32
  • TRUE AND TRUETRUE
  • TRUE AND FALSEFALSE
  • TRUE AND UNKNOWNUNKNOWN
  • FALSE AND FALSEFALSE
  • UNKNOWN AND UNKNOWNUNKNOWN
  • FALSE AND UNKNOWNFALSE
  • The other cases can be deduced by symmetry.

For a more compact presentation, refer to the three-valued truth table.

Solution 3.33

IS NOT

Solution 3.34

It list the login of the professors teaching in the department where the student whose login is “jrakesh” is majoring. It can be rewritten as

SELECT PROF.Login 
    FROM PROF, STUDENT
    WHERE Department = Major
    AND STUDENT.Login = 'jrakesh';
Solution 3.35

It tries to find a Login in a Code.

Solution 3.36

SELECT SUM(Pages) FROM BOOK;

Solution 3.37

ALTER TABLE BOOK ADD COLUMN Pages INT;

Solution 3.38

ALTER TABLE BOOK ALTER COLUMN Pages DROP DEFAULT;

Solution 3.39

Essentially, if there are no primary key in the relation, and if no attribute has the UNIQUE constraint. Cf. also this previous problem.

Solution 3.40

This command, ROLLBACK, undoes the last transaction, i.e. it allows to “roll back” to a previous point identified by START TRANSACTION;: everything that happened between those two commands is “undone”, as if it had never been executed.

Solution 3.41

This command, DELIMITER ;, changes the SQL interpreter delimiter back to being “;”. It is useful when defining procedures, as the traditional delimiter’s role have to be “suspended” to wrap a series of commands inside a function.

Problems

Problem 3.1 (Discovering the documentation)

The goal of this problem is to learn where to find the documentation for your DBMS, and to understand how to read the syntax of SQL commands.

You can consult (Elmasri and Navathe 2010, Table 5.2, p. 140) or (Elmasri and Navathe 2015, Table 7.2, p. 235), for a very quick summary of the most common commands. Make sure you are familiar with the Backus–Naur form (BNF) notation commonly used:

  • non-terminal symbols (i.e., variables, parameters) are enclosed in angled brackets,
<…>
  • optional parts are shown in square brackets,
[…]
  • repetitons are shown in braces
{…}
  • alternatives are shown in parenthesis and separated by vertical bars,
(…|…|…)

The most complete lists of commands are probably at

Those are the commands implemented in the DBMS you are actually using. Since there are small variations from one implementation to the other, it is better to take one of this link as a reference in the future.

As a starting point, looking at the syntax for CREATE TABLE commands is probably a good start, cf. https://mariadb.com/kb/en/create-table/ or https://dev.mysql.com/doc/refman/8.0/en/create-table.html.


Problem 3.2 (Create and use a simple table in SQL)

This problem will guide you in manipulating a very simple table in SQL.

Pb 3.2 – Question 1

Log in as testuser, create a database named HW_Address, use it, and create two tables:

CREATE TABLE NAME(
    FName VARCHAR(15),
    LName VARCHAR(15),
    ID INT,
    PRIMARY KEY(ID)
);

CREATE TABLE ADDRESS(
    StreetName VARCHAR(15),
    Number INT,
    Habitants INT,
    PRIMARY KEY(StreetName, Number)
);
Pb 3.2 – Question 2

Observe the output produced by the command DESC ADDRESS;.

Pb 3.2 – Question 3

Add a foreign key to the ADDRESS table, using

ALTER TABLE ADDRESS 
    ADD FOREIGN KEY (Habitants)
    REFERENCES NAME(ID);

And observe the new output produced by the command DESC ADDRESS;.

Is it what you would have expected? How informative is it? Can you think of a command that would output more detailled information, including a reference to the existence of the foreign key?

Pb 3.2 – Question 4

Draw the relational model corresponding to that database and identify the primary and foreign keys.

Pb 3.2 – Question 5

Add this data to the NAME table:

INSERT INTO NAME VALUES ('Barbara', 'Liskov', 003);
INSERT INTO NAME VALUES ('Tuong Lu', 'Kim', 004);
INSERT INTO NAME VALUES ('Samantha', NULL, 080);

What command can you use to display this infomation back? Do you notice anything regarding the values we entered for the ID attribute?

Pb 3.2 – Question 6

Add some data into the ADDRESS table:

INSERT INTO ADDRESS
    VALUES
    ('Armstrong Drive', 10019, 003),
    ('North Broad St.', 23, 004),
    ('Robert Lane', 120, NULL);

What difference do you notice with the insertions we made in the NAME table? Which syntax seems more easy to you?

Pb 3.2 – Question 7

Write a SELECT statement that returns the ID number of the person whose first name is “Samantha”.

Pb 3.2 – Question 8

Write a statement that violates the entity integrity constraint. What is the error message returned?

Pb 3.2 – Question 9

Execute an UPDATE statement that violates the referential integrity constraint. What is the error message returned?

Pb 3.2 – Question 10

Write a statement that violates another kind of constraint. Explain what constraint you are violating and explain the error message.


Problem 3.3 (Duplicate rows in SQL)

Log in as testuser and create a database titled HW_REPETITION. Create in that database a table (the following questions refer to this table as EXAMPLE, but you are free to name it whatever you want) with at least two attributes that have different data types. Do not declare a primary key yet. Answer the following:

Pb 3.3 – Question 1

Add a tuple to your table using

INSERT INTO EXAMPLE VALUES(X, Y);

where the X and Y are values have the right datatype. Try to add this tuple again. What do you observe? (You can use SELECT * FROM EXAMPLE; to observe what is stored in this table.)

Pb 3.3 – Question 2

Alter your table to add a primary key, using

ALTER TABLE EXAMPLE ADD PRIMARY KEY (Attribute);

where Attribute is the name of the attribute you want to be a primary key. What do you observe?

Pb 3.3 – Question 3

Empty your table using

DELETE FROM EXAMPLE;

and alter your table to add a primary key, using the command we gave at the previous step. What do you observe?

Pb 3.3 – Question 4

Try to add the same tuple twice. What do you observe?


Problem 3.4 (Constraints on foreign keys)

From the notes, recall the following about foreign keys:

Two important remarks:

  • The datatype of the foreign key has to be the exactly the same as the datatype of the attribute to which we are referring.
  • The target of the foreign key must be the primary key.

But, the situation is slightly more complex. Test for yourself by editing the following code as indicated:

/* code/sql/HW_FKtest.sql */
DROP SCHEMA IF EXISTS HW_FKtest;

CREATE SCHEMA HW_FKtest;

USE HW_FKtest;

CREATE TABLE TARGET (
  Test VARCHAR(15) PRIMARY KEY
);

CREATE TABLE SOURCE (
  Test VARCHAR(25),
  FOREIGN KEY (Test) REFERENCES TARGET (Test)
);
HW_FKtest.sql
  1. Remove the PRIMARY KEY constraint.
  2. Replace PRIMARY KEY with UNIQUE.
  3. Replace one of the VARCHAR(25) with CHAR(25).
  4. Replace one of the VARCHAR(25) with INT.
  5. Replace one of the VARCHAR(25) with VARCHAR(15)
  6. Once you have edited and run the program in all of its modified versions, adjust the remarks above to better reflect the reality of the implementation we are using.

Problem 3.5 (Revisiting the PROF table)

Create the PROF, DEPARTMENT, STUDENT and GRADE tables as in the “Constructing and populating a new example” section. Populate them with some data (copy it from the notes or come up with your own data).

To obtain exactly the same schema as the one we developped and edited, you can use mysqldump to “dump” this table, with a command like

mysqldump -u testuser -ppassword\
    -h localhost --add-drop-database\
    --skip-comments --compact\
    HW_ProfExample > dump.sql

The code we studied during the lecture is more or less the following.

/* code/sql/HW_ProfExampleRevisitedRevisited.sql */
DROP SCHEMA IF EXISTS HW_ProfExampleRevisited;

CREATE SCHEMA HW_ProfExampleRevisited;

USE HW_ProfExampleRevisited;

CREATE TABLE PROF (
  Login VARCHAR(25) PRIMARY KEY,
  NAME VARCHAR(25),
  Department CHAR(5)
);

CREATE TABLE DEPARTMENT (
  Code CHAR(5) PRIMARY KEY,
  NAME VARCHAR(25),
  Head VARCHAR(25),
  FOREIGN KEY (Head) REFERENCES PROF (LOGIN) ON UPDATE CASCADE
);

ALTER TABLE PROF
  ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT (Code);

CREATE TABLE STUDENT (
  Login VARCHAR(25) PRIMARY KEY,
  NAME VARCHAR(25),
  Registered DATE,
  Major CHAR(5),
  FOREIGN KEY (Major) REFERENCES DEPARTMENT (Code)
);

CREATE TABLE GRADE (
  Login VARCHAR(25),
  Grade INT,
  PRIMARY KEY (LOGIN, Grade),
  FOREIGN KEY (LOGIN) REFERENCES STUDENT (LOGIN)
);

INSERT INTO DEPARTMENT
VALUES (
  'MATH',
  'Mathematics',
  NULL),
(
  'CS',
  'Computer
    Science',
  NULL);

INSERT INTO DEPARTMENT (
  Code,
  Name)
VALUES (
  'CYBR',
  'Cyber Secturity');

INSERT INTO PROF (
  LOGIN,
  Department,
  Name)
VALUES (
  'caubert',
  'CS',
  'Clément Aubert');

INSERT INTO PROF (
  LOGIN,
  Name,
  Department)
VALUES (
  'aturing',
  'Alan Turing',
  'CS'),
(
  'perdos',
  'Paul
    Erdős',
  'MATH'),
(
  'bgates',
  'Bill Gates',
  'CYBR');

INSERT INTO STUDENT (
  LOGIN,
  Name,
  Registered,
  Major)
VALUES (
  'jrakesh',
  'Jalal Rakesh',
  DATE '2017-12-01',
  'CS'),
(
  'svlatka',
  'Sacnite Vlatka',
  '2015-03-12',
  'MATH'),
(
  'cjoella',
  'Candice Joella',
  '20120212',
  'CYBR'),
(
  'aalyx',
  'Ava Alyx',
  20121011,
  'CYBR'),
(
  'caubert',
  'Clément Aubert',
  NULL,
  'CYBR');

INSERT INTO GRADE
VALUES (
  'jrakesh',
  3.8),
(
  'svlatka',
  2.5);
HW_ProfExampleRevisitedRevisited.sql

We will resume working on this model, and enhance it.

Pb 3.5 – Question 1

Draw the complete relational model for this database (i.e., for the PROF, DEPARTMENT, STUDENT and GRADE relations).

Pb 3.5 – Question 2

Create and populate a LECTURE table as follows:

  • It should have four attributes: Name, Instructor, Code, and Year, of types VARCHAR(25) for the first two, CHAR(5) for Code, and YEAR(4) for Year.
  • The Year and Code attributes should be the primary key (yes, have two attributes be the primary key).
  • The Instructor attribute should be a foreign key referencing the Login attribute in PROF.
  • Populate the LECTURE table with some made-up data.

Try to think about some of the weakenesses of this representation. For instance, can it accomodate two instructors for the same class? Write down two possibles scenarios in which this schema would not be appropriate.

Pb 3.5 – Question 3

The GRADE table had some limitations too. For example, every student could have only one grade. Add two columns to the GRADE table using:

ALTER TABLE GRADE
    ADD COLUMN LectureCode CHAR(5),
    ADD COLUMN LectureYear YEAR(4);

Add a foreign key:

ALTER TABLE GRADE
    ADD FOREIGN KEY (LectureYear, LectureCode)
    REFERENCES LECTURE(Year, Code);

Use DESCRIBE and SELECT to observe the schema of the GRADE table and its rows. Is it what you would have expected?

Pb 3.5 – Question 4
Update the tuples in GRADE with some made-up data. Now every row should contain, in addition to a login and a grade, a lecture year and a lecture code.
Pb 3.5 – Question 5
Update the relational model you previously drew to reflect the new situation of your tables.
Pb 3.5 – Question 6

Write SELECT statements answering the following questions (where PROF.Name, LECTURE.Name, YYYY, LECTURE.Code and STUDENT.Login should be relevant values considering your data):

  1. “Could you give me the logins and grades of the students who took LECTURE.Name in YYYY?”
  2. “Could you list the instructors who taught in year YYYY without any duplicates?”
  3. “Could you list the name and grade of all the student who ever took the class LECTURE.Code?”
  4. “Could you tell me which years was the class LECTURE.Code taught?”
  5. “Could you list the other classes taught the same year as the class LECTURE.Code?”
  6. “Could you print the names of the students who registered after STUDENT.Login?”
  7. “Could you tell me how many departments’ heads are teaching this year?”

Problem 3.6 (TRAIN table and more advanced SQL coding)

Look at the SQL code below and then answer the following questions.

/* code/sql/HW_Train.sql */
CREATE TABLE TRAIN (
  ID VARCHAR(30),
  Model VARCHAR(30),
  ConstructionYear YEAR (4)
);

CREATE TABLE CONDUCTOR (
  ID VARCHAR(20),
  NAME VARCHAR(20),
  ExperienceLevel VARCHAR(20)
);

CREATE TABLE ASSIGNED_TO (
  TrainId VARCHAR(20),
  ConductorId VARCHAR(20),
  Day DATE,
  PRIMARY KEY (TrainId, ConductorId)
);
HW_Train.sql
Pb 3.6 – Question 1

Modify the CREATE statement that creates the TRAIN table (lines 1–5), so that ID would be declared as the primary key. It is sufficient to only write the line(s) that need to change.

Pb 3.6 – Question 2

Write an ALTER statement that makes ID become the primary key of the CONDUCTOR table.

Pb 3.6 – Question 3

Modify the CREATE statement that creates the ASSIGNED_TO table (lines 13–18), so that it has two foreign keys: ConductorId references the ID attribute in CONDUCTOR and TrainId references the ID attribute in TRAIN. It is sufficient to only write the line(s) that need to change.

Pb 3.6 – Question 4

Write INSERT statements that insert one tuple of your choosing in each relation (no NULL values). These statements should respect all the constraints (including the ones we added in the previous questions) and result in actual insertions. (Remember that four digits is a valid value for an attribute with the YEAR(4) datatype.)

Pb 3.6 – Question 5

Write a statement that sets the value of the ExperienceLevel attribute to “Senior” in all the tuples where the ID attribute is “GP1029” in the CONDUCTOR relation.

Pb 3.6 – Question 6

Write a SELECT statement that answers each of the following questions:

  1. “What are the identification numbers of the trains?”
  2. “What are the names of the conductors with a”Senior” experience level?”
  3. “What are the construction years of the”Surfliner” and “Regina” models that we have?”
  4. “What is the ID of the conductor that was responsible of the train referenced”K-13” on 2015/12/14?”
  5. “What are the models that were ever conducted by the conductor whose ID is”GP1029”?”

Problem 3.7 (Read, correct, and write SQL statements for the COFFEE database)

Suppose we have the relational model depicted below, with the indicated data in it:

COFFEE

Ref Origin TypeOfRoast PricePerPound
001 Brazil Light 8.90
121 Bolivia Dark 7.50
311 Brazil Medium 9.00
221 Sumatra Dark 10.25

CUSTOMER

CardNo Name Email
001 Bob Hill b.hill@isp.net
002 Ana Swamp swampa@nca.edu
003 Mary Sea brig@gsu.gov
004 Pat Mount pmount@fai.fr

SUPPLY

Provider Coffee
Coffee Unl. 001
Coffee Unl. 121
Coffee Exp. 311
Johns & Co. 221

PROVIDER

Name Email
Coffee Unl. bob@cofunl.com
Coffee Exp. pat@coffeex.dk
Johns & Co. NULL

In the following, we will assume that this model was implemented in a DBMS (MySQL or MariaDB), the primary keys being COFEE.Ref, CURTOMER.CardNo, SUPPLY.Provider and SUPPLY.Coffee, and PROVIDER.Name, and the foreign keys being as follows:

FavCoffee in the CUSTOMER relation refers to Ref in the COFFEE relation
Provider in the SUPPLY refers to Name in the PROVIDER relation
Coffee in the SUPPLY refers to Ref in the COFFEE relation

Read and write SQL commands for the following “what-if” scenarios. Assume that:

  1. Datatype do not matter: we use only strings and appropriate numerical datatypes.
  2. Every statement respects SQL’s syntax (there’s no “a semi-colon is missing” trap).
  3. None of these commands are actually executed; the data is always in the state depicted above.

You can use COFFEE.1 to denote the first tuple (or row) in COFFEE, and similarly for other relations and tuples (so that, for instance, SUPPLY.4 corresponds to "Johns & Co"., 221).

Pb 3.7 – Question 1

Draw the relational model of this table.

Pb 3.7 – Question 2
Determine if the following insertion statements would violate the the entity integrity constraint, (“primary key cannot be NULL and should be unique”), the referential integrity constraint (“the foreign key must refer to something that exists”), if there would be some other kind of error (ignoring the plausability / revelance of inserting that tuple), or if it would result in successful insertion.
INSERT INTO CUSTOMER VALUES(005, 'Bob Hill', NULL, 001);
INSERT INTO COFFEE VALUES(002, "Peru", "Decaf", 3.00);
INSERT INTO PROVIDER VALUES(NULL, "contact@localcof.com");
INSERT INTO SUPPLY VALUES("Johns  Co.", 121);
INSERT INTO SUPPLY VALUES("Coffee Unl.", 311, 221);
Pb 3.7 – Question 3

Assuming that the referential triggered action clause ON UPDATE CASCADE is used for each of the foreign keys in this database, list the tuples modified by the following statements:

UPDATE CUSTOMER SET FavCoffee = 001
    WHERE CardNo = 001;

UPDATE COFFEE SET TypeOfRoast = 'Decaf'
    WHERE Origin = 'Brazil';

UPDATE PROVIDER SET Name = 'Coffee Unlimited'
    WHERE Name = 'Coffee Unl.';

UPDATE COFFEE SET PricePerPound = 10.00
    WHERE PricePerPound > 10.00;
Pb 3.7 – Question 4
Assuming that the referential triggered action clause ON DELETE CASCADE is used for each of the foreign keys in this database, list the tuples modified by the following statements:
DELETE FROM CUSTOMER
    WHERE Name LIKE '%S%';

DELETE FROM COFFEE
    WHERE Ref = 001;

DELETE FROM SUPPLY
    WHERE Provider = 'Coffee Unl.'
        AND Coffee = 001;

DELETE FROM PROVIDER
    WHERE Name = 'Johns & Co.';
Pb 3.7 – Question 5
Assume that there is more data in our table than what was given at the beginning of the problem. Write SQL queries that answer the following questions:
  1. “What are the origins of your dark coffees?”
  2. “What is the reference of Bob’s favorite coffee?” (note: it does not matter if you return the favorite coffee of all the Bobs in the database.)
  3. “What are the names of the providers who did not give their email?”
  4. “How many coffees does Johns & co. provide us with?”
  5. “What are the names of the providers of my dark coffees?”

Problem 3.8 (Write select queries for the DEPARTMENT table)

Consider the following SQL code:

/* code/sql/HW_Department.sql */
CREATE TABLE DEPARTMENT (
  ID INT PRIMARY KEY,
  NAME VARCHAR(30)
);

CREATE TABLE EMPLOYEE (
  ID INT PRIMARY KEY,
  NAME VARCHAR(30),
  Hired DATE,
  Department INT,
  FOREIGN KEY (Department) REFERENCES DEPARTMENT (ID)
);

INSERT INTO DEPARTMENT
VALUES (
  1,
  "Storage"),
(
  2,
  "Hardware");

INSERT INTO EMPLOYEE
VALUES (
  1,
  "Bob",
  20100101,
  1),
(
  2,
  "Samantha",
  20150101,
  1),
(
  3,
  "Mark",
  20050101,
  2),
(
  4,
  "Karen",
  NULL,
  1),
(
  5,
  "Jocelyn",
  20100101,
  1);
HW_Department.sql

Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.

  1. The name of the employees working in the Storage department ("Bob", "Samantha", "Karen" and "Jocelyn"),
  2. The name of the employee that has been hired for the longest period of time ("Mark"),
  3. The name(s) of the employee(s) from the Storage department who has(have) been hired for the longest period of time. Phrased differently, the oldest employees of the Storage department ("Bob" and "Jocelyn").

Problem 3.9 (Write select queries for the COMPUTER table)

Consider the following SQL code:

/* code/sql/HW_Computer.sql */
DROP SCHEMA IF EXISTS HW_Computer;

CREATE SCHEMA HW_Computer;

USE HW_Computer;

CREATE TABLE COMPUTER (
  ID VARCHAR(20) PRIMARY KEY,
  Model VARCHAR(40)
);

CREATE TABLE PRINTER (
  ID VARCHAR(20) PRIMARY KEY,
  Model VARCHAR(40)
);

CREATE TABLE CONNEXION (
  Computer VARCHAR(20),
  Printer VARCHAR(20),
  PRIMARY KEY (Computer, Printer),
  FOREIGN KEY (Computer) REFERENCES COMPUTER (ID),
  FOREIGN KEY (Printer) REFERENCES PRINTER (ID)
);

INSERT INTO COMPUTER
VALUES (
  'A',
  'DELL A'),
(
  'B',
  'HP X'),
(
  'C',
  'ZEPTO D'),
(
  'D',
  'MAC Y');

INSERT INTO PRINTER
VALUES (
  '12',
  'HP-140'),
(
  '13',
  'HP-139'),
(
  '14',
  'HP-140'),
(
  '15',
  'HP-139');

INSERT INTO CONNEXION
VALUES (
  'A',
  '12'),
(
  'A',
  '13'),
(
  'B',
  '13'),
(
  'C',
  '14');
HW_Computer.sql

Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.

  1. The number of computers connected to the printer whose ID is '13' (2).
  2. The number of different models of printers (2).
  3. The model(s) of the printer(s) connected to the computer whose ID is 'A' ('HP-140' and 'HP-139').
  4. The ID(’s) of the computer(s) not connected to any printer ('D').

Problem 3.10 (Write select queries for the SocialMedia schema)

Consider the following SQL code:

/* code/sql/HW_SocialMedia.sql */
CREATE TABLE ACCOUNT (
  ID INT PRIMARY KEY,
  NAME VARCHAR(25),
  Email VARCHAR(25) UNIQUE
);

CREATE TABLE SUBSCRIBE (
  Subscriber INT,
  Subscribed INT,
  DATE DATE,
  FOREIGN KEY (Subscriber) REFERENCES ACCOUNT (ID),
  FOREIGN KEY (Subscribed) REFERENCES ACCOUNT (ID),
  PRIMARY KEY (Subscriber, Subscribed)
);

CREATE TABLE VIDEO (
  ID INT PRIMARY KEY,
  Title VARCHAR(25),
  Released DATE,
  Publisher INT,
  FOREIGN KEY (Publisher) REFERENCES ACCOUNT (ID)
);

CREATE TABLE THUMBS_UP (
  Account INT,
  Video INT,
  DATE DATE,
  PRIMARY KEY (Account, Video),
  FOREIGN KEY (Account) REFERENCES ACCOUNT (ID),
  FOREIGN KEY (Video) REFERENCES VIDEO (ID)
);

INSERT INTO ACCOUNT
VALUES (
  1,
  "Bob Ross",
  "bob@ross.com"),
(
  2,
  NULL,
  "anon@fai.com"),
(
  3,
  "Martha",
  NULL);

INSERT INTO SUBSCRIBE
VALUES (
  2,
  1,
  DATE "2020-01-01"),
(
  3,
  1,
  DATE "2019-03-03"),
(
  3,
  2,
  DATE "2019-03-03"),
(
  2,
  2,
  DATE "2019-03-03"),
(
  1,
  2,
  DATE "2019-03-03");

-- The first entry means that 2 subscribed to 1, not the
--		   other way around.
--		    And similarly for the other entries.
INSERT INTO VIDEO
VALUES (
  10,
  "My first video!",
  DATE "2020-02-02",
  1),
(
  20,
  "My second video!",
  DATE "2020-02-03",
  1),
(
  30,
  "My vacations",
  DATE "2020-02-04",
  2);

INSERT INTO THUMBS_UP
VALUES (
  2,
  10,
  DATE "2020-02-02"),
(
  3,
  10,
  DATE "2020-02-02"),
(
  2,
  20,
  DATE "2020-02-02"),
(
  1,
  30,
  DATE "2020-02-05");
HW_SocialMedia.sql

Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.

  1. The title of all the videos ("My first video!", "My second video!", "My vacations").
  2. The release date of the video whose title is "My first video!" ("2020-02-02").
  3. The ID of the account(s) where the “Name” attribute was not given ("2").
  4. The ID of the videos whose title contains the word "video" ("10", "20").
  5. The number of thumbs up for the video with title "My vacations" ("1").
  6. The title of the oldest video ("My first video!").
  7. The names of the accounts who gave a thumbs up to the video with ID 30 ("Bob Ross").
  8. The ID of the account with the greatest number of subscribers ("2").

Problem 3.11 (Write select queries for a variation of the COMPUTER table)

Consider the following SQL code:

CREATE TABLE COMPUTER (
  ID VARCHAR(20) PRIMARY KEY,
  Model VARCHAR(40)
);

CREATE TABLE PERIPHERAL (
  ID VARCHAR(20) PRIMARY KEY,
  Model VARCHAR(40),
  TYPE ENUM ('mouse', 'keyboard', 'screen', 'printer')
);

CREATE TABLE CONNEXION (
  Computer VARCHAR(20),
  Peripheral VARCHAR(20),
  PRIMARY KEY (Computer, Peripheral),
  FOREIGN KEY (Computer) REFERENCES COMPUTER (ID),
  FOREIGN KEY (Peripheral) REFERENCES PERIPHERAL (ID)
);

INSERT INTO COMPUTER
VALUES (
  'A',
  'Apple IIc Plus'),
(
  'B',
  'Commodore SX-64');

INSERT INTO PERIPHERAL
VALUES (
  '12',
  'Trendcom Model',
  'printer'),
(
  '14',
  'TP-10
    Thermal Matrix',
  'printer'),
(
  '15',
  'IBM Selectric',
  'keyboard');

INSERT INTO CONNEXION
VALUES (
  'A',
  '12'),
(
  'B',
  '14'),
(
  'A',
  '15');
HW_ComputerVariation.sql

Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.

  1. The model of the computer whose ID is 'A' ('Apple IIc Plus').
  2. The type of the peripheral whose ID is '14' (printer).
  3. The model of the printers (Trendcom Model, TP-10 Thermal Matrix).
  4. The model of the peripherals whose NAME starts with 'IBM' ('IBM Selectric').
  5. The model of the peripherals connected to the computer whose ID is 'A' (Trendcom Model, IBM Selectric).
  6. The number of peripheral connected to the computer whose model is Apple IIc Plus (2).

Problem 3.12 (Improving a role-playing game with a relational model)

A friend of yours wants you to review and improve the code for a role-playing game.

The original idea was that each character has a name, a class (e.g., Bard, Assassin, Druid), a certain amount of experience, a level, one or more weapons (providing bonuses) and the ability to complete quests. A quest has a name and rewards the characters who completed it with a certain amount of experience and, on rare occaisions, with a special item.

Your friend came up with the following code:

CREATE TABLE CHARACTER(
    Name VARCHAR(30) PRIMARY KEY,
    Class VARCHAR(30),
    XP INT,
    LVL INT,
    Weapon_Name VARCHAR(30),
    Weapon_Bonus INT,
    Quest_Completed VARCHAR(30)
);

CREATE TABLE QUEST(
    ID VARCHAR(20) PRIMARY KEY,
    Completed_By VARCHAR(30),
    XP_Gained INT,
    Special_Item VARCHAR(20),
    FOREIGN KEY (Completed_By) REFERENCES CHARACTER(Name)
);

ALTER TABLE CHARACTER
    ADD FOREIGN KEY (Quest_Completed) REFERENCES QUEST(ID);

However, there are several problems with the code:

  • A character can have only one weapon. (All the attempts to “hack” the CHARACTER table to add an arbitrary number of weapons ended up creating horrible messes.)
  • Every time a character completes a quest, a copy of the quest must also be created. (Your friend is not so sure why, but nothing else works. Also it seems that a character can complete only one quest, but your friend is not sure about that either.)
  • It would be nice to be able to store features that are tied to the class, not the character, like the bonuses they provide and their associated elements (e.g., all bards use fire, all assassins use wind, etc.), but your friend simply cannot figure out how to make that happen.

Can you provide a relational model (there is no need to write the SQL code, but do remember to indicate the primary and foreign keys) that would solve all of your friend’s troubles?

Problem 3.13 (A simple database for books)

Consider the following code:

/* code/sql/HW_SimpleBook.sql */
DROP SCHEMA IF EXISTS HW_SimpleBook;

CREATE SCHEMA HW_SimpleBook;

USE HW_SimpleBook;

CREATE TABLE AUTHOR (
  FName VARCHAR(30),
  LName VARCHAR(30),
  Id INT PRIMARY KEY
);

CREATE TABLE PUBLISHER (
  NAME VARCHAR(30),
  City VARCHAR(30),
  PRIMARY KEY (NAME, City)
);

CREATE TABLE BOOK (
  Title VARCHAR(30),
  Pages INT,
  Published DATE,
  PublisherName VARCHAR(30),
  PublisherCity VARCHAR(30),
  FOREIGN KEY (PublisherName, PublisherCity) REFERENCES
    PUBLISHER (NAME, City),
  Author INT,
  FOREIGN KEY (Author) REFERENCES AUTHOR (Id),
  PRIMARY KEY (Title, Published)
);

INSERT INTO AUTHOR
VALUES (
  "Virginia",
  "Wolve",
  01),
(
  "Paul",
  "Bryant",
  02),
(
  "Samantha",
  "Carey",
  03);

INSERT INTO PUBLISHER
VALUES (
  "Gallimard",
  "Paris"),
(
  "Gallimard",
  "New-York"),
(
  "Jobs Pub.",
  "New-York");

INSERT INTO BOOK
VALUES (
  "What to eat",
  213,
  DATE '20170219',
  "Gallimard",
  "Paris",
  01),
(
  "Where to live",
  120,
  DATE '20130212',
  "Gallimard",
  "New-York",
  02),
(
  "My Life, I",
  100,
  DATE '18790220',
  "Gallimard",
  "Paris",
  03),
(
  "My Life, II",
  100,
  DATE '18790219',
  "Jobs Pub.",
  "New-York",
  03);
HW_SimpleBook.sql

The values inserted in the database is just to provide some examples; you should assume there is more data in it than what we have inserted. In this long problem, you will be asked to write commands to select, update, delete, insert data, and to improve upon the relational model.

Pb 3.13 – Question 1

Write a command that selects:

  1. The Title of all the books.
  2. The distinct Name of the publishers.
  3. The Titles and Published dates of the books published since January 31, 2012.
  4. The first and last names of the authors published by "Gallimard" (from any city).
  5. The first and last names of the authors who were not published by an editor in "New-York".
  6. The ID of the authors who published a book whose name starts with "Where".
  7. The total number of pages in the database.
  8. The number of pages in the longest book written by the author whose last name is "Wolve".
  9. The titles of the books published in the 19th century.
Pb 3.13 – Question 2
Write a command that updates the title of all the books written by the author whose ID is 3 to "BANNED". Is there any reason for this command to be rejected by the system? If yes, explain the reason.
Pb 3.13 – Question 3
Write one or multiple commands that would delete the author whose ID is 3 and all the books written by that author. Make sure you do not violate any foreign key constraints.
Pb 3.13 – Question 4
Write a command that would create a table used to record the awards granted to authors for particular books. Assume that each award has its own name, is awarded every year, and that it is awarded to an author for a particular book. Pick appropriate attributes, datatypes16, primary and foreign keys, and, as always, avoid redundancy.
Pb 3.13 – Question 5
Draw the relational model of the database you created (including all the relations given in the code and the ones you added).
Pb 3.13 – Question 6
Discuss two limitations of the model and how to improve it.

Problem 3.14 (A database for website certificates)

A certificate for a website has a serial number (SN) and a common name (CN). It must belong to an organization and be signed by a certificate authority (CA). The organization and CA must both have an SN and a CN. A CA can be trusted, not trusted, or not evaluated. The code below is an attempt to represent this situation and is populated with examples.

CREATE TABLE ORGANIZATION(
    SN VARCHAR(30) PRIMARY KEY,
    CN VARCHAR(30)
    );

CREATE TABLE CA(
    SN VARCHAR(30) PRIMARY KEY,
    CN VARCHAR(30),
    Trusted BOOL
    );

CREATE TABLE CERTIFICATE(
    SN VARCHAR(30) PRIMARY KEY,
    CN Varchar(30),
    Org VARCHAR(30) NOT NULL,
    Issuer VARCHAR(30) NOT NULL,
    Valid_Since DATE,
    Valid_Until DATE,
    FOREIGN KEY (Org) 
        REFERENCES ORGANIZATION(SN)
        ON DELETE CASCADE,
    FOREIGN KEY (Issuer) REFERENCES CA(SN)
    );

INSERT INTO ORGANIZATION VALUES
    ('01', 'Wikimedia Foundation'),
    ('02', 'Free Software Foundation');

INSERT INTO CA VALUES
    ('A', "Let's Encrypt", true),
    ('B', 'Shady Corp.', false),
    ('C', 'NewComer Ltd.', NULL);

INSERT INTO CERTIFICATE VALUES
    ('a', '*.wikimedia.org', '01', 'A', 
            20180101, 20200101),
    ('b', '*.fsf.org', '02', 'A',
            20180101, 20191010),
    ('c', '*.shadytest.org', '02', 'B',
            20190101, 20200101),
    ('d', '*.wikipedia.org', '01', 'C',
            20200101, 20220101);
  1. Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.
    1. The CN’s of all certificates ("*.wikimedia.org, \*.fsf.org, \*.shadytest.org, \*.wikipedia.org").
    2. The SN’s of the organizations whose CN contains "Foundation" ("01, 02").
    3. The CN’s and expiration dates of all the certificates that expired, assuming today is the 6th of December 2019 ("\*.fsf.org", 2019 − 10 − 10).
    4. The CN’s of the CA’s that are not trusted ("Shady Corp., NewComer Ltd."),
    5. The CN’s of the certificates that are signed by a CA that is not trusted ("\*.shadytest.org, \*.wikipedia.org").
    6. The number of certificates signed by the CA whose CN is "Let's encrypt" (2).
    7. A table listing the CN’s of the organizations along with the CN’s of their certificates ("Wikimedia Foundation, \*.wikimedia.org, Free Software Foundation, \*.fsf.org, Free Software Foundation, \*.shadytest.org, Wikimedia Foundation, \*.wikipedia.org").
  2. In this set-up, what happens if the following commands are issued? List all the entries that are modified or deleted, or specify if the command would not change anything and explain why.
    1. DELETE FROM CA WHERE SN = 'A';
    2. UPDATE ORGANIZATION SET CN = "FSF" WHERE SN = '02';
    3. UPDATE ORGANIZATION SET SN = "01" WHERE SN = '02';
    4. DELETE FROM ORGANIZATION;

Problem 3.15 (A simple database for published pieces of work)

Consider the following code:

/* code/sql/HW_Work.sql */
CREATE TABLE AUTHOR (
  NAME VARCHAR(30) PRIMARY KEY,
  Email VARCHAR(30)
);

CREATE TABLE WORK (
  Title VARCHAR(30) PRIMARY KEY,
  Author VARCHAR(30),
  FOREIGN KEY (Author) REFERENCES AUTHOR (NAME) ON DELETE
    CASCADE ON UPDATE CASCADE
);

CREATE TABLE BOOK (
  ISBN INT PRIMARY KEY,
  Work VARCHAR(30),
  Published DATE,
  Price DECIMAL(10, 2),
  FOREIGN KEY (WORK) REFERENCES WORK (Title) ON DELETE
    RESTRICT ON UPDATE CASCADE
);

CREATE TABLE EBOOK (
  ISBN INT PRIMARY KEY,
  Work VARCHAR(30),
  Published DATE,
  Price DECIMAL(10, 2),
  FOREIGN KEY (WORK) REFERENCES WORK (Title) ON DELETE
    RESTRICT ON UPDATE CASCADE
);

INSERT INTO AUTHOR
VALUES (
  "Virginia W.",
  "vw@isp.net"), -- A.1
(
  "Paul B.", "pb@isp.net"), -- A.2
(
  "Samantha T.", "st@fai.fr") -- A.3
;

INSERT INTO WORK
VALUES (
  "What to eat",
  "Virginia W.") -- W.1
;

INSERT INTO BOOK
VALUES (
  15155627,
  "What to eat",
  DATE '20170219',
  12.89) -- B.1
;

INSERT INTO EBOOK
VALUES (
  15155628,
  "What to eat",
  DATE '20170215',
  9.89) -- E.1
;
HW_Work.sql

Assume the following:

  1. Every statement respects SQL’s syntax (there’s no “a semi-colon is missing” trap).
  2. None of the commands in the rest of this problem are actually executed; they are for hypothetical “what if” questions.

Also, note that each row inserted between line 39 and 50 is given a name in comment ("A.1, A.2, A.3, W.1", etc.).

Pb 3.15 – Question 1
Draw the relational model corresponding to this series of commands.
Pb 3.15 – Question 2

Determine if the following insertion statements would violate the entity integrity constraint, the referential integrity constraint, if there would be some other kind of error, or if it would result in successful insertion.

INSERT INTO EBOOK VALUES (0, NULL, 20180101, 0);
INSERT INTO AUTHOR VALUES("Mary B.", "mb@fai.fr", NULL);
INSERT INTO WORK VALUES("My Life", "Claude A.");
INSERT INTO BOOK VALUES(00000000, NULL, DATE'20001225', 90.9);
INSERT INTO AUTHOR VALUES("Virginia W.", "alt@isp.net");
Pb 3.15 – Question 3

List the rows (A.2, W.1, etc.) modified by the following statements. Be careful about the conditions on foreign keys!

UPDATE AUTHOR SET Email = 'Deprecated' WHERE Email LIKE '%isp.net';
UPDATE WORK SET Title = "How to eat" WHERE Title = "What to eat";
DELETE FROM WORK;
DELETE FROM AUTHOR WHERE Name = "Virginia W.";
Pb 3.15 – Question 4

Assume that there is more data than what we inserted. Write a command that selects:

  • The prices of all the ebooks.
  • The distinct names of the authors who have authored a piece of work.
  • The names of the authors using fai.fr for their email domain.
  • The prices of the ebooks published after 2018.
  • The price of the most expensive book.
  • The number of the pieces of work written by the author whose name is “Virginia W..”
  • The email of the author who wrote the piece called “My Life.”
  • The ISBN’s of the books containing a work written by the author whose email is “vw@isp.net.”
Pb 3.15 – Question 5
Write a command that updates the title of all the pieces of work written by the author whose name is “Virginia W.” to “BANNED.” Is there any reason for this command to be rejected by the system? If yes, explain the reason.
Pb 3.15 – Question 6
Write one or multiple commands that would delete the work whose title is “My Life”, as well as all of the book and ebook versions of it.
Pb 3.15 – Question 7
Discuss two limitations of the model and how to improve it.

Problem 3.16 (A simple database for authors of textbooks)

Consider the following code:

/* code/sql/HW_TextbookAuthored.sql */
DROP SCHEMA IF EXISTS HW_TEXTBOOKAUTHORED;

CREATE SCHEMA HW_TEXTBOOKAUTHORED;

USE HW_TEXTBOOKAUTHORED;

CREATE TABLE TEXTBOOK (
  Title VARCHAR(50),
  ISBN CHAR(13) PRIMARY KEY,
  Price DECIMAL(10, 2)
);

CREATE TABLE AUTHOR (
  LName VARCHAR(30),
  FName VARCHAR(30),
  Email VARCHAR(30),
  PRIMARY KEY (Lname, Fname)
);

CREATE TABLE AUTHORED (
  Book CHAR(13),
  FOREIGN KEY (Book) REFERENCES TEXTBOOK (ISBN),
  AuthorLName VARCHAR(30),
  AuthorFName VARCHAR(30),
  FOREIGN KEY (AuthorLName, AuthorFName) REFERENCES AUTHOR
    (LName, Fname)
);

INSERT INTO TEXTBOOK
VALUES (
  'Starting Out with Java: Early Objects',
  9780133776744,
  30.00),
(
  'NoSQL for Mere Mortals',
  9780134023212,
  47.99);

INSERT INTO AUTHOR
VALUES (
  'Sullivan',
  'Dan',
  NULL),
(
  'Gaddis',
  'Tony',
  NULL);

INSERT INTO AUTHORED
VALUES (
  9780134023212,
  'Sullivan',
  'Dan'),
(
  9780133776744,
  'Gaddis',
  'Tony');
HW_TextbookAuthored.sql

The meaning of the AUTHORED table is that a tuple  < I, L, F> represents that the author whose last name is L and whose first name is F wrote the textbook whose ISBN is I.

Answer the following:

  1. Write a command that updates the email address of ‘Gaddis’, ‘Tony.’
  2. Write a command that inserts a textbook of your choice into the TEXTBOOK table. No value should be NULL.
  3. Write a command that makes ‘Gaddis’, ‘Tony’ the author of the textbook you just added to our database.
  4. Write a command that makes “0.01” the default value for the Price attribute of the TEXTBOOK relation.
  5. Write a command that inserts a textbook of your choice in the TEXTBOOK table and have the price set to the default value.
  6. Write a command that creates a table called EDITOR with three attributes: Name, Address, and Website. The Name attribute should be the primary key. Insert two tuples in the EDITOR table, making sure that one should has the Name attribute set to “Pearson.”
  7. Write a command that creates a table called PUBLISHED with two attributes: Editor and Textbook. The Editor attribute should reference the EDITOR table and the Textbook attribute should reference the TEXTBOOK table.
  8. Write a command that makes “Pearson” the editor of the textbook whose ISBN is 9780133776744.

Answer the following short questions based on what is in our model so far:

  1. Can an author have authored more than one textbook?
  2. Can a textbook have more than one author?
  3. Can a textbook without an ISBN be inserted in the TEXTBOOK relation?
  4. Can the price of a textbook be negative?
  5. Can two authors have the same first and last names?
  6. Can two textbooks have the same title?
  7. Can two editors have the same address?

Problem 3.17 (A simple database for capstone projects)

Consider the following code:

/* code/sql/HW_Capstone.sql */
DROP SCHEMA IF EXISTS HW_CAPSTONE;

CREATE SCHEMA HW_CAPSTONE;

USE HW_CAPSTONE;

CREATE TABLE STUDENT (
  FName VARCHAR(50),
  Id CHAR(13) PRIMARY KEY,
  GraduationYear INT,
  GraduationSemester ENUM ("Fall", "Spring", "Summer")
);

CREATE TABLE PROGRAMMING_LANGUAGE (
  NAME VARCHAR(50) PRIMARY KEY,
  Licence VARCHAR(50)
);

CREATE TABLE PROJECT (
  CodeName VARCHAR(50),
  Leader CHAR(13),
  PRIMARY KEY (CodeName, Leader),
  FOREIGN KEY (Leader) REFERENCES STUDENT (Id)
);

CREATE TABLE USED_LANGUAGE (
  ProjectCodeName VARCHAR(50),
  ProjectLeader CHAR(13),
  UsedLanguage VARCHAR(50),
  PRIMARY KEY (ProjectCodeName, ProjectLeader, UsedLanguage),
  FOREIGN KEY (ProjectCodeName, ProjectLeader) REFERENCES
    PROJECT (CodeName, Leader),
  FOREIGN KEY (UsedLanguage) REFERENCES PROGRAMMING_LANGUAGE (NAME)
);


/*
 */
INSERT INTO STUDENT
VALUES (
  "Mary",
  "0123456789100",
  2025,
  "Summer"),
(
  "Steve",
  "0000000000000",
  2025,
  "Fall"),
(
  "Claude",
  "9999999999999",
  2024,
  "Fall"),
(
  "Meghan",
  "0987654321098",
  2023,
  "Spring");

INSERT INTO PROGRAMMING_LANGUAGE
VALUES (
  "Rust",
  "MIT"),
(
  ".NET Core",
  "MIT"),
(
  "Racket",
  "LGPL"),
(
  "Python",
  "PSF");

-- Taken from
-- https://en.wikipedia.org/wiki/Comparison_of_open-source_programming_language_licensing
INSERT INTO PROJECT
VALUES (
  "Brick Break",
  "0123456789100"),
(
  "Brick Break",
  "0000000000000"),
(
  "Grade Calculator",
  "0123456789100"),
(
  "Undecided",
  "9999999999999");

INSERT INTO USED_LANGUAGE
VALUES (
  "Brick Break",
  "0123456789100",
  "Rust"),
(
  "Brick Break",
  "0000000000000",
  ".NET Core"),
(
  "Brick Break",
  "0000000000000",
  "Python"),
(
  "Grade Calculator",
  "0123456789100",
  "Racket");
HW_Capstone.sql

The meaning of the USED_LANGUAGE table is that a tuple   < N, L, U> represents the fact that the project whose code name is N and whose leader is L uses the programming language U.

Pb 3.17 – Question 1

Answer the following short questions based on the model implemented above. You can simply answer “True” or “False”, or justify your reasoning (e.g. with code).

  1. Can a project uses multiple programming languages?
  2. Can a student be the leader of multiple projects?
  3. Can multiple projects have the same code name?
  4. Could Claude simply enter NULL for the value of his project’s code name, since he’s undecided?
  5. Can a project be created without project leader?
  6. Can we know who is working on a project without being its leader?
Pb 3.17 – Question 2

Draw the relational model corresponding to this code.

Pb 3.17 – Question 3

Write the following commands.

  1. Write a command that insert a new student in the STUDENT table.
  2. Write a command that updates the code name of the project (“Undecided”, “9999999999999”) to “VR in ER”.
  3. Write a command that updates the graduation year of the student whose id is “0987654321098” to 2024, and the semester to “Fall”.
  4. Write a command that changes the STUDENT table to make it impossible to enter NULL for the first name of a student, without changing the primary key.
  5. Write a command that changes the datatype of GraduationYear to SMALLINT.
  6. Write a command that adds an attribute “ReleaseDate” to the PROJECT table.
  7. If you managed to write the previous command correctly, write a command that sets the release date of the project (“Brick Break”, “0123456789100”) to the 26th of November 2022.
  8. Write a command that makes it impossible for a student to be the leader in more than one project

Problem 3.18 (A simple database for vaccines)

Consider the following code:

/* code/sql/HW_Vaccine.sql */
CREATE TABLE COMPANY (
  Name VARCHAR(50) PRIMARY KEY,
  Website VARCHAR(255) CHECK (Website LIKE "https://%")
);

CREATE TABLE DISEASE (
  Name VARCHAR(50) PRIMARY KEY,
  Communicable BOOL,
  -- Whether the disease can be transmitted from a human to
  --      another.
  TYPE ENUM ("infectious", "deficiency", "hereditary")
);

CREATE TABLE VACCINE (
  Name VARCHAR(50) PRIMARY KEY,
  Manufacturer VARCHAR(50) NOT NULL,
  FOREIGN KEY (Manufacturer) REFERENCES COMPANY (NAME) ON
    UPDATE CASCADE
);

CREATE TABLE EFFICACY (
  DiseaseName VARCHAR(50),
  VaccineName VARCHAR(50),
  Efficacy DECIMAl(5, 2),
  PRIMARY KEY (DiseaseName, VaccineName),
  FOREIGN KEY (DiseaseName) REFERENCES DISEASE (NAME),
  FOREIGN KEY (VaccineName) REFERENCES VACCINE (NAME)
);

INSERT INTO COMPANY
VALUES (
  "Moderna",
  "https://www.modernatx.com/");

INSERT INTO DISEASE
VALUES (
  "Coronavirus disease 2019",
  TRUE,
  "infectious");

INSERT INTO VACCINE
VALUES (
  "mRNA-1273",
  "Moderna");

INSERT INTO EFFICACY
VALUES (
  "Coronavirus disease 2019",
  "mRNA-1273",
  94.1);
HW_Vaccine.sql
Pb 3.18 – Question 1

Answer the following short questions. In our implementation…

  1. … can two companies have exactly the same name?
  2. … can two companies have the same website?
  3. … can a company not have a website?
  4. … can the same vaccine be manufactured by multiple companies?
  5. … can a vaccine not have a manufacturer?
  6. … can a disease being neither communicable nor not communicable?
  7. … can the same vaccine have different efficacies for different diseases?
Pb 3.18 – Question 2

Answer the following questions:

  1. What does CHECK (Website LIKE "https://*") do?

  2. Why did we picked the DECIMAl(5,2) datatype?

  3. What is the benefit / are the benefits of having a separate EFFICACY table over having something like

    CREATE TABLE VACCINE(
        Name VARCHAR(50) PRIMARY KEY,
        Manufacturer VARCHAR(50),
        Disease VARCHAR(50),
        Efficacy DECIMAl(5,2),
        FOREIGN KEY (Manufacturer) REFERENCES COMPANY (Name)
    );

?

Pb 3.18 – Question 3

Draw the relational model corresponding to this code.

Pb 3.17 – Question 4

Write the following commands.

  1. Write a command that insert “Pfizer” in the COMPANY table (you can make up the website or look it)
  2. Write a command that insert the “Pfizer-BioNTech COVID-19 Vaccine” in the VACCINE table, and a command that store the efficacy of that vaccine against the “Coronavirus disease 2019” disease (you can make up the values or look them up).
  3. Write a command that updates the name of the company “Moderna” to “Moderna, Inc.” everywhere.
  4. Write a command that lists the name of all the companies.
  5. Write a command that deletes the “Coronavirus disease 2019” entry from the DISEASE table (if only!). This command should return an error. Explain it and leave the command commented.
  6. Write two commands: one that adds “physiological” to the possible types of diseases, and one that inserts a physiological disease in the DISEASE table.
  7. Write a command that return the list of all the companies that manufacture a vaccine against “Coronavirus disease 2019”.

Problem 3.19 (A database for residencies)

Consider the following code:

/* code/sql/HW_Residency.sql */
DROP SCHEMA IF EXISTS HW_RESIDENCY;

CREATE SCHEMA HW_RESIDENCY;

USE HW_RESIDENCY;

CREATE TABLE PERSON (
  FName VARCHAR(40),
  LName VARCHAR(40),
  SSN VARCHAR(11) PRIMARY KEY,
  Birthdate DATE
);

CREATE TABLE HOUSE (
  Address VARCHAR(40) PRIMARY KEY,
  Color ENUM ("blue", "white", "green")
);

CREATE TABLE RESIDENCY (
  Person VARCHAR(11),
  House VARCHAR(40),
  PrincipalResidence BOOLEAN,
  Status ENUM ("own", "rent", "squat", "other"),
  FOREIGN KEY (Person) REFERENCES PERSON (SSN),
  FOREIGN KEY (House) REFERENCES HOUSE (Address) ON DELETE CASCADE
);

INSERT INTO PERSON
VALUES (
  NULL,
  "Doe",
  "000-00-0000",
  NULL), -- P.1
(
  "Michael", "Keal", "000-00-0001", DATE "1983-02-11"), -- P.2
(
  "James", "Baldwin", "000-00-0002", DATE
    "1967-01-01"), -- P.3
(
  "Mridula", "Warrier", "000-00-0003", DATE "1990-02-11");

-- P.4
INSERT INTO HOUSE
VALUES (
  "123 Main St.",
  "blue"), -- H.1
(
  "456 Second St.", "white"), -- H.2
(
  "11 Third St.", "blue");

-- H.3
INSERT INTO RESIDENCY
VALUES (
  "000-00-0001",
  "123 Main St.",
  TRUE,
  "own"), -- R.1
(
  "000-00-0001", "456 Second St.", FALSE, "own"), -- R.2
(
  "000-00-0002", "123 Main St.", TRUE, "rent"), -- R.3
(
  "000-00-0003", "456 Second St.", TRUE, "own");

-- R.4
HW_Residency.sql

Note that each row inserted in the PERSON, HOUSE and RESIDENCY tables is given the name and noted as afterwards as a comment ("P.1, P.2, P.3, P.4, H.1", etc.).

Answer the following questions and problems, assuming that none of the commands in the rest of the problem are actually executed.

Pb 3.19 – Question 1

Draw the relational model corresponding to this series of commands (it is not necessary to include the state).

Pb 3.19 – Question 2
Write a command that violates the entity integrity constraint.
Pb 3.19 – Question 3
Write a command that violates the referential integrity constraint.
Pb 3.19 – Question 4

List the rows (e.g. “P.2”, “H.1”, or “none”) modified by the following statements:

  1. UPDATE HOUSE SET COLOR = "green";
  2. DELETE FROM RESIDENCY WHERE House LIKE "1%";
  3. DELETE FROM HOUSE WHERE Address = "456 Second St.";
  4. DELETE FROM PERSON WHERE Birthdate=DATE"1990-02-11";
Pb 3.19 – Question 5

Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.

  1. The Addresses' of the houses in the system (“11 Third St., 123 Main St., 456 Second St.”`).
  2. The SSN’s of the people whose first name was not entered in the system ("000-00-0000").
  3. All the different colors of houses ("white, blue").
  4. The Address of the residency of "James Baldwin" ("123 Main St.").
  5. The first name of the oldest person in the database ("James").
  6. "Michael Keal"’s principal residency address ("123 Main St.").
  7. The distinct first and last names of the homeowners ("Michael Keal, Mridula Warrier").
  8. The SSN’s of the people that have the same principal residency as "James Baldwin" ("000-00-0001").
Pb 3.19 – Question 6
Write a command that updates the SSN of "James Baldwin" to "000-00-0010". Is there any reason for this command to be rejected by the system? If yes, explain the reason.
Pb 3.19 – Question 7

Answer the following short questions from the data in our model, as it is currently:

  1. Is it possible for two people to have the same last name?
  2. Is it possible for a person to have multiple principal residencies?
  3. Is it possible for a house to not be yellow?
  4. Is it possible for the SSN to be any series of 11 characters?
  5. Is it possible for a person to own any number of houses?
  6. Is it possible for a person to rent at most one house?
Pb 3.19 – Question 8
Consider the data currently in the RESIDENCY table and give a possible primary key.
Pb 3.19 – Question 9
Discuss why the primary key identified from the previous question for the RESIDENCY table is a good choice.

Problem 3.20 (A database for research fundings)

Consider the following code:

/* code/sql/HW_ScientificResearch.sql */
CREATE TABLE SCIENTIST (
  SSN INT PRIMARY KEY,
  Name VARCHAR(30) NOT NULL
);

CREATE TABLE PROJECT (
  Code CHAR(4) PRIMARY KEY,
  Name VARCHAR(150) NOT NULL
);

CREATE TABLE CONTRIBUTESTO (
  Scientist INT,
  Project CHAR(4),
  Hours INT,
  PRIMARY KEY (Scientist, Project),
  FOREIGN KEY (Scientist) REFERENCES SCIENTIST (SSN),
  FOREIGN KEY (Project) REFERENCES PROJECT (Code) ON DELETE
    CASCADE ON UPDATE CASCADE
);

CREATE TABLE FUNDINGAGENCY (
  Name VARCHAR(150) PRIMARY KEY,
  TYPE ENUM ("State", "Federal", "Foundation"),
  Creation YEAR
);

CREATE TABLE FUNDS (
  Agency VARCHAR(150),
  Project CHAR(4),
  Amount DECIMAL(12, 2),
  FOREIGN KEY (Agency) REFERENCES FUNDINGAGENCY (NAME) ON
    UPDATE CASCADE ON DELETE RESTRICT,
  FOREIGN KEY (Project) REFERENCES PROJECT (Code)
);

INSERT INTO SCIENTIST
VALUES (
  "000000000",
  "Mike"), -- S.1
(
  "000000001", "Sabine"), -- S.2
(
  "000000002", "James"), -- S.3
(
  "000000003", "Emily"), -- S.4
(
  "000000004", "Claire");

-- S.5
INSERT INTO PROJECT
VALUES (
  "AA",
  "Advancing Airplanes"), -- P.1
(
  "BA", "Better Airplanes"), -- P.2
(
  "BB", "Better Buildings"), -- P.3
(
  "CC", "Creative Creation");

-- P.4
INSERT INTO CONTRIBUTESTO
VALUES (
  "000000001",
  "AA",
  12), -- C.1
(
  "000000001", "BB", 10), -- C.2
(
  "000000002", "AA", 5), -- C.3
(
  "000000003", "BA", 3), -- C.4
(
  "000000000", "BB", 1), -- C.5
(
  "000000000", "AA", 1);

-- C.6
INSERT INTO FUNDINGAGENCY
VALUES (
  "National Science Foundation",
  "Federal",
  1950), -- FA.1
(
  "French-American Cultural Exchange", "Foundation", 2017);

-- FA.2
INSERT INTO FUNDS
VALUES (
  "National Science Foundation",
  "AA",
  100000), -- F.1
(
  "French-American Cultural Exchange", "CC", 10000);

-- F.2
HW_ScientificResearch.sql

Note that each row inserted in the tables is given a name and noted as afterwards as a comment ("S.1, S.2, P.1, C.1, FA.1", etc.).

Answer the following questions and problems, assuming that none of the commands in the rest of the problem are actually executed.

Pb 3.20 – Question 1

Draw the relational model corresponding to this series of commands (it is not necessary to include the state).

Pb 3.20 – Question 2

Draw the relational model corresponding to this series of commands (no need to include the state).

Pb 3.20 – Question 3

How could you edit line 12 so that negative values and NULL would not be admitted as values for Hours?

Pb 3.20 – Question 4

Write a command that would violate the referential integrity constraint.

Pb 3.20 – Question 5

List the rows affected (updated or deleted) by the following commands. If no rows are affected because the command would would violate the entity integrity constraint, the referential integrity constraint, or if there would be some other kind of error, please indicate it.

  1. UPDATE SCIENTIST SET SSN = "000000001" WHERE Name = "Claire";
  2. UPDATE FUNDINGAGENCY SET Name = "NSF" WHERE Name = "National Science Foundation";
  3. DELETE FROM FUNDINGAGENCY WHERE Name = "French-American Cultural Exchange";
Pb 3.20 – Question 6

Write a query that selects …(In parenthesis, the values returned in this set-up, but you have to be general.)

  1. …the name of the funding agencies created after 2000 ("French-American Cultural Exchange")

  2. …the code of the projects that contains the word "Airplanes" ("AA", "BA")

  3. …the number of hours scientists contributed to the project "AA" (18)

  4. …the code of the projects to which the scientist named Sabine contributed ("AA", "BB")

  5. …the name of the projects who benefited from federal funds ("Advancing Airplanes")

  6. …the name of the scientist who contributed to the same project as Mike ("Sabine", "James")

  7. …the name of the projects that are not funded by an agency ("Better Airplanes", "Better Buildings")

  8. …the name of the scientist who contributed the most (in terms of hours) to the project named "Advancing Airplanes" (Sabine).

Pb 3.20 – Question 7

Identify and discuss two limitations of this model, and offer a way to remedy at least one of them.


Problem 3.21 (Improving a Relational Model for a Printing Station)

Consider the following code:

   CREATE TABLE ROOM(
        Nickname VARCHAR(40) PRIMARY KEY,
        Size INT,
        ComputerOrPhoneInIt BOOL NOT NULL
    );

    CREATE TABLE COMPUTER(
        Nickname VARCHAR(40) PRIMARY KEY,
        OperatingSystem VARCHAR(50),
        Room VARCHAR(40),
        FOREIGN KEY (Room) REFERENCES ROOM(Nickname)  
    );

    CREATE TABLE PHONE(
        Nickname VARCHAR(40) PRIMARY KEY,
        OperatingSystem VARCHAR(50),
        Room VARCHAR(40),
        FOREIGN KEY (Room) REFERENCES ROOM(Nickname)  
    );


    CREATE TABLE PRINTER(
        Nickname VARCHAR(40) PRIMARY KEY,
        ConnectedTo VARCHAR(40),
        Room VARCHAR(40),
        FOREIGN KEY (Room) REFERENCES ROOM(Nickname)  
    );

It was written by some friends of yours to store data for their printing station: their shop offers computers, phones and printers located in different rooms, and they want to keep track of some information about those. They have multiple issues with this implementation, and require your help to identify them and design a new model addressing those.

Pb 3.21 – Question 1

For each issue listed below, explain what causes it, and if there is a way to address it (you do not need to write actual code, simply explain how you would proceed, using keywords if it clarifies).

  • They have a hard time coming up with different nicknames for their printers, computers, rooms and phones every time they add one.
  • The attribute ROOM.ComputerOrPhoneInIt is a bit cumbersome, as they have to remember to update it to FALSE if the last computer or phone is removed from a room.
  • The OperatingSystem attributes in PHONE and COMPUTER are not very convenient, as they do not provide an easy way for instance to list all the windows computers and phones, or all the 64-bits architectures. Examples of values are “Windows 10 IoT Core version 1.8.9, 64 bits”,“Android Pie”, “macOS 11 Big Sur, updated last week”, etc.
  • It seems that they cannot record when a printer is connected to more than one device.
Pb 3.21 – Question 2

Draw a relational model (no need to write sql code) that would improve their implementation. You are free (and encouraged!) to create new relations and alter the original attributes. No need to specify the domains unless you want to add particular constraints, but remember to draw the primary and foreign keys.


Problem 3.22 (Write select queries for a (third!) variation of the COMPUTER table)

Consider the following code:

CREATE TABLE COMPUTER (
    ID VARCHAR(20) PRIMARY KEY,
    Model VARCHAR(40)
);

CREATE TABLE PERIPHERAL (
    ID VARCHAR(20) PRIMARY KEY,
    Model VARCHAR(40),
    Type ENUM ('mouse', 'keyboard', 'screen', 'printer'),
    LastConnexion DATETIME
);

CREATE TABLE CONNEXION (
    Computer VARCHAR(20),
    Peripheral VARCHAR(20),
    PRIMARY KEY (Computer, Peripheral),
    FOREIGN KEY (Computer) REFERENCES COMPUTER (ID)
        ON DELETE CASCADE
        ON UPDATE CASCADE,
    FOREIGN KEY (Peripheral) REFERENCES PERIPHERAL (ID)
        ON DELETE RESTRICT
        ON UPDATE CASCADE
);


CREATE TRIGGER last_connexion_update
    BEFORE INSERT ON CONNEXION
    FOR EACH ROW
        UPDATE PERIPHERAL
        SET LastConnexion = NOW()
        WHERE NEW.Peripheral = PERIPHERAL.ID;

INSERT INTO COMPUTER
VALUES
    ('A','Apple IIc Plus'),  -- C.1
    ('B','Commodore SX-64'); -- C.2

INSERT INTO PERIPHERAL(ID, Model, Type)
VALUES
    ('12', 'Trendcom Model', 'printer'),       -- P.1
    ('14', 'TP-10 Thermal Matrix', 'printer'), -- P.2
    ('15', 'IBM Selectric', 'keyboard');       -- P.3

INSERT INTO CONNEXION
VALUES
    ('A', '12'),  -- X.1
    ('B', '14'),  -- X.2
    ('A', '15');  -- X.3
Pb 3.22 – Question 1
Draw the relational model corresponding to this series of commands (no need to include the state).
Pb 3.22 – Question 2
Fill the following table.
  True False
A peripheral can be connected to multiple computers    
The ID of a computer must be a letter    
Whether the connexion is wired or wireless can be determined    
Every computer must have a different model    
A peripheral can be a mouse and a keyboard at the same time    
A computer can be connected to multiple peripheral    
A computer can be connected to another computer    
A peripheral can be connected to another peripheral    
Pb 3.22 – Question 3

List the rows (i.e., C.2, X.1, or even “none”) deleted by the following statements:

  1. DELETE FROM CONNEXION WHERE Computer = 'A';
  2. DELETE FROM COMPUTER WHERE ID = 'A';
  3. DELETE FROM PERIPHERAL WHERE ID = '15';
  4. DELETE FROM CONNEXION WHERE Computer <> 'A';
Pb 3.22 – Question 4

Write a query that selects …(In parenthesis, the values returned in this set-up, but you have to be general.)

  1. …the type of the peripheral with Id 12 (printer)
  2. ID of the computer whose model name contain "Apple" (A).
  3. …the number of computer in the database (2).
  4. …all the different kind of peripheral, without duplication (printer, keyboard).
  5. …the ID of the computer connected to a keyboard (A)
  6. …the model of the computer connected to the "TP-10 Thermal Matrix" peripheral (Commodore SX-64).
Pb 3.22 – Question 5
Discuss what would happen after the command INSERT INTO CONNEXION VALUES ('B', '12'); is executed.

Solutions to Selected Problems

Solution to Problem 3.2 (Create and use a simple table in SQL)

This problem is supposed to be a straightforward application of what we studied in class. Look back at Setting Up Your Work Environment if you feel like you are stuck before referencing this solution.

Pb 3.2 – Solution to Q. 1

We simply log-in as indicated in the “Logging-in as testuser” section. Then we enter:

CREATE DATABASE HW_Address;
USE HW_Address;

This creates the tables asked for in the problem.

Pb 3.2 – Solution to Q. 2

Ommiting the Extra column, we have:

MariaDB [HW_Address]>     DESC ADDRESS;
+------------+-------------+------+-----+---------+
| Field      | Type        | Null | Key | Default |
+------------+-------------+------+-----+---------+
| StreetName | varchar(15) | NO   | PRI | NULL    |
| Number     | int(11)     | NO   | PRI | NULL    |
| Habitants  | int(11)     | YES  |     | NULL    |
+------------+-------------+------+-----+---------+
Pb 3.2 – Solution to Q. 3

We add the foreign key, still omitting the Extra column:

MariaDB [HW_Address]> DESC ADDRESS;
+------------+-------------+------+-----+---------+
| Field      | Type        | Null | Key | Default |
+------------+-------------+------+-----+---------+
| StreetName | varchar(15) | NO   | PRI | NULL    |
| Number     | int(11)     | NO   | PRI | NULL    |
| Habitants  | int(11)     | YES  | MUL | NULL    |
+------------+-------------+------+-----+---------+

The only difference is the MUL value, which is a bit surprising: quoting https://dev.mysql.com/doc/refman/8.0/en/show-columns.html,

If Key is MUL, then the column is the first column of a nonunique index in which multiple occurrences of a given value are permitted within the column.

In other words, this does not carry any information about the fact that ADDRESS.Habitants is now a foreign key referencing NAME.ID. A way of displaying information about that foreign key is using SHOW CREATE TABLE:

MariaDB [HW_Address]> SHOW CREATE TABLE ADDRESS;
+---------+----------------------+
| Table   | Create Table  
+---------+----------------------+
| ADDRESS | CREATE TABLE `ADDRESS` (
`StreetName` varchar(15) NOT NULL,
`Number` int(11) NOT NULL,
`Habitants` int(11) DEFAULT NULL,
PRIMARY KEY (`StreetName`,`Number`),
KEY `Habitants` (`Habitants`),
CONSTRAINT `ADDRESS_ibfk_1` FOREIGN KEY (`Habitants`) REFERENCES `NAME` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+---------+----------------------+
1 row in set (0.01 sec)
Pb 3.2 – Solution to Q. 4

NAME(FName, LName, ID (PK)) ADDRESS(StreetName (PK), Number (PK), Habitants (FK referencing NAME.ID))

Pb 3.2 – Solution to Q. 5

To display the information back, we can use

SELECT * FROM NAME;

We should notice that the ID attribute values lost their leading zeros.

Pb 3.2 – Solution to Q. 6

This syntax is better for “bulk insertion” since it allows for us to write fewer commands and to focus on the data being inserted. However, if an error occurs, then nothing gets inserted.

Pb 3.2 – Solution to Q. 7

SELECT ID FROM NAME WHERE FName = 'Samantha';

Pb 3.2 – Solution to Q. 8

This is a command that violates the entity integrity constraint:

INSERT INTO NAME VALUES ('Maria', 'Kashi', NULL);

The error message that it returns is:

ERROR 1048 (23000): Column 'ID' cannot be null

Another way of violating the entity integrity constraint is:

INSERT INTO NAME VALUES ('Maria', 'Kashi', 80);

The error message that it returns is:

ERROR 1062 (23000): Duplicate entry '80' for key 'PRIMARY'
Pb 3.2 – Solution to Q. 9

This is an UPDATE statement that violates the entity integrity constraint:

UPDATE ADDRESS SET Habitants = 340 WHERE Number = 120;

The error message that it returns is:

ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_Address`.`ADDRESS`, CONSTRAINT `ADDRESS_ibfk_1` FOREIGN KEY (`Habitants`) REFERENCES `NAME` (`ID`))
Pb 3.2 – Solution to Q. 10

Here is the query that violates another type of constraint:

INSERT INTO NAME VALUE ('Hi');

The error message that it returns is:

ERROR 1136 (21S01): Column count does not match value count at row 1

The query statement violates the implicit constraint by trying to insert a row with fewer values than there are attributes in the table.

Another example of a statement that violates another type of constraint is:

INSERT INTO ADDRESS VALUES ('Maria', 'Random', 98);

This is a violation of an explicit constraint, which is that the value must match the domain (datatype) of the attribute where it is inserted. However, MySQL and MariaDB do not return an error, they simply replace 'Random' with 0.


Solution to Problem 3.3 (Duplicate rows in SQL)

Here is how we created our table:

CREATE SCHEMA HW_REPETITION;
USE HW_REPETITION;

CREATE TABLE EXAMPLE(
    X VARCHAR(15),
    Y INT
);
Pb 3.3 – Solution to Q. 1

The command to add a tuple to our table is:

INSERT INTO EXAMPLE VALUES('Train', 4);

If we execute this command twice, then SQL is OK with it, and inserts the same tuple twice:

SELECT * FROM EXAMPLE;

Displays:

+-------+------+
| X     | Y    |
+-------+------+
| Train |    4 |
| Train |    4 |
+-------+------+

This is an illustration of the fact that the data in a table in SQL is not a set, as opposed to a state in a relation in the relational model.

Pb 3.3 – Solution to Q. 2

The command:

ALTER TABLE EXAMPLE ADD PRIMARY KEY (X);

Should return:

ERROR 1062 (23000): Duplicate entry 'Train' for key 'PRIMARY'

We tried to declare that X was a primary key, but SQL disagreed, since two rows have the same value for that attribute.

Pb 3.3 – Solution to Q. 3

Once the table is empty, X now qualifies as a candidate key, and can now be made a primary key. SQL stops complaining and lets us assign it as a primary key.

Pb 3.3 – Solution to Q. 4

After trying this insertion statement twice:

INSERT INTO EXAMPLE VALUES('Train', 4);

SQL refuses to insert the tuple after the second attempt:

ERROR 1062 (23000): Duplicate entry 'Train' for key 'PRIMARY'

Notice that this is exactly the same error message as before, when we tried to add the primary key while we had a duplicate row of tuples!


Solution to Problem 3.4 (Constraints on foreign keys)
  1. Removing the PRIMARY KEY constraint, SQL throws the following error message:

    ERROR 1005 (HY000): Can't create table `HW_FK_test`.`SOURCE` (errno: 150 "Foreign key constraint is incorrectly formed")
  2. Replacing PRIMARY KEY with UNIQUE does not generate any error messages.

  3. Replacing one of the VARCHAR(25) with CHAR(25) does not generate any error messages.

  4. Replacing VARCHAR(25) with INT results in this error message:

    ERROR 1005 (HY000): Can't create table `HW_FK_test`.`SOURCE` (errno: 150 "Foreign key constraint is incorrectly formed")
  5. Replacing one of the VARCHAR(25) with VARCHAR(15) does not generate any error messages.

  6. The remarks become:

    • The datatype of the foreign key has to be “compatible” with the datatype of the attribute to which we are referring.
    • The target of the foreign key must be the primary key or have the UNIQUE constraint.

Solution to Problem 3.5 (Revisiting the PROF table)
Pb 3.5– Solution to Q. 1

Ignoring the LECTURE relation, we have:

PROF(Login (PK), Name, Department (FK to DEPARTMENT.Code)) DEPARTMENT(Code (PK), Name, Head (FK to PROF.Login)) LECTURE (Code (PK), Year (PK), Name, Instructor (FK to PROF.Login) STUDENT(Login (PK), Name, Registered, Major (FK to DEPARTMENT.Code)) GRADE (Login (PK, FK to STUDENT.Login), Grade (PK), LectureCode (FK to LECTURE.Code), LectureYear (FK to LECTURE.Year))  

Pb 3.5– Solution to Q. 2

The code is straightforward:

CREATE TABLE HW_Lecture (
  NAME VARCHAR(25),
  Instructor VARCHAR(25),
  Year YEAR (4),
  Code CHAR(5),
  PRIMARY KEY (Year, Code),
  FOREIGN KEY (Instructor) REFERENCES PROF (LOGIN)
);

INSERT INTO HW_Lecture
VALUES (
  'Intro to CS',
  'caubert',
  2017,
  '1304'),
(
  'Intro
    to Algebra',
  'perdos',
  2017,
  '1405'),
(
  'Intro to
    Cyber',
  'aturing',
  2017,
  '1234');
HW_ProfExampleRevisitedRevisited.sql

However, this representation can not handle the following situations:

We come back to those short-coming in the “Reverse-Engineering” section, using more abstract tools (such as Entity Diagrams) that have not been introduced yet.

Pb 3.5– Solution to Q. 3

The statements are immediate:

DESCRIBE GRADE;

SELECT *
FROM GRADE;
HW_ProfExampleRevisitedRevisited.sql

What may be surprising is that the values for LectureCode and LectureYear are set to NULL in all the tuples.

Pb 3.5– Solution to Q. 4

We use UPDATE statements:

UPDATE
  GRADE
SET LectureCode = '1304',
  LectureYear = 2017
WHERE LOGIN = 'jrakesh'
  AND Grade = '2.85';

UPDATE
  GRADE
SET LectureCode = '1405',
  LectureYear = 2017
WHERE LOGIN = 'svlatka'
  OR (LOGIN = 'jrakesh'
    AND Grade = '3.85');

UPDATE
  GRADE
SET LectureCode = '1234',
  LectureYear = 2017
WHERE LOGIN = 'aalyx'
  OR LOGIN = 'cjoella';
HW_ProfExampleRevisitedRevisited.sql
Pb 3.5– Solution to Q. 5
We refer back to the solution to Q. 1.
Pb 3.5– Solution to Q. 6

We use SELECT statements:

SELECT LOGIN,
  Grade
FROM GRADE
WHERE Lecturecode = '1304'
  AND LectureYear = '2017';

SELECT DISTINCT Instructor
FROM HW_Lecture
WHERE Year = 2017;

SELECT Name,
  Grade
FROM STUDENT,
  GRADE
WHERE GRADE.LectureCode = 1405
  AND STUDENT.Login = GRADE.Login;

SELECT Year
FROM HW_Lecture
WHERE Code = '1234';

SELECT Name
FROM HW_Lecture
WHERE Year IN (
    SELECT Year
    FROM HW_Lecture
    WHERE CODE = '1234');

SELECT B.name
FROM STUDENT AS A,
  STUDENT AS B
WHERE A.Name = 'Ava Alyx'
  AND A.Registered > B.Registered;

SELECT COUNT(DISTINCT PROF.Name) AS 'Head Teaching This Year'
FROM HW_Lecture,
  DEPARTMENT,
  PROF
WHERE Year = 2017
  AND Instructor = Head
  AND Head = PROF.Login;
HW_ProfExampleRevisitedRevisited.sql
Solution to Problem 3.6 (TRAIN table and more advanced SQL coding)

The code below includes the answers to all of the questions for this problem:

-- Question 1:
CREATE TABLE TRAIN (
  Id VARCHAR(30) PRIMARY KEY, -- This line was changed.
  Model VARCHAR(30),
  ConstructionYear YEAR (4)
);

-- Question 2 :
CREATE TABLE CONDUCTOR (
  Id VARCHAR(20),
  NAME VARCHAR(20),
  ExperienceLevel VARCHAR(20)
);

ALTER TABLE CONDUCTOR
  ADD PRIMARY KEY (Id);

-- Question 3
CREATE TABLE ASSIGNED_TO (
  TrainId VARCHAR(20),
  ConductorId VARCHAR(20),
  Day DATE,
  PRIMARY KEY (TrainId, ConductorId),
  FOREIGN KEY (TrainId) REFERENCES TRAIN (Id), -- This line was changed
  FOREIGN KEY (ConductorId) REFERENCES CONDUCTOR (Id) -- This line was changed
);

-- Question 4:
/* 
 We insert more than one tuple, to make the SELECT statements that follow easier
 to test and debug.
 */
INSERT INTO TRAIN
VALUES (
  'K-13',
  'SurfLiner',
  2019),
(
  'K-12',
  'Regina',
  2015);

INSERT INTO CONDUCTOR
VALUES (
  'GP1029',
  'Bill',
  'Junior'),
(
  'GP1030',
  'Sandrine',
  'Junior');

INSERT INTO ASSIGNED_TO
VALUES (
  'K-13',
  'GP1029',
  DATE '2015/12/14'),
(
  'K-12',
  'GP1030',
  '20120909');

-- Question 5:
UPDATE
  CONDUCTOR
SET ExperienceLevel = 'Senior'
WHERE Id = 'GP1029';

-- Question 6:
-- 1.
SELECT Id
FROM TRAIN;

-- 2.
SELECT Name
FROM CONDUCTOR
WHERE ExperienceLevel = 'Senior';

-- 3.
SELECT ConstructionYear
FROM TRAIN
WHERE Model = 'SurfLiner'
  OR Model = 'Regina';

-- 4.
SELECT ConductorId
FROM ASSIGNED_TO
WHERE TrainId = 'K-13'
  AND Day = '2015/12/14';

-- 5.
SELECT Model
FROM TRAIN,
  ASSIGNED_TO
WHERE ConductorID = 'GP1029'
  AND TrainId = TRAIN.ID;
HW_Train.sql
Solution to Problem 3.7 (Read, correct, and write SQL statements for the COFFEE database)

Solution to Question 1:

COFFEE (Ref (PK), Origin, TypeOfRoast, PricePerPound) CUSTOMER (CardNo (PK), Name, Email, FavCoffee (FK to COFFEE.Ref)) SUPPLY (Provider (PK, FK to PROVIDEV.ID), Coffee (PK, FK to COFEE.Ref)) PROVIDER (Name (PK), Email)  

The answers to the rest of the questions are in the following code:

/* code/sql/HW_DBCoffee.sql */
-- Question 2:
START TRANSACTION;

INSERT INTO CUSTOMER
VALUES (
  005,
  'Bob Hill',
  NULL,
  001);

INSERT INTO COFFEE
VALUES (
  002,
  "Peru",
  "Decaf",
  3.00);

-- The following statement raises an error.
--		   INSERT INTO PROVIDER
--		     VALUES (NULL, "contact@localcof.com");
--		   ERROR 1048 (23000) at line 68: Column
--    'Name'
--	   cannot
--	      be
--		   null
INSERT INTO SUPPLY
VALUES (
  "Johns & Co.",
  121);

-- The following statement raises an error.
--		   -INSERT INTO SUPPLY
--		      VALUES ("Coffee Unl.", 311, 221);
--		    ERROR 1136 (21S01): Column count
-- doesn't
--       match
--	     value
--		   count at row 1
--		    Rest the changes:
ROLLBACK;

-- Question 3:
START TRANSACTION;

UPDATE
  CUSTOMER
SET FavCoffee = 001
WHERE CardNo = 001;

-- Rows matched: 1  Changed: 1  Warnings: 0
SELECT *
FROM CUSTOMER;

ROLLBACK;

START TRANSACTION;

UPDATE
  COFFEE
SET TypeOfRoast = 'Decaf'
WHERE Origin = 'Brazil';

-- Rows matched: 2  Changed: 2  Warnings: 0
SELECT *
FROM COFFEE;

ROLLBACK;

START TRANSACTION;

UPDATE
  PROVIDER
SET Name = 'Coffee Unlimited'
WHERE Name = 'Coffee Unl.';

-- Rows matched: 1  Changed: 1  Warnings: 0
SELECT *
FROM PROVIDER;

SELECT *
FROM SUPPLY;

ROLLBACK;

START TRANSACTION;

UPDATE
  COFFEE
SET PricePerPound = 10.00
WHERE PricePerPound > 10.00;

-- Rows matched: 1  Changed: 1  Warnings: 0
SELECT *
FROM COFFEE;

ROLLBACK;

-- Question 4:
START TRANSACTION;

DELETE FROM CUSTOMER
WHERE Name LIKE '%S%';

-- Query OK, 2 rows affected (0.01 sec)
SELECT *
FROM CUSTOMER;

ROLLBACK;

START TRANSACTION;

DELETE FROM COFFEE
WHERE Ref = 001;

-- Query OK, 1 row affected (0.00 sec)
SELECT *
FROM COFFEE;

SELECT *
FROM SUPPLY;

ROLLBACK;

START TRANSACTION;

DELETE FROM SUPPLY
WHERE Provider = 'Coffee Unl.'
  AND Coffee = '001';

-- Query OK, 1 row affected (0.00 sec)
SELECT *
FROM SUPPLY;

ROLLBACK;

START TRANSACTION;

DELETE FROM PROVIDER
WHERE Name = 'Johns & Co.';

-- Query OK, 1 row affected (0.00 sec)
SELECT *
FROM PROVIDER;

SELECT *
FROM SUPPLY;

ROLLBACK;

-- Question 5:
-- 1.
SELECT Origin
FROM COFFEE
WHERE TypeOfRoast = 'Dark';

-- 2.
SELECT FavCoffee
FROM CUSTOMER
WHERE Name LIKE 'Bob%';

-- 3.
SELECT Name
FROM PROVIDER
WHERE Email IS NULL;

-- 4.
SELECT COUNT(*)
FROM SUPPLY
WHERE Provider = 'Johns & Co.';

-- 5.
SELECT Provider
FROM COFFEE,
  SUPPLY
WHERE TypeOfRoast = 'Dark'
  AND Coffee = Ref;
HW_DBCoffee.sql
Solution to Problem 3.8 (Write select queries for the DEPARTMENT table)
SELECT EMPLOYEE.Name
FROM EMPLOYEE,
  DEPARTMENT
WHERE DEPARTMENT.Name = "Storage"
  AND EMPLOYEE.Department = DEPARTMENT.ID;
HW_Department.sql
SELECT Name
FROM EMPLOYEE
WHERE Hired <= ALL (
    SELECT Hired
    FROM EMPLOYEE
    WHERE Hired IS NOT NULL);
HW_Department.sql
SELECT EMPLOYEE.Name
FROM EMPLOYEE,
  DEPARTMENT
WHERE Hired <= ALL (
    SELECT Hired
    FROM EMPLOYEE
    WHERE Hired IS NOT NULL
      AND DEPARTMENT.Name = "Storage"
      AND EMPLOYEE.Department = DEPARTMENT.ID)
  AND DEPARTMENT.Name = "Storage"
  AND EMPLOYEE.Department = DEPARTMENT.ID;
HW_Department.sql
Solution to Problem 3.11 (Write select queries for a variation of the COMPUTER table)
SELECT Model
FROM COMPUTER
WHERE ID = 'A';

SELECT TYPE
FROM PERIPHERAL
WHERE ID = '14';

SELECT Model
FROM PERIPHERAL
WHERE TYPE = 'printer';

SELECT Model
FROM PERIPHERAL
WHERE Model LIKE 'IBM%';

SELECT Model
FROM PERIPHERAL,
  CONNEXION
WHERE Computer = 'A'
  AND Peripheral = PERIPHERAL.ID;

SELECT COUNT(Computer)
FROM CONNEXION,
  COMPUTER
WHERE Model = 'Apple IIc Plus'
  AND Computer = COMPUTER.ID;
HW_ComputerVariation.sql
Solution to Problem 3.10 (Write select queries for the SocialMedia schema)
/* code/sql/HW_SocialMedia.sql */
-- … the title of all the videos ("My first video!", "My
--		   second video!", "My vacations").
SELECT TITLE
FROM VIDEO;

-- … the release date of the video whose title is "My first
--		   video!" ("2020-02-02").
SELECT Released
FROM VIDEO
WHERE Title = "My first video!";

-- … the ID of the account(s) where the "Name" attribute
--		   was not given ("2").
SELECT ID
FROM ACCOUNT
WHERE Name IS NULL;

-- … the ID of the videos whose title contains the word
--		   "video" ("10", "20").
SELECT ID
FROM VIDEO
WHERE TITLE LIKE "%video%";

-- or
SELECT ID
FROM VIDEO
WHERE Title REGEXP 'video';

-- … the number of thumbs up for the video with title "My
--		   vacations" ("1").
SELECT COUNT(*)
FROM THUMBS_UP,
  VIDEO
WHERE VIDEO.Title = "My vacations"
  AND VIDEO.ID = THUMBS_UP.Video;

-- … the title of the oldest video ("My first video!").
SELECT Title
FROM VIDEO
WHERE Released <= ALL (
    SELECT Released
    FROM VIDEO);

-- or
SELECT Title
FROM VIDEO
WHERE Released = (
    SELECT Min(Released)
    FROM VIDEO);

-- or even
SELECT Title
FROM VIDEO
ORDER BY Released ASC
LIMIT 1;

-- … the names of the accounts who gave a thumbs up to the
--		   video with id 30 ("Bob Ross").
SELECT Name
FROM ACCOUNT,
  THUMBS_UP
WHERE THUMBS_UP.Video = 30
  AND THUMBS_UP.Account = ACCOUNT.ID;

-- … the ID of the account with the greatest number of
--		   subscribers ("2").
SELECT Subscribed
FROM SUBSCRIBE
GROUP BY Subscribed
ORDER BY COUNT(Subscriber) DESC
LIMIT 1;
HW_SocialMedia.sql
Solution to Problem 3.12 (Improving a role-playing game with a relational model)

The following solves all the issues with your friend’s code design. As quests only rarely provide a special item, we added a relation to avoid having a Special-item in the QUEST table since that would be NULL too often.

CLASS(Name (PK), Bonus, Element) CHARACTER(Name (PK), Class (FK to CLASS.Name), XP, LVL) WEAPON(Name (PK), Bonus, Possessed-By (FK to CHARACTER.Name)) QUEST(Name (PK), XP) COMPLETED-BY(Character (PK, FK to CHARACTER.Name), Quest (PK, FK to QUEST.Name)) SPECIAL-ITEM(Name (P), Quest (FK to QUEST.Name))


Solution to Problem 3.13 (A simple database for books)
Pb 3.13 – Solution to Q. 1

Here are possible ways of getting the required information:

  1. The Title of all the books:

    SELECT Title FROM BOOK;
  2. The distinct Name of the publishers.

    SELECT DISTINCT Name FROM PUBLISHER;
  3. The Titles and Published dates of the books published since January 31, 2012.

    SELECT Title, Published FROM BOOK
    WHERE Published > DATE'20120131';
  4. The first and last names of the authors published by "Gallimard" (from any city).

    SELECT FName, LName FROM AUTHOR, BOOK
    WHERE PublisherName = "Gallimard"
        AND Author = ID;
  5. The first and last names of the authors who were not published by an editor in "New-York".

    SELECT FName, LName FROM AUTHOR, BOOK
    WHERE NOT PublisherCity= "New-York"
        AND Author = ID;
  6. The ID of the authors who published a book whose name starts with "Where".

    SELECT Author FROM BOOK
    WHERE Title LIKE 'Where%';
  7. The total number of pages in the database.

    SELECT SUM(Pages) FROM BOOK;
  8. The number of pages in the longest book written by the author whose last name is "Wolve".

    SELECT MAX(PAGES) FROM BOOK, AUTHOR
    WHERE LName = "Wolve"
        AND Author = ID;            
  9. The title of the books published in the 19th century.

    SELECT Title FROM BOOK
    WHERE Published >= DATE'18010101' 
        AND Published <= DATE'19001231';
Pb 3.13 – Solution to Q. 2
We can use the following command:
UPDATE BOOK SET Title = "BANNED"
WHERE Author = 3;

The pair (title, publication date) is the primary key in the BOOK table, so if the author whose ID is 3 has published more than one book at a particular date, then our update will be rejected, as applying it would result in violating the entity integrity constraint.

Pb 3.13 – Solution to Q. 3
To delete the required rows, we can use:
DELETE FROM BOOK WHERE Author = 3;
DELETE FROM AUTHOR WHERE ID = 3;

Note that trying to delete the rows in the AUTHOR table before deleting the rows in the BOOK table could cause a referential integrity violation, since the BOOK table has a foreign key assigned to the AUTHOR table’s Id field.

Pb 3.13 – Solution to Q. 4
We could design that table as follows:
CREATE TABLE AWARD(
    Name VARCHAR(30),
    Year DATE,
    BookTitle VARCHAR(30),
    BookPubDate DATE,
    FOREIGN KEY (BookTitle, BookPubDate)
        REFERENCES BOOK(Title, Published),
    PRIMARY KEY (Name, Year)
);

Note that there is no need to store the name of the author in this relation: this information can be recovered by looking in the BOOK table for the name of the author of the awarded book.

Pb 3.13 – Solution to Q. 5
We obtain something as follows:

AUTHOR (FName, LName, ID (PK)) AWARD (Name (PK), Year (PK), BookTitle (FK to BOOK.Title), BookDate (FK to BOOK.Date)) PUBLISHER (Name (PK), City (PK))) BOOK (Title (PK), Pages, Published (PK), PublisherName (FK to PUBLISHER.Name), PublisherCity (FK to PUBLISHER.City), Author (FK to AUTHOR.ID)};  

Note that having two attributes as the primary key makes the referencing of foreign keys more cumbersome.

Pb 3.13 – Solution to Q. 6

Two of the flaws that come to mind are:

  1. The choice of the primary key for the BOOK relation: two books with the same title cannot be published on the same day, which is a serious limitation. Using a primary key like ISBN would be much more appropriate.
  2. This design makes it impossibile to deal with books written by multiple authors or published by multiple publishers. We could address this by having two separate tables, IS_THE_AUTHOR_OF and PUBLISHED_BY, that “maps” the book’s ISBN with author’s or editor’s primary key.

Solution to Problem 3.14 (A database for website certificates)

The solution can be read from the following code:

/* code/sql/HW_Certificate.sql */
DROP SCHEMA IF EXISTS HW_Certificate;

CREATE SCHEMA HW_Certificate;

USE HW_Certificate;


/*
SN = Serial Number
CN = Common Name
CA = Certificate Authority
 */
CREATE TABLE ORGANIZATION (
  SN VARCHAR(30) PRIMARY KEY,
  CN VARCHAR(30)
);

CREATE TABLE CA (
  SN VARCHAR(30) PRIMARY KEY,
  CN VARCHAR(30),
  Trusted BOOL
);

CREATE TABLE CERTIFICATE (
  SN VARCHAR(30) PRIMARY KEY,
  CN VARCHAR(30) NOT NULL,
  Org VARCHAR(30) NOT NULL,
  Issuer VARCHAR(30),
  Valid_Since DATE,
  Valid_Until DATE,
  FOREIGN KEY (Org) REFERENCES ORGANIZATION (SN) ON DELETE CASCADE,
  FOREIGN KEY (Issuer) REFERENCES CA (SN)
);

INSERT INTO ORGANIZATION
VALUES (
  '01',
  'Wikimedia Foundation'),
(
  '02',
  'Free
    Software Foundation');

INSERT INTO CA
VALUES (
  'A',
  "Let's Encrypt",
  TRUE),
(
  'B',
  'Shady Corp.',
  FALSE),
(
  'C',
  'NewComer Ltd.',
  NULL);

INSERT INTO CERTIFICATE
VALUES (
  'a',
  '*.wikimedia.org',
  '01',
  'A',
  20180101,
  20200101),
(
  'b',
  '*.fsf.org',
  '02',
  'A',
  20180101,
  20191010),
(
  'c',
  '*.shadytest.org',
  '02',
  'B',
  20190101,
  20200101),
(
  'd',
  '*.wikipedia.org',
  '01',
  'C',
  20200101,
  20220101);

-- CN of all certificates.
SELECT CN
FROM CERTIFICATE;

-- (*.wikimedia.org | *.fsf.org | *.shadytest.org |
--		   *.wikipedia.org)
--		    The SN of the organizations whose CN
--      contains
--		   "Foundation"
SELECT SN
FROM ORGANIZATION
WHERE CN LIKE "%Foundation%";

-- (01 | 02)
--		    The CN and expiration date of all the
--	   certificates
--		that
--		   expired (assuming we are the 6th of
--    December
--	   2019).
SELECT CN,
  Valid_Until
FROM CERTIFICATE
WHERE Valid_Until < DATE '20191206';

-- (*.fsf.org,  2019-10-10)
--		    The CN of the CA that are not trusted.
SELECT CN
FROM CA
WHERE Trusted IS NOT TRUE;

-- (Shady Corp. |  NewComer Ltd.)
--		    The CN of the certificates that are
--   signed
--      by
--	a
--	   CA
--		that
--		   is not trusted.
SELECT CERTIFICATE.CN
FROM CERTIFICATE,
  CA
WHERE Trusted IS NOT TRUE
  AND CA.SN = CERTIFICATE.Issuer;

-- (Shady Corp. | NewComer Ltd.)
--		    The number of certificates signed by
-- the
--    CA
--	  whose
--	     CN
--		is
--		   "Let's encrypt".
SELECT COUNT(CERTIFICATE.SN) AS "Number of certificates signed
    by Let's encrypt"
FROM CERTIFICATE,
  CA
WHERE CERTIFICATE.Issuer = CA.SN
  AND CA.CN = "Let's encrypt";

-- (2)
--		    A table listing the CN of the
--    organizations
--	  along
--	       with
--		   the CN of their certificates.
SELECT ORGANIZATION.CN AS Organization,
  CERTIFICATE.CN AS Certificate
FROM ORGANIZATION,
  CERTIFICATE
WHERE CERTIFICATE.Org = ORGANIZATION.SN;

-- ( Wikimedia Foundation,  *.wikimedia.org | Free Software
--		   Foundation, *.fsf.org | Free Software
--	Foundation
--	  ,
--		   *.shadytest.org | Wikimedia Foundation ,
--		*.wikipedia.org
--		  )
/* 
DELETE FROM CA WHERE SN = 'A';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`HW_Certificate`.`CERTIFICATE`, CONSTRAINT `CERTIFICATE_ibfk_2` FOREIGN KEY (`Issuer`) REFERENCES `CA` (`SN`))

=> Rejected, because an entry in CERTIFICATE references this tuple (referential integrity constraint).

UPDATE ORGANIZATION SET CN = "FSF" WHERE SN = '02';
Query OK, 1 row affected (0.008 sec)
Rows matched: 1  Changed: 1  Warnings: 0

=> Ok, change 
('02', 'Free Software Foundation');
into
('02', 'FSF');
in ORGANIZATION

MariaDB [HW_Certificate]> UPDATE ORGANIZATION SET SN = "01" WHERE SN = '02';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`HW_Certificate`.`CERTIFICATE`, CONSTRAINT `CERTIFICATE_ibfk_1` FOREIGN KEY (`Org`) REFERENCES `ORGANIZATION` (`SN`) ON DELETE CASCADE)

=> Rejected, because an entry in CERTIFICATE references this tuple (referential integrity constraint). 
This query would have been rejected even if this tuple was not referenced, since it would have violated the entity integrity constraint.

DELETE FROM ORGANIZATION;

=> Deletes all the content of organization and of certificate.
 */
HW_Certificate.sql
Solution to Problem 3.15 (A simple database for published pieces of work)
Pb 3.15 – Solution to Q. 1

The relational model for this code is:

WORK(Title (PK), Author (FK to AUTHOR.Name)) AUTHOR(Name (PK), Email) BOOK(ISBN (PK), Work (FK to WORK.Title), Published, Price) EBOOK(ISBN (PK), Work (FK to WORK.Title), Published, Price)  

Pb 3.15 – Solution to Q. 2

The solution to the next questions can be read from the following code:

/* code/sql/HW_Work.sql */
/*
 Determine if the following insertion statements would violate the the Entity integrity constraint,
 the Referential integrity constraint, if there would be some Other kind of error, or if it would
 result in uccessful insertion.
 */
START TRANSACTION;

-- We don't want to perform the actual insertions.
INSERT INTO EBOOK
VALUES (
  0,
  NULL,
  20180101,
  0);


/*
 Query OK, 1 row affected (0.003 sec)
 So, "Successful insertion".
 */
-- The following statement raises an error.
-- INSERT INTO AUTHOR VALUES ("Mary B.", "mb@fai.fr", NULL);
/*
 ERROR 1136 (21S01): Column count doesn't match value count at row 1
 So, "Other kind of error".
 */
-- The following statement raises an error.
-- INSERT INTO WORK VALUES ("My Life", "Claude A.");
/*
 ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
 (`HW_EXAM_1`.`WORK`, CONSTRAINT `WORK_ibfk_1` FOREIGN KEY (`Author`) REFERENCES `AUTHOR` (`Name`)
 ON DELETE CASCADE ON UPDATE CASCADE)
 So, "Referential integrity constraint"
 */
INSERT INTO BOOK
VALUES (
  00000000,
  NULL,
  DATE '20001225',
  90.9);


/*
 Query OK, 1 row affected (0.000 sec)
 So, "Successful insertion".
 */
-- The following statement raises an error.
--  INSERT INTO AUTHOR VALUES ("Virginia W.", "alt@isp.net");
/*
 ERROR 1062 (23000): Duplicate entry 'Virginia W.' for key 'PRIMARY'
 So, "Entity integrity constraint".
 */
ROLLBACK;

-- We go back to the previous state.
/*
 List the rows (i.e., A.2, W.1, etc.) modified by the following statements
 (be careful about the conditions on foreign keys!):
 */
START TRANSACTION;

-- We don't want to perform the following operations.
UPDATE
  AUTHOR
SET Email = 'Deprecated'
WHERE Email LIKE '%isp.net';

/*
 Query OK, 2 rows affected (0.010 sec)
 Rows matched: 2  Changed: 2  Warnings: 0
 This changed A.1 and A.2
 */
UPDATE
  WORK
SET Title = "How to eat"
WHERE Title = "What to eat";


/*
 Rows matched: 1  Changed: 1  Warnings: 0
 SQL returns only the number of row changed in the WORK table,
 but other rows have been changed as well.
 This changed W.1, B.1, E.1.
 */
-- The following statement raises an error.
-- DELETE FROM WORK;
/*
 ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
 (`HW_EXAM_1`.`BOOK`, CONSTRAINT `BOOK_ibfk_1` FOREIGN KEY (`Work`) REFERENCES `WORK` (`Title`) ON UPDATE CASCADE)
 Does not change any row.
 */
-- The following statement raises an error.
--  DELETE FROM AUTHOR WHERE Name = "Virginia W.";
/*
 ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
 (`HW_EXAM_1`.`BOOK`, CONSTRAINT `BOOK_ibfk_1` FOREIGN KEY (`Work`) REFERENCES `WORK` (`Title`) ON UPDATE CASCADE)
 Does not change any row.
 */
ROLLBACK;

-- We go back to the previous state.
-- You can now assume that there is more data than
-- what we inserted, if that helps you. Write a
-- command that selects …

--  We insert some dummy values for this
--  next part.
INSERT INTO WORK
VALUES (
  "My Life",
  "Paul B."),
(
  "What to eat, 2",
  "Virginia W.");

INSERT INTO BOOK
VALUES (
  15355627,
  "My Life",
  DATE '20180219',
  15.00),
(
  12912912,
  "What to eat, 2",
  DATE '20200101',
  13);

INSERT INTO EBOOK
VALUES (
  15150628,
  "My Life",
  DATE '20190215',
  10.89),
(
  42912912,
  "What to eat, 2",
  DATE '20200115',
  12);

-- … the price of all the ebooks.
SELECT Price
FROM EBOOK;

-- … the (distinct) names of the authors who have authored
-- a piece of work.
SELECT DISTINCT Author
FROM WORK;

-- … the name of the authors using fai.fr for their email.
SELECT Name
FROM AUTHOR
WHERE Email LIKE '%fai.fr';

-- … the price of the ebooks published after 2018.
SELECT Price
FROM BOOK
WHERE Published >= 20180101;


/*
 Note that
 SELECT Price FROM BOOK WHERE Published > 2018;
 would return all the prices, along with a warning:
 Incorrect datetime value: '2018'
 */
-- … the price of the most expensive book.
SELECT MAX(Price)
FROM BOOK;

-- … the number of pieces of work written by the author
-- whose name is “Virginia W.”.
SELECT COUNT(*)
FROM WORK
WHERE WORK.Author = "Virginia W.";

-- … the email of the author who wrote the piece of work
-- called “My Life”.
SELECT Email
FROM AUTHOR,
  WORK
WHERE WORK.Title = "My Life"
  AND WORK.Author = AUTHOR.Name;

-- the isbn(s) of the book containing a work written by the
-- author whose email is "vw@isp.net".
SELECT ISBN
FROM BOOK,
  WORK,
  AUTHOR
WHERE AUTHOR.Email = "vw@isp.net"
  AND WORK.Author = AUTHOR.Name
  AND BOOK.Work = WORK.Title;


/*
 Write a command that updates the title of all the pieces of work written by the author whose name is “Virginia W. to”BANNED".
 Is there any reason for this command to be rejected by the system? If yes, explain which one.
 */
-- The following statement raises an error.
/*
UPDATE
 WORK
SET
 Title = "BANNED"
WHERE
 Author = "Virginia W.";
 */
/*
 Gives an error, since "Title" is the primary key in the WORK table, and Virginia W. has authored two pieces of work or more,
 they are both given the title "BANNED", which violates the unicity of value in primary keys.
 */
-- Write one or multiple commands that would delete the work
-- whose title is “My Life”, as well as
--    all
--       of
--	   the
--		 books
-- and ebooks versions of it.
--  The following statement raises an
-- error.
-- DELETE FROM WORK
-- WHERE Title = "My Life";
/*
 Fails
 ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
 (`HW_EXAM_1`.`BOOK`, CONSTRAINT `BOOK_ibfk_1` FOREIGN KEY (`Work`) REFERENCES `WORK` (`Title`) ON UPDATE CASCADE)
 */
-- We have to first delete the corresponding publications:
DELETE FROM BOOK
WHERE WORK = "My Life";

DELETE FROM EBOOK
WHERE WORK = "My Life";

-- And then we can delete the work:
DELETE FROM WORK
WHERE Title = "My Life";


/*
 And, no, we cannot delete "simply" from multiple tables in one command.
 Some workaround exists, cf. https://stackoverflow.com/q/1233451/ .
 */
HW_Work.sql
Pb 3.15 – Solution to Q. 3

Finally, to answer the last question, here is a list of the possible limitations:

  1. Having the name or the title as a primary key (in the AUTHOR and WORK tables) is not a good idea: we cannot have two authors with the same name or two pieces of work with the same title!
  2. If all the attributes in the BOOK and the EBOOK tables are going to be the same, then we should probably have only one table called PUBLICATION with a boolean to indicate whenever the publication is digital or on paper.
  3. Having a mix of ON DELETE CASCADE and ON DELETE RESTRICT is not really justified and makes the tables harder to use. We should have used the same update policy on both tables.

Solution to Problem 3.16 (A simple database for authors of textbooks)
The answers can be found in the following snippet:
/*
 code/sql/HW_TEXTBOOK_AUTHORED_SOL.sql
 */
/*
 EXERCISE 1

 Write a command that updates the email address of 'Gaddis', 'Tony' to "tgaddis@pearson.com"
 */
UPDATE
  AUTHOR
SET Email = "tgaddis@pearson.com"
WHERE LName = 'Gaddis'
  AND FName = 'Tony';


/*
 You can use
 SELECT * FROM AUTHOR;
 to check that the modification took place.
 */
/*
 EXERCISE 2

 Write a command that inserts the textbook of your choice in the
 TEXTBOOK table. No value should be NULL, but you can invent
 the values.
 */
INSERT INTO TEXTBOOK
VALUES (
  'Fundamentals of Database Systems',
  9780133970777,
  165.89);


/*
 You can use
 SELECT * FROM TEXTBOOK;
 to check that the insertion was correctly made.
 */
/*
 EXERCISE 3

 Write a command that makes 'Gaddis', 'Tony' the author of the
 textbook you just added to our database.
 */
INSERT INTO AUTHORED
VALUES (
  9780133970777,
  'Gaddis',
  'Tony');


/*
 You can use
 SELECT * FROM AUTHORED;
 to check that the insertion was correctly made.


 EXERCISE 4

 Write a command that makes "0.01" becomes the
 default value for the Price attribute of the
 TEXTBOOK relation.
 */
ALTER TABLE TEXTBOOK
  ALTER COLUMN Price SET DEFAULT 0.01;


/*
 You can use
 DESCRIBE TEXTBOOK;
 to check that the Price attribute now has a default
 value.


 EXERCISE 5

 Write a command that insert a textbook of
 your choice in the TEXTBOOK table, with the
 price set to the default value.
 */
INSERT INTO TEXTBOOK
VALUES (
  'Proof Theory',
  9780486490731,
  DEFAULT);


/*
 You can use
 SELECT * FROM TEXTBOOK;
 to check that the insertion was correctly made.


 EXERCISE 6

 Write a command that creates a table called EDITOR
 with 3 attributes, "Name", "Address" and "Website".
 The "Name" attribute should be the primary key.
 Then, insert two tuples in the EDITOR table, one
 should have the "Name" attribute set to "Pearson".
 */
CREATE TABLE EDITOR (
  NAME VARCHAR(30) PRIMARY KEY,
  Address VARCHAR(255),
  Website VARCHAR(100)
);

INSERT INTO EDITOR
VALUES (
  'Pearson',
  NULL,
  'http://pearsoned.com/'),
(
  'Dover',
  NULL,
  'https://store.doverpublications.com/');


/*
 You can use
 DESCRIBE EDITOR;
 to check that the table was actually created, and
 SELECT * FROM EDITOR;
 to check that the values were inserted.


 EXERCISE 7

 Write a command that creates a table called PUBLISHED
 with 2 attributes, "Editor", and "Textbook".
 The "Editor" attribute should references the EDITOR
 table, and the "Textbook" attribute should reference
 the TEXTBOOK table.
 */
CREATE TABLE PUBLISHED (
  Editor VARCHAR(30),
  FOREIGN KEY (Editor) REFERENCES EDITOR (NAME),
  Textbook CHAR(13),
  FOREIGN KEY (Textbook) REFERENCES TEXTBOOK (ISBN)
);


/*
 You can use
 DESCRIBE PUBLISHED;
 to check that the table was actually created.

 EXERCISE 8

 Write a command that makes "Pearson" the editor of
 the textbook whose ISBN is 9780133776744.
 */
INSERT INTO PUBLISHED
VALUES (
  "Pearson",
  9780133776744);


/*
 You can use
 SELECT * FROM PUBLISHED;
 to check that the table was actually created.


 EXERCISE 9

 Answer the following short questions. In our model, as it is, …

 Can an author have authored more than one textbook?
 Yes.

 Can a textbook have more than one author?
 Yes.

 Can a textbook without ISBN be inserted in the TEXTBOOK relation?
 No, unless you create a "dummy" (fake) value for it,
 like 0000000000000, but this value can be used only
 once, since ISBN is the primary key.

 Can the price of a textbook be negative?
 Yes. We can actually test it:
 INSERT INTO TEXTBOOK VALUES ("Test", 0000000000000, -1);

 Can two author have the same first and last name?
 No. The query:
 INSERT INTO AUTHOR VALUES ('Smith', 'Bob', NULL), ('Smith', 'Bob', NULL);
 returns
 ERROR 1062 (23000): Duplicate entry 'Smith-Bob' for key 'PRIMARY'

 Can two textbooks have the same title?
 Yes, as long as they have different ISBN. The command
 INSERT INTO TEXTBOOK VALUES ("Test", 0000000000001, NULL), ("Test", 0000000000002, NULL);
 is processed just fine.

 Can two editiors have the same address?
 Yes. The command:
 INSERT INTO EDITOR VALUES ("Test 1", "123 Main St.", NULL), ("Test 2", "123 Main St.", NULL);
 is processed just fine.
HW_TextbookAuthoredSol.sql
Solution to Problem 3.17 (A simple database for capstone projects)
The answers can be found in the following snippet:
/*
code/sql/HW_CapstoneSol.sql
 */
/*

I. Short Questions (6 pts)

Answer the following short questions based on the model implemented above.
You can simply answer "True" or "False", or justify your reasoning (e.g. with code).
 */
-- 1. Can a project uses multiple programming languages?
--		Yes.
--       2. Can a student be the leader of multiple
--	  projects?
--		Yes.
--       3. Can multiple projects have the same code name?
--		Yes.
--       4. Could Claude simply enter NULL for the value
--	  of his pproject's code name, since he's undecided?
--       No.
--       5. Can a project be created without project
--	  leader?
--       No.
--      6. Can we know who is working on a project
--	 without being its leader?
--       No.
/*

II. Relational Model (6 pts.)

Draw the relational model corresponding to this code.
You can hand-draw it and join a scan or a picture, or simply hand me back the sheet where you drew it.
 */
/*

III. Simple Commands (8 pts.)

Below, you are asked to write commands that perform various actions.
Please, leave them uncommented, unless you can't write them correctly, in which case it's ok to leave them commented.
The first question is answered as an example.
 */
-- 0. Write a command that list all the names of the
--	       programming languages.
SELECT Name
FROM PROGRAMMING_LANGUAGE;

-- 1. Write a command that insert a new student in the
--	       STUDENT table.
--		(You should invent the values).
INSERT INTO STUDENT
VALUES (
  "Bob",
  "0987654321234",
  NULL,
  NULL);

-- 2. Write a command that updates the code name of the
--	  project ("Undecided", "9999999999999") to "VR in
--	  ER".
UPDATE
  PROJECT
SET CodeName = "VR in ER"
WHERE CodeName = "Undecided"
  AND Leader = "9999999999999";

-- 3. Write a command that updates the graduation year of the
--	  student whose id is "0987654321098" to 2024, and
--	  the semester to "Fall".
UPDATE
  STUDENT
SET GraduationYear = 2024,
  GraduationSemester = "Fall"
WHERE id = "0987654321098";

-- 4. Write a command that changes the STUDENT table to make
--	  it impossible to enter NULL for the first name of
--	  a student, without changing the primary key.
ALTER TABLE STUDENT MODIFY FName VARCHAR(50) NOT NULL;

-- 5. Write a command that changes the datatype of
--	  GraduationYear to SMALLINT.
ALTER TABLE STUDENT MODIFY GraduationYear SMALLINT;

-- 6. Write a command that adds an attribute "ReleaseDate" to
--	  the PROJECT table.
ALTER TABLE PROJECT
  ADD COLUMN ReleaseDate DATE;

-- 6.bis If you managed to write the previous command
--	 correctly, write a command that sets the release
--	 date of the project ("Brick Break",
--  "0123456789100")
--     to
--	 the 26th of November 2022.
UPDATE
  PROJECT
SET ReleaseDate = DATE "20221126"
WHERE CodeName = "Brick Break"
  AND Leader = "0123456789100";

-- 7. Write a command that makes it impossible for a student
--	  to be the leader in more than one project
--	  (This command should return an error)
--	  ALTER TABLE PROJECT ADD UNIQUE (Leader);
HW_CapstoneSol.sql
Solution to Problem 3.18 (A simple database for vaccines)
The answers can be found in the following snippet:
/* code/sql/HW_VaccineSol.sql */
/*

I. Short Questions (3 pts.)

Answer the following short questions. In our implementation…

1. … can two companies have exactly the same name?

No, as COMPANY.Name is the only attribute in the primary key of COMPANY.

2. … can two companies have the same website?

Yes, nothing prevents it.

3. … can a company not have a website?

Yes, the domain of COMPANY.Website is "VARCHAR(255)", without a constraint preventing it from being "NULL".

4. … can the same vaccine be manufactured by multiple companies?

No, as VACCINE.Manufacturer is an attribute in VACCINE that accepts only one value.

5. … can a vaccine not have a manufacturer?

No, as VACCINE.Manufacturer bears the "NOT NULL" constraint.

6. … can a disease being neither communicable nor not communicable?

Yes, as DISEASE.Communicable is of type "BOOL", it accepts the "NULL" value.

7. … can the same vaccine have different efficacies for different diseases?

Yes, the EFFICACY table has for primary key VaccineName and DiseaseName, which implies that the same vaccine can occur repeatedly as long as it is associated with different diseases.
 */
/*

II. Longer Questions (6 pts.)

Answer the following questions:

1. What does `CHECK (Website LIKE "https://*")` do?

It refrains any value not starting with  "https://" to be inserted as a value for the COMPANY.Website attribute.
Note that in particular it forbids a website from not being secured (that is, http:// is not a valid protocol).

2. Why did we picked the `DECIMAl(5,2)` datatype?

It is the appropriate datatype to represent percentage values represented as ranging from 100.00 to 0.00.
The discussion at https://stackoverflow.com/a/2762376/ also highlights that percent can be represented as decimal(5,4) with a check to insure that the value will range between 1.0000 and 0.0000.

3. What is the benefit / are the benefits of having a separate EFFICACY table over having something like

CREATE TABLE VACCINE(
 Name VARCHAR(50) PRIMARY KEY,
 Manufacturer VARCHAR(50),
 Disease VARCHAR(50),
 Efficacy DECIMAl(5,2),
 FOREIGN KEY (Manufacturer) REFERENCES COMPANY (Name)
);

?

This implementation does not allow to record that the same vaccine can have different efficacies for different diseases.
Stated differently, it forbids to represent vaccines efficient against multiple diseases faitfully.
 */
/*

III. Relational Model (6 pts.)

Draw the relational model corresponding to this code.
You can hand-draw it and join a scan or a picture, or simply hand me back a sheet.
 */
/*

IV. Simple Commands (5 pts.)

Below, you are asked to write commands that perform various actions.
Please, leave them uncommented, unless
 - you can not write them correctly, but want to share your attempt,
 - it is specified that it should return an error.

The first question is answered as an example.
 */
-- 0. Write a command that list the names of
--	all the diseases.
SELECT Name
FROM DISEASE;

-- 1. Write a command that insert "Pfizer" in the
--	 COMPANY table (you can make up the website or look
--    it)
INSERT INTO COMPANY
VALUES (
  "Pfizer",
  "https://www.pfizer.com/");

--  2. Write a command that insert the "Pfizer-BioNTech
--	  COVID-19 Vaccine" in the VACCINE table, and a
--    command
--	  that store the efficacy of that vaccine against
--	  the "Coronavirus disease 2019" disease
--	 ( you can make up the values or look them up).
INSERT INTO VACCINE
VALUES (
  "Pfizer-BioNTech COVID-19 Vaccine",
  "Pfizer");

INSERT INTO EFFICACY
VALUES (
  "Coronavirus disease 2019",
  "Pfizer-BioNTech COVID-19 Vaccine",
  89);

--  3. Write a command that updates the name of the
--	  company "Moderna" to "Moderna, Inc." everywhere.
UPDATE
  COMPANY
SET Name = "Moderna, Inc."
WHERE Name = "Moderna";

--  4. Write a command that lists the name of all the
--	  companies.
SELECT Name
FROM COMPANY;

--  5. Write a command that deletes the "Coronavirus disease
--	  2019" entry from the DISEASE table (if only!).
/*
DELETE FROM DISEASE
WHERE Name = "Coronavirus disease 2019";
 */
--  This command should return an error. Explain it and leave
--       the command commented.
--     The "Coronavirus disease 2019" value in DISEASE.Name
-- is
--    refereed to by two entries in the EFFICACY table.
--     As the foreign key from EFFICACY.DiseaseName to
--    DISEASE.Name does not specify its policy "ON DELETE",
--  its
--    default behavior is to restrict deletion, causing the
--    error.
--       6. Write two commands: one that adds "physiological"
--   to
--	  the possible types of diseases, and one that
--   inserts
--	  a physiological disease in the DISEASE table.
ALTER TABLE DISEASE MODIFY TYPE ENUM ("infectious",
  "deficiency", "hereditary", "physiological");

INSERT INTO DISEASE
VALUES (
  "Asthma",
  FALSE,
  "physiological");

--  7 (difficult). Write a command that return the list of
--		     all the companies that manufacture a
--		     vaccine against "Coronavirus disease
--    2019".
SELECT VACCINE.Manufacturer
FROM VACCINE,
  EFFICACY
WHERE VACCINE.Name = EFFICACY.VaccineName
  AND EFFICACY.DiseaseName = "Coronavirus disease 2019";
HW_VaccineSol.sql
Solution to Problem 3.19 (A database for residencies)

The file code/sql/HW_ResidencySol.sql contains the solution to the code part of this problem.

Pb 3.19 – Solution to Q. 1
The relational model is:

PERSON(FName, LName, SSN (PK), Birthdate) HOUSE(Address (PK), Color) RESIDENCY(Person (FK to PERSON.SSN), House (FK to HOUSE.Address), PrincipalResidence, Status)  

Pb 3.19 – Solution to Q. 2

To violate the entity integrity constraint, it suffices to insert a tuple with NULL as a value for one of the attributes of a primary key or to insert a value that was already inserted.

Two examples are:

INSERT INTO PERSON VALUES ("Bob", "Ross", NULL, DATE"1942-10-29");

which would return ERROR 1048 (23000): Column 'SSN' cannot be null.

INSERT INTO HOUSE VALUES ("123 Main St.", "green");

which would return ERROR 1062 (23000): Duplicate entry '123 Main St.' for key 'PRIMARY'.

Pb 3.19 – Solution to Q. 3

To violate the referential integrity constraint, it suffices to insert a tuple where the value for one of the attributes of a foreign key does not exist in the referenced table.

For instance,

INSERT INTO RESIDENCY VALUES ("999-99-9999", NULL, NULL, NULL);

would return

ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_Residency_SOL`.`RESIDENCY`, CONSTRAINT `RESIDENCY_ibfk_1` FOREIGN KEY (`Person`) REFERENCES `PERSON` (`SSN`))

Since there is no row in the PERSON table with the value "999-99-9999" for SSN.

Pb 3.19 – Solution to Q. 4

The answers can be found in the following snippet:

/*
 In the following we use transactions
 to be able to simulate the "what if"
 aspect of the questions: we will not
 commit the changes we are testing,
 and roll back on them before moving to
 the next question.
 */
-- Exercise 4
--		    List the rows (i.e., P.2, H.1, or even
--	   “none”)
--		   modified by the following statements:
START TRANSACTION;

UPDATE
  HOUSE
SET COLOR = "green";

-- H.1, H.2 and H.3
ROLLBACK;

START TRANSACTION;

DELETE FROM RESIDENCY
WHERE House LIKE "1%";

-- R.1,and R.3
ROLLBACK;

START TRANSACTION;

DELETE FROM HOUSE
WHERE Address = "456 Second St.";

-- H.2, R.2 and R.4 (because of the foreign key).
ROLLBACK;

START TRANSACTION;

-- Commented, because it causes an error.
--		    DELETE FROM PERSON
--		    WHERE Birthdate = DATE "1990-02-11";
--		    None, because of the foreign key and
-- the
--	     referential
--		   integrity constraint.
--		    ERROR 1451 (23000): Cannot delete or
--    update
--      a
--	     parent
--		  row:
--		   a foreign key constraint fails
--		   (`HW_RESIDENCY_SOL`.`RESIDENCY`,
--   CONSTRAINT
--		   `RESIDENCY_ibfk_1` FOREIGN KEY
-- (`Person`)
--	    REFERENCES
--		   `PERSON` (`SSN`))
ROLLBACK;
HW_ResidencySol.sql
Pb 3.19 – Solution to Q. 5

The answers can be found in the following snippet:

-- Exercise 5
/* Write a query that selects …
 … the addresses of the houses in the system (11 Third St., 123 Main St., 456 Second St.).
 */
SELECT Address
FROM HOUSE;

-- … the SSN of the persons whose first name was not
--		   entered in the system (000-00-0000).
SELECT SSN
FROM PERSON
WHERE FName IS NULL;

-- … all the different colors of houses (white, blue).
SELECT DISTINCT COLOR
FROM HOUSE;

-- … the address of the residency of James Baldwin (123
--		   Main St.).
SELECT House
FROM RESIDENCY,
  PERSON
WHERE PERSON.Fname = "James"
  AND PERSON.LName = "Baldwin"
  AND PERSON.SSN = RESIDENCY.Person;

-- … the first name of the oldest person in the database
--		   (James).
SELECT FName
FROM PERSON
WHERE Birthdate = (
    SELECT MIN(Birthdate)
    FROM PERSON
    WHERE Birthdate IS NOT NULL);

-- … Michael Keal’s principal residency address (123 Main
--		   St.).
SELECT RESIDENCY.House
FROM RESIDENCY,
  PERSON
WHERE PERSON.FName = "Michael"
  AND PERSON.LName = "Keal"
  AND PERSON.SSN = RESIDENCY.Person
  AND RESIDENCY.PrincipalResidence = TRUE;

-- … the (distinct) first and last names of the homeowners
--		   (Michael Keal, Mridula Warrier).
SELECT DISTINCT (PERSON.FName),
  PERSON.LName
FROM PERSON,
  RESIDENCY
WHERE RESIDENCY.Status = "own"
  AND RESIDENCY.Person = PERSON.SSN;

-- cf comment at snippet homonyms
SELECT PERSON.FName,
  PERSON.LName
FROM PERSON
WHERE SSN IN ( SELECT DISTINCT (RESIDENCY.Person)
    FROM RESIDENCY
    WHERE RESIDENCY.Status = "own");

-- … the SSN of the persons that have the same principal
--		   residency as James Baldwin
-- (000-00-0001).
SELECT RoomMate.Person
FROM RESIDENCY AS James,
  RESIDENCY AS RoomMate,
  PERSON
WHERE PERSON.FName = "James"
  AND PERSON.LName = "Baldwin"
  AND PERSON.SSN = James.Person
  AND James.House = RoomMate.House
  AND NOT James.Person = RoomMate.Person
  AND RoomMate.PrincipalResidence = TRUE;
HW_ResidencySol.sql

Note that the query that returns the name of the homeowners can be improved.

--		    If we have homonymns in our database,
--   e.g.
INSERT INTO PERSON
VALUES (
  "A",
  "B",
  "000-00-0010",
  NULL),
(
  "A",
  "B",
  "000-00-0011",
  NULL);

INSERT INTO HOUSE
VALUES (
  "H",
  NULL);

-- H.3
INSERT INTO RESIDENCY
VALUES (
  "000-00-0010",
  "H",
  TRUE,
  "own"),
(
  "000-00-0011",
  "H",
  TRUE,
  "own");

-- Then the query below fails, in the sense that it reports
--		   the name "A, B" only once.
SELECT DISTINCT (PERSON.FName),
  PERSON.LName
FROM PERSON,
  RESIDENCY
WHERE RESIDENCY.Status = "own"
  AND RESIDENCY.Person = PERSON.SSN;

-- A better (and not much more complicated) solution would
--		   have been
SELECT PERSON.FName,
  PERSON.LName
FROM PERSON
WHERE SSN IN ( SELECT DISTINCT (RESIDENCY.Person)
    FROM RESIDENCY
    WHERE RESIDENCY.Status = "own");
HW_ResidencySol.sql
Pb 3.19 – Solution to Q. 6

To update the SSN of "James Baldwin" to "000-00-0010", we could use:

UPDATE PERSON SET SSN = "000-00-0010" WHERE FName = "James" AND LName = "Baldwin";

However, this command would be rejected because of the foreign key constraint. On UPDATE, the foreign key from RESIDENCY.Person to PERSON.SSN restricts by default. The error would be:

ERROR 1451 (23000) at line 75: Cannot delete or update a parent row: a foreign key constraint fails (`HW_Residency_SOL`.`RESIDENCY`, CONSTRAINT `RESIDENCY_ibfk_1` FOREIGN KEY (`Person`) REFERENCES `PERSON` (`SSN`))
Pb 3.19 – Solution to Q. 7

In our model, as it is currently,

  1. It is possible for two people to have the same last name.
  2. It is possible for a person to have multiple principal residencies.
  3. It is not possible for a house to not be yellow.
  4. It is possible for the SSN to be any series of 11 characters.
  5. It is possible for a person to own any number of houses.
  6. It is possible for a person to rent any number of houses.
Pb 3.19 – Solution to Q. 8

Considering the given state for the RESIDENCY table, the following two are possible primary keys:

  1. Person and PrincipalResidence
  2. Person and House
Pb 3.19 – Solution to Q. 9

The first key would not accomodate a person with multiple secondary residencies, which is not a good thing. The second key could make sense, since it would refrain a person from declaring the same address twice as their residency. The only case that could be hard to work around is if a person was trying to own multiple units at the same address; however, this is more an issue with the primary key of HOUSE than an issue with the primary key we suggested for RESIDENCY.


Solution to Problem 3.20 (A database for research fundings)
(Some of) the answers can be found in the following snippet:
/* code/sql/HW_ScientificResearchSol.sql */
-- List the rows affected (updated or deleted) by the
--	 following commands.
--	  If no rows are affected because the command would
--      would
--	 violate the entity integrity constraint, the
--      referential
--	 integrity constraint, or if there would be some
--   other
--	kind
--	 of error, please indicate it.
START TRANSACTION;


/*
UPDATE
 SCIENTIST
SET SSN = "000000001"
WHERE Name = "Claire";
 */
-- ERROR 1062 (23000) at line 106: Duplicate entry '1'
-- for
--	 key 'PRIMARY'
ROLLBACK;

START TRANSACTION;

UPDATE
  FUNDINGAGENCY
SET Name = "NSF"
WHERE Name = "National Science Foundation";

SELECT *
FROM FUNDINGAGENCY;

-- FA. 1
SELECT *
FROM FUNDS;

-- F.1
ROLLBACK;

START TRANSACTION;


/*
DELETE FROM FUNDINGAGENCY
WHERE Name = "French-American Cultural Exchange";
 */
-- ERROR 1451 (23000): Cannot delete or update a parent row:
--	 a foreign key constraint fails
--	 (`HW_SCIENTIFIC_RESEARCH`.`FUNDS`, CONSTRAINT
--	 `FUNDS_ibfk_1` FOREIGN KEY (`Agency`) REFERENCES
--	 `FUNDINGAGENCY` (`Name`) ON UPDATE CASCADE)
ROLLBACK;

-- List the name of the funding agencies created after 2000
--	 ("French-American Cultural Exchange")
SELECT Name
FROM FUNDINGAGENCY
WHERE Creation >= 2000;

-- List the code of the projects that contains the word
--	 "Airplanes" ("AA", "BA")
SELECT CODE
FROM PROJECT
WHERE Name LIKE ("%Airplanes%");

-- List the number of hours scientists contributed to the
--	 project "AA" (18)
SELECT SUM(Hours)
FROM CONTRIBUTESTO
WHERE Project = "AA";

-- List the code of the projects to which the scientist named
--	 Sabine contributed ("AA", "BB")
SELECT Project
FROM CONTRIBUTESTO,
  SCIENTIST
WHERE SCIENTIST.Name = "Sabine"
  AND SCIENTIST.SSN = CONTRIBUTESTO.Scientist;

-- Give the name of the projects who benefited from federal
--	 funds ("Advancing Airplanes")
SELECT PROJECT.Name
FROM PROJECT,
  FUNDS,
  FUNDINGAGENCY
WHERE FUNDINGAGENCY.Type = "Federal"
  AND FUNDINGAGENCY.Name = FUNDS.Agency
  AND FUNDS.Project = PROJECT.Code;

-- Give the name of the scientist who contributed to the same
--	 project as Mike ("Sabine", "James")
SELECT DISTINCT (Fellow.Name) AS "Mike's fellow"
FROM SCIENTIST AS Mike,
  SCIENTIST AS Fellow,
  CONTRIBUTESTO AS A,
  CONTRIBUTESTO AS B
WHERE Mike.Name = "Mike"
  AND Mike.SSN = A.Scientist
  AND A.Project = B.Project
  AND B.Scientist = Fellow.SSN
  AND NOT Fellow.Name = "Mike";

-- List the name of the projects that are not funded by an
--	 agency ("Better Airplanes", "Better Buildings")
SELECT DISTINCT (PROJECT.Name)
FROM PROJECT,
  FUNDS
WHERE NOT PROJECT.Code IN (
    SELECT FUNDS.Project
    FROM FUNDS);

-- Give the name of the scientist who contributed the most
--	 (in terms of hours) to the project named
-- "Advancing
--	 Airplanes" (Sabine)
SELECT SCIENTIST.Name
FROM SCIENTIST,
  CONTRIBUTESTO
WHERE CONTRIBUTESTO.Hours >= (
    SELECT MAX(Hours)
    FROM CONTRIBUTESTO,
      PROJECT
    WHERE PROJECT.Name = "Advancing Airplanes"
      AND PROJECT.Code = CONTRIBUTESTO.Project)
  AND CONTRIBUTESTO.Scientist = SCIENTIST.SSN;
HW_ScientificResearchSol.sql
Solution to Problem 3.21 (Improving a Relational Model for a Printing Station)
Pb 3.21 – Solution to Q. 1
  • Instead of making the nickname attribute being the primary key, they could have it as a non-prime attribute, and use some id with auto-increment as a primary key for the computers, rooms and phones without having to come up with new names all the times.
  • They should simply remove that attribute, and write a SELECT query that returns this information whenever they need it.
  • The best way to address this issue is probably to have a separate table for operating systems, with the attributes they are interested in (architecture, manufacturer of the OS, etc.), and foreign keys from Computer and Phone to it.
  • Making both the Nickname and the ConnectedTo attributes be the primary key would solve their issue, but could potentiall introduce a lot of reduncdancy. The best way is probably to have a separate table that list the connections. Since computers and phones are in two different tables, this creates an additional challenge, since we would need to have a “connection table for computers”, and a “connection table for phones”. We recommend actually merging those two tables into one, that would additionaly have an attribute to set if the device is a phone or a computer.
Pb 3.21 – Solution to Q. 2

A possible solution would consist in

  • Have a OS relation with attributes such as manufacturer, architecture, last update, and an id attribute for the primary key,
  • Merging Computer and Phone into a single Device relation, which would contain an additional attribute Type to distinguish between computers and phones, and a foreign key to OS,
  • Having a Connection relation whose attributes would be foreign keys to Device and Printer,
  • Remove the ComputerOrPhoneInIt attribute,
  • Have id attributes in Room and Device, which would be their sole primary key, but leaving the Nickname attribute in case they would like to store that information.

Solution to Problem 3.22 (Write select queries for a (third!) variation of the COMPUTER table)
(Some of) the answers can be found in the following snippet:
/* code/sql/HW_ComputerVariationAdvancedSol.sql */
START TRANSACTION;

DELETE FROM CONNEXION
WHERE Computer = 'A';

SELECT *
FROM COMPUTER;

SELECT *
FROM PERIPHERAL;

SELECT *
FROM CONNEXION;

ROLLBACK;

START TRANSACTION;

DELETE FROM COMPUTER
WHERE ID = 'A';

SELECT *
FROM COMPUTER;

SELECT *
FROM PERIPHERAL;

SELECT *
FROM CONNEXION;

ROLLBACK;

START TRANSACTION;

DELETE FROM PERIPHERAL
WHERE ID = '15';

SELECT *
FROM COMPUTER;

SELECT *
FROM PERIPHERAL;

SELECT *
FROM CONNEXION;

ROLLBACK;

START TRANSACTION;

DELETE FROM CONNEXION
WHERE Computer <> 'A';

SELECT *
FROM COMPUTER;

SELECT *
FROM PERIPHERAL;

SELECT *
FROM CONNEXION;

ROLLBACK;

SELECT TYPE
FROM PERIPHERAL
WHERE ID = '12';

SELECT ID
FROM COMPUTER
WHERE Model LIKE '%Apple%';

SELECT COUNT(ID)
FROM COMPUTER;

SELECT DISTINCT (TYPE)
FROM PERIPHERAL;

SELECT CONNEXION.Computer
FROM CONNEXION,
  PERIPHERAL
WHERE PERIPHERAL.Type = 'keyboard'
  AND PERIPHERAL.ID = CONNEXION.Peripheral;

SELECT COMPUTER.Model
FROM CONNEXION,
  PERIPHERAL,
  COMPUTER
WHERE PERIPHERAL.Model = 'TP-10 Thermal Matrix'
  AND PERIPHERAL.ID = CONNEXION.Peripheral
  AND CONNEXION.Computer = COMPUTER.ID;

INSERT INTO CONNEXION
VALUES (
  'B',
  '12');

SELECT *
FROM COMPUTER;

SELECT *
FROM PERIPHERAL;

-- Note that the "LastConnexion" attribute has been updated.
SELECT *
FROM CONNEXION;
HW_ComputerVariationAdvancedSol.sql

Designing a Good Database

Resources

This part of the lecture covers significantly more material than the other, hence we give the details of the references below:

Interest for High-Level Design

Previous relational models have mistakes and limitations:

We could go back and forth between relational models (~ logical level) and SQL implementations (~ physical level), but we will use even more high-level tools (~ conceptual level):

The conceptual data model is (in theory at least) independent of the choice of database technology.

Remember that in relational models, relations were representing entities (Student) and relationships (Majors_In). At the conceptual level, and more particularly in ER diagram, the distinction is made between entities and relationship.


Entity-Relationship Model

Data is organized into entities (with attributes), relationships between entities (with attributes as well).

Entities

  • Entity = Thing, object, with independent existence.
  • Each entity has attributes (properties)

Entity A :

  • Name = Clément Aubert
  • Address = HCOB, HA, E. 128 ; Invented St., Auguta, GA
  • Diploma = Ph.D in CS; BS in Math
  • Highest Diploma = Ph.D in CS
  • Favorite Class = CSCI 1301
  • Favorite Sport = NULL

Some vocabulary:

  • Entity = actual thing (individual)
  • Entity type = collection of entities with the same attributes
  • Entity set (or collection) = collection of all entities of a particular entity type.

Attributes

Attributes can be

  • Composite (divided in smaller parts) or simple (atomic)
  • Single-valued or multi-valued
  • Stored vs derived
  • Nested!

{…} = multi-valued

(…) = complex

For instance, one could

  • store the name using a composite attribute (First Name, {Middle Name}, Last Name),
  • store multiple addresses using the “schema” {Address(Street, Number, Apt, City, State, ZIP)},
  • derive the value of “Highest Diploma” using the value(s) stored in “Diploma”.

Key Attributes

A key attribute is an attribute whose value is distinct for each entity in the entity set.

  • Serve to identify an entity,
  • Can be more than one such attribute (and we leave the options open),
  • Cannot be multiple attributes: if more than one attribute is needed to make a key attribute, combine them into a composite attribute and make it the key.
  • A composite attribute that is a key attribute should not still be a key attribute if we were to remove one of the attribute (similar to the minimality requirement).
  • An entity with no key is called a weak entity type: it is an entity that will be identified thanks to its relation to other entities, and thanks to its partial key (we will discuss this later).

Drawing Entity Types

  • Entity = squared box (name in upper case)
  • Attribute = rounded box connected to square box (name in lower case)
If the attribute is …, then…
composite other attributes are connected to it
multi-valued the box have double lines
derived the box have dotted lines
a key the name of the attribute is underlined

 

 

In the following, we’ll focus on the relationship between the entities more than on the attributes of particular entities, so we’ll sometimes simply draw

 

leaving the attributes un-specified (but that does not mean that they all have to be atomic) or even just

 

but that does not mean that the entity type have no attribute!


Relationships

Vocabulary

  • Relationship = actual relation (or action) between entities (“teaches”, “loves”, “possesses”, etc.).
  • Relationship instance = r1 associates n entities e1, …, en (“Pr. X teaches CSCI YYY”, “There is love between Mary and Paul”, etc.)
  • Relationship set = collection of instances
  • Relationship type = abstraction (“Every course belong to one instructor”, “Love is a relation between two persons”, etc).

E1, … En participate in R, e1, …, en participate in r1, n is the degree.

 

Note that we can have Entity Set 1 = Entity Set 2, in which case we say the relation is recursive17.

Naming convention:

  • Use a singular name for entity types.
  • Use a verb for relationship.
  • Relationship types are drawn in diamonds.
  • Drawing usually reads left to right, and top-down.

 

Role Names and Recursive Relations

Convenient, and sometimes mandatory, to give role names.

If we want to stress that we are considering only one aspect of an entity type (that is, a person is not only an employee, a company is not only an employer, but this aspect is crucial for the “EMPLOYS” relation):

We can also use it to make the “right-side” and the “left-side” of a recursive relationship explicit:

Finally, we will sometimes use “Role Name of Entity 1 : Role Name of Entity 2” as a notation for the relation between them. For instance, we can write “Employer:Employee” to denote the “EMPLOYS” relation, and we will also use this notation when the relationship is between different entities, and write e.g. “PERSON:POSITION” for the “OCCUPIES” relation.

Constraints

Two constraints, called “structural constraints”, applies to relationship types: cardinality ratio and participation constraint. They both concerns the number of relationship instances an entity can participate in (which is different from the cardinality of a relationship type).

Cardinality Ratio

Maximum number of relationships instances that an entity can participate in.

For binary relations, can be 1 : 1, N : 1, 1 : N, or M : N. The 1 stands for “at most 1”, and the M, N, and P stand for “possibly more than 1”, or “no maximum”. In ER diagram, we do not count, and do not make the distinction between “at most 5” and “at most 10”, for instance18.

Possible examples include:

Relation Possible Ratio Explanation
MENTOR : MENTEE 1 : N “A mentor can have multiple mentees, a mentee has at most one mentor.”
PERSON : SSN 1 : 1 “A person has one SSN, a SSN belongs to one person.”
COURSE : DEPARTMENT N : 1 “A course is offered by one department, a department can offer any number of courses.”
STUDENT : TEAM M : N “A student can participate in multiple team, a team can have multiple students.”

We indicate the ratio on the edges:

Note that reflexive relations can have any ratio as well. An example of M : N recursive relation could be:

Participation Constraint

Minimum number of relationships instances that an entity can participant it, a.k.a. “minimum cardinality constraint.”

The participation can be total (a.k.a. existence dependency, the entity must be in that relationship at least once) or partial (the entity may or may not be in that relationship).

Total is drawn with a double line, partial is drawn with a single line:

This reads “a course must be offered by a department, but a department may or may not offer courses.”

Attributes

Relationships can have attributes too. The typical example is a date attribute, but other examples include

  • TEACHING relation between PROF and CLASS (N : M) could have a “Quarter” attribute.
  • MENTORING relation between MENTOR and MENTEE (1 : N) could have a “Since” attribute.
  • EMITED_DRIVING_LICENCE between DMV and PERSON (N : 1) could have a “Date” attribute.

Note that an attribute on a relationship type can be atomic or composite, single or multi-valued, stored or derived, but that it cannot be a key attribute (after all, there are no entity to identify!).

Note that there are some moving aspects here: atributes on 1 : 1, 1 : N, N : 1 relationships can be migrated (to the N side when there is one, or to either side where there is none).

For instance, imagine that every phone uses exactly (= “at most and at least”) one carrier, that a carrier can provide network to multiple phones, and that the average quality of the network is an attribute in this relationship:

Then each instance of the relation would be of the form (“Phone X”, “Carrier Y”, “9/10”) for some way of ranking the average quality from 0 to 10. Note that, from the fact that the relationship is N : 1, this means that there is only one tuple involving “Phone X”: this means that the average quality could actually be seen as a property of the phone, and hence be migrated as an attribute to the phone side:

Note that we could not migrate the “average phone quality” to the “Carrier” side: imagine if we had the instances (“Phone X”, “Carrier Y”, “9/10”) and (“Phone Z”, “Carrier Y”, “3/10”), then should the attribute of “Carrier Y” be “9/10” or “3/10”: we have no way of deciding based on this model. Whenever it is a good choice to migrate this attribute or not will depend on the requirement of the models, and it may not always be appropriate to migrate the attribute to the entity. In the case of 1 : 1 relationship, migrating the attribute to both sides (i.e., to both entities) would be a mistake, since it would introduce redundancy in your model.

As an exercise, you can look at the relationships TEACHING, MENTORING and EMITED_DRIVING_LICENCE that are listed above, and see if the attributes can be migrated or not, and if yes, on which side.

Relationships of Degree Higher than Two

Of course, relationships can have a degree higher than two. An example of a ternary relation could be:

To determine cardinality ratio, one should fix all but one parameters, and wonder how many values of the remaining parameter can be in that relationship. Another wording for the same idea can be found in this thread.

Four our example, Customer Y and Bank Z could be in relationship with more than one account (hence the “N”). On the opposite, Customer Y and Account K would be in relationship with only one bank (hence the “1” on the bottom), and Bank Z and Account K would belong to only one customer (hence the “1” on the left).

Let us look at two other examples. First, assume we want to collect information about the treatment prescribed by physicians to patients, we could use a relationship like the following one:

Where

  • The “P” stands for the fact that the same physician can prescribe the same treatment to multiple patients,
  • The “N” stands for the fact that different treatment can be prescribe by the same physiciant to the same patient,
  • The “M” stands for the fact that the same patient can get the same treatment from different physicians.

Now, if we want to store information about who is the president of a country during a term, we could get something like:

Note that this representation of the data assumes that a citizen cannot be the president of two different countries during the same term (the right 1), which could be debatable.

It is sometimes impossible to do without relations with arity greater than 2. For instance, consider the following two diagrams19:

You should realize that they convey different information. For instance, you can know for a fact that a person visit a bookshop only if they bought something in it, while the second diagram de-correlate the act of buying with the visit to a bookshop. Similarly, the second diagram could give you a hint that a person that owns a copy of a book Z and visits a bookshop X that sells it could also visit it, but you will not know that for sure.

An example of recursive ternary relation could be:

An example of relation of degree 4 could be:

The cardinality ratio are computed using the same method as described before.


Weak Entity Types

There are actually two sorts of entity types:

  • Strong (a.k.a. regular, the ones we studied so far), with a key attribute,
  • Weak, without key attribute.

Weak (or child) entity types are identified by identifying / owner type that is related to it, in conjunction with one attribute (the partial key). That relation is called identifying (or supporting) relationship, and weak entities have a total participation constraint. The partial key is an attribute, that, when paired with an entity with which they are in relation through their identifying relationship, allows to identify a particular entity.

Weak entities and identifying relationships have a double border, and partial key have a dotted underline, as follows:

The idea here is that we do not need to gather data about all the dependent in the world, or in isolation, but are interested in dependent only if they are related to en employee in our database. Just having the name of a dependent is not enough to identify them, but having their name and the SSN of the employee they are related to is enough. The identifying relation always have ratio 1 : M or 1 : 1: a weak entity cannot be related to more than one entity of the owner type, so that M : N ratio are not possible (cf. e.g. https://dba.stackexchange.com/q/17207). If you need to have, for instance, a dependant connected to multiple employees, then that means that your dependent entity should be strong, because it has an existence “of its own”.

You may wonder why we do not represent weak entities simply as (composite, multi-valued) attributes of their owner type. For instance, why would we use

instead of

? The answer depends whenever we need to have the ability to represent our weak entities (here, PET) as being in relationship with other entities (that can themselves be weak!), as follows:

This would be impossible if PET was an attribute of FRIEND! Whenever the pet entity type is involved in other relationships or not should help you in deciding which representation to choose.

  • Weak entities types can sometimes be replaced by complex (composite, multi-valued) attributes, unless they are involved in other relationships.
  • Owner can itself be weak!
  • The degree of the identifying relationship can be more than 2 (cf. e.g., https://stackoverflow.com/q/15393587/).

Another example of weak entity whose owner is weak as well could be:

The idea being that the Health care provider cares about an insure only if they are covered by them, and that they care about the doula only if they are currently helping one of their insure.

Alternative Notations

Multiple notations have been used to represent the ratio and constraint on relationship.

A Quick Overview of the Notations for ER Diagram (courtesy of wikipedia)

In the following, we introduce two of them: the Min/Max and the Crow’s foot notations.

Notation with Explicit Maximal (Min/Max Notation)

The two constraints can be written on the same side, and the N, M, P ratio can be replaced by actual number, providing more information.

For instance,

could be drawn as

meaning that

  • A car can be used to carpool between 1 and 5 persons (and that it must be used for at least 1 person),
  • A person can be registered for 0, 1, 2 or 3 carpool at the same time.

More generally, we have the following:

Crow’s Foot Notation

Enhanced Entity–Relationship Model

Extended (or Enhanced) ER Models (EER) have additionaly:

  • Subtype / Subclass: “every professor is an employee”. There is a class / subclass relationship (you can proceed by specialization or generalization).
  • Category (to represent UNION): an OWNER entity that can be either a PERSON, a BANK, or a COMPANY entity type.

Closer to object-oriented programming.

Reverse Engineering

It is possible to go from relational models to ER models, and sometimes needed: if you are given an implementation that seems poorly design, this can be a way of “backing up” and thinking about the (sometimes implicit) choices that were made during the implementation, to eventually correct them.

For instance, consider the code we studied in “A First Example”:

CREATE TABLE STORM (
  NAME VARCHAR(25) PRIMARY KEY,
  Kind ENUM ("Tropical
    Storm", "Hurricane"),
  WindSpeed INT,
  Creation DATE
);

-- We can change the enumerated datatype:
ALTER TABLE STORM MODIFY Kind ENUM ("Tropical Storm",
  "Hurricane", "Typhoon");

CREATE TABLE STATE (
  NAME VARCHAR(25) UNIQUE,
  Postal_abbr CHAR(2) PRIMARY KEY,
  Affected_by VARCHAR(25),
  FOREIGN KEY (Affected_by) REFERENCES STORM (NAME) ON
    DELETE SET NULL ON UPDATE CASCADE
);
HW_Storm.sql

It corresponds to the following relational model:

STORM(Name (PK), Kind, WindSpeed, Creation) STATE(PostalAbbr (PK), Name, AffectedBy (FK to STORM.Name))  

which in turn corresponds to the following ER diagram:

Looking at this diagram made it obvious that our code has a flaw: a stom can affect more than one state! Turning the 1 on the left-hand side of the “AFFECTS” relationship into a M is immediate on the diagram, but, of course, mapping it back to a relational model, and then implementing it correctly, will require more work. In any case, if you had not noted already this flaw, reverse-engineering this code highlighted it quite clearly.

If we look back at Problem 3.5 (Revisiting the PROF table), we had already made a first step, since we converted the code into the following relational model:

PROF(Login (PK), Name, Department (FK to DEPARTMENT.Code)) DEPARTMENT(Code (PK), Name, Head (FK to PROF.Login)) LECTURE (Code (PK), Year (PK), Name, Instructor (FK to PROF.Login) STUDENT(Login (PK), Name, Registered, Major (FK to DEPARTMENT.Code)) GRADE (Login (PK, FK to STUDENT.Login), Grade (PK), LectureCode (FK to LECTURE.Code), LectureYear (FK to LECTURE.Year))  

Going a bit further, we could extrapolate just a little bit and get the following ER diagram:

As we noted in our solution to the second question, this model has several limitations. To list a few, this representation can not handle the following situations:

  • If multiple instructors teach the same class,
  • If the lecture is taught more than once a year (either because it is taught in the Fall, Spring and Summer, or if multiple sections are offered at the same time),
  • If a Lecture is cross-listed, then some duplication of information will be needed.

Looking at it as an ER diagram should help you in understanding why we have those flaws, and how they could be addressed, and “testing” the model should be made easier in its ER form than as SQL code.


ER-to-Relational Models Mapping

Intro

We have to map all of the following:

Entity Strong, Weak
Attributes Composite, Key, Atomic, Multi-valued, Partial Key
Relationships Binary (1 : 1, N : 1, 1 : N, N : M), n-ary

Using four tools: Relations, Attributes, Primary Keys, Foreign Keys.

Algorithm

We will use three techniques to represent some of the relationships, the foreign key approach, the merged relations approach and the cross-reference approach. They are detailed and illustrated after the algorithm, which goes as follows:

# is mapped to
1 Strong Entity Relation with all the simple attributes. Decompose complex (composite) attributes. Pick a key to be the PK, if it is composite, take its elements.
2 Weak Entity Relation with all the simple attributes. Decompose complex attributes. Add as a foreign key the primary key of the relation corresponding to the owner entity type, and make it a primary key, in addition to the partial key of the weak entity. If the owner entity type is itself weak, start with it.
3 Binary 1 : 1 Relationship Types Foreign Key, Merge Relations or Cross-Reference approach
4 Binary 1 : N Relationship Types Foreign Key or Cross-Reference approach
5 Binary M : N Relationship Types Cross-Reference approach
6 n-ary Relationship Types Cross-Reference approach
7 Multi-valued Attributes Create a new relation, add as a foreign key the primary key of the relation corresponding to the original strong entity type. Make all the attributes be the primary key.

whose primary key is the foreign key to the relation corresponding to the entity.

  1. Foreign Key Approach: choose one of the relation (preferably with total participation constraint, or on the N side), add a foreign key and all the attributes of the relationship.
  2. Merged Relation Approach: If both participation constraints are total, just merge them. Primary key = just pick one (or take both). If we were working on the implementation, we would add a NOT NULL constraint on the attribute that is not part of the primary key anymore.
  3. Cross-Reference or Relationship Relation Approach: Create a lookup table with an appropriate number of foreign keys, pick some of them (the one on the N side, both if the ratio is M : N, for n-ary it is a bit more complex, cf. example below) as the primary key.

Every time a relationships have attributes, they are mapped to the resulting relation.

Let us look in more details at some of those steps. For strong entities, using steps 1 and 7, the following:

would give:

DESK(Serial(PK), Building, Room) DESK_COLOR(Desk (PK, FK referencing DESK.Serial))  

And note that if Serial was a complex attribute, we would just “unfold” it, or decompose it, and make all the resulting attributes the primary key of the relation. If one of the attribute was at the same time multi-valued and composite, as follows:

Then we would obtain:

COMPUTER(MAC(PK)) COMPUTER_COLOR(Compter (PK, FK referencing COMPUTER.MAC), Name, Email)  

For relationships, things are a bit more complicated. Consider the following:

Since it is a 1 : 1 relationship where one of the side has a partial constraint, we have the choice between two approaches. The foreign key approach would give:

ENT.A(KeyA (PK), FK (FK to ENT.B.KeyB)) ENT.B(KeyB (PK))  

Note that we could also have added the foreign key on the side of ENT.B, referencing the key of ENT.A. But since ENT.A has a total participation constraint, we know that the value of FK will always exist, whereas some entities in ENT.B may not be in relationship with an entity from ENT.A, creating the (nefast) need for NULL values.

For the same diagram, the cross-reference approach would give:

ENT.A(KeyA (PK)) ENT.B(KeyB (PK)) MAPPING(KeyA (PK, FK referencing ENT.A.KeyA), KeyB(FK referencing ENT.B.KeyB))  

Similarly, note that, in MAPPING, KeyB, or KeyA and KeyB, would also be valid primary keys, but that it makes more sense to have KeyA being the primary key, since we know that ENT.A has a total participation constraint, but ENT.B does not.

If both participation constraints were total, as follows:

Then we could use the merged relations approach, and get:

ENT.A.AND.B.(KeyA (PK), KeyB)  

We picked KeyA to be the primary key for the same reason as before. Note that merging the two entities into one relation also means that you have eventually to do some work on the relations that were referring to them.

Of course, if ENT.A and ENT.B are the same entity (that is, REL is recursive), we would get:

ENT.A(KeyA (PK), Rel(FK referencing Ent.A.KeyA))  

or

ENT.A(KeyA (PK)) REL(KeyA1 (PK, FK referencing Ent.A.KeyA), KeyA2 (FK referencing Ent.A.KeyA))  

depending on the approach we chose.

Binary 1 : N and binary M : N relationships are dealt with in a similar way, using foreign key or cross-reference approaches. The most difficult part of the mapping is with n-ary relationships: we have to use cross-reference approaches, but determining the primary key is not an easy task. Consider the following20:

The arity constraints here can be rephrased as:

  • A member can reserve a particular equipment at multiple time slots (the N),
  • An equipment can be reserved at a particular time slot by only one member (the 1 on the left),
  • A member can reserve only one equipment per time slot (the 1 on the right).

And note that there is no total participation constraint.

To reprent the RESERVES relationship, we need to create a relation with attributes referencing the primary key of MEMBER, the primary key of TIME_SLOT, and the primary key of EQUIPMENT. Making them all the primary key does not represent the fact that the same equipment cannot be booked twice during the same slot, nor that a member can book only one equipment per slot, but allows members to reserve a particular equipment at multiple time slots. To improve this situation, we can either

  1. take the foreign key to MEMBER and the foreign key to TIME_SLOT to be the primary key of this relation,
  2. or take the foreign key to EQUIPMENT and the foreign key to TIME_SLOT to be the primary key of this relation.

Both solutions enforce only some of the requirement expressed by the ER diagram.

Outro

ER Model Relational Model
Entity type Entity relation
1 : 1 or 1 : N relationship type Foreign key (or relationship relation)
M : N relationship type Relationship relation and two foreign keys
n-ary relationship type Relationship relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of simple component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary key

You can have a look at e.g. http://holowczak.com/converting-e-r-models-to-relational-models/ to get a slightly different explanation of this conversion, and additional pointers.

Guidelines and Normal Form

What makes a good database? At the logical (conceptual) and physical (implementation) levels. We will answer belowe this question broadly, and will then use the concept of functional dependency to capture some of those notions precisely, mathematically, and be able to detect issues preventing our database from meeting the usual goals.

In general, a good data base should:

  1. Enforces information preservation (and avoid loss of information)
  2. Have minimum redundancy
  3. Makes queries easy (avoid redundant work, make SELECT and select-project-join easy)

Normally, consistency will simply follows if those goals are met.

For ER diagrams, some of the usual techniques21 are:

General Rules

Semantics

1 relation corresponds to 1 entity or 1 relationship type

No Anomalies

  1. Insertion Anomalies
    Having to invent values or to put NULL to insert tuples, especially on a key attribute!
  2. Deletion Anomalies
    Loosing information inadvertently
  3. Modification Anomalies
    Updates have to be consistent.

(Bad!) Example:

---------- (Login, Name, AdvisoryName, AdvisorOffice, Major, MajorHead)

-----------(Office, PhoneNumber, Building)
  1. Advisor without student
  2. Delete last student of advisor
  3. Advisor change name.

NULL Should Be Rare

NULL has 3 meanings, wastes space, and makes join / nested projections harder.

Example:


STUDENT(Login, …, siblingEnrolled)

Transform into “Emergency Contact in University” relation (bonus: allow multiple contacts).

Identical Attributes in Different Tables Should Be (Primary, Forgein) Key Pairs

Example with advisorOffice and Office: if we try to write a join to obtain the phone number of a student’s advisor, we will obtain all the phone.

Example


MARKER(Owner, Color, OwnerOffice, Brand, BrandEmail)

TEACHER(Office, Name, Phone)

Corrected to:


MARKER(Owner, Color, B͟r͟a͟n͟d͟)

TEACHER(Office, N͟a͟m͟e͟, Phone)

BRAND(N͟a͟m͟e͟, Email)

Functional Dependencies

Functional dependencies (FD) is a formal tool used to assess how “good” a database is, a property of the relation schema. Functional dependencies list the constraints between two sets of attributes from the database. For instance, if X and Y are (sets of) attributes, X → Y reads “X fixes Y”, and implies that the value(s) of Y is fixed by the value(s) of X.

Using Semantics of Attributes

“What should be.”

Let us list all the attributes of our previous example:

MARKER.Owner, MARKER.Color, MAKER.Brand, TEACHER.Office, TEACHER.Name,
TEACHER.Phone, BRAND.Name, BRAND.Email

Think about their dependencies, and list them:

  • TEACHER.NameTEACHER.Office
  • BRAND.NameBRAND.Email
  • TEACHER.OfficeTEACHER.Name
  • TEACHER.OfficeTEACHER.Phone
  • MAKER.Owner and MARKER.ColorMARKER.Brand ?

Using Relation States

“What is.”, can disprove some of the assumptions made previously, but should not add new dependencies based on it (they may be by chance!).

  • Maybe TEACHER.OfficeTEACHER.Name does not hold, because teachers share offices?
  • Maybe TEACHER.NameMARKER.Brand and MARKER.Color seemed to be enforced by the state, but we should not add a functional dependency based on that: there are no “requirement” that a Teacher must always buy the same brand and color, this could simply true be by chance so far and should not be imposed to the teachers.

A particular state cannot enforce a FD, but it can negate one.

Example:

Att. 1 Att. 2 Att. 3
Bob 15 Boston
Bob 13 Boston
Jane 12 Augusta
Emily 12 Augusta
May hold Will not hold
Att. 2 → Att. 3 Att1 → Att2
Att. 3 → Att. 2 Att. 3 → Att. 2
Att. 1 → Att. 3 Att. 2 → Att. 1
{Att. 1, Att. 2} → Att. 3 {Att. 3, Att. 2} → Att. 1

Notations

Or, more conveniently:

If an attribute is a foreign key to another, we will draw an arrow between relations:

Note that:

  • X and Y are sets, we will write A instead of {A}, but keep writing {A, B} for {A, B}.
  • {A1, …, An} → {B1, …, Bm} means that A1 and … and An fix B1, and that A1 and … and An fix Bn, etc.
  • FD1, FD2, …, FDn for the list of functional dependencies, F for all of them.
  • A → B does not imply nor refute B → A.
  • We will not write the FD that are implied by (this variation of) Armstrong’s axioms:
    • Reflexivity: If Y is a subset of X, then X → Y
    • Augmentation: If X → Y, then {X, Z} → Y
    • Transitivity: If X → Y and Y → Z, then X → Z

We will assume that the consequence of those axioms always hold (“closure under those rules”), but will generaly not write them explicitely, since they do not carry any new or additional information.

Definitions

Remember superkey (not minimal key), key, candidate key, secondary key? We now have a formal definition.

In one particular relation R(A1,…,An),

  • If {A1, …, An} → Y for all attribute Y, then {A1, …, An} is a superkey.
  • If {A1, …, An}/Ai is not a superkey anymore for all Ai, then {A1, …, An} is a key.
  • We will often discard candidate keys and focus on one primary key.
  • If Ai is a member of some candidate key of R, it is a prime attribute of R. It is a non-prime attribute otherwise.

Given a FD {A1, …, An} → Y,

  • It is a full functional dependency if for all Ai, {A1, …, An}/Ai → Y, does not hold.
  • It is a partial dependency otherwise.

A FD : X → Y is a transivive dependency if there exist a set of attribute B s.t.

  • B ≠ X, B ≠ Y
  • B is not a candidate key,
  • B is not a subset of any candidate key,
  • X → B and B → Y hold

Normal Forms and Keys

There exists multiple normal forms: First, Second, Third, Fourth, Fifth normal form (“X”NF), … Stronger than the Third, there is the Boyce-Codd NF (BCNF), but we will focus on the first three, that are “cumulative”: to you satisfy N, a relation have to satisfy N − 1, N − 2, etc. The normal form of a relation is the highest normal form condition that it meets.

Fist Normal Form

Definition

The domain of all attributes must be atomic (simple, indivisible): exclude multi-valued and composite attributes.

Sometimes, additional requirement that every relation has a primary key. We will take this requirement to be part of the definition of 1NF, but some authors take a relation to be in 1NF if it has at least candidate keys (i.e., multiple possible keys, but no primary key, which makes their definition more general, cf. (Elmasri and Navathe 2015, 14.4.1)). Hence, we will always assume that a primary key is given, and it will be underlined.

Normalization

This essentially consists in

  • Picking a primary key,
  • Making the complex and multi-valued attributes atomic, following what was done when mapping entity-relationship models to relational models: by either “flattening” the complex attribute (i.e., picking the attributes composing it) or by creating a relation that will allow to store multiple values and linking it to the original relation.

Second Normal Form

Definition

1NF + Every non-prime attribute is fully functionnaly dependent on the primary key.

Normalization

For each attribute A of the relation whose primary key is A1, …, An:

  • Is it prime (i.e., is A ∈ {A1, …, An})?
    • Yes → Done.
    • No → Is it partially dependent on the primary key ?
      • No, it is fully dependent on the primary key → Done
      • Yes, it depends only of {A1, …, Ak} → Do the following:
        • Create a new relation with A and {A1, …, Ak}, make {A1, …, Ak} the primary key, and “import” all the functional dependencies,
        • Remove A from the original relation, and all the functional dependencies that implied it,
        • Add a foreign key to {A1, …, Ak} from their original counterparts in the original relation.

becomes

Refinment: note that if more than one attribute depends of the same subset {A1, …, Ak}, we will create two relations: that is useless, we could have created just one. For instance, considering

applying the algorithm would give (the incorrect, since a foreign key can not refer two attributes in two different tables)

whereas a more subtle algorithm would give

Note that in both cases, all the relations are in Second Normal Form, though (and valid if we ignore the foreign key issue discussed above).

Note also that, sometimes, removing the “original” relation may be preferable: cf. an example in Problem 4.33 (COFFEE relation: primary key and normal form).

Note also that if our primary key is a singleton (a set with only one element), then there is nothing to do, we are in 2NF as soon as we are in 1NF: every functional dependency from a single element is always full!

Third Normal Form

Definition

2NF + no non-prime attribute is transitively dependent on the primary key.

Normalization

For each attribute A of the relation whose primary key is A1, …, An:

  • Is it prime (i.e., is A ∈ {A1, …, An})?
    • Yes → Done.
    • No → Is it transitively dependent on the primary key ?
      • No, there is no {A1, …, Ak} such that {A1, …, An} → {A1, …, Ak} → A and {A1, …, Ak} ⊈ {A1, …, An} and A ∉ {A1, …, Ak} → Done
      • Yes, there is such a {A1, …, Am} → Do the following:
        • Create a new relation with A and {A1, …, Ak}, make {A1, …, Ak} the primary key, and import all the functional dependencies,
        • Remove A from the original relation, as well as all the functional dependencies involving it,
        • Add a foreign key from {A1, …, Ak} to their original counterparts in the original relation.

Examples

We can have a look at another example:

Note that {State, Driver_Licence_Num}, would be a valid primary key for this relation, and that adding it would make it a relation in 1NF.

As we can see, the name “Driver” is somehow counter-intuitive, since the relation also carries information about Governors. This relation is actually not in 2NF, because the FD {State, Driver_Licence_Num} → Governor is not fully functional. A possible way to fix it is to get:

As you can see, the 2NF helped us in separating properly the entities.

An example of a relation that is in 2NF but not in 3NF could be:

As we can see, all the non-prime attributes are fully functionally dependent from Login, which is our primary key. But, obviously, one of this dependecy is transitive, and breaks the 3NF. A way to fix it is:

As we can see, 3NF also helped us in separating properly the entities, in a slightly different way.

In conclusion, we can observe that every FD X → Y s.t. X is a proper subset of the primary key, or a non-prime attribute, is problematic. 2NF is a guarantee that every entity has its own relation, 3NF is a way to avoid data inconsistency.


Unified Modeling Language Diagrams

Overview

One approach for analysis, design, implementation and deployment of databases and their applications. Databases interact with multiple softwares and users, we need a common language.

Unified Modeling Language is a standard:

  • Generic
  • Language-independent
  • Platform-independent

Wide, powerful, but also intimidating.

You know UML from object-oriented programming language:

That is an example of a class diagram (with class name, attributes and operators, as well as a particular way to represent that a class extends another) , there are other types of diagrams, they are not unrelated! For instance, using communication diagrams, deployment diagrams, and state chart diagrams, you can collect the requirements needed to draw a class diagram! They each offer a viewpoint on a software that will help you in making sure the various pieces will fit together: it is a tool commonly used in software engineering, and useful in database design.

Types of Diagrams

There are 14 different types of diagrams, divided between two categories: structural and behavioral.

UML Diagram Hierarchie

(Source: https://commons.wikimedia.org/wiki/File:UML_diagrams_overview.svg)

Structural UML Diagrams

They describe structural, or static, relationships between objects, softwares.

  • Class diagram describes static structures: classes, interfaces, collaborations, dependencies, generalizations, etc. We can represent conceptual data base schema with them!
  • Object diagram, a.k.a. instance diagram, represents the static view of a system at a particular time. You can think of a “freeze” of a program, to be able to observe the value of the variables and the objects (or instances) created.
  • Component diagram describes the organization and the dependencies among software components (e.g., executables, files, libraries, etc.), to describe how an arbitrary large software system is split into pieces.
  • Deployment diagram is the description of the physical deployment of artifacts (i.e., software components) on nodes (i.e., hardware). If your program runs on a local computer, fetching data from the Internet, and storing output on a server, you may describe this situation using this sort of diagram.

In this category also exist Composite structure diagram, Package diagram and Profile diagram.

Behavioral UML diagrams

They describe the behavioral, or dynamic, relationship, between components.

  • Use case diagram describes the interaction between the user and the system. Supposedly, it is the privileged tool to communicate with end-users.
  • State machine diagram, a.k.a., state chart diagram, describes how a system react to external events. You can picture yourself a complex form of finite state automata diagram.
  • Activity diagram is a flow of control between activities. You may have seen them already, they are supposedly easy to follow:
Activity Diagram Quiz Example

Then there is the sub-category of “Interaction diagrams”:

  • Sequence diagram describes the interactions between objects over time, the flow of information or messages between objects. It is helpful to grasp the time ordering of the interactions.
  • Communication diagram, a.k.a., collaboration diagram, describes the interactions between objects as a serie of sequenced messages. It is helpful to grasp the structure of the objects, who is interacting with who.

This sub-category also comprise Timing diagram and Interaction overview diagram.

Zoom on Classes Diagrams

Looking at the “COMPANY conceptual schema in UML class diagram notation”, and comparing it with the “ER schema diagram for the COMPANY database” from the textbook, can help you in writing your own “Rosetta Stone” between ER and UML diagram. Let us introduce some UML terminology for the class diagrams.

UML ER
Class Entity Type
Class Name Entity Name
Attributes Attributes
Operations (or Method) Sometimes Derived Attributes
Association Relationship Type
Link Relationship Instance
Multiplicities Structural Constraint

As well as for ER diagram, the domain (or data type) of the attributes is optional. A composite attribute in a ER diagram can be interpreted as a structured domain in a UML diagram (think of a struct), and a multi-valued attribute requires to create a new class.

Associations are, to some extend, more expressive than relationship types:

  • As for relationship types, they can be recursive (or reflexive), and uses role names to clarify the roles of both parties.
  • As for relationship types they can have attributes: actually, a whole class can be connected to an association.
  • As for relationship types, they can express a cardinality constraint on the relation between classes. They are written as min .. max, with * for “no maximum”, and the following shorthands: * stands for 0..* and 1 stands for 1..1. An association with 1 on one side and * on the other (resp. 1 and 1, * and 1, * and *) is sometimes called “one-to-many” (resp., “one-to-one”, “many-to-one”, “many-to-many”). The notation in partially inverted w.r.t. ER diagrams:

Additionally, associations can be “extended”, and they are not the only kind of relationship that can be expressed between two classes.

  • As opposed to the relationship types, they can be given a direction, indicating that the user should be able to navigate them only in one direction, or in two (which is the default). This is used for security or privacy purposes.
  • As opposed to the relationship types, they can be qualified, implying that a class is not connected to the other class as a whole, but to one particular attribute, called the qualifier, or discriminator.
  • As opposed to the relationship types, they are part of a bigger collection of relationships. Other relationships include:

Qualified associations can be used for weak entities, but not only.

Class Diagram Relationships

Some of those subtleties depend on your need, and are subjective, but are important tool to design properly a database, and relieving the programmer from the burden of figuring out many details.

Exercises

Exercise 4.1

Name the three high-level models we will be learning about in this class (expand the acronyms).

Exercise 4.2

What could be the decomposition of an attribute used to store an email address? When could that be useful?

Exercise 4.3

What would be the benefit of having a composite attribute “Phone” made of two attributes (Number and Description) being multi-valued? Answer this question, and draw the resulting attribute.

Exercise 4.4

Draw the ER diagram for a “COMPUTER” entity that has one multivalued attribute “Operating_System”, a composite attribute “Devices” (decomposed into “Keyboard” and “Mouse”) and an “ID” key attribute.

Exercise 4.5

Draw the ER diagram for a “CELLPHONE” entity that has a composite attribute “Plan” (decomposed into “Carrier” and “Price”), an “MIN” (Mobile Identification Number) key attribute, and a multi-valued “App_Installed” attribute.

Exercise 4.6

Name one difference between a primary key in the relational model and a key attribute in the ER model.

Exercise 4.7

What is a derived attribute? Give two examples and justify them.

Exercise 4.8

Invent an entity type with at least one composite attribute and one atomic attribute, but no multi-valued attributes. Identify a possible key attribute and draw the entity type you obtained using the conventions we used in class.

Exercise 4.9

What is the degree of a relationship type?

Exercise 4.10

What is a self-referencing, or recursive, relationship type? Give two examples.

Exercise 4.11

What does it mean for a binary relationship type “Owner” between entity types “Person” and “Computer” to have a cardinality ratio of M : N?

Exercise 4.12

What are the two possible structural constraints on a relationship type?

Exercise 4.13

Draw the diagram for a “VideoGame” entity that would allow to store the name of the game, the supported platform(s) and the release date. Then, add a recursive relationship on that entity called “Is the sequel of” and specify all the constraints.

Exercise 4.14

Draw a diagram to represent a relationship type R between two entities types A and B such that:

  • An entity in A may or may not be in relationship R with an entity in B.
  • An entity in B must be in relationship R with an entity in A.
  • An entity in A can be in relationship R with at most one entity in B.
  • An entity in B can be in relationship R with any number of entities in A.
Exercise 4.15

Express the constraints represented in the following diagram in plain English.

Exercise 4.16

What does it mean for a binary relationship type “is the Chair of” between entity types “Professor” and “Department” to have a cardinality ratio of 1:N? Would it make sense to be have a total participation constraint on one side, and if yes, on which side?

Exercise 4.17

Express the constraints represented in the following diagram in plain English.

Exercise 4.18

For the following binary relationships, suggest cardinality ratios based on the common-sense meaning of the entity types.

Entity 1 Cardinality Ratio Entity 2
STUDENT : MAJOR
CAR : TAG
INSTRUCTOR : LECTURE
INSTRUCTOR : OFFICE
COMPUTER : OPERATING_SYSTEM
Exercise 4.19

Give an example of a binary relationship type of cardinality 1 : N.

Exercise 4.20

Give an example of a binary relationship type of cardinality N : 1 and draw the corresponding diagram (you do not have to include details on the participating entity types).

Exercise 4.21

Draw an ER diagram with a single entity type, with two stored attributes, and one derived attribute. In your answer, it should be clear that the value for the derived attribute can always be obtained from the value(s) of the other attribute(s).

Exercise 4.22

Draw an ER diagram expressing the total participation of an entity type “BURGER” in a binary relation “CONTAINS” with an entity type “INGREDIENT”. What would be the cardinality ratio of such a relation?

Exercise 4.23

Under what condition(s) can an attribute of a binary relationship type be migrated to become an attribute of one of the participating entity types?

Exercise 4.24

Consider the following diagram:

  1. Express the constraints represented in the diagram in plain English.
  2. Imagine there is a “FromIP” attribute on the “OPENED_BY” relationship that stores the IP used by the user to open the ticket. Could you migrate the attribute to one of the entity? Explain how you would do it, or why it is impossible.
Exercise 4.25

Consider the following diagram:

  1. Express the constraints about the maximums represented in the diagram in plain English.
  2. Briefly explain why there are no total participation constraints on this diagram.
Exercise 4.26

Suppose a “PRODUCES” relationship with an attribute “Amount” exists between a “PRODUCER” entity type and a “MOVIE” entity type, with ratio 1 : M. Migrate the “Amount” attribute to one of the entity types and draw the resulting diagram.

Exercise 4.27

Suppose a “MEMBERSHIP relationship with an attribute”Level” (e.g., “silver”, “platinium”, etc.) exists between a “PERSON” entity type and a “CLUB” entity type, with ratio M : 1. Migrate the “Level” attribute to one of the entity types and draw the resulting diagram.

Exercise 4.28

Assume with have three entity types, “Lecture Notes”, “Class” and “Professor.”

  1. Draw a diagram of a ternary relationship between the three entities.
  2. Draw a diagram that has two binary relationships from one of the three entities to the other two entities.
  3. Come up with a question that could be answered using one model but not the other from the previous steps (specify which relationship would be able to answer your question).

You can specify role names in your diagrams for added clarity, and remember to list all the constraints.

Exercise 4.29

Can we always replace a ternary relationship with three binary relationships? Give an example.

Exercise 4.30

What is the difference between an entity type and a weak entity type?

Exercise 4.31

What is a partial key?

Exercise 4.32

Why do weak entity type have a total participation constraint?

Exercise 4.33

Invent a weak entity type, its identifying (owner) entity type and the identifiying (or supporting) relationship. Both entities should have (partial) key, and each should have at least one composite attribute.

Exercise 4.34

Convert the following ER diagram into a relational model:

Exercise 4.35

Convert the following ER diagram into a relational model:

Exercise 4.36

What is insertion anomaly? Give an example.

Exercise 4.37

What is deletion anomaly? Is it a desirable feature?

Exercise 4.38

Why should we avoid attributes whose value will often be NULL? Can the usage of NULL be completely avoided?

Exercise 4.39

Consider the following relation:

PROF(S͟S͟N͟, Name, Department, Bike_brand)

Why is it a poor design to have a “Bike_brand” attribute in such a relation? How should we store this information?

Exercise 4.40

Consider the following relation:

STUDENT(S͟S͟N͟, Name, , Sibling_On_Campus)

Why is it a poor design to have a “Sibling_On_Campus” attribute in such a relation? How should we store this information?

Exercise 4.41

Consider the following relational database schema:

STUDENT(L͟o͟g͟i͟n͟, Name, , Major, Major_Head)
DEPARTMENT(C͟o͟d͟e͟, Name, Major_Head)

Assuming that “Major” is a foreign key referencing “DEPARTMENT.Code”, what is the problem with that schema? How could you address it?

Exercise 4.42

Why can we not infer a functional dependency automatically from a particular relation state?

Exercise 4.43

Consider the relation R(A,B,C,D,E,F) and the following functional dependencies:

  1. F → {D, C}, D → {B, E}, {B, E} → A
  2. {A, B} → {C, D}, {B, E} → F
  3. A → {C, D}, E → F, D → B

For each set of functional dependency, give a key for R. We want a key, so it has to be minimal.

Exercise 4.44

Consider the relation R(A,B,C,D,E,F) and the following functional dependencies:

A → {D, E}, D → {B, F}, {B, E} → A, {A, C} → {B, D, F}, A → F

Answer the following:

  1. How many candidate keys is there? List them.
  2. How many transitive dependencies can you find? Give them and justify them.
Exercise 4.45

What is a composite attribute in a ER diagram? Can a relational schema with composite attribute be in Second Normal Form?

Exercise 4.46

Consider the relation R(A,B,C,D) and answer the following:

  1. If {A, B} is the only key, is {A, B} → {C, D}, {B, C} → D a 2NF? List the nonprime attributes and justify.
  2. If {A, B, C} is the only key, is A → {B, D}, {A, B, C} → D a 2NF? List the nonprime attributes and justify.
Exercise 4.47

Consider the relation R(A,B,C,D,E,F) with candidate keys {A, B} and C. Remember that, in all generality, to be a prime attribute, you just need to be part of a possible candidate key. Answer the following:

  1. What are the prime attributes in R?
  2. Is {C, D} → E a fully functional dependency?
  3. Write a set of functional dependencies containing at least one transitive depency, and justify your answer.
Exercise 4.48

Consider the relation R(A,B,C,D,E) and the following functional dependencies:

  1. C → D, {C, B} → A, A → {B, C, D}, B → E
  2.  A → {C, D}, C → B, D → E, {E, C} → A
  3. {A, B} → D, D → {B, C}, E → C

For each one, give one candidate key for R.

Exercise 4.49

Consider the relation R(A,B,C,D,E) and answer the following:

  1. If {A, B} is the primary key, is B → E, C → D a 2NF? List the nonprime attributes and justify.
  2. If {A} is the primary key, is B → C, B → D a 2NF? List the nonprime attributes and justify.
Exercise 4.50

Consider the relation R(A,B,C,D,E,F), and let {B, D} be the primary key, and have additionnaly the functional dependencies {A, D} → E, C → F. This relation is not in 3NF, can you tell why?

Exercise 4.51

Consider the relation R(A,B,C,D) and answer the following:

  1. If A is the only key, is A → {B, C, D}, {A, B} → C, {B, C} → D a 3NF? List the nonprime attributes and justify.
  2. If B is the only key, is B → {A, C, D}, A → {C, D}, {A, C} → D a 3NF? List the nonprime attributes and justify.
Exercise 4.52

Consider the relation R(A,B,C,D,E) and the functional dependencies {A, B} → C, B → D, C → E. Answer the following:

  1. A by itself is not a primary key, but what is the only key that contains A?
  2. List the non-prime attributes.
  3. This relation is not in 2NF: what transformation can you operate to obtain a 2NF?
  4. One of the relation you obtained at the previous step is likely not to be in 3NF. Can you normalize it? If yes, how?
Exercise 4.53

What are the two different categories of UML diagram?

Exercise 4.54

Can a C++ developer working on Linux and a Java developer working on MacOS use the same class diagram as a basis to write their programs? Justify your answer.

Exercise 4.55

What kind of diagram should we use if we want to …

  1. describe the functional behavior of the system as seen by the user?
  2. capture the flow of messages in a software?
  3. represent the workflow of actions of an user?
Exercise 4.56

Name two reasons why one would want to use a UML class diagram over an ER diagram to represent a conceptual schema.

Exercise 4.57

Consider the following diagram:

Give the number of attributes for both classes, and suggest two operations for the class that does not have any. Discuss the multiplicities: why did the designer picked those values?

Exercise 4.58
Convert the following ER diagram to a UML class diagram.

Exercise 4.59

Briefly explain the difference between an aggregation and a composition association.

Exercise 4.60

How is generalization (or inheritance) represented in a UML class diagram? Why is such a concept useful?

Exercise 4.61

Convert the following ER diagram into a UML class diagram:

Exercise 4.62

Convert the following UML class diagram into an ER diagram:

Solution to Exercises

Solution 4.1

The three high-level models we will be learning about are the Unified Modeling Language, Entity Relationship, and Enhanced Entity–Relationship models.

Solution 4.2

A useful decomposition of an email address attribute could be: the username part before the @ sign, and the domain part afterwards (that could even be sub-divided between the domain name and its top-level domain). It might be useful to have statistics about the domains of the users or to sort the usernames by length, etc.

Solution 4.3

Having a “Phone” attribute being multi-valued would allow to store multiple phone numbers for the same entity. Typically, one would want to store a pair (Number, Description) for their office phone, their cell, etc. The resulting attribute would be drawn as follows:

Solution 4.4

Solution 4.5

Solution 4.6

There can be more than one key in the ER model, but it has to be made of a single attribute, whereas a primary key can be made of multiple attributes.

Solution 4.7

A derived attribute is an attribute whose value can be determined by the value of other attributes. For instance:
- The value of an “Age” attribute could be determined from the value of an “Date of birth” attribute and the current day.
- The value of a “State” attribute can be determined from the value of a “Zip code” attribute.
- The value of a “Body Mass Index” attribute could be calculated from the values of height and weight attributes.
- The value of an “Initials” attribute could be determined using the values of the “First Name”, “Middle Name”, and “Last Name” attributes.

Solution 4.8

Solution 4.9

The degree of a realationship type is the number of its participating entity types.

Solution 4.10

A self-referencing relationship type is where the same entity type participates more than once. On a SEATS entity type, it would be an attribute like “is to the left of” or on a PERSONS entity type, it would be and attribute like “is married to”.

Solution 4.11

The cardinality ratio on the binary relationship type “Owner” between the entity types “Person” and “Computer” means that a person can own multiple computers, and a computer can have multiple owners.

Solution 4.12

The two possible structural constraints on a relationship type are the cardinality ratio and participation constraints.

Solution 4.13

We would obtain the following diagram:

Note that the “M / N” part could be discussed: having 1 instead of M would mean that a videogame that is a sequel is the sequel of at most one videogame. This could make sense, but would forbid for instance to register “Battletoads & Double Dragon - The Ultimate Team” as the sequel of both Double Dragon and Battletoads. Having 1 instead of N would mean that every videogame has at most one sequel: it would prevent from registering both “Super Mario Land” and “Super Mario World” as the sequels of “Super Mario Bros. 3”.

However, note that all the participation constraints are partial: having a total participation constraint would mean (on the left side) that every game is a sequel, or (on the right side) that every game have at least one sequel, two statements that are obviously wrong.

Solution 4.14

We would obtain the following diagram:

Solution 4.15

A key opens only one door, and every key must open at least one door. A door can be opened by multiple keys, and some doors may not be opened by any key (think of doors that do not have a lock).

Solution 4.16

The binary relation type “is the Chair of” with a cardinality ratio of 1:N between entity types “Professor” and “Department” means that a department can have at most one professor as its chair, but that a professor can be the chair of multiple departments. It could make sense to require that every department has a chair, hence writing a double line between the Department entity and the “is the Chair of” relationship, but it would not make sense to have a total participation constraint on the side of the professor (which would mean that every professor has to be the chair of a department).

Solution 4.17

An operating system may be supported by many computers, but it is also possible that no computer supports it (think of an operating system in development, or developed for embeded devices). A computer must support at least one operating system and can support multiple operating systems.

Solution 4.18
Entity 1 Cardinality Ratio Entity 2 Explanation
STUDENT N : 1 MAJOR “A student has one major, but multiple students can have the same major”
CAR 1 : 1 TAG “A car has exactly one tag, a tag belongs to one particular car.”
INSTRUCTOR 1 : N LECTURE “An instructor can teach multiple lecture, but a lecture is taught by only one person.”
INSTRUCTOR 1 : N OFFICE “An instructor can have multiple office, but an office belongs to only one instructor”
COMPUTER M : N OPERATING_SYSTEM “A computer can have multiple operating system, the same operating system can be installed on more than one computer.”

Some of these choices are debatable (typically, almost any combination seems reasonable for the INSTRUCTOR : OFFICE relation).

Solution 4.19

A binary of relationship of SUPERVISOR as a recursive relationship on EMPLOYEE.

Solution 4.20

Solution 4.21

Solution 4.22

Solution 4.23

An attribute of a binary relationship type can be migrated to one of the participating entity types when the cardinality ratio is 1 : N, 1 : 1, or N : 1. It can be migrated “to the N side” or, if there is no N side, to either side. Note that for n-ary relationships, at least one ratio needs to be 1 for the attribute to be allowed to migrate (and “to the N side”, or, if there is no N side, to any side).

Solution 4.24
  1. A ticket must be opened by a exactly one user, and an user can open any number of tickets (including 0).
  2. We could migrate the “FromIP” attribute to the “TICKET” entity: the intuition is that while the IP adress of a user can evolve through time, the IP used at the time of creation of the ticket is unique to the ticket, and hence can become an attribute of the ticket.
Solution 4.25
  1. A citizen can be the president of at most one country during a given term. A country can have only one citizen as their president during a given term. A citizen can be the president of the same country over multiple terms.
  2. Some citizen will never be president, some country may not have presidents (think royalty), some terms may not relate to any presidency (e.g. “3rd century BC”).
Solution 4.26

We could have the following:

Solution 4.27

We could have the following:

Solution 4.28
  1. A possible example of ternary relationship is:

  2. One example of two binary relationships could be:

  3. A question like

    “Who wrote the lecture notes X?”

    could be answered with the binary relationships but not the ternary. Conversely, a question like

    “What are the lecture notes refered to by Prof. X in their class Y?”

    could not be answered using the binary relationships (since we do not know what classes are taught by Prof. X).

Solution 4.29

No, a ternary relationship cannot always be replaced by three binary relationship. For instance, if I have a “Travelling to” relationship between a “Person”, a “City” and a “Transport mode”, to represent the fact that a person is travelling to a city using a particular mode of transportation, there is no way I can convey the same information using binary relationships.

Solution 4.30

The weak entity type does not have a key attribute, it cannot be distinguised from the other weak entities based on a single attribute, for that we also need to know its relationship to some other entity type.

Solution 4.31

For a weak entity attribute, it is the attribute that can uniquely identify weak entites that are related to the same owner entity.

Solution 4.32

Otherwise, we could not identify entities in it without owner entity.

Solution 4.33

A possible solution is:

Note that the two composite attributes are “generic”, in the sense that you can re-use those examples easily.

Solution 4.34

A possible option is:

PERSON(SSN (PK), DOB, Stays_At (FK to PLACE.Address) ADDRESS(Address (PK), Rooms)  

Note that “Stays_At” could also be a separate relation, with two attributes, “Address” and “Person”, linked to respectively PLACE.Address and PERSON.SSN, and both being the primary key of the relation.

Solution 4.35

A possible option is:

EMPLOYEE(Shift, Name, SSN (PK)) RESERVATION(Id (PK, TakenBy (FK to EMPLOYEE.SSN), StartTime, EndTime, Date, CustomerFName (FK to CUSTOMER.Fname), CustomerLName (FK to CUSTOMER.Lname)) CUSTOMER(FName (PK), LName (PK), Phone)  

Note that to more faithfully represent the total participation constraints, one could add NOT NULL attributes to TakenBy, CustomerFName and CustomerLName in RESERVATION.

Solution 4.36

When you have to invent a primary key or add a lot of NULL value to be able to add a tuple. I want to add a room in my DB, but the only place where rooms are listed are as an attribute on a Instructor table, so I have to “fake” an instructor to add a room.

Solution 4.37

A delete anomaly exists when certain attributes are lost because of the deletion of other attributes. It is not desirable, since it can lead to the loss of information.

Solution 4.38

Because they waste space, they are ambiguous (N/A, or unknown, or not communicated?), and they make querries harder. No, it is necessary sometimes.

Solution 4.39

Because it will be NULL most of the time. In a separate relation, e.g. a “BIKE” relation, with two attributes, “Owner” and “Brand”, “Owner” being a foreign key referencing the SSN attribute of PROF.

Solution 4.40

Because it will be NULL most of the time, and because students could have more than one sibling on campus. In a separate relation, e.g. in a “EMERGENCY_CONTACT” relation, with two attributes, “Student” (refercing the SSN attribute of STUDENT), and “Contact”. If the emergency contacts are not related to the student, or if we want to preserve the fact that one student is a sibling to another, we can create another relation to store that information.

Solution 4.41

Major_Head will give update anomalies. By putting the Head of the department in the DEPARTMENT relation only, i.e., removing it from STUDENT.

Solution 4.42

Just because a coincidence exists (i.e., “in my data set, no android user is color-blind”) does not mean that it will always be true (i.e., “no color-blind person will ever use android”). Functional dependencies should come from a principled reasoning about the attributes, and not from the observation of the data.

Solution 4.43
  1. F
  2. {A, B, E}
  3. {A, E}
Solution 4.44
  1. Only one: {A, C},
  2. A → F by A → D, D → F.
Solution 4.45

A composite attribute is an attribute made of multiple attributes, like an “Address” attribute could be composed of the “sub”-attributes “Street”, “City”, “Zip” and “State. A relational schema needs a primary key and to have only atomic domains to be in first normal form, so, no, a relational schema with composite attributes can not be in second normal form.

Solution 4.46
  1. Yes. C and D are non prime, and they fully depend on {A, B}.
  2. No. D is the only non prime, and it depends only on A.
Solution 4.47
  1. A, B and C.
  2. No, because we can remove D,
  3. A → D, D → E and A → E
Solution 4.48
  1. {B, C}, A
  2. A, {C, E},
  3. {A, D, E}, {A, B, E}
Solution 4.49
  1. No. C, D, E, and E has a partial relation to B