Primary Keys
Adding a primary key:
ALTER TABLE STATE ADD PRIMARY KEY (Name);
Removing the primary key:
ALTER TABLE STATE DROP PRIMARY KEY;
Lecture Notes
August 8, 2022 (05:47:16 PM)
CAMPUS
SQL
SQL
SQL
codingSQL
statements for the COFFEE databaseAs of August 2022, the main author of those notes (Dr. Aubert) is not scheduled to teach CSCI 3410 - Database Systems in the forseeable future. As a result, those notes are archived.
Please, also note that pandoc-include-code and pandoc-numbering, required to compile those notes, are not compatible with the current version of pandoc (cf. the INSTALL.md instructions). As a result, compiling those notes will require increasing version tinkering.
These lecture notes are written in an elusive style: they are a support for the explanations that will be made at the board. Reading them before coming to the lecture will help you getting a sense of the next topic we will be discussing, but you may sometimes have trouble deciphering their … unique style.
On top of the notes, you will find in this document:
Any feedback is greatly appreciated. Please refer to https://spots.augusta.edu/caubert/db/ln/README.html#contributing for how to contribute to those notes. The syllabus is at https://spots.augusta.edu/caubert/db/, and the webpage for those notes is at https://spots.augusta.edu/caubert/db/ln/.
Please, refer to those notes using this entry (Aubert 2019):
@report{AubertCSCI3410-DatabaseSystems,
author={Aubert, Clément},
title={CSCI 3410 - Database Systems},
url={https://spots.augusta.edu/caubert/db/ln/},
urldate={2019-11-03},
year={2019},
institution={{School of Computer and Cyber Sciences, Augusta University}},
location={Augusta, Georgia, USA},
langid={en},
type={Lecture notes}
}
There are four way to access the code shared in those lecture notes:
For this latter aspect, note that some portion of code starts with a path in comment, and are followed by a link, like so:
HW_HelloWorld.sqlThis means that this code can be found at
and that you can click the link below the code directly to access it1.
The SQL
code frequently starts with
DROP SCHEMA IF EXISTS HW_NAME_OF_SCHEMA;
CREATE SCHEMA HW_NAME_OF_SCHEMA;
USE HW_NAME_OF_SCHEMA;
This parts starts by deleting the schema HW_NAME_OF_SCHEMA
if it exists, then create and use it: it allows the code to run independently of your installation. It needs to be used with care, though, since it would delete everything you have in the HW_NAME_OF_SCHEMA
schema before re-creating it, but empty.
Finally, the comments
-- start snippet something
and
-- end snippet something
can be ignored, as their are an artifice from pandoc-include-code to select which portion of the code to display in those notes.
A typical (meeting twice a week, ±17 weeks, ±30 classes) semester is divided as follows:
For information purposes, an indication like this:
marks the (usual) separation between two lectures.
To give you a sense of what you will be asked during the exams, quizzes and projects, or simply to practise, please find below the exams given previous semesters, in reverse chronological order. The quizzes are not indicated, but were generally a mix of up to five exercises and one problem from the relevant chapter(s).
Due to the Covid-19 pandemic, only one exam took place, and the final exam was taken remotely on D2L. A second project, more ambitious, was also asked from the students, and accounted for a large portion of their grade.
OR
operator, and Exercise 3.30)SQL
coding)SQL
statements for the COFFEE database)SQL
coding)SQL
statements for the COFFEE database)The source code for those notes is hosted at rocketgit, typeset in markdown, and then compiled using pandoc and multiple filters (pandoc-numbering, the citeproc library, pandoc-include-code). The drawings use various LaTeX packages, including PGF, TikZ, tikz-er2, pgf-umlcd and tikz-dependency. The help from the TeX - LaTeX Stack Exchange community greatly improved this document. The u͟n͟d͟e͟r͟l͟i͟n͟e͟3 text is obtained using YayText, the unicode symbols are searched in the “Unicode characters and corresponding LaTeX math mode commands”. Finally, the pdf
version of the document uses Linux Libertine fonts, the html
version uses Futura.
Those lecture notes were created under an Affordable Learning Georgia Mini-Grant for Ancillary Materials Creation and Revision (Proposal M71).
Those lecture notes have greatly benefited from the contributions of many students, included but not limited to Crystal Anderson, Bobby Mcmanus, Minh Nguyen and Poonam Veeral. Additionally, (Redacted), Mark Holcomb, Assya Sellak, Sydney Strong and Patrick Woolard helped smash some bugs in the tools used to produce this document.
Please refer to https://spots.augusta.edu/caubert/db/ln/README.html#authors-and-contributors for a detail of the contributions.
You can find at the end of this document the list of references, and some particular resources listed at the beginning of each chapter. Let me introduce some of them:
Those resources are listed as complements, but it is not require to read them to understand the content of those notes. (Watt and Eng 2014) –being available free of charge– is more descriptive than the current notes, and as such can constitutes a great complement. Unfortunately, it lacks some technical aspects, and the database program aspect is not discussed in detail.
This work is under Creative Commons Attribution 4.0 International License or later.
Some figures and resources are borrowed from other sources, in which case it is indicated clearly.
There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.
In theory, pretty much any programming language can be used to
But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.
In those lecture notes, we will introduce one of this tool–the SQL
programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL
offered by the NoSQL approach, as well as the limitations and highlights of both models.
A database (DB) is a collection of related data.
It has two components, the data (= information, can be anything, really) and the management (= logical organization) of the data, generally through a Database Management System.
A database
The mini-world is the part of the world, or universe, that will be represented in the database: as we can not represent the whole universe (every position of every atom at any given moment since the big-bang!), we must agree on what “slice” of it we should represent. Typically, a data-base designed to help in calculating students’ grades will include students’ names, transcript, classes taken, etc., but certainly not their height, favorite color or where they usually seat in class: altough all of this information is “part of the universe”, we will not need it and decide to exclude it from our data.
A DBMS has multiple components, as follows:
Note that
A DBMS contains a general purpose software that is used to
You can think of a tool to
Exactly like a program can have
a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.
Role | Task |
---|---|
Client | Specify the business statement, the specifications |
DB Administrator | Install, configure, secure and maintain up-to-date the DBMS |
Designer | Lay out the global organization of the data |
Programmer | Implement the database, work on the programs that will interface with it |
User | Provide, search, and edit the data (usually) |
In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.
From the business statement to the usage, a project generally follows one of this path:
Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.
Let us consider the following:
STUDENT
Name | Student_number | Class | Major |
---|---|---|---|
Morgan | 18 | 2 | IT |
Bob | 17 | 1 | CS |
COURSE
Course_name | Course_number | Credit_hours | Department |
---|---|---|---|
Intro. to CS | 1301 | 4 | CS |
DB Systems | 3401 | 3 | CS |
Principles of Scripting and Automation | 2120 | 3 | AIST |
SECTION
Section_identifier | Course_num | Semster | Year | Instructor |
---|---|---|---|---|
2910 | 1301 | Fall | 2019 | Kate |
9230 | 2103 | Spring | 2020 | Todd |
GRADE_REPORT
Student_number | Section_identifier | Grade |
---|---|---|
17 | 2910 | A |
18 | 2910 | B |
PREREQUISITE
Course_number | Prerequisite_number |
---|---|
2120 | 1301 |
1302 | 1301 |
You can describe the structure as a collection of relations, and a collection of columns:
RELATIONS
Relation Name | Number of Columns |
---|---|
STUDENT | 4 |
COURSE | 4 |
SECCTION | 5 |
GRADE_REPORT | 3 |
PREREQUISITE | 2 |
COLUMNS
Column Name | Datatype | Belongs to relation |
---|---|---|
Name | String | STUDENT |
Student_number | Integer | STUDENT |
Class | String | STUDENT |
Major | String | STUDENT |
Course_name | String | COURSE |
Course_number | Integer | COURSE |
Credit_hours | Integer | COURSE |
Department | String | COURSE |
… | … | … |
Prerequisite_number | Integer | PREREQUISITE |
This organization will allow some interactions. For instance, we can obtain the answer to questions like
“What is the name of the course whose number is 1301?”,
“What courses is Kate teaching this semester?”,
“Does Bob meets the pre-requisite for 2910?”
Note that this last query is a bit different, as it forces us to look up information in multiple relations.
We should also be able to perform updates, removal, addition of records in an efficient way (using auxiliary files (indexes), optimization).
Finally, selection (for any operation) requires care: do we want all the records, some of them, exactly one?
Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:
In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.
There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.
What is the difference between a database and the meta-data of the database?
Is a pile of trash a database? Why, or why not?
Define the word “miniworld”.
Expand the acronym “DBMS”.
Name two DBMS.
Name the four different kinds of action that can be performed on data.
Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:
Task | Assigned to |
---|---|
Install a DBMS on a server. | |
Sketch the schema so that the data will not be redundant. | |
Write client-side application that uses the DBMS API. | |
Establish the purpose of the database. |
List some of the tasks assigned to the Database Administrator.
Why do DBMS include concurrency control?
Do I have to change my DBMS if I want to change the structure of my data?
What is independence between program and data? Why does it matter?
Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?
Which one comes first, the physical design, the conceptual design, or the logical design?
What is a virtual data? How can I access it?
The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.
No, because it lacks a logical structure.
The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.
Database Management System
Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL”, “NoSQL”, “Relational Model”, or such: we are asking for the names of actual softwares!
The four actions are:
We can have something like:
Task | Assigned to |
---|---|
Install a DBMS on a server. | Administrator, IT service |
Sketch the schema so that the data will not be redundant. | Designer |
Write client-side application that uses the DBMS API. | Programmer, Developer |
Establish the purpose of the database. | Client, business owner |
The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.
DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.
Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model”, e.g., one of the NoSQL models, can require to change the DBMS.
The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.
If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.
The conceptual design.
It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.
CAMPUS
)Define a CAMPUS
database organized into three files as follows:
BUILDING
file storing the name and GPS coordinates of each building.ROOM
file storing the building, number and floor of each room.PROF
file storing the name, phone number, email and room number where the office is located for each professor.A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, fig. 1.3) or (Elmasri and Navathe 2015, fig. 1.3). Write the database catalog corresponding to the CAMPUS
database.
Invent data for such a database, with two buildings, three rooms and two professors.
Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS
database, which is an up-to-date and accurate representation of its miniworld:
CAMPUS
)The database catalog should be similar to the following:
RELATIONS
Relation name | Number of columns |
---|---|
BUILDING |
3 |
ROOM |
3 |
PROF |
4 |
COLUMNS
Column name | Datatype | Belongs to relation |
---|---|---|
Building_Name | Character(30) | Building |
GPSLat | Decimal(9,6) | Building |
GPSLon | Decimal(9,6) | Building |
Building_Name | Character(30) | ROOM |
Room_Number | Integer(1) | ROOM |
Floor | Integer (1) | ROOM |
Prof_Name | Character (30) | PROF |
Phone | Integer (10) | PROF |
Character (30) | PROF |
|
Room_Number | Integer (1) | PROF |
For the data, you could have:
BUILDING
file, we could have:
(Allgood Hall, 33.47520, -82.02503) (Institut Galilé, 48.959001, 2.339999)
ROOM
file, we could have:
(Allgood Hall, 128, 1)
(Institut Galilé, 205, 3) (Allgood Hall, 228, 2)
PROF
file, we could have:
(Aubert, 839401, dae@ipn.net, 128) (Mazza, 938130, Dm@fai.net, 205)
If everything we knew about the campus came from that database, then
The relational data model (or relational database schema) is:
List_of_major
as an enumerated data type, instead of just String
), enforce some constraints (i.e., UNIQUE
, to force all the values to be different), or even have a default value.NULL
value.NULL
is N/A, unknown, unavailable (or withheld).We now study constraints on the tuples. There are constraints on the scheme, for instance, “a relation cannot have two attributes with the same name”, but we studied those already. The goal of those constraints is to maintain the validity of the relations, and to enforce particular connexions between relations.
Those are part of the definition of the relational model and are independent of the particular relation we are looking at.
Those constraints are parts of the schema.
NOT NULL
, UNIQUE
).NULL
5.Those last two constraints will be studied in the next section.
Constraints that cannot be expressed in the schema, and hence must be enforced by
Examples: “the age of an employee must be greater than 16”, “this year’s salary increase must be more than last year’s”.
Since we can not have two identical tuples in the same relation, there must be a subset of values that distinguish them. We study the corresponding subset of attributes.
Let us consider the following example:
A | B | C | D |
---|---|---|---|
Yellow | Square | 10 | (5, 3) |
Blue | Rectangle | 10 | (3, 9) |
Blue | Circle | 9 | (4, 6) |
and the following sets of attributes:
{A, B, C, D} | {A} | {B, C} | {D} | |
---|---|---|---|---|
Superkey ? | ✔ | ✘ | ✔ | ✔ |
Key ? | ✘ | ✘ | ✘ | ✔ |
Note that here we “retro-fit” those definitions, in database design, they come first (i.e., you define what attributes should always distinguish between tuples before populating your database). We are making the assumption that the data pre-exist to the specification to make the concept clearer.
A foreign key (FK) is a set of attributes whose values must match the value in a tuple in another, pre-defined relation. Formally, the set of attributes FK in the relation schema R1 is a foreign key of R1 (“referencing relation”) that references R2 (“referenced relation”) if
NULL
If there is a foreign key from R1 to R2, then we say that there is a referential integrity constraint from R1 to R2. We draw it with an arrow from the FK to the PK. Note that it is possible that R1 = R2.
NULL
. Note also that all the values must be different, as the same value cannot occur twice as the primary key of tuples: we don’t want to enter the same VIN twice, that would mean we are registering a car that was already registered in our database!NULL
. Furthermore, their pair must be different from all the other values. Stated differently, you can have <GA, 1234>
, <GA, 0000>
and <NC, 1234>
as values for the <State, Licence-num> pair, even if they have one element in common, what is forbidden is to have both element in common (i.e., you cannot have <GA, 1234>
twice). If both elements were common, that would mean that we are registering a driver that was already in the database.NULL
(which it could be), then it has to be a value that occurs as the VIN value of some tuple in the CAR relation. For the Insured-Driver-State and Insured-Driver-Licence-Num, the situation is similar: they must either both be NULL
, or be values that occurs paired together as the values for State and Licence-Num in a tuple in the CAR relationship. If e.g. Insured-Car was containing the VIN of a car not in the CAR relation, that would mean we are trying to insure a car that is “not known” from the database’s perspective, something we certainly want to avoid.The operations you can perform on your data are of two kinds: retrievals and updates.
They are two constraints for updates:
A transaction is a series of retrievals and updates performed by an application program, that leaves the database in a consistent state.
In the following, we give examples of insertion, deletion and update that could be performed, as well as how they could lead a database to become inconsistent. The annotations (1.), (2.) and (3.) refer to the “remedies”, discussed afterward.
Insert <109920, Honda, Accord, 2012> into CAR
How things can go wrong:
NULL
for any value of the attributes of the primary key (1.)Delete the DRIVER tuple with State = GA and Licence_number = 123
How things can go wrong:
Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges
How things can go wrong:
NULL
for the any value of the attributes of the primary key (1.)When the operation leads the database to become inconsistent, you can either:
NULL
, the corresponding value(s).What are the meta-data and the data called in the relational model?
Connect the dots:
Row • | • Attribute | |
Column header • | • Tuple | |
Table • | • Relation |
What do we call the number of attributes in a relation?
At the logical level, does the order of the tuples in a relation matter?
What is the difference between a database schema and a database state?
What should we put as a value in an attribute if its value is unknown?
What, if any, is the difference between a superkey, a key, and a primary key?
Name the two kinds of integrity that must be respected by the tuples in a relation.
What is entity integrity? Why is it useful?
Are we violating an integrity constraint if we try to set the value of an attribute that is part of a primary key to NULL
? If yes, which one?
If in a relation R1, an attribute A1 is a foreign key referencing an attribute A2 in a relation R2, what does this implies about A2?
Give three examples of operations.
What is the difference between an operation and a transaction?
Consider the following two relations:
COMPUTER(Owner, RAM, Year, Brand)
OS(Name, Version, Architecture)
For each, give
Give three different ways to deal with operations whose execution in isolation would result in the violation of one of the constraint.
Define what is the domain constraint.
Circle the correct statements:
Consider the following three relations:
For each relation, answer the following:
Consider the following three relations
What are the foreign keys in the ASSIGNED-TO relation? What are they refering?
In the ASSIGNED-TO relation, explain why the Date attribute is part of the primary key. What would happen if it was not?
Assuming the database is empty, are the following instructions valid? If not, what integrity constraint are they violating?
Insert <'AM-356', 'Surfliner', 2012> into TRAIN
Insert <NULL, 'Graham Palmer', 'Senior'> into CONDUCTOR
Insert <'XB-124', 'GPalmer', '02/04/2018'> into ASSIGNED-TO
Insert <'BTed, 'Bobby Ted', 'Senior'> and <'BTed', 'Bobby Ted Jr.', 'Junior'> into CONDUCTOR
Consider the following relation schema and state:
A | B | C | D |
---|---|---|---|
2 | Blue | Austin | true |
1 | Yellow | Paris | true |
1 | Purple | Pisa | false |
2 | Yellow | Augusta | true |
Assuming that this is all the data we will ever have, discuss whenever {A, B, C, D}, {A, B} and {B} are superkeys and/or keys.
Consider the following relation and possible state. Assuming that this is all the data we will ever have, give two superkeys, and one key, for this relation.
A | B | C | D |
---|---|---|---|
1 | Austin | true |
Shelly |
1 | Paris | true |
Cheryl |
3 | Pisa | false |
Sheila |
1 | Augusta | true |
Ash |
1 | Pisa | true |
Linda |
Consider the following relation and possible state. Assuming that this is all the data we will ever have, give three superkeys for this relation, and, for each of them, indicate if they are a key as well.
A | B | C | D |
---|---|---|---|
1 | A | Austin | true |
2 | B | Paris | true |
1 | C | Pisa | false |
2 | C | Augusta | true |
1 | B | Augusta | true |
Consider the following two relations:
INSERT
and one UPDATE
instruction. Both should violate the integrity of your database.Consider the following two relations:
The meta-data is called the schema, and the data is called the relation state. You can refer to the diagram we studied at the beginnig of the Chapter for a reminder.
Row is Tuple, Column header is Attribute, Table is Relation.
The degree, or arity, of the relation.
No, it is a set.
The schema is the organization of the database (the meta-data), while the state is the state is the content of the database (the data).
NULL
A superkey is a subset of attributes such that no two tuples have the same combination of values for all those attributes. A key is a minimal superkey, i.e., a superkey from which we cannot remove any attribute without losing the uniqueness constraint. The primary key is one of the candidate key, i.e., the key that was chosen.
Referential integrity and entity integrity.
Entity integrity ensures that each row of a table has a unique and non-null primary key value. It allows to make sure that every tuple is different from the others, and helps to “pick” elements in the database.
Yes, the entity integrity constraint.
Then we know that A2 is the primary key of R2, and that A1 and A2 have the same domain.
Reading from the database, performing UPDATE
or DELETE
operations.
An operation is an “atomic action” that can be performed on the database (adding an element, updating a value, removing an element, etc.). A transaction is a series of such operations, and the assumption is that, even if it can be made of operations that, taken individually, could violate a constraint, the overall transaction will leave the database in a consistent state.
An operation whose execution in isolation would result in the violation of a constraint can either a) be “restricted” (i.e., not executed), b) result in a propagation (i.e., the tuples that would violate a constraint are updated or deleted accordingly), or c) result in some values in tuples that would violate a constraint to be set to a default value, or the NULL
value (this last option works only if the constraint violated is the referential entity constraint).
The requirement that each tuple must have for an attribute A an atomic value from the domain dom(A), or NULL
.
“Every key is a superkey.”, “Every primary key is a key.” and “Every superkey with one element is a key.” are correct statements.
To answer 1 and 2, the diagram would become:
For the last question, the answer is yes: based on the ISSN of the book, we can retrieve the author of the book. Hence, knowing which book was awarded which year, by looking in the GAINED-AWARD table, gives us the answer to that question.
NULL
can be given as a value to an attribute that is part of the PK.'XB-124
and 'GPalmer'
are not values in TRAIN.Ref
and CONDUCTOR.CompanyID
.INSERT <"A.H.", NULL>
would violate the requirement not to have two tuples with the same value for the attributes that constitute the primary key in the BUILDING relation. UPDATE ROOM with CODE = 12 to Building = "G.C.C."
would create an entry referencing a name in the BUILDING relation that does not exist.
Consider the relation representing classes taught in a university:
CLASS(Major, Number, Section, Instructor, Term, Year, Time, Weekdays, Room)
The goal is to be able to have multiple offerings (classes) of courses over several semesters. Here are some examples of values for the attributes:
Attribute | Possible Value |
---|---|
Major | CSCI, AIST, CYBER, HIST, … |
Number | 1301, 3401, 1201, … |
Section | A, B, C, … |
Instructor | John Smith, Sophie Adams, … |
Term | Spring, Fall, … |
Year | 1990, 2010, … |
Time | 1400, 1230, 0900, … |
Weekdays | M, MW, MWF, … |
Room | UH 120, GCC 3014, … |
List three possible candidate keys and describe under what conditions each candidate key would be valid.
A cinema company wants you to design a relational model for the following set-up:
Propose a relational model for the following situation:
Propose a relational model for the following situation:
We want to design a relational model for an auction website. Members (that can be buyers, sellers, both or neither) can participate in the sale of items.
When creating your schema, do not add any new information, and try as much as possible to avoid relations that will create redundant data and NULL
entries. Note that we should be able to uniquely determine the member account linked to the seller account, and similarly for buyers accounts. Furthermore, members can have at most one buyer and one seller account.
We want to design a relational model for an animal shelter, with three goals in mind: to keep track of the pets currently sheltered, of the veterinarian for each type of pet, and of each pet’s favorite toy (needed during a visit to the veterinarian!).
Follow the specification below:
When creating your schema (that you can draw at the back of previous page), do not add any new information (except possibly “id” attributes), and try as much as possible to avoid relations that will create redundant data and NULL
entries. Identify the primary key for each relation that you create. When you are done, answer the true / false question below.
With your model … | Yes | No |
---|---|---|
…it is possible to determine which pet don’t have a favorite toy. | ||
…it is possible to determine what is the average stay in the shelter. | ||
…it is possible to determine if a pet’s favorite toy is best suited for their type. | ||
…it is possible for multiple types of animal to have the same veterinarian. | ||
…it is possible for multiple veterinarians to be attributed to the same type. |
We discuss four possible choices:
All in all, {Major, Number, Term, Year, Section} seems like the safest choice.
A possible solution is:
Be careful: saying that a bill has a unique sponsor does not imply that a the sponsor is a good primary key for the bills: a house member could very well be the sponsor of multiple bills! It just implies that a single attribute is enough to hold the name of the sponsor.
For simplicity, we added an ID
to our MEMBER
and BILL
relations. Note that having a “role” in the MEMBER
relation to store the information about speaker, etc., would be extremely inefficient, since we would add an attribute to the ~435 members that would be NULL
in ~430 of them.
A possible solution follows. The part that is the hardest to accomodate is the fact that a course can have multiple codes. We are reading here “cross-listed” as “a course that is offered under more than one departmental heading and can receive different codes (e.g., CSCI XXXX and AIST YYYY)”.
A possible solution follows.
In this model,
…it is possible to determine which pet don’t have a favorite toy.
…it is not possible to determine what is the average stay in the shelter, because their exit date is not stored.
…it is possible to determine if a pet’s favorite toy is best suited for their type.
…it is possible for multiple types of animal to have the same veterinarian, as the same value for “Veterinarian” could occur in multiple tuples in the TYPE relation. If both “Veterinarian” and “Name” were parts of the primary key, then that would not be the case.
…it is not possible for multiple veterinarians to be attributed to the same type, as the name of the type is the primary key in the TYPE relation.
SQL
, but none of its implementation.This chapter will be “code-driven”: the code will illustrate and help you understand some concepts. You may want to have a look at the “Setting Up Your Work Environment” Section as early as possible in this lecture. On top of being a step-by-step guide to install and configure a relational database managment system, it contains a list of useful links.
“Common” / Relational | SQL |
---|---|
“Set of databases” | Catalog (named collection of schema)7 |
“Database” | Schema |
Relation | Table |
Tuple | Row |
Attribute | Column, or Field |
A schema is made of
Type and domains are two different things in some implementations, cf. for instance PostgreSQL, where a domain is defined to be essentially a datatype with constraint.8
SQL
is a programming language: it has a strict syntax, sometimes cryptic error messages, it evolves, etc. Some of its salient aspects are:
SQL
is “kind of” case-insensitive9, does not care about spaces and new lines.--
, multi-line comments uses /* …*/
.;
.The following is an adaptation of w3resource.com, the canonical source being MySQL’s documentation:
INTEGER
(or its short-hand notation INT
) or SMALLINT
.FLOAT
and DOUBLE
(or its synonym, REAL
). MySQL also allows the syntax FLOAT(M,D)
or REAL(M,D)
, where the values can be stored up to M
digits in total where D
represents the decimal point.DECIMAL(10, 2)
(or its synonym in MySQL NUMERIC
).CHAR
and VARCHAR
: the length (resp. maximal length) of the CHAR
(resp. VARCHAR
) has to be declared, and CHAR
are right-padded with spaces to the specified length. Historically, 255 was the size used, because it is the largest number of characters that can be counted with an 8-bit number, but, whenever possible, the “right size” should be used.BIT(1)
, and a boolean using BOOLEAN
(or BOOL
, both actually being aliases for TINYINT(1)
).DATE
, TIME
, DATETIME
and TIMESTAMP
(which convert the current day / time to from the current time zone to UTC).There are many other datatypes, but they really depends on the particular implementation, so we will not consider them too much.
/* code/sql/HW_Faculty.sql */
-- We first drop the schema if it already exists:
DROP SCHEMA IF EXISTS HW_Faculty;
-- Then we create the schema:
CREATE SCHEMA HW_Faculty;
/*
Or we could have use the syntax:
CREATE DATABASE HW_FACUTLY;
*/
-- Now, let us create a table in it:
CREATE TABLE HW_Faculty.PROF (
Fname VARCHAR(15),
/*
No String!
The value "15" vas picked randomly, any value below 255 would
more or less do the same. Note that declaring extremely large
values without using them can impact the performance of
your database, cf. for instance https://dba.stackexchange.com/a/162117/
*/
Room INT,
/*
shorthand for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC
The "REAL" datatype is like the "DOUBLE" datatype of C# (they are actually synonyms in SQL):
more precise than the "FLOAT" datatype, but not as exact as the "NUMERIC" datatype.
cf. https://dev.mysql.com/doc/refman/8.0/en/numeric-types.html
*/
Title CHAR(3),
-- fixed-length string, padded with blanks if needed
Tenured BIT(1),
Nice BOOLEAN,
-- True / False (= 0) / Unknown
Hiring DATE,
/*
The DATE is always supposed to be entered in a YEAR/MONTH/DAY variation.
To tune the way it will be displayed, you can use the "DATE_FORMAT" function
(cf. https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format),
but you can enter those values only using the "standard" literals
(cf. https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html )
*/
Last_seen TIME,
FavoriteFruit ENUM ('apple', 'orange', 'pear'),
PRIMARY KEY (Fname, Hiring)
);
/*
Or, instead of using the fully qualified name HW_Faculty.PROF,
we could have done:
USE HW_Faculty;
CREATE TABLE PROF(…)
*/
-- Let us use this schema, from now on.
USE HW_Faculty;
-- Let us insert some "Dummy" value in our table:
INSERT INTO PROF
VALUES (
"Clément", -- Or 'Clément'.
290,
'PhD',
0,
NULL,
'19940101', -- Or '940101', '1994-01-01', '94/01/01'
'090500', -- Or '09:05:00', '9:05:0', '9:5:0', '090500'
-- Note also the existence of DATETIME, with 'YYYY-MM-DD
-- HH:MM:SS'
'Apple' -- This is not case-sensitive, oddly enough.
);
The following commands are particularly useful. They allow you to get a sense of the current state of your databases.
In the following, <SchemaName>
should be substituted with an actual schema name.
SHOW SCHEMAS; -- List the schemas.
SHOW TABLES; -- List the tables in a schema.
DROP SCHEMA <SchemaName>; -- "Drop" (erase) SchemaName.
You can also use the variation
DROP SCHEMA IF EXISTS <SchemaName>;
that will not issue an error if <SchemaName>
does not exist.
In the following, <TableName>
should be substituted with an actual table name.
SHOW CREATE TABLE <TableName>-- Gives the command to "re-construct" TableName.
DESCRIBE <TableName>; -- Show the structure of TableName.
DROP TABLE <TableName>; -- "Drop" (erase) TableName.
Note that if the table <TableName>
you are trying to erase is referenced by other tables through foreign keys, you will obtain an error
1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails ERROR
you must delete first the table containing the foreign key, as by default this operation is restricted.
If you want to erase a table if it exists, you can use the variation
DROP TABLE IF EXISTS <TableName>;
that will not issue an error if <TableName>
does not exist.
SELECT * FROM <TableName> -- List all the rows in TableName.
SHOW WARNINGS; -- Show the content of the latest warning issued.
There are six different kind of constraints that one can add to an attribute:
NOT NULL
UNIQUE
DEFAULT
CHECK
We already know the first two from the relational model. The other four are new, and could not be described in this model.
We will review them below, and show how they can be specified at the time the table is declared, or added and removed later. For more in-depth examples, you can refer to https://www.w3resource.com/mysql/creating-table-advance/constraint.php.
Note that all of them but DEFAULT
are indeed, constraints, as they prevent the user from inserting some data (i.e. you can not insert NULL
if the attribute has the constraint NOT NULL
). DEFAULT
is not a constraint in that sense, as it does not prevent some data from being inserted, but it is called a constraint nevertheless. We will see another example of such “helper” qualification with AUTO-INCREMENT
.
We will now see how to declare those constraints when we create the table (except for the foreign key, which we save for later).
/* code/sql/HW_ConstraintsPart1.sql */
DROP SCHEMA IF EXISTS HW_ConstraintsPart1;
CREATE SCHEMA HW_ConstraintsPart1;
USE HW_ConstraintsPart1;
CREATE TABLE HURRICANE (
Name VARCHAR(25) PRIMARY KEY,
WindSpeed INT DEFAULT 76 CHECK (WindSpeed > 74 AND
WindSpeed < 500),
-- 75mph is the minimum to be considered as a hurricane
-- cf. https://www.hwn.org/resources/bws.html
Above VARCHAR(25)
);
CREATE TABLE STATE (
Name VARCHAR(25) UNIQUE,
Postal_abbr CHAR(2) NOT NULL
);
If we wanted to combine multiple constraints, we could10, but we would have to follow the order described at https://dev.mysql.com/doc/refman/8.0/en/create-table.html, which is NOT NULL
, DEFAULT
, AUTO_INCREMENT
, UNIQUE
, PRIMARY KEY
, CHECK
(even if, in practise, derivation from this order is oftentimes accepted by DBMSes).
MySQL
can output a description of those tables for us:
> DESCRIBE HURRICANE;
MariaDB [HW_ConstraintsPart1]+-----------+-------------+------+-----+---------+-------+
Field | Type | Null | Key | Default | Extra |
| +-----------+-------------+------+-----+---------+-------+
varchar(25) | NO | PRI | NULL | |
| Name | int(11) | YES | | 76 | |
| WindSpeed | varchar(25) | YES | | NULL | |
| Above | +-----------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
> DESCRIBE STATE;
MariaDB [HW_ConstraintsPart1]+-------------+-------------+------+-----+---------+-------+
Field | Type | Null | Key | Default | Extra |
| +-------------+-------------+------+-----+---------+-------+
varchar(25) | NO | PRI | NULL | |
| Name | char(2) | NO | UNI | NULL | |
| Postal_abbr | +-------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Note that more than one attribute can be the primary key, in which case the syntax needs to be something like the following:
/* code/sql/HW_PKtest.sql */
DROP SCHEMA IF EXISTS HW_PKtest;
CREATE SCHEMA HW_PKtest;
USE HW_PKtest;
CREATE TABLE TEST (
A INT,
B INT,
PRIMARY KEY (A, B)
);
Note that in this case, a statement like
INSERT INTO TEST VALUE (1, NULL);
would result in an error: all the values that are part of the primary key needs to be non-NULL
.
For the UNIQUE
constraint, note that NULL
can be inserted: the rationale is that all the values need to be different from one another or NULL
.
A couple of comments about the CHECK
constraint:
WindSpeed INT CHECK (WindSpeed > 74 AND WindSpeed < 500)
would have been parsed but would not have any effect, cf. https://mariadb.com/kb/en/constraint/#check-constraints. Since MariaDB 10.2.1, the CHECK
constraint are enforced.CHECK
constraint, with a command likeINSERT INTO HURRICANE VALUES ("Test1", 12, NULL);
then the insertion would not take place, and the system would issue an error message:
4025 (23000): CONSTRAINT `HURRICANE.WindSpeed` failed for `HW_ConstraintsPart1]>`.`HURRICANE` ERROR
To use the DEFAULT
value, use
INSERT INTO HURRICANE VALUES ("Test2", DEFAULT, NULL);
Note that, by default, the DEFAULT
value is NULL
, regardless of the datatype. You can experiment it by running the following code:
Let us know pretend that we want to edit some attributes, by either adding or removing constraints. SQL
’s syntax is a bit inconsistent on this topic, because it treats the constraints as being of different natures.
Adding a primary key:
ALTER TABLE STATE ADD PRIMARY KEY (Name);
Removing the primary key:
ALTER TABLE STATE DROP PRIMARY KEY;
UNIQUE
ConstraintAdding a UNIQUE
constraint:
ALTER TABLE STATE ADD UNIQUE (Postal_abbr);
Removing a UNIQUE constraint:
ALTER TABLE STATE DROP INDEX Postal_abbr;
Note the difference between adding and removing the UNIQUE
constraint: the parenthesis around (Postal_abbr)
are mandatory when adding the constraint, but would cause an error when removing it!
NOT NULL
ConstraintAdding the NOT NULL
constraint:
ALTER TABLE STATE MODIFY Postal_abbr CHAR(2) NOT NULL;
Removing the NOT NULL
constraint:
ALTER TABLE STATE MODIFY Postal_abbr CHAR(2);
The syntax of NOT NULL
comes from the fact that this constraint is taken to be part of the datatype.
Changing the default value:
ALTER TABLE HURRICANE ALTER COLUMN WindSpeed SET DEFAULT 74;
Removing the default value:
ALTER TABLE HURRICANE ALTER COLUMN WindSpeed DROP DEFAULT;
Note that if you change the default value, it does not change the values you inserted retro-actively. To resume on our previous example, the values inserted with DEFAULT
as a value would still be NULL
even after executing the following instruction:
Adding a foreign key constraint:
ALTER TABLE HURRICANE ADD FOREIGN KEY (Above) REFERENCES STATE(Name);
Removing a foreign key constraint is out of the scope of this lecture. If you are curious, you can have a look at https://www.w3schools.com/sql/sql_foreignkey.asp: dropping a foreign key constraint requires your constraint to have a name, something we did not introduce.
Two important remarks:
Refer to Problem 3.4 (Constraints on foreign keys) for a slightly more accurate picture of the constraints related to the creation of foreign keys. Note that a foreign key could be declared at the time of creation of the table as well, using the syntax we will introduce below.
Let us test our constraints:
INSERT INTO STATE VALUES('Georgia', 'GA');
INSERT INTO STATE VALUES('Texas', 'TX');
INSERT INTO STATE VALUES('FLORIDA', 'FL');
UPDATE STATE SET Name = 'Florida'
WHERE Postal_abbr = 'FL';
-- There's an error with the following request. Why?
INSERT INTO HURRICANE VALUES('Irma', 150, 'FL');
/*
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_ConstraintsPart1`.`HURRICANE`, CONSTRAINT `HURRICANE_ibfk_1` FOREIGN KEY (`Above`) REFERENCES `STATE` (`Name`))
*/
INSERT INTO HURRICANE VALUES('Harvey', DEFAULT, 'Texas');
INSERT INTO HURRICANE VALUES('Irma', 150, 'Florida');
DELETE FROM HURRICANE
WHERE Name = 'Irma';
INSERT INTO HURRICANE VALUES('Irma', 150, 'Georgia');
UPDATE HURRICANE SET Above = 'Georgia'
WHERE Name = 'Irma';
/*
MariaDB [HW_ConstraintsPart1]> SELECT * FROM HURRICANE;
+--------+-----------+---------+
| Name | WindSpeed | Above |
+--------+-----------+---------+
| Harvey | 74 | Texas |
| Irma | 150 | Georgia |
+--------+-----------+---------+
*/
-- There's an error with the following request. Why?
UPDATE HURRICANE SET Above = 'North Carolina'
WHERE Name = 'Irma';
-- Let's patch it, by adding North Carolina to our STATE table.
INSERT INTO STATE VALUES('North Carolina', 'NC');
UPDATE HURRICANE SET Above = 'North Carolina'
WHERE Name = 'Irma';
Let us come back more specifically to foreign key.
In the example below, we introduce the foreign key update and delete rules. We also introduce, passing by, the enumerated data type, and how to edit it.
CREATE TABLE STORM (
NAME VARCHAR(25) PRIMARY KEY,
Kind ENUM ("Tropical
Storm", "Hurricane"),
WindSpeed INT,
Creation DATE
);
-- We can change the enumerated datatype:
ALTER TABLE STORM MODIFY Kind ENUM ("Tropical Storm",
"Hurricane", "Typhoon");
CREATE TABLE STATE (
NAME VARCHAR(25) UNIQUE,
Postal_abbr CHAR(2) PRIMARY KEY,
Affected_by VARCHAR(25),
FOREIGN KEY (Affected_by) REFERENCES STORM (NAME) ON
DELETE SET NULL ON UPDATE CASCADE
);
Note that we can “inline” the foreign key constraint like we “inlined” the primary key constraint (cf. https://stackoverflow.com/q/24313143/), but that it will not be enforced!
Let us now illustrate this table by introducing some data in it:
INSERT INTO STORM
VALUES (
"Harvey",
"Hurricane",
130,
"2017-08-17");
-- In the following, the entry gets created, but date is
-- "corrected" to "2017-17-08"!
-- INSERT INTO STORM
-- VALUES ("Dummy", "Hurricane", 120,
-- "2017-17-08");
-- The error message returned is
-- ERROR 1292 (22007) at line 34:
-- Incorrect
-- date
-- value:
-- "2017-17-08" for column
-- `HW_STORM`.`STORM`.`Creation`
-- at
-- row 1
-- In the following, we explicitely use
-- "DATE",
-- and
-- since
-- the date is incorrect, nothing gets
-- inserted.
-- INSERT INTO STORM
-- VALUES ("Dummy2", "Hurricane", 120,
-- DATE
-- "2017-17-08");
-- ERROR 1525 (HY000) at line 40:
-- Incorrect
-- DATE
-- value:
-- "2017-17-08"
-- The next one sets NULL for DATE.
INSERT INTO STORM
VALUES (
"Irma",
"Tropical Storm",
102,
DEFAULT);
MySQL will always notify you if there is an error in a date attribute when you use the DATE
prefix.
INSERT INTO STATE
VALUES (
"Georgia",
"GA",
NULL);
INSERT INTO STATE
VALUES (
"Texas",
"TX",
NULL);
INSERT INTO STATE
VALUES (
"Florida",
"FL",
NULL);
-- This instruction is not using the primary key, is that a
-- problem?
UPDATE
STATE
SET Affected_by = "Harvey"
WHERE Name = "Georgia";
UPDATE
STORM
SET Name = "Harley"
WHERE Name = "Harvey";
DELETE FROM STORM
WHERE Name = "Harley";
We will see in the “Reverse-Engineering” section why this schema is poorly designed, but for now, let’s focus on the foreign keys and their restrictions.
The following is a code-driven explanation of the foreign key update and delete rules (or “restrictions”). It is intended to make you understand the default behavior of foreig keys, and to understand how the system reacts to the possible restrictions.
CREATE TABLE F_Key(
VARCHAR(25) PRIMARY KEY
Attribute
);
CREATE TABLE Table_default(
VARCHAR(25) PRIMARY KEY,
Attribute1 VARCHAR(25),
Attribute2 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
);
-- By default, this foreign key will restrict.
CREATE TABLE Table_restrict(
VARCHAR(25) PRIMARY KEY,
Attribute1 VARCHAR(25),
Attribute2 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
ON DELETE RESTRICT
ON UPDATE RESTRICT
);
CREATE TABLE Table_cascade(
VARCHAR(25) PRIMARY KEY,
Attribute1 VARCHAR(25),
Attribute2 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
ON DELETE CASCADE
ON UPDATE CASCADE
);
CREATE TABLE Table_set_null(
VARCHAR(25) PRIMARY KEY,
Attribute1 VARCHAR(25),
Attribute2 FOREIGN KEY (Attribute2) REFERENCES F_Key(Attribute)
ON DELETE SET NULL
ON UPDATE SET NULL
);
/*
* You might encounter a
* ON UPDATE SET DEFAULT
* but this reference option (cf. https://mariadb.com/kb/en/library/foreign-keys/ )
* worked only with a particular engine ( https://mariadb.com/kb/en/library/about-pbxt/ )
* and will not be treated here.
*/
INSERT INTO F_Key VALUES('First Test');
INSERT INTO Table_default VALUES('Default', 'First Test');
INSERT INTO Table_restrict VALUES('Restrict', 'First Test');
INSERT INTO Table_cascade VALUES('Cascade', 'First Test');
INSERT INTO Table_set_null VALUES('Set null', 'First Test');
SELECT * FROM Table_default;
SELECT * FROM Table_restrict;
SELECT * FROM Table_cascade;
SELECT * FROM Table_set_null;
-- The following will fail because of the Table_default table:
UPDATE F_Key SET Attribute = 'After Update'
WHERE Attribute = 'First Test';
DELETE FROM F_Key
WHERE Attribute = 'First Test';
-- Let us drop this table, and try again.
DROP TABLE Table_default;
-- The following fails too, this time because of the Table_restrict table:
UPDATE F_Key SET Attribute = 'After Update'
WHERE Attribute = 'First Test';
DELETE FROM F_Key
WHERE Attribute = 'First Test';
-- Let us drop this table, and try again.
DROP TABLE Table_restrict;
-- Let's try again:
UPDATE F_Key SET Attribute = 'After Update' WHERE Attribute = 'First Test';
-- And let's print the situation after this update:
SELECT * FROM Table_cascade;
SELECT * FROM Table_set_null;
/*
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade;
+------------+--------------+
| Attribute1 | Attribute2 |
+------------+--------------+
| Cascade | After Update |
+------------+--------------+
1 row in set (0.00 sec)
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null;
+------------+------------+
| Attribute1 | Attribute2 |
+------------+------------+
| Set null | NULL |
+------------+------------+
1 row in set (0.00 sec)
*/
-- Let's make a second test.
INSERT INTO F_Key VALUES('Second Test');
INSERT INTO Table_cascade VALUES('Default', 'Second Test');
INSERT INTO Table_set_null VALUES('Restrict', 'Second Test');
DELETE FROM F_Key
WHERE Attribute = 'Second Test';
/*
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_cascade;
+------------+--------------+
| Attribute1 | Attribute2 |
+------------+--------------+
| Cascade | After Update |
+------------+--------------+
1 row in set (0.00 sec)
MariaDB [HW_CONSTRAINTS_PART3]> SELECT * FROM Table_set_null;
+------------+------------+
| Attribute1 | Attribute2 |
+------------+------------+
| Restrict | NULL |
| Set null | NULL |
+------------+------------+
2 rows in set (0.00 sec)
*/
Then note that we cannot create both tables as pictured directly, as PROF requires DEPARTMENT to exist, to have a foreign key referencing it, and similarly for DEPARTMENT: it is an egg and chicken situation! Hence, we have to first create a table without the foreign key, and then add it later on, as described below:
/* code/sql/HW_ProfExample.sql */
CREATE TABLE PROF (
Login VARCHAR(25) PRIMARY KEY,
NAME VARCHAR(25),
Department CHAR(5)
);
CREATE TABLE DEPARTMENT (
Code CHAR(5) PRIMARY KEY,
NAME VARCHAR(25),
Head VARCHAR(25),
FOREIGN KEY (Head) REFERENCES PROF (LOGIN) ON UPDATE CASCADE
);
ALTER TABLE PROF
ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT (Code);
Note the structure of the ALTER TABLE
command:
KEY Department REFERENCES Code;
⇒ errorKEY (Department) REFERENCES (Code);
⇒ errorKEY PROF(Department) REFERENCES DEPARTMENT(Code);
⇒ okCREATE TABLE STUDENT (
Login VARCHAR(25) PRIMARY KEY,
NAME VARCHAR(25),
Registered DATE,
Major CHAR(5),
FOREIGN KEY (Major) REFERENCES DEPARTMENT (Code)
);
CREATE TABLE GRADE (
Login VARCHAR(25),
Grade INT,
PRIMARY KEY (LOGIN, Grade),
FOREIGN KEY (LOGIN) REFERENCES STUDENT (LOGIN)
);
On a side note, note that we do not have the same difficulty when inserting a value in a table that contains a foreign key referencing itself: it is accepted to insert a value that is referencing itself, as illustrated below.
HW_FK_Self_Reference.sqlWe can insert multiple values at once:
HW_ProfExample.sqlWe can specify which attributes we are giving:
HW_ProfExample.sqlAnd we can even specify the order (even the trivial one):
INSERT INTO PROF (
LOGIN,
Department,
Name)
VALUES (
"caubert",
"CS",
"Clément Aubert");
INSERT INTO PROF (
LOGIN,
Name,
Department)
VALUES (
"aturing",
"Alan Turing",
"CS"),
(
"perdos",
"Paul
Erdős",
"MATH"),
(
"bgates",
"Bill Gates",
"CYBR");
INSERT INTO STUDENT (
LOGIN,
Name,
Registered,
Major)
VALUES (
"jrakesh",
"Jalal Rakesh",
DATE "2017-12-01",
"CS"),
(
"svlatka",
"Sacnite Vlatka",
"2015-03-12",
"MATH"),
(
"cjoella",
"Candice Joella",
"20120212",
"CYBR"),
(
"aalyx",
"Ava Alyx",
20121011,
"CYBR"),
(
"caubert",
"Clément Aubert",
NULL,
"CYBR");
INSERT INTO GRADE
VALUES (
"jrakesh",
3.8),
(
"svlatka",
2.5);
(Note the date literals)
By default, the values that are not given are set to their respective DEFAULT
values.
Note that we can create foreign keys to primary keys made of multiple attributes, and to the primary key of the table we are currently creating.
/* code/sql/HW_AdvancedFK.sql */
CREATE TABLE T1 (
A1 INT,
A2 INT,
B INT,
PRIMARY KEY (A1, A2)
);
CREATE TABLE T2 (
A1 INT,
A2 INT,
B1 INT PRIMARY KEY,
B2 INT,
-- We can create a "pair" of foreign key in one line, as
-- follows:
FOREIGN KEY (A1, A2) REFERENCES T1 (A1, A2),
-- We can create a foreign key that references the primary
-- key of the table we are currently creating, and name
-- it, as follows:
CONSTRAINT My_pk_to_T1 FOREIGN KEY (B2) REFERENCES T2 (B1)
);
In the example, we are also naming our foreign key. The benefit of naming our fk constraint is that, if we violate it, for instance with
INSERT INTO T2 VALUES (1, 1, 1, 3);
then the name of the constraint (here “My_pk_to_T1”) will be displayed in the error message:
add or update a child row: a foreign key constraint fails (`db_9_9837c1`.`t2`, CONSTRAINT
Cannot `My_pk_to_T1` FOREIGN KEY (`B2`) REFERENCES `t2`(`B1`))
Order of clauses does not matter, not even for optimization purpose.
UPDATE <table>
SET <attribute1> = <value1>, <attribute2> = <value2>, …
WHERE <condition>;
SELECT <attribute list, called projection attributes>
FROM <table list>
WHERE <condition>;
DELETE FROM <table list>
WHERE <condition>;
Conditions can
=
, equal to>
, greater than,<
less than,>=
greater than or equal to,<=
less than or equal to,<>
not equal to.condition1 AND condition2
condition1 OR condition2
NOT condition
LIKE
,\
,_
will match one character (any character), %
will match any number of character,REGEXP
keyword.SELECT LOGIN
FROM STUDENT;
UPDATE
DEPARTMENT
SET Head = "aturing"
WHERE Code = "MATH";
UPDATE
DEPARTMENT
SET Head = "bgates"
WHERE Code = "CS"
OR Code = "CYBR";
SELECT LOGIN
FROM STUDENT
WHERE NOT Major = "CYBR";
SELECT LOGIN,
Name
FROM PROF
WHERE Department = "CS";
SELECT LOGIN
FROM STUDENT
WHERE Major = "CYBR"
AND Registered > DATE "20121001";
SELECT LOGIN
FROM STUDENT
WHERE Name LIKE "Ava%";
SELECT Name
FROM PROF
WHERE LOGIN LIKE "_aubert";
Note that LIKE
is by default case-insensitive, both in MariaDB and in MySQL. The COLLATE
operator can be used to force the search to be case-sensitive, as well as LIKE BINARY
.
Cf. (Elmasri and Navathe 2010, 5.1.1), (Elmasri and Navathe 2015, 7.1.1)
The Boolean logic in SQL is three-valued: a statement can be true
, false
or unknown
. If you pick the following two commands:
/* code/sql/HW_DefaultTest.sql */
SELECT *
FROM TEST
WHERE TestA = "A";
SELECT *
FROM TEST
WHERE TestA <> "A";
you may believe that they will capture all the tuples in the TEST
table, as the value for TestA
is either "A"
or not "A"
, but you would be wrong. If the value of TestA
is NULL
, then both conditions would fail, as SQL cannot say that the value is or is not "A"
: it is simply undefined!
NULL
NULL
is
Unknown value (“Nobody knows”)
What is the date of birth of Jack the Ripper?
Unavailable / Withheld (“I do not have that information with me at the moment”)
What is the number of english spies in France?
What is the VIN of your car?
What is the identity of the Tiananmen Square person?
Not Applicable (“Your question does not make sense”)
What is the US SSN of a French person?
What is the email address of an author of the XIXth century?
If NULL
is involved in a comparison, the result evaluates to “Unknown.”
AND | T | F | U |
T | T | F | U |
F | F | F | F |
U | U | F | U |
OR | T | F | U |
T | T | T | T |
F | T | F | U |
U | T | U | U |
NOT | |
T | F |
F | T |
U | U |
You can test if a value is NULL
with IS NULL
.
INSERT INTO DEPARTMENT
VALUES (
"Hist",
"History",
NULL);
SELECT *
FROM DEPARTMENT
WHERE Head IS NULL;
SELECT *
FROM DEPARTMENT
WHERE Head IS NOT NULL;
SELECT COUNT(*)
FROM GRADE
WHERE Grade IS NULL;
Note that you can not use IS
to compare values: this key word is reserved to test if a value is (not) NULL
, and nothing else.
This means that if you want to capture all the tuples, you cannot write
/* code/sql/HW_DefaultTest.sql */
SELECT *
FROM TEST
WHERE TestA = "A";
SELECT *
FROM TEST
WHERE TestA <> "A";
but should have something like
HW_DefaultTest.sqlThere are no if…then…else
statements in SQL
, but you can do something similar with CASE
(cf. https://dev.mysql.com/doc/refman/8.0/en/case.html). However, note that SQL
is probably not the right place to try to control the flow of execution.
This probably depends on the system a lot, but one could wonder if MySQL uses some form of short-cut evaluation when comparing with NULL
. Unfortunately, even with three times (!) the verbose option, MySQL does not give more insight as to whenever it drops comparing values once a NULL was encountered (cf. https://dev.mysql.com/doc/refman/8.0/en/mysql-command-options.html#option_mysql_verbose, you can log-in using mysql -u testuser -p --password=password -v -v -v
to activate the most verbose mode). Normally, EXPLAIN
(https://dev.mysql.com/doc/refman/8.0/en/explain.html) should be useful in answering this question, but failed to answer it as well.
For DISTINCT
, ALL
and UNION
, cf. (Elmasri and Navathe 2010, 4.3.4) or (Elmasri and Navathe 2015, 6.3.4). For ORDER BY
, cf. (Elmasri and Navathe 2010, 4.3.6) or (Elmasri and Navathe 2015, 6.3.6). For aggregate functions, cf. (Elmasri and Navathe 2010, 5.1.7) or (Elmasri and Navathe 2015, 7.1.7).
AUTO_INCREMENT
Something that is not exactly a constraint, but that can be used to “qualify” domains, is the AUTO_INCREMENT
feature of MySQL. Cf. https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html, you can have MySQL increment a particular attribute (most probably intended to be your primary key, or some form of counter) for you.
A typical example could be:
/* code/sql/HW_AutoIncrement.sql */
CREATE TABLE PERSON (
PersonID INT AUTO_INCREMENT,
Name VARCHAR(255),
PRIMARY KEY (PersonID)
);
INSERT INTO PERSON (
Name)
VALUES (
'Lars'),
(
'Kristina'),
(
'Sophie');
SELECT *
FROM PERSON;
This way, the burden of having to keep track of the persons’ ids is left to the program, and not to the person inserting data in the table.
We can save the current state, and start a series of transactions, with the command
START TRANSACTION;
All the commands that follows are “virtually” executed: you can undo them all using
ROLLBACK;
This puts you back in the state you were in before starting the transaction. If you want all the commands you typed in-between to be actually enforced, you can use the command
COMMIT;
Nested transactions are technically possible, but they are counter-intuitive and should be avoided, cf. https://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-2630-nested-transactions-are-real/.
DISTINCT
/ ALL
The result of a SELECT
query, for instance, is a table, and SQL
treats tables as multi-set, hence there can be repetitions in the result of a query, but we can remove them:
SELECT DISTINCT Major FROM STUDENT;
The default behaviour is equivalent to specifying ALL
, and it display the duplicates. In this case, it would be
> SELECT Major FROM STUDENT;
+-------+
| Major |+-------+
| CS |
| CYBR |
| CYBR |
| CYBR |
| MATH |+-------+
UNION
Set-theoretic operations are available as well. For instance, one can use:
SELECT Login FROM STUDENT) UNION (SELECT Login FROM PROF); (
to collect all the logins from both tables.
There is also INTERSECT
and EXCEPT
in the specification, but MySQL does not implement them (cf. https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems#Database_capabilities).
ORDER BY
You can have ORDER BY
specifications:
SELECT LOGIN
FROM GRADE
WHERE Grade > 2.0
ORDER BY Grade;
SELECT LOGIN
FROM GRADE
WHERE Grade > 2.0
ORDER BY Grade DESC;
SELECT LOGIN,
Major
FROM STUDENT
ORDER BY Major,
Name;
ORDER BY
order by ascending order by default.
You can use MAX
, SUM
, MIN
, AVG
, COUNT
to peform simple operations.
SELECT MAX(Registered) FROM STUDENT;
returns the “greatest” date of registration of a student, i.e., the date of the latest registration.
SELECT COUNT(Name) FROM STUDENT;
returns the number of names, i.e., the number of students.
SELECT COUNT(DISTINCT Name) FROM STUDENT;
returns the number of different names (which in this case is the same as the number of names, since we have no homonyms).
Note that AVG
returns the average of all non-NULL
values, as we can see on the following example:
/* code/sql/HW_Avg.sql */
CREATE TABLE TEST (
Test INT
);
INSERT INTO TEST
VALUES (
NULL),
(
0),
(
10);
SELECT AVG(Test)
FROM TEST;
-- Returns 5.0
The same goes for e.g. MAX:
HW_Max.sqlWe can use aliases for the columns. Compare
SELECT Login FROM PROF;
+---------+
| Login |+---------+
| aturing |
| caubert |
| bgates |
| perdos |+---------+
with
SELECT Login AS "Username" FROM PROF;
+----------+
| Username |+----------+
| aturing |
| caubert |
| bgates |
| perdos |+----------+
Aliases can also be used on table names. Adiases for columns are a helpful way of describing the result of the query, while alias on table have a specific purpose that will be clearer as we study select-project-join queries.
For select-project-join, cf. (Elmasri and Navathe 2010, 4.3.1) or (Elmasri and Navathe 2015, 6.3.1). For aliases, cf. (Elmasri and Navathe 2010, 4.3.2) or (Elmasri and Navathe 2015, 6.3.2), For nested queries, cf. (Elmasri and Navathe 2010, 5.1.2) or (Elmasri and Navathe 2015, 7.1.2).
Department.Name = 'Mathematics'
is the selection conditionDepartment = Code
is the join condition, because it combines two tuples.Name
?Grade > 3.0
is the selection conditionSTUDENT.Login = GRADE.Login
is the join conditionWe can have two join conditions!
SELECT PROF.Name
FROM PROF,
DEPARTMENT,
STUDENT
WHERE STUDENT.Name = "Ava Alyx"
AND STUDENT.Major = DEPARTMENT.Code
AND DEPARTMENT.Head = PROF.Login;
Note that for the kind of join we are studiying (called “inner joins”), the order does not matter.
In Problem 3.3 (Duplicate rows in SQL
), we saw that SQL
was treating tables as multi-sets, i.e., repetitions are allowed. This can lead to strange behaviour when performing Select-Project-Join queries. Consider the following example:
We can use aliases on tables to shorten the previous query:
SELECT PROF.Name
FROM PROF,
DEPARTMENT,
STUDENT AS B
WHERE B.Name = "Ava Alyx"
AND B.Major = DEPARTMENT.Code
AND DEPARTMENT.Head = PROF.Login;
We can use multiple aliases to make it even shorter (but less readable):
SELECT A.Name
FROM PROF AS A,
DEPARTMENT AS B,
STUDENT AS C
WHERE C.Name = "Ava Alyx"
AND C.Major = B.Code
AND B.Head = A.Login;
For those two, aliases are convenient, but not required to write the query. In some cases, we cannot do without aliases. For instance if we want to compare two rows in the same table:
SELECT Other.Login
FROM GRADE AS Mine,
GRADE AS Other
WHERE Mine.Login = "aalyx"
AND Mine.Grade < Other.Grade;
Generally, when you want to perform a join within the same table, then you have to “make two copies of the tables” and name them differently using aliases. Let us try to write a query that answers the question
What are the login of the professors that have the same department as the professor whose login is caubert?
We need a way of distinguising between the professors we are projecting on (the one whole login is caubert) and the one we are joining with (the ones that have the same department). This can be done using something like:
SELECT JOINT.Login
FROM PROF AS PROJECT,
PROF AS JOINT
WHERE PROJECT.Login = "caubert"
AND PROJECT.Department = JOINT.Department;
Note that we are “opening up two copies of the PROF tables”, and naming them differently (PROJECT
and JOINT
).
Another (improved) example of a similar query is
SELECT Fellow.Name AS "Fellow of Ava"
FROM STUDENT AS Ava,
STUDENT AS Fellow
WHERE Ava.Name = "Ava Alyx"
AND Fellow.Major = Ava.Major
AND NOT Fellow.Login = Ava.Login;
A couple of remarks about this query:
AS "Fellow of Ava"
is another kind of aliasing, mentioned in a previous section.NOT Fellow.Login = Ava.Login
guarantees that we will not select Ava again, and exclude her from the results (Ava is not supposed to be a fellow of herself).NOT Fellow.Name = Me.Name;
instead of NOT Fellow.Login = Ava.Login
would prevent the homonym from occuring in the results.AND NOT Me = Fellow
by NOT Fellow.Login = Ava.Login
would not work: you have to compare attributes of the tuples, not the tuples.Let us look at a first example
HW_ProfExample.sqlA nested query is made of an outer query (SELECT Login
…) and an inner query (SELECT AVG(Grade)
…). Note that the inner query does not terminate with a ;
.
Logical operators such as ALL
or IN
can be used in nested queries. To learn more about those operators, refer to https://www.w3schools.com/sql/sql_operators.asp.
An example could be
HW_ProfExample.sqlNote that
>=
, and not >
, since no grade is strictly greater than itself.IS NOT NULL
is needed: otherwise, if one of the grade is NULL
, then the comparison would yelds “unknown”, and no grade would be greater than all of the others.MAX
:Answering the question
What are the logins of the professors belonging to a department that is the major of at least one student whose name ends with an “a”?
–that sounds like the what would ask a police officer in a whodunit– could be answer using
HW_ProfExample.sqlFor this query, we could not use =
, since more than one major could be returned.
Furthermore, nested query that uses =
can often be rewritten without being nested. For instance,
becomes
HW_ProfExample.sqlConversly, you can sometimes write select-project-join as nested queries For instance,
HW_ProfExample.sqlbecomes
HW_ProfExample.sqlA “stored” procedure is a SQL
function statements that can take arguments and may be called from another part of your program. Stated differently, a procedure is a serie of statements stored in a schema, that can easily be executed repeatedly, cf. https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html or https://mariadb.com/kb/en/library/create-procedure/.
Imagine we have the following:
/* code/sql/HW_ProcedureExamples.sql */
CREATE TABLE STUDENT (
Login INT PRIMARY KEY,
NAME VARCHAR(30),
Major VARCHAR(30),
Email VARCHAR(30)
);
INSERT INTO STUDENT
VALUES (
123,
"Test A",
"CS",
"a@a.edu"),
(
124,
"Test B",
"IT",
"b@a.edu"),
(
125,
"Test C",
"CYBR",
"c@a.edu");
SQL
is extremely litteral: when it reads the delimiter ;
, it must execute the command that was shared. But a procedure, being composed of commands, will contain the ;
symbol. To “solve” this (weird) issue, and be able to define a procedure, we have to “temporarily alter the language”, using DELIMITER //
that makes the delimiter being //
instead of ;
11.
In any case, we can then define and execute a simpe procedure called STUDENTLIST
as follows:
DELIMITER $$
CREATE PROCEDURE STUDENTLIST ()
BEGIN
SELECT *
FROM STUDENT;
-- This ";" is not the end of the procedure definition!
END;
$$
-- This is the delimiter that marks the end of the procedure
-- definition.
DELIMITER ;
-- Now, we want ";" to be the "natural" delimiter again.
CALL STUDENTLIST ();
-- Now, we want ";" to be the "natural" delimiter again.
CALL STUDENTLIST ();
A procedure an also take arguments, and an example could be:
DELIMITER $$
CREATE PROCEDURE STUDENTLOGIN (
NameP VARCHAR(30)
)
BEGIN
SELECT LOGIN
FROM STUDENT
WHERE NameP = Name;
END;
$$
DELIMITER ;
SHOW CREATE PROCEDURE STUDENTLOGIN;
-- This display information about the procedure just created.
-- We can pass quite naturally an argument
-- to
-- our
-- procedure.
CALL STUDENTLOGIN ("Test A");
A trigger is a series of statements stored in a schema that can be automatically executed whenever a particular event in the schema occurs. Triggers are extremely powerfull, and are a way of automating part of the work in your database. In MariaDB, you could have the following program.
Imagine we have the following:
CREATE TABLE STUDENT (
Login VARCHAR(30) PRIMARY KEY,
Average FLOAT
);
CREATE TABLE GRADE (
Student VARCHAR(30),
Exam VARCHAR(30),
Grade INT,
PRIMARY KEY (Student, Exam),
FOREIGN KEY (Student) REFERENCES STUDENT (LOGIN)
);
We want to create a trigger that counts the number of times something was inserted in our STUDENT
table. SQL
supports some primitive form of variables (cf. https://dev.mysql.com/doc/refman/8.0/en/user-variables.html and https://mariadb.com/kb/en/library/user-defined-variables/). There is no “clear” form of type, https://dev.mysql.com/doc/refman/8.0/en/user-variables.html reads:
In addition, the default result type of a variable is based on its type at the beginning of the statement. This may have unintended effects if a variable holds a value of one type at the beginning of a statement in which it is also assigned a new value of a different type. To avoid problems with this behavior, either do not assign a value to and read the value of the same variable within a single statement, or else set the variable to 0, 0.0, or ’’ to define its type before you use it.
In other words, SQL
just “guess” the type of your value and go with it. Creating a simple trigger that increment a variable every time an insertion is performed in the STUDENT
table can be done as follows:
SET @number_of_student = 0;
CREATE TRIGGER NUMBER_OF_STUDENT_INC
AFTER INSERT ON STUDENT
FOR EACH ROW
SET @number_of_student = @number_of_student + 1;
Now, assume we want to create a trigger that calculates the average for us. Note that the trigger will need to manipulate two tables (STUDENT
and GRADE
) at the same time.
CREATE TRIGGER STUDENT_AVERAGE
AFTER INSERT ON GRADE
FOR EACH ROW -- Woh, a whole query inside our trigger!
UPDATE STUDENT
SET STUDENT.Average = (
SELECT AVG(Grade)
FROM GRADE
WHERE GRADE.Student = STUDENT.Login)
WHERE STUDENT.Login = NEW.Student;
-- The "NEW" keyword here refers to the "new" entry
-- that is being inserted by the INSERT
-- statement
-- triggering
-- the trigger.
The source code contains examples of insertion and explanations on how to witness the trigger in action.
This part is a short tutorial to install and configure a working relational DBMS. We will proceed in 5 steps:
You will install the MySQL DataBase Managment System, or its community-developed fork, MariaDB. Below are the instruction to install MySQL Community Edition on Windows 10 and macOS, and MariaDB on Linux-based distribution, but both are developped for every major operating system (macOS, Windows, Debian, Ubuntu, etc.): feel free to pick one or the other, it will not make a difference in this course (up to some minor aspects). MySQL is more common, MariaDB is growing, both are released under GNU General Public License, well-documented and free of charge for their “community” versions.
It is perfectly acceptable, and actually encouraged, to install MySQL or MariaDB on a virtual machine for this class. You can use the Windows Subsystem for Linux, VMware or Virtual Box to run a “sandboxed” environment that should keep your data isolated from our experimentations.
Below are precise and up-to-date instructions, follow them carefully, read the messages displayed on your screen, make sure a step was correctly executed before moving to the next one, and everything should be all right. Also, remember:
The following links could be useful:
Visit https://dev.mysql.com/downloads/installer/, click on “Download” next to
Windows (x86, 32-bit), MSI Installer XXX YYY (mysql-installer-web-community-XXX.msi)
where XXX is a number version (e.g., 8.0.13.0.), and YYY is the size of the file (e.g., 16.3M). On the next page, click on the (somewhat hidden) “No thanks, just start my download.” button.
Save the “mysql-installer-web-community-XXX.msi” file, and open it. If there is an updated version of the installer available, agree to download it. Accept the license term.
We will now install the various components needed for this class, leaving all the choices by defaults. This means that you need to do the following:
We now want to make sure that MySQL is running: launch Windows’ “Control Panel”, then click on “Administrative Tools”, and on “Services”. Look for “MySQLXX”, its status should be “Running”. If it is not, right-click on it and click on “Start”.
Open a command prompt (search for cmd
, or use PowerShell) and type
cd "C:\Program Files\MySQL\MySQL Server 8.0\bin"
If this command fails, it is probably because the version number changed: open the file explorer, go to C:\Program Files\MySQL\
, look for the right version number, and update the command accordingly.
Then, enter
-u root -p mysql
and enter the password you picked previously for the root account. You are now logged as root in your database management system, you should see a brief message, followed by a prompt
> mysql
Now, move on to “Creating a User”.
The instructions are almost the same as for Windows. Read https://dev.mysql.com/doc/refman/8.0/en/osx-installation-pkg.html and download the file from https://dev.mysql.com/downloads/mysql/ once you selected “macOS” as your operating system. Install it, leaving everything by default but adding a password (refer to the instructions for windows). Then, open a command-line interface (the terminal), enter
-u root -p mysql
and enter the password you picked previously for the root account. You are now logged as root in your database management system, you should see a brief message, followed by a prompt
> mysql
Now, move on to “Creating a User”.
Install, through your standard package management system (apt
or aptitude
for debian-based systems, pacman
for Arch Linux, etc.), the packages mysql-client
and mysql-server
(or default-mysql-client
and default-mysql-server
) as well as their dependencies12.
Open a terminal and type
/etc/init.d/mysql status
or, as root,
service mysql status
to see if MySQL is running: if you read something containing
Active: active (running)
then you can move on to the next step, otherwise run (as root)
service mysqld start
and try again.
As root, type in your terminal
mysql_secure_installation
You will be asked to provide the current password for the root MySQL user: this password has not be defined yet, so just hit “Enter”. You will be asked if you want to set a new password (that you can freely chose, just make sure to memorize it). Then, answer “n” to the question “Remove anonymous users?”, “Y” to “Disallow root login remotely?”, “n” to “Remove test database and access to it?” and finally “Y” to “Reload privilege tables now?”.
Still as root, type in your terminal
mysql -u root -p
and enter the password you picked previously for the root account. You are now logged as root in your database management system: you should see a brief message, followed by a prompt
MariaDB [(none)]>
Now, move on to “Creating a User”.
This step will create a non-root user13 and grant it some rights. Copy-and-paste or type the following three commands, one by one (that is, enter the first one, hit “enter”, enter the second, hit “enter”, etc.). This step will create a non-root user14 and grant it some rights. Copy-and-paste or type the following three commands, one by one (that is, enter the first one, hit “enter”, enter the second, hit “enter”, etc.).
We first create a new user called testuser
on our local installation, and give it the password password
:
CREATE USER 'testuser'@'localhost' IDENTIFIED BY 'password';
Then, we grant the user all the privileges on the databases whose name starts with HW_
:
GRANT ALL PRIVILEGES ON `HW\_%` . * TO 'testuser'@'localhost';
Be careful: backticks (`
) are surrounding HW\_%
whereas single quotes ('
) are surrounding testuser
and localhost
.
And then we quit the DBMS, using
EXIT;
The message displayed after the two first commands should be
0 rows affected (0.00 sec) Query OK,
and the message displayed after the last command should be
Bye
We now log in as the normal user called “testuser”.
Linux users should type as a normal user, i.e., not as root, in their terminal the following, and Windows users should type in their command prompt the following15:
mysql -u testuser -p
Enter password
as your password. If you are prompted with a message
ERROR 1045 (28000): Access denied for user 'testuser'@'localhost' (using password: YES)
then you probably typed the wrong password. Otherwise, you should see a welcoming message from MySQL or MariaDB and a prompt.
To save yourself the hassle of typing the password, you can use
mysql -u testuser -ppassword
or
mysql -u testuser -p --password=password
to log-in as testuser immediately.
If at some point you want to know if you are logged as root or testuser, simply enter
\s;
Now, let us create our first schema, our first table, populate it with data, and display various information.
We first create the schema (or database) HW_FirstTest
:
CREATE DATABASE HW_FirstTest; -- Or CREATE SCHEMA HW_FirstTest;
Let us make sure that we created it:
SHOW DATABASES;
Let us use it:
USE HW_FirstTest;
And see what it contains now:
SHOW TABLES;
We now create a table called TableTest
, wtih two integer attributes called Attribute1
and Attribute2
:
CREATE TABLE TableTest (Attribute1 INT, Attribute2 INT);
And can make sure that the table was indeed created:
SHOW TABLES;
We can further ask our DBMS to display the structure of the table we just created:
DESCRIBE TableTest; -- Can be abbreviated as DESC TableTest;
And even ask to get back the code that would create the exact same structure (but without the data!):
SHOW CREATE TABLE TableTest;
Now, let us populate it with some data:
INSERT INTO TableTest
VALUES (1,2),
3,4),
(5,6); (
Note that the SQL
syntax and your DBMS are completely fine with your statement spreading over multiple lines. Let us now display the data stored in the table:
SELECT * FROM TableTest;
After that last command, you should see
+------------+------------+
| Attribute1 | Attribute2 |+------------+------------+
1 | 2 |
| 3 | 4 |
| 5 | 6 |
| +------------+------------+
Finally, we can erase the content of the table, then erase (“drop”) the table, and finally the schema:
DELETE FROM TableTest; -- Delete the rows
DROP TABLE TableTest; -- Delete the table
DROP DATABASE HW_FirstTest; -- Delete the schema
You’re all set! All you have to do is to quit, using the command
EXIT;
Note that we were quite careless when we set-up our installation:
testuser
that is common to all of our installation.mysqldump -u testuser -ppassword
means that the password will be stored in the history of your command-line interface (that you should be able to access using history
or Get-History
for Powershell) and could be accessed by anyone having access to it.All of those are obvious security risks, and make this installation unsafe to be a production environment. We will only use it as a testing / learning environment, but it is strongly recommended to:
For each of the following, fill in the blanks:
SQL
, a relation is called a ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ .SQL
, every statement ends with ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟, and in-line comments start with a ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ .SQL
, there is no string
datatype, so we have to use ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟.SQL
’s role is to ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ ͟ .What does it mean to say that SQL
is at the same time a “data definition language” and a “data manipulation language”?
Name three kind of objects (for lack of a better word) a CREATE
statement can create.
Write a SQL
statement that adds a primary key constraint to an attribute named ID
in an already existing table named STAFF
.
Complete each row of the following table with either a datatype or two different examples:
Data type | Examples |
---|---|
4 , -32 |
|
Char(4) | |
VarChar(10) | 'Train' , 'Michelle' |
Bit(4) | |
TRUE , UNKNOWN |
In the datatype CHAR(3)
, what does the 3
indicate?
Explain this query: CREATE SCHEMA FACULTY;
.
Write code to
You are free to come up with an example (even very simple or cryptic) or to re-use an example from class.
Explain this query:
ALTER TABLE TABLEA
DROP INDEX Attribute1;
If I want to enter January 21, 2016, as a value for an attribute with the DATE
datatype, what value should I enter?
Write a statement that inserts the values "Thomas"
and 4
into the table TRAINS
.
If PkgName
is the primary key in the table MYTABLE
, what can you tell about the number of rows returned by the following statement?
SELECT * FROM MYTABLE WHERE PkgName = 'MySQL';
.
If you want that every time a referenced row is delted, all the refering rows are deleted as well, what mechanism should you use?
By default, does the foreign key restrict, cascade, or set null on update? Can you justify this choice?
If a database designer is using the ON UPDATE SET NULL
for a foreign key, what mechanism is (s)he implementing (i.e., describe how the database will react a certain operation)?
If the following is part of the design of a table:
FOREIGN KEY (DptNumber) REFERENCES DEPARTMENT(Number)
ON DELETE SET DEFAULT
ON UPDATE CASCADE;
What happen to the rows whose foreign key DptNumber
are set to 3
if the row in the DEPARTEMENT
table with primary key Number
set to 3
is…
5
?If the following is part of the design of a WORKER
table:
FOREIGN KEY WORKER(DptNumber) REFERENCES DEPARTMENT(DptNumber)
ON UPDATE CASCADE;
What happen to the rows whose foreign key DptNumber
are set to 3
if the row in the DEPARTMENT
table with primary key Number
set to 3
is…
5
?Given a relation TOURIST(Name, EntryDate, Address)
, write a SQL
statement printing the name and address of all the tourists who entered the territory after the 15 September, 2012.
Describe what the star do in the statement
SELECT ALL * FROM MYTABLE;
What is the fully qualified name of an attribute? Give an example.
If DEPARTMENT
is a database, what is DEPARTMENT.*
?
What is a multi-set? What does it mean to say that MySQL treats tables as multisets?
What is the difference between
SELECT ALL * FROM MYTABLE;
and
SELECT DISTINCT * FROM MYTABLE;
How are the results the same? How are they different?
What is wrong with the statement
SELECT * WHERE Name = 'CS' FROM DEPARTMENT;
Write a query that returns the number of row (i.e., of entries, of tuples) in a table named BOOK
.
When is it useful to use a select-project-join query?
When is a tuple variable useful?
Write a query that changes the name of the professor whose Login
is 'caubert'
to 'Hugo Pernot'
in the table PROF
.
Can an UPDATE
statement have a WHERE
condition using an attribute that is not the primary key? If no, justify, if yes, tell what could happen.
Give the three possible meaning of the NULL
value, and an example for each of them.
What are the values of the following expressions (i.e., do they evaluate to TRUE
, FALSE
, or UNKNOWN
)?
TRUE AND FALSE
TRUE AND UNKNOWN
NOT UNKNOWN
FALSE OR UNKNOWN
Write the truth table for AND
for the three-valued logic of SQL
.
What comparison expression should you use to test if a value is different from NULL
?
Explain this query:
SELECT Login
FROM PROF
WHERE Department IN ( SELECT Major
FROM STUDENT
WHERE Login = 'jrakesh');
Can you rewrite it without nesting queries?
What is wrong with this query?
SELECT Name FROM STUDENT
WHERE Login IN
SELECT Code FROM Department WHERE head = 'aturing'); (
Write a query that returns the sum of all the values stored in the Pages
attribute of a BOOK
table.
Write a query that adds a Pages
attribute of type INT
into a (already existing) BOOK
table.
Write a query that removes the default value for a Pages
attribute in a BOOK
table.
Under which conditions does SQL
allow you to enter the same row in a table twice?
Explain this query: ROLLBACK;
.
Explain this query: DELIMITER ;
.
The blanks can be filled as follow:
SQL
, a relation is called a ͟ ͟ ͟ ͟table ͟ ͟ ͟ ͟ ͟ ͟ ͟ .SQL
, every statement ends with ͟ ͟a semi-colon (;
) ͟ ͟, and in-line comments start with a ͟ ͟ ͟two minus signs (--
) ͟ ͟ .SQL
, there is no string
datatype, so we have to use ͟ ͟ ͟VARCHAR(x) or CHAR(x) where x is an integer reflecting the maximum (or fixed) size of the string ͟ ͟ ͟ ͟ ͟ .SQL
’s role is to ͟ ͟ control access to the data stored, by creating users and granting them rights ͟ ͟ .It can specify the conceptual and internal schema, and it can manipulate the data.
Database (schema), table, view, assertion, trigger, etc.
ALTER TABLE STAFF ADD PRIMARY KEY(ID);
Data type | Examples |
---|---|
INT |
4 , -32 |
CHAR(4) |
'abCD' , "dEfG" |
VARCHAR(10) |
'Train' , 'Michelle' |
BIT(4) |
B'1010' , B'0101' |
BOOL`` |
TRUE,
FALSE,
NULL`
NULL
is actually a valid answer for every single type of
That we can store exactly three characters.
It creates a schema, i.e., a database, named Faculty
.
/* code/sql/HW_Short.sql */
CREATE TABLE A (
Att1 INT PRIMARY KEY,
Att2 INT
);
CREATE TABLE B (
Att3 INT PRIMARY KEY,
Att4 INT,
FOREIGN KEY (Att4) REFERENCES A (Att1)
);
INSERT INTO A
VALUES (
1,
2);
INSERT INTO B
VALUES (
3,
1);
It removes the UNIQUE
constraint on the Attribute1
in the TABLEA
table.
DATE'2016-01-21'
, '2016-01-21'
, '2016/01/21'
, '20160121'
.
INSERT INTO TRAINS VALUES('Thomas', 4);
We know that at most one (but possibly 0) row will be returned.
We should use a referential triggered action clause, ON DELETE CASCADE
.
By default, the foreign key restricts updates. This prevents unwanted update of information: if an update needs to be propagated, then it needs to be “acknowledged” and done explicitely.
If the referenced row is updated, then the attribute of the referencing rows are set to NULL
.
In the referencing rows,
DEPARTMENT
table with primary key Number
set to 3
cannot be deleted if a row in the WORKER
table references it.We could use the following:
SELECT Name, Address
FROM TOURIST
WHERE EntryDate > DATE'2012-09-15';
It selects all the attributes, it is a wildcard.
The name of the relation with the name of its schema and a period beforehand. An example would be EMPLOYEE.Name
.
All the tables in that database.
A multiset is a set where the same value can occur twice. In MySQL, the same row can occur twice in a table.
They both select all the rows in the MYTABLE
table, but ALL
will print the duplicate values, whereas DISTINCT
will print them only once.
You cannot have the WHERE
before FROM
.
SELECT COUNT(*) FROM BOOK;
We use those query that projects on attributes using a selection and join conditions when we need to construct for information based on pieces of data spread in multiple tables.
It makes the distinction between two different rows of the same table, it is useful when we want to select a tuple in a relation that is in a particular relation with a tuple in the same relation. Quoting https://stackoverflow.com/a/7698796/:
They are useful for saving typing, but there are other reasons to use them:
- If you join a table to itself you must give it two different names otherwise referencing the table would be ambiguous.
- It can be useful to give names to derived tables, and in some database systems it is required… even if you never refer to the name.
We could use the following:
UPDATE PROF SET Name = 'Hugo Pernot'
WHERE Login = 'caubert';
Yes, we can have select condition that does not use primary key. In that case, it could be the case that we update more than one tuple with such a command (which is not necessarily a bad thing).
Unknown value (“Will it rain tomorrow?”), unavailable / withheld (“What is the phone number of Harry Belafonte?”), N/A (“What is the email address of Abraham Lincoln?”).
TRUE AND FALSE
→ FALSE
TRUE AND UNKNOWN
→ UNKNOWN
NOT UNKNOWN
→ UNKNOWN
FALSE OR UNKNOWN
→ FALSE
TRUE AND TRUE
→ TRUE
TRUE AND FALSE
→ FALSE
TRUE AND UNKNOWN
→ UNKNOWN
FALSE AND FALSE
→ FALSE
UNKNOWN AND UNKNOWN
→ UNKNOWN
FALSE AND UNKNOWN
→ FALSE
For a more compact presentation, refer to the three-valued truth table.
IS NOT
It list the login of the professors teaching in the department where the student whose login is “jrakesh” is majoring. It can be rewritten as
SELECT PROF.Login
FROM PROF, STUDENT
WHERE Department = Major
AND STUDENT.Login = 'jrakesh';
It tries to find a Login
in a Code
.
SELECT SUM(Pages) FROM BOOK;
ALTER TABLE BOOK ADD COLUMN Pages INT;
ALTER TABLE BOOK ALTER COLUMN Pages DROP DEFAULT;
Essentially, if there are no primary key in the relation, and if no attribute has the UNIQUE
constraint. Cf. also this previous problem.
This command, ROLLBACK
, undoes the last transaction, i.e. it allows to “roll back” to a previous point identified by START TRANSACTION;
: everything that happened between those two commands is “undone”, as if it had never been executed.
This command, DELIMITER ;
, changes the SQL interpreter delimiter back to being “;
”. It is useful when defining procedures, as the traditional delimiter’s role have to be “suspended” to wrap a series of commands inside a function.
The goal of this problem is to learn where to find the documentation for your DBMS, and to understand how to read the syntax of SQL
commands.
You can consult (Elmasri and Navathe 2010, Table 5.2, p. 140) or (Elmasri and Navathe 2015, Table 7.2, p. 235), for a very quick summary of the most common commands. Make sure you are familiar with the Backus–Naur form (BNF) notation commonly used:
<…>
[…]
{…}
(…|…|…)
The most complete lists of commands are probably at
Those are the commands implemented in the DBMS you are actually using. Since there are small variations from one implementation to the other, it is better to take one of this link as a reference in the future.
As a starting point, looking at the syntax for CREATE TABLE
commands is probably a good start, cf. https://mariadb.com/kb/en/create-table/ or https://dev.mysql.com/doc/refman/8.0/en/create-table.html.
SQL
)This problem will guide you in manipulating a very simple table in SQL
.
Log in as testuser
, create a database named HW_Address
, use it, and create two tables:
CREATE TABLE NAME(
VARCHAR(15),
FName VARCHAR(15),
LName INT,
ID PRIMARY KEY(ID)
);
CREATE TABLE ADDRESS(
VARCHAR(15),
StreetName INT,
Number INT,
Habitants PRIMARY KEY(StreetName, Number)
);
Observe the output produced by the command DESC ADDRESS;
.
Add a foreign key to the ADDRESS
table, using
ALTER TABLE ADDRESS
ADD FOREIGN KEY (Habitants)
REFERENCES NAME(ID);
And observe the new output produced by the command DESC ADDRESS;
.
Is it what you would have expected? How informative is it? Can you think of a command that would output more detailled information, including a reference to the existence of the foreign key?
Draw the relational model corresponding to that database and identify the primary and foreign keys.
Add this data to the NAME
table:
INSERT INTO NAME VALUES ('Barbara', 'Liskov', 003);
INSERT INTO NAME VALUES ('Tuong Lu', 'Kim', 004);
INSERT INTO NAME VALUES ('Samantha', NULL, 080);
What command can you use to display this infomation back? Do you notice anything regarding the values we entered for the ID
attribute?
Add some data into the ADDRESS
table:
INSERT INTO ADDRESS
VALUES
'Armstrong Drive', 10019, 003),
('North Broad St.', 23, 004),
('Robert Lane', 120, NULL); (
What difference do you notice with the insertions we made in the NAME
table? Which syntax seems more easy to you?
Write a SELECT
statement that returns the ID
number of the person whose first name is “Samantha”.
Write a statement that violates the entity integrity constraint. What is the error message returned?
Execute an UPDATE
statement that violates the referential integrity constraint. What is the error message returned?
Write a statement that violates another kind of constraint. Explain what constraint you are violating and explain the error message.
SQL
)Log in as testuser
and create a database titled HW_REPETITION
. Create in that database a table (the following questions refer to this table as EXAMPLE
, but you are free to name it whatever you want) with at least two attributes that have different data types. Do not declare a primary key yet. Answer the following:
Add a tuple to your table using
INSERT INTO EXAMPLE VALUES(X, Y);
where the X
and Y
are values have the right datatype. Try to add this tuple again. What do you observe? (You can use SELECT * FROM EXAMPLE;
to observe what is stored in this table.)
Alter your table to add a primary key, using
ALTER TABLE EXAMPLE ADD PRIMARY KEY (Attribute);
where Attribute
is the name of the attribute you want to be a primary key. What do you observe?
Empty your table using
DELETE FROM EXAMPLE;
and alter your table to add a primary key, using the command we gave at the previous step. What do you observe?
Try to add the same tuple twice. What do you observe?
From the notes, recall the following about foreign keys:
Two important remarks:
- The datatype of the foreign key has to be the exactly the same as the datatype of the attribute to which we are referring.
- The target of the foreign key must be the primary key.
But, the situation is slightly more complex. Test for yourself by editing the following code as indicated:
/* code/sql/HW_FKtest.sql */
DROP SCHEMA IF EXISTS HW_FKtest;
CREATE SCHEMA HW_FKtest;
USE HW_FKtest;
CREATE TABLE TARGET (
Test VARCHAR(15) PRIMARY KEY
);
CREATE TABLE SOURCE (
Test VARCHAR(25),
FOREIGN KEY (Test) REFERENCES TARGET (Test)
);
PRIMARY KEY
constraint.PRIMARY KEY
with UNIQUE
.VARCHAR(25)
with CHAR(25)
.VARCHAR(25)
with INT
.VARCHAR(25)
with VARCHAR(15)
Create the PROF
, DEPARTMENT
, STUDENT
and GRADE
tables as in the “Constructing and populating a new example” section. Populate them with some data (copy it from the notes or come up with your own data).
To obtain exactly the same schema as the one we developped and edited, you can use mysqldump to “dump” this table, with a command like
mysqldump -u testuser -ppassword\
-h localhost --add-drop-database\
--skip-comments --compact\
> dump.sql HW_ProfExample
The code we studied during the lecture is more or less the following.
/* code/sql/HW_ProfExampleRevisitedRevisited.sql */
DROP SCHEMA IF EXISTS HW_ProfExampleRevisited;
CREATE SCHEMA HW_ProfExampleRevisited;
USE HW_ProfExampleRevisited;
CREATE TABLE PROF (
Login VARCHAR(25) PRIMARY KEY,
NAME VARCHAR(25),
Department CHAR(5)
);
CREATE TABLE DEPARTMENT (
Code CHAR(5) PRIMARY KEY,
NAME VARCHAR(25),
Head VARCHAR(25),
FOREIGN KEY (Head) REFERENCES PROF (LOGIN) ON UPDATE CASCADE
);
ALTER TABLE PROF
ADD FOREIGN KEY (Department) REFERENCES DEPARTMENT (Code);
CREATE TABLE STUDENT (
Login VARCHAR(25) PRIMARY KEY,
NAME VARCHAR(25),
Registered DATE,
Major CHAR(5),
FOREIGN KEY (Major) REFERENCES DEPARTMENT (Code)
);
CREATE TABLE GRADE (
Login VARCHAR(25),
Grade INT,
PRIMARY KEY (LOGIN, Grade),
FOREIGN KEY (LOGIN) REFERENCES STUDENT (LOGIN)
);
INSERT INTO DEPARTMENT
VALUES (
'MATH',
'Mathematics',
NULL),
(
'CS',
'Computer
Science',
NULL);
INSERT INTO DEPARTMENT (
Code,
Name)
VALUES (
'CYBR',
'Cyber Secturity');
INSERT INTO PROF (
LOGIN,
Department,
Name)
VALUES (
'caubert',
'CS',
'Clément Aubert');
INSERT INTO PROF (
LOGIN,
Name,
Department)
VALUES (
'aturing',
'Alan Turing',
'CS'),
(
'perdos',
'Paul
Erdős',
'MATH'),
(
'bgates',
'Bill Gates',
'CYBR');
INSERT INTO STUDENT (
LOGIN,
Name,
Registered,
Major)
VALUES (
'jrakesh',
'Jalal Rakesh',
DATE '2017-12-01',
'CS'),
(
'svlatka',
'Sacnite Vlatka',
'2015-03-12',
'MATH'),
(
'cjoella',
'Candice Joella',
'20120212',
'CYBR'),
(
'aalyx',
'Ava Alyx',
20121011,
'CYBR'),
(
'caubert',
'Clément Aubert',
NULL,
'CYBR');
INSERT INTO GRADE
VALUES (
'jrakesh',
3.8),
(
'svlatka',
2.5);
We will resume working on this model, and enhance it.
Draw the complete relational model for this database (i.e., for the PROF
, DEPARTMENT
, STUDENT
and GRADE
relations).
Create and populate a LECTURE
table as follows:
Name
, Instructor
, Code
, and Year
, of types VARCHAR(25)
for the first two, CHAR(5)
for Code
, and YEAR(4)
for Year
.Year
and Code
attributes should be the primary key (yes, have two attributes be the primary key).Instructor
attribute should be a foreign key referencing the Login
attribute in PROF
.LECTURE
table with some made-up data.Try to think about some of the weakenesses of this representation. For instance, can it accomodate two instructors for the same class? Write down two possibles scenarios in which this schema would not be appropriate.
The GRADE
table had some limitations too. For example, every student could have only one grade. Add two columns to the GRADE
table using:
ALTER TABLE GRADE
ADD COLUMN LectureCode CHAR(5),
ADD COLUMN LectureYear YEAR(4);
Add a foreign key:
ALTER TABLE GRADE
ADD FOREIGN KEY (LectureYear, LectureCode)
REFERENCES LECTURE(Year, Code);
Use DESCRIBE
and SELECT
to observe the schema of the GRADE
table and its rows. Is it what you would have expected?
GRADE
with some made-up data. Now every row should contain, in addition to a login and a grade, a lecture year and a lecture code.
Write SELECT
statements answering the following questions (where PROF.Name
, LECTURE.Name
, YYYY
, LECTURE.Code
and STUDENT.Login
should be relevant values considering your data):
LECTURE.Name
in YYYY
?”YYYY
without any duplicates?”LECTURE.Code
?”LECTURE.Code
taught?”LECTURE.Code
?”STUDENT.Login
?”SQL
coding)Look at the SQL
code below and then answer the following questions.
/* code/sql/HW_Train.sql */
CREATE TABLE TRAIN (
ID VARCHAR(30),
Model VARCHAR(30),
ConstructionYear YEAR (4)
);
CREATE TABLE CONDUCTOR (
ID VARCHAR(20),
NAME VARCHAR(20),
ExperienceLevel VARCHAR(20)
);
CREATE TABLE ASSIGNED_TO (
TrainId VARCHAR(20),
ConductorId VARCHAR(20),
Day DATE,
PRIMARY KEY (TrainId, ConductorId)
);
Modify the CREATE
statement that creates the TRAIN
table (lines 1–5), so that ID
would be declared as the primary key. It is sufficient to only write the line(s) that need to change.
Write an ALTER
statement that makes ID
become the primary key of the CONDUCTOR
table.
Modify the CREATE
statement that creates the ASSIGNED_TO
table (lines 13–18), so that it has two foreign keys: ConductorId
references the ID
attribute in CONDUCTOR
and TrainId
references the ID
attribute in TRAIN
. It is sufficient to only write the line(s) that need to change.
Write INSERT
statements that insert one tuple of your choosing in each relation (no NULL
values). These statements should respect all the constraints (including the ones we added in the previous questions) and result in actual insertions. (Remember that four digits is a valid value for an attribute with the YEAR(4)
datatype.)
Write a statement that sets the value of the ExperienceLevel
attribute to “Senior” in all the tuples where the ID
attribute is “GP1029” in the CONDUCTOR
relation.
Write a SELECT
statement that answers each of the following questions:
SQL
statements for the COFFEE database)Suppose we have the relational model depicted below, with the indicated data in it:
COFFEE
Ref | Origin | TypeOfRoast | PricePerPound |
---|---|---|---|
001 | Brazil | Light | 8.90 |
121 | Bolivia | Dark | 7.50 |
311 | Brazil | Medium | 9.00 |
221 | Sumatra | Dark | 10.25 |
CUSTOMER
CardNo | Name | |
---|---|---|
001 | Bob Hill | b.hill@isp.net |
002 | Ana Swamp | swampa@nca.edu |
003 | Mary Sea | brig@gsu.gov |
004 | Pat Mount | pmount@fai.fr |
SUPPLY
Provider | Coffee |
---|---|
Coffee Unl. | 001 |
Coffee Unl. | 121 |
Coffee Exp. | 311 |
Johns & Co. | 221 |
PROVIDER
Name | |
---|---|
Coffee Unl. | bob@cofunl.com |
Coffee Exp. | pat@coffeex.dk |
Johns & Co. | NULL |
In the following, we will assume that this model was implemented in a DBMS (MySQL or MariaDB), the primary keys being COFEE.Ref
, CURTOMER.CardNo
, SUPPLY.Provider
and SUPPLY.Coffee
, and PROVIDER.Name
, and the foreign keys being as follows:
FavCoffee in the CUSTOMER relation | refers to | Ref in the COFFEE relation |
Provider in the SUPPLY | refers to | Name in the PROVIDER relation |
Coffee in the SUPPLY | refers to | Ref in the COFFEE relation |
Read and write SQL
commands for the following “what-if” scenarios. Assume that:
SQL
’s syntax (there’s no “a semi-colon is missing” trap).You can use COFFEE.1 to denote the first tuple (or row) in COFFEE, and similarly for other relations and tuples (so that, for instance, SUPPLY.4 corresponds to "Johns & Co"., 221
).
Draw the relational model of this table.
NULL
and should be unique”), the referential integrity constraint (“the foreign key must refer to something that exists”), if there would be some other kind of error (ignoring the plausability / revelance of inserting that tuple), or if it would result in successful insertion.
INSERT INTO CUSTOMER VALUES(005, 'Bob Hill', NULL, 001);
INSERT INTO COFFEE VALUES(002, "Peru", "Decaf", 3.00);
INSERT INTO PROVIDER VALUES(NULL, "contact@localcof.com");
INSERT INTO SUPPLY VALUES("Johns Co.", 121);
INSERT INTO SUPPLY VALUES("Coffee Unl.", 311, 221);
Assuming that the referential triggered action clause ON UPDATE CASCADE
is used for each of the foreign keys in this database, list the tuples modified by the following statements:
UPDATE CUSTOMER SET FavCoffee = 001
WHERE CardNo = 001;
UPDATE COFFEE SET TypeOfRoast = 'Decaf'
WHERE Origin = 'Brazil';
UPDATE PROVIDER SET Name = 'Coffee Unlimited'
WHERE Name = 'Coffee Unl.';
UPDATE COFFEE SET PricePerPound = 10.00
WHERE PricePerPound > 10.00;
ON DELETE CASCADE
is used for each of the foreign keys in this database, list the tuples modified by the following statements:
DELETE FROM CUSTOMER
WHERE Name LIKE '%S%';
DELETE FROM COFFEE
WHERE Ref = 001;
DELETE FROM SUPPLY
WHERE Provider = 'Coffee Unl.'
AND Coffee = 001;
DELETE FROM PROVIDER
WHERE Name = 'Johns & Co.';
SQL
queries that answer the following questions:
Consider the following SQL
code:
/* code/sql/HW_Department.sql */
CREATE TABLE DEPARTMENT (
ID INT PRIMARY KEY,
NAME VARCHAR(30)
);
CREATE TABLE EMPLOYEE (
ID INT PRIMARY KEY,
NAME VARCHAR(30),
Hired DATE,
Department INT,
FOREIGN KEY (Department) REFERENCES DEPARTMENT (ID)
);
INSERT INTO DEPARTMENT
VALUES (
1,
"Storage"),
(
2,
"Hardware");
INSERT INTO EMPLOYEE
VALUES (
1,
"Bob",
20100101,
1),
(
2,
"Samantha",
20150101,
1),
(
3,
"Mark",
20050101,
2),
(
4,
"Karen",
NULL,
1),
(
5,
"Jocelyn",
20100101,
1);
Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.
Storage
department ("Bob"
, "Samantha"
, "Karen"
and "Jocelyn"
),"Mark"
),Storage
department who has(have) been hired for the longest period of time. Phrased differently, the oldest employees of the Storage
department ("Bob"
and "Jocelyn"
).Consider the following SQL
code:
/* code/sql/HW_Computer.sql */
DROP SCHEMA IF EXISTS HW_Computer;
CREATE SCHEMA HW_Computer;
USE HW_Computer;
CREATE TABLE COMPUTER (
ID VARCHAR(20) PRIMARY KEY,
Model VARCHAR(40)
);
CREATE TABLE PRINTER (
ID VARCHAR(20) PRIMARY KEY,
Model VARCHAR(40)
);
CREATE TABLE CONNEXION (
Computer VARCHAR(20),
Printer VARCHAR(20),
PRIMARY KEY (Computer, Printer),
FOREIGN KEY (Computer) REFERENCES COMPUTER (ID),
FOREIGN KEY (Printer) REFERENCES PRINTER (ID)
);
INSERT INTO COMPUTER
VALUES (
'A',
'DELL A'),
(
'B',
'HP X'),
(
'C',
'ZEPTO D'),
(
'D',
'MAC Y');
INSERT INTO PRINTER
VALUES (
'12',
'HP-140'),
(
'13',
'HP-139'),
(
'14',
'HP-140'),
(
'15',
'HP-139');
INSERT INTO CONNEXION
VALUES (
'A',
'12'),
(
'A',
'13'),
(
'B',
'13'),
(
'C',
'14');
Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.
ID
is '13'
(2
).2
).ID
is 'A'
('HP-140'
and 'HP-139'
).'D'
).Consider the following SQL
code:
/* code/sql/HW_SocialMedia.sql */
CREATE TABLE ACCOUNT (
ID INT PRIMARY KEY,
NAME VARCHAR(25),
Email VARCHAR(25) UNIQUE
);
CREATE TABLE SUBSCRIBE (
Subscriber INT,
Subscribed INT,
DATE DATE,
FOREIGN KEY (Subscriber) REFERENCES ACCOUNT (ID),
FOREIGN KEY (Subscribed) REFERENCES ACCOUNT (ID),
PRIMARY KEY (Subscriber, Subscribed)
);
CREATE TABLE VIDEO (
ID INT PRIMARY KEY,
Title VARCHAR(25),
Released DATE,
Publisher INT,
FOREIGN KEY (Publisher) REFERENCES ACCOUNT (ID)
);
CREATE TABLE THUMBS_UP (
Account INT,
Video INT,
DATE DATE,
PRIMARY KEY (Account, Video),
FOREIGN KEY (Account) REFERENCES ACCOUNT (ID),
FOREIGN KEY (Video) REFERENCES VIDEO (ID)
);
INSERT INTO ACCOUNT
VALUES (
1,
"Bob Ross",
"bob@ross.com"),
(
2,
NULL,
"anon@fai.com"),
(
3,
"Martha",
NULL);
INSERT INTO SUBSCRIBE
VALUES (
2,
1,
DATE "2020-01-01"),
(
3,
1,
DATE "2019-03-03"),
(
3,
2,
DATE "2019-03-03"),
(
2,
2,
DATE "2019-03-03"),
(
1,
2,
DATE "2019-03-03");
-- The first entry means that 2 subscribed to 1, not the
-- other way around.
-- And similarly for the other entries.
INSERT INTO VIDEO
VALUES (
10,
"My first video!",
DATE "2020-02-02",
1),
(
20,
"My second video!",
DATE "2020-02-03",
1),
(
30,
"My vacations",
DATE "2020-02-04",
2);
INSERT INTO THUMBS_UP
VALUES (
2,
10,
DATE "2020-02-02"),
(
3,
10,
DATE "2020-02-02"),
(
2,
20,
DATE "2020-02-02"),
(
1,
30,
DATE "2020-02-05");
Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.
"My first video!"
, "My second video!"
, "My vacations"
)."My first video!"
("2020-02-02"
)."2"
)."video"
("10"
, "20"
)."My vacations"
("1"
)."My first video!"
)."Bob Ross"
)."2"
).Consider the following SQL
code:
CREATE TABLE COMPUTER (
ID VARCHAR(20) PRIMARY KEY,
Model VARCHAR(40)
);
CREATE TABLE PERIPHERAL (
ID VARCHAR(20) PRIMARY KEY,
Model VARCHAR(40),
TYPE ENUM ('mouse', 'keyboard', 'screen', 'printer')
);
CREATE TABLE CONNEXION (
Computer VARCHAR(20),
Peripheral VARCHAR(20),
PRIMARY KEY (Computer, Peripheral),
FOREIGN KEY (Computer) REFERENCES COMPUTER (ID),
FOREIGN KEY (Peripheral) REFERENCES PERIPHERAL (ID)
);
INSERT INTO COMPUTER
VALUES (
'A',
'Apple IIc Plus'),
(
'B',
'Commodore SX-64');
INSERT INTO PERIPHERAL
VALUES (
'12',
'Trendcom Model',
'printer'),
(
'14',
'TP-10
Thermal Matrix',
'printer'),
(
'15',
'IBM Selectric',
'keyboard');
INSERT INTO CONNEXION
VALUES (
'A',
'12'),
(
'B',
'14'),
(
'A',
'15');
Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.
ID
is 'A'
('Apple IIc Plus'
).ID
is '14'
(printer
).NAME
starts with 'IBM'
('IBM Selectric'
).ID
is 'A'
(Trendcom Model, IBM Selectric
).2
).A friend of yours wants you to review and improve the code for a role-playing game.
The original idea was that each character has a name, a class (e.g., Bard, Assassin, Druid), a certain amount of experience, a level, one or more weapons (providing bonuses) and the ability to complete quests. A quest has a name and rewards the characters who completed it with a certain amount of experience and, on rare occaisions, with a special item.
Your friend came up with the following code:
CREATE TABLE CHARACTER(
VARCHAR(30) PRIMARY KEY,
Name VARCHAR(30),
Class INT,
XP INT,
LVL VARCHAR(30),
Weapon_Name INT,
Weapon_Bonus VARCHAR(30)
Quest_Completed
);
CREATE TABLE QUEST(
VARCHAR(20) PRIMARY KEY,
ID VARCHAR(30),
Completed_By INT,
XP_Gained VARCHAR(20),
Special_Item FOREIGN KEY (Completed_By) REFERENCES CHARACTER(Name)
);
ALTER TABLE CHARACTER
ADD FOREIGN KEY (Quest_Completed) REFERENCES QUEST(ID);
However, there are several problems with the code:
CHARACTER
table to add an arbitrary number of weapons ended up creating horrible messes.)Can you provide a relational model (there is no need to write the SQL
code, but do remember to indicate the primary and foreign keys) that would solve all of your friend’s troubles?
Consider the following code:
/* code/sql/HW_SimpleBook.sql */
DROP SCHEMA IF EXISTS HW_SimpleBook;
CREATE SCHEMA HW_SimpleBook;
USE HW_SimpleBook;
CREATE TABLE AUTHOR (
FName VARCHAR(30),
LName VARCHAR(30),
Id INT PRIMARY KEY
);
CREATE TABLE PUBLISHER (
NAME VARCHAR(30),
City VARCHAR(30),
PRIMARY KEY (NAME, City)
);
CREATE TABLE BOOK (
Title VARCHAR(30),
Pages INT,
Published DATE,
PublisherName VARCHAR(30),
PublisherCity VARCHAR(30),
FOREIGN KEY (PublisherName, PublisherCity) REFERENCES
PUBLISHER (NAME, City),
Author INT,
FOREIGN KEY (Author) REFERENCES AUTHOR (Id),
PRIMARY KEY (Title, Published)
);
INSERT INTO AUTHOR
VALUES (
"Virginia",
"Wolve",
01),
(
"Paul",
"Bryant",
02),
(
"Samantha",
"Carey",
03);
INSERT INTO PUBLISHER
VALUES (
"Gallimard",
"Paris"),
(
"Gallimard",
"New-York"),
(
"Jobs Pub.",
"New-York");
INSERT INTO BOOK
VALUES (
"What to eat",
213,
DATE '20170219',
"Gallimard",
"Paris",
01),
(
"Where to live",
120,
DATE '20130212',
"Gallimard",
"New-York",
02),
(
"My Life, I",
100,
DATE '18790220',
"Gallimard",
"Paris",
03),
(
"My Life, II",
100,
DATE '18790219',
"Jobs Pub.",
"New-York",
03);
The values inserted in the database is just to provide some examples; you should assume there is more data in it than what we have inserted. In this long problem, you will be asked to write commands to select, update, delete, insert data, and to improve upon the relational model.
Write a command that selects:
Title
of all the books.Name
of the publishers.Titles
and Published
dates of the books published since January 31, 2012."Gallimard"
(from any city)."New-York"
.ID
of the authors who published a book whose name starts with "Where"
."Wolve"
.ID
is 3
to "BANNED"
. Is there any reason for this command to be rejected by the system? If yes, explain the reason.
ID
is 3
and all the books written by that author. Make sure you do not violate any foreign key constraints.
A certificate for a website has a serial number (SN
) and a common name (CN
). It must belong to an organization and be signed by a certificate authority (CA
). The organization and CA
must both have an SN
and a CN
. A CA
can be trusted, not trusted, or not evaluated. The code below is an attempt to represent this situation and is populated with examples.
CREATE TABLE ORGANIZATION(
VARCHAR(30) PRIMARY KEY,
SN VARCHAR(30)
CN
);
CREATE TABLE CA(
VARCHAR(30) PRIMARY KEY,
SN VARCHAR(30),
CN BOOL
Trusted
);
CREATE TABLE CERTIFICATE(
VARCHAR(30) PRIMARY KEY,
SN Varchar(30),
CN VARCHAR(30) NOT NULL,
Org VARCHAR(30) NOT NULL,
Issuer DATE,
Valid_Since DATE,
Valid_Until FOREIGN KEY (Org)
REFERENCES ORGANIZATION(SN)
ON DELETE CASCADE,
FOREIGN KEY (Issuer) REFERENCES CA(SN)
);
INSERT INTO ORGANIZATION VALUES
'01', 'Wikimedia Foundation'),
('02', 'Free Software Foundation');
(
INSERT INTO CA VALUES
'A', "Let's Encrypt", true),
('B', 'Shady Corp.', false),
('C', 'NewComer Ltd.', NULL);
(
INSERT INTO CERTIFICATE VALUES
'a', '*.wikimedia.org', '01', 'A',
(20180101, 20200101),
'b', '*.fsf.org', '02', 'A',
(20180101, 20191010),
'c', '*.shadytest.org', '02', 'B',
(20190101, 20200101),
'd', '*.wikipedia.org', '01', 'C',
(20200101, 20220101);
CN
’s of all certificates ("*.wikimedia.org, \*.fsf.org, \*.shadytest.org, \*.wikipedia.org"
).SN
’s of the organizations whose CN
contains "Foundation"
("01, 02"
).CN
’s and expiration dates of all the certificates that expired, assuming today is the 6th of December 2019 ("\*.fsf.org"
, 2019 − 10 − 10).CN
’s of the CA
’s that are not trusted ("Shady Corp., NewComer Ltd."
),CN
’s of the certificates that are signed by a CA
that is not trusted ("\*.shadytest.org, \*.wikipedia.org"
).CA
whose CN
is "Let's encrypt"
(2
).CN
’s of the organizations along with the CN
’s of their certificates ("Wikimedia Foundation, \*.wikimedia.org, Free Software Foundation, \*.fsf.org, Free Software Foundation, \*.shadytest.org, Wikimedia Foundation, \*.wikipedia.org"
).DELETE FROM CA WHERE SN = 'A';
UPDATE ORGANIZATION SET CN = "FSF" WHERE SN = '02';
UPDATE ORGANIZATION SET SN = "01" WHERE SN = '02';
DELETE FROM ORGANIZATION;
Consider the following code:
/* code/sql/HW_Work.sql */
CREATE TABLE AUTHOR (
NAME VARCHAR(30) PRIMARY KEY,
Email VARCHAR(30)
);
CREATE TABLE WORK (
Title VARCHAR(30) PRIMARY KEY,
Author VARCHAR(30),
FOREIGN KEY (Author) REFERENCES AUTHOR (NAME) ON DELETE
CASCADE ON UPDATE CASCADE
);
CREATE TABLE BOOK (
ISBN INT PRIMARY KEY,
Work VARCHAR(30),
Published DATE,
Price DECIMAL(10, 2),
FOREIGN KEY (WORK) REFERENCES WORK (Title) ON DELETE
RESTRICT ON UPDATE CASCADE
);
CREATE TABLE EBOOK (
ISBN INT PRIMARY KEY,
Work VARCHAR(30),
Published DATE,
Price DECIMAL(10, 2),
FOREIGN KEY (WORK) REFERENCES WORK (Title) ON DELETE
RESTRICT ON UPDATE CASCADE
);
INSERT INTO AUTHOR
VALUES (
"Virginia W.",
"vw@isp.net"), -- A.1
(
"Paul B.", "pb@isp.net"), -- A.2
(
"Samantha T.", "st@fai.fr") -- A.3
;
INSERT INTO WORK
VALUES (
"What to eat",
"Virginia W.") -- W.1
;
INSERT INTO BOOK
VALUES (
15155627,
"What to eat",
DATE '20170219',
12.89) -- B.1
;
INSERT INTO EBOOK
VALUES (
15155628,
"What to eat",
DATE '20170215',
9.89) -- E.1
;
Assume the following:
SQL
’s syntax (there’s no “a semi-colon is missing” trap).Also, note that each row inserted between line 39 and 50 is given a name in comment ("A.1, A.2, A.3, W.1"
, etc.).
Determine if the following insertion statements would violate the entity integrity constraint, the referential integrity constraint, if there would be some other kind of error, or if it would result in successful insertion.
INSERT INTO EBOOK VALUES (0, NULL, 20180101, 0);
INSERT INTO AUTHOR VALUES("Mary B.", "mb@fai.fr", NULL);
INSERT INTO WORK VALUES("My Life", "Claude A.");
INSERT INTO BOOK VALUES(00000000, NULL, DATE'20001225', 90.9);
INSERT INTO AUTHOR VALUES("Virginia W.", "alt@isp.net");
List the rows (A.2, W.1, etc.) modified by the following statements. Be careful about the conditions on foreign keys!
UPDATE AUTHOR SET Email = 'Deprecated' WHERE Email LIKE '%isp.net';
UPDATE WORK SET Title = "How to eat" WHERE Title = "What to eat";
DELETE FROM WORK;
DELETE FROM AUTHOR WHERE Name = "Virginia W.";
Assume that there is more data than what we inserted. Write a command that selects:
Consider the following code:
/* code/sql/HW_TextbookAuthored.sql */
DROP SCHEMA IF EXISTS HW_TEXTBOOKAUTHORED;
CREATE SCHEMA HW_TEXTBOOKAUTHORED;
USE HW_TEXTBOOKAUTHORED;
CREATE TABLE TEXTBOOK (
Title VARCHAR(50),
ISBN CHAR(13) PRIMARY KEY,
Price DECIMAL(10, 2)
);
CREATE TABLE AUTHOR (
LName VARCHAR(30),
FName VARCHAR(30),
Email VARCHAR(30),
PRIMARY KEY (Lname, Fname)
);
CREATE TABLE AUTHORED (
Book CHAR(13),
FOREIGN KEY (Book) REFERENCES TEXTBOOK (ISBN),
AuthorLName VARCHAR(30),
AuthorFName VARCHAR(30),
FOREIGN KEY (AuthorLName, AuthorFName) REFERENCES AUTHOR
(LName, Fname)
);
INSERT INTO TEXTBOOK
VALUES (
'Starting Out with Java: Early Objects',
9780133776744,
30.00),
(
'NoSQL for Mere Mortals',
9780134023212,
47.99);
INSERT INTO AUTHOR
VALUES (
'Sullivan',
'Dan',
NULL),
(
'Gaddis',
'Tony',
NULL);
INSERT INTO AUTHORED
VALUES (
9780134023212,
'Sullivan',
'Dan'),
(
9780133776744,
'Gaddis',
'Tony');
The meaning of the AUTHORED
table is that a tuple < I, L, F> represents that the author whose last name is L and whose first name is F wrote the textbook whose ISBN is I.
Answer the following:
TEXTBOOK
table. No value should be NULL
.Price
attribute of the TEXTBOOK
relation.TEXTBOOK
table and have the price set to the default value.EDITOR
with three attributes: Name
, Address
, and Website
. The Name
attribute should be the primary key. Insert two tuples in the EDITOR
table, making sure that one should has the Name
attribute set to “Pearson.”PUBLISHED
with two attributes: Editor
and Textbook
. The Editor
attribute should reference the EDITOR
table and the Textbook
attribute should reference the TEXTBOOK
table.Answer the following short questions based on what is in our model so far:
ISBN
be inserted in the TEXTBOOK
relation?Consider the following code:
/* code/sql/HW_Capstone.sql */
DROP SCHEMA IF EXISTS HW_CAPSTONE;
CREATE SCHEMA HW_CAPSTONE;
USE HW_CAPSTONE;
CREATE TABLE STUDENT (
FName VARCHAR(50),
Id CHAR(13) PRIMARY KEY,
GraduationYear INT,
GraduationSemester ENUM ("Fall", "Spring", "Summer")
);
CREATE TABLE PROGRAMMING_LANGUAGE (
NAME VARCHAR(50) PRIMARY KEY,
Licence VARCHAR(50)
);
CREATE TABLE PROJECT (
CodeName VARCHAR(50),
Leader CHAR(13),
PRIMARY KEY (CodeName, Leader),
FOREIGN KEY (Leader) REFERENCES STUDENT (Id)
);
CREATE TABLE USED_LANGUAGE (
ProjectCodeName VARCHAR(50),
ProjectLeader CHAR(13),
UsedLanguage VARCHAR(50),
PRIMARY KEY (ProjectCodeName, ProjectLeader, UsedLanguage),
FOREIGN KEY (ProjectCodeName, ProjectLeader) REFERENCES
PROJECT (CodeName, Leader),
FOREIGN KEY (UsedLanguage) REFERENCES PROGRAMMING_LANGUAGE (NAME)
);
/*
*/
INSERT INTO STUDENT
VALUES (
"Mary",
"0123456789100",
2025,
"Summer"),
(
"Steve",
"0000000000000",
2025,
"Fall"),
(
"Claude",
"9999999999999",
2024,
"Fall"),
(
"Meghan",
"0987654321098",
2023,
"Spring");
INSERT INTO PROGRAMMING_LANGUAGE
VALUES (
"Rust",
"MIT"),
(
".NET Core",
"MIT"),
(
"Racket",
"LGPL"),
(
"Python",
"PSF");
-- Taken from
-- https://en.wikipedia.org/wiki/Comparison_of_open-source_programming_language_licensing
INSERT INTO PROJECT
VALUES (
"Brick Break",
"0123456789100"),
(
"Brick Break",
"0000000000000"),
(
"Grade Calculator",
"0123456789100"),
(
"Undecided",
"9999999999999");
INSERT INTO USED_LANGUAGE
VALUES (
"Brick Break",
"0123456789100",
"Rust"),
(
"Brick Break",
"0000000000000",
".NET Core"),
(
"Brick Break",
"0000000000000",
"Python"),
(
"Grade Calculator",
"0123456789100",
"Racket");
The meaning of the USED_LANGUAGE table is that a tuple < N, L, U> represents the fact that the project whose code name is N and whose leader is L uses the programming language U.
Answer the following short questions based on the model implemented above. You can simply answer “True” or “False”, or justify your reasoning (e.g. with code).
Draw the relational model corresponding to this code.
Write the following commands.
Consider the following code:
/* code/sql/HW_Vaccine.sql */
CREATE TABLE COMPANY (
Name VARCHAR(50) PRIMARY KEY,
Website VARCHAR(255) CHECK (Website LIKE "https://%")
);
CREATE TABLE DISEASE (
Name VARCHAR(50) PRIMARY KEY,
Communicable BOOL,
-- Whether the disease can be transmitted from a human to
-- another.
TYPE ENUM ("infectious", "deficiency", "hereditary")
);
CREATE TABLE VACCINE (
Name VARCHAR(50) PRIMARY KEY,
Manufacturer VARCHAR(50) NOT NULL,
FOREIGN KEY (Manufacturer) REFERENCES COMPANY (NAME) ON
UPDATE CASCADE
);
CREATE TABLE EFFICACY (
DiseaseName VARCHAR(50),
VaccineName VARCHAR(50),
Efficacy DECIMAl(5, 2),
PRIMARY KEY (DiseaseName, VaccineName),
FOREIGN KEY (DiseaseName) REFERENCES DISEASE (NAME),
FOREIGN KEY (VaccineName) REFERENCES VACCINE (NAME)
);
INSERT INTO COMPANY
VALUES (
"Moderna",
"https://www.modernatx.com/");
INSERT INTO DISEASE
VALUES (
"Coronavirus disease 2019",
TRUE,
"infectious");
INSERT INTO VACCINE
VALUES (
"mRNA-1273",
"Moderna");
INSERT INTO EFFICACY
VALUES (
"Coronavirus disease 2019",
"mRNA-1273",
94.1);
Answer the following short questions. In our implementation…
Answer the following questions:
What does CHECK (Website LIKE "https://*")
do?
Why did we picked the DECIMAl(5,2)
datatype?
What is the benefit / are the benefits of having a separate EFFICACY table over having something like
CREATE TABLE VACCINE(
VARCHAR(50) PRIMARY KEY,
Name VARCHAR(50),
Manufacturer VARCHAR(50),
Disease DECIMAl(5,2),
Efficacy FOREIGN KEY (Manufacturer) REFERENCES COMPANY (Name)
);
?
Draw the relational model corresponding to this code.
Write the following commands.
Consider the following code:
/* code/sql/HW_Residency.sql */
DROP SCHEMA IF EXISTS HW_RESIDENCY;
CREATE SCHEMA HW_RESIDENCY;
USE HW_RESIDENCY;
CREATE TABLE PERSON (
FName VARCHAR(40),
LName VARCHAR(40),
SSN VARCHAR(11) PRIMARY KEY,
Birthdate DATE
);
CREATE TABLE HOUSE (
Address VARCHAR(40) PRIMARY KEY,
Color ENUM ("blue", "white", "green")
);
CREATE TABLE RESIDENCY (
Person VARCHAR(11),
House VARCHAR(40),
PrincipalResidence BOOLEAN,
Status ENUM ("own", "rent", "squat", "other"),
FOREIGN KEY (Person) REFERENCES PERSON (SSN),
FOREIGN KEY (House) REFERENCES HOUSE (Address) ON DELETE CASCADE
);
INSERT INTO PERSON
VALUES (
NULL,
"Doe",
"000-00-0000",
NULL), -- P.1
(
"Michael", "Keal", "000-00-0001", DATE "1983-02-11"), -- P.2
(
"James", "Baldwin", "000-00-0002", DATE
"1967-01-01"), -- P.3
(
"Mridula", "Warrier", "000-00-0003", DATE "1990-02-11");
-- P.4
INSERT INTO HOUSE
VALUES (
"123 Main St.",
"blue"), -- H.1
(
"456 Second St.", "white"), -- H.2
(
"11 Third St.", "blue");
-- H.3
INSERT INTO RESIDENCY
VALUES (
"000-00-0001",
"123 Main St.",
TRUE,
"own"), -- R.1
(
"000-00-0001", "456 Second St.", FALSE, "own"), -- R.2
(
"000-00-0002", "123 Main St.", TRUE, "rent"), -- R.3
(
"000-00-0003", "456 Second St.", TRUE, "own");
-- R.4
Note that each row inserted in the PERSON
, HOUSE
and RESIDENCY
tables is given the name and noted as afterwards as a comment ("P.1, P.2, P.3, P.4, H.1"
, etc.).
Answer the following questions and problems, assuming that none of the commands in the rest of the problem are actually executed.
Draw the relational model corresponding to this series of commands (it is not necessary to include the state).
List the rows (e.g. “P.2”, “H.1”, or “none”) modified by the following statements:
UPDATE HOUSE SET COLOR = "green";
DELETE FROM RESIDENCY WHERE House LIKE "1%";
DELETE FROM HOUSE WHERE Address = "456 Second St.";
DELETE FROM PERSON WHERE Birthdate=DATE"1990-02-11";
Write queries that return the following information. The values returned in this set-up will be in parenthesis, but keep the queries general.
Addresses' of the houses in the system (
“11 Third St., 123 Main St., 456 Second St.”`).SSN
’s of the people whose first name was not entered in the system ("000-00-0000"
)."white, blue"
).Address
of the residency of "James Baldwin"
("123 Main St."
)."James"
)."Michael Keal"
’s principal residency address ("123 Main St."
)."Michael Keal, Mridula Warrier"
).SSN
’s of the people that have the same principal residency as "James Baldwin"
("000-00-0001"
).SSN
of "James Baldwin"
to "000-00-0010"
. Is there any reason for this command to be rejected by the system? If yes, explain the reason.
Answer the following short questions from the data in our model, as it is currently:
SSN
to be any series of 11 characters?RESIDENCY
table and give a possible primary key.
RESIDENCY
table is a good choice.
Consider the following code:
/* code/sql/HW_ScientificResearch.sql */
CREATE TABLE SCIENTIST (
SSN INT PRIMARY KEY,
Name VARCHAR(30) NOT NULL
);
CREATE TABLE PROJECT (
Code CHAR(4) PRIMARY KEY,
Name VARCHAR(150) NOT NULL
);
CREATE TABLE CONTRIBUTESTO (
Scientist INT,
Project CHAR(4),
Hours INT,
PRIMARY KEY (Scientist, Project),
FOREIGN KEY (Scientist) REFERENCES SCIENTIST (SSN),
FOREIGN KEY (Project) REFERENCES PROJECT (Code) ON DELETE
CASCADE ON UPDATE CASCADE
);
CREATE TABLE FUNDINGAGENCY (
Name VARCHAR(150) PRIMARY KEY,
TYPE ENUM ("State", "Federal", "Foundation"),
Creation YEAR
);
CREATE TABLE FUNDS (
Agency VARCHAR(150),
Project CHAR(4),
Amount DECIMAL(12, 2),
FOREIGN KEY (Agency) REFERENCES FUNDINGAGENCY (NAME) ON
UPDATE CASCADE ON DELETE RESTRICT,
FOREIGN KEY (Project) REFERENCES PROJECT (Code)
);
INSERT INTO SCIENTIST
VALUES (
"000000000",
"Mike"), -- S.1
(
"000000001", "Sabine"), -- S.2
(
"000000002", "James"), -- S.3
(
"000000003", "Emily"), -- S.4
(
"000000004", "Claire");
-- S.5
INSERT INTO PROJECT
VALUES (
"AA",
"Advancing Airplanes"), -- P.1
(
"BA", "Better Airplanes"), -- P.2
(
"BB", "Better Buildings"), -- P.3
(
"CC", "Creative Creation");
-- P.4
INSERT INTO CONTRIBUTESTO
VALUES (
"000000001",
"AA",
12), -- C.1
(
"000000001", "BB", 10), -- C.2
(
"000000002", "AA", 5), -- C.3
(
"000000003", "BA", 3), -- C.4
(
"000000000", "BB", 1), -- C.5
(
"000000000", "AA", 1);
-- C.6
INSERT INTO FUNDINGAGENCY
VALUES (
"National Science Foundation",
"Federal",
1950), -- FA.1
(
"French-American Cultural Exchange", "Foundation", 2017);
-- FA.2
INSERT INTO FUNDS
VALUES (
"National Science Foundation",
"AA",
100000), -- F.1
(
"French-American Cultural Exchange", "CC", 10000);
-- F.2
Note that each row inserted in the tables is given a name and noted as afterwards as a comment ("S.1, S.2, P.1, C.1, FA.1"
, etc.).
Answer the following questions and problems, assuming that none of the commands in the rest of the problem are actually executed.
Draw the relational model corresponding to this series of commands (it is not necessary to include the state).
Draw the relational model corresponding to this series of commands (no need to include the state).
How could you edit line 12 so that negative values and NULL
would not be admitted as values for Hours
?
Write a command that would violate the referential integrity constraint.
List the rows affected (updated or deleted) by the following commands. If no rows are affected because the command would would violate the entity integrity constraint, the referential integrity constraint, or if there would be some other kind of error, please indicate it.
UPDATE SCIENTIST SET SSN = "000000001" WHERE Name = "Claire";
UPDATE FUNDINGAGENCY SET Name = "NSF" WHERE Name = "National Science Foundation";
DELETE FROM FUNDINGAGENCY WHERE Name = "French-American Cultural Exchange";
Write a query that selects …(In parenthesis, the values returned in this set-up, but you have to be general.)
…the name of the funding agencies created after 2000 ("French-American Cultural Exchange")
…the code of the projects that contains the word "Airplanes" ("AA", "BA")
…the number of hours scientists contributed to the project "AA" (18)
…the code of the projects to which the scientist named Sabine contributed ("AA", "BB")
…the name of the projects who benefited from federal funds ("Advancing Airplanes")
…the name of the scientist who contributed to the same project as Mike ("Sabine", "James")
…the name of the projects that are not funded by an agency ("Better Airplanes", "Better Buildings")
…the name of the scientist who contributed the most (in terms of hours) to the project named "Advancing Airplanes" (Sabine).
Identify and discuss two limitations of this model, and offer a way to remedy at least one of them.
Consider the following code:
CREATE TABLE ROOM(
VARCHAR(40) PRIMARY KEY,
Nickname INT,
Size BOOL NOT NULL
ComputerOrPhoneInIt
);
CREATE TABLE COMPUTER(
VARCHAR(40) PRIMARY KEY,
Nickname VARCHAR(50),
OperatingSystem VARCHAR(40),
Room FOREIGN KEY (Room) REFERENCES ROOM(Nickname)
);
CREATE TABLE PHONE(
VARCHAR(40) PRIMARY KEY,
Nickname VARCHAR(50),
OperatingSystem VARCHAR(40),
Room FOREIGN KEY (Room) REFERENCES ROOM(Nickname)
);
CREATE TABLE PRINTER(
VARCHAR(40) PRIMARY KEY,
Nickname VARCHAR(40),
ConnectedTo VARCHAR(40),
Room FOREIGN KEY (Room) REFERENCES ROOM(Nickname)
);
It was written by some friends of yours to store data for their printing station: their shop offers computers, phones and printers located in different rooms, and they want to keep track of some information about those. They have multiple issues with this implementation, and require your help to identify them and design a new model addressing those.
For each issue listed below, explain what causes it, and if there is a way to address it (you do not need to write actual code, simply explain how you would proceed, using keywords if it clarifies).
ROOM.ComputerOrPhoneInIt
is a bit cumbersome, as they have to remember to update it to FALSE
if the last computer or phone is removed from a room.OperatingSystem
attributes in PHONE
and COMPUTER
are not very convenient, as they do not provide an easy way for instance to list all the windows computers and phones, or all the 64-bits architectures. Examples of values are “Windows 10 IoT Core version 1.8.9, 64 bits”,“Android Pie”, “macOS 11 Big Sur, updated last week”, etc.Draw a relational model (no need to write sql
code) that would improve their implementation. You are free (and encouraged!) to create new relations and alter the original attributes. No need to specify the domains unless you want to add particular constraints, but remember to draw the primary and foreign keys.
Consider the following code:
CREATE TABLE COMPUTER (
VARCHAR(20) PRIMARY KEY,
ID VARCHAR(40)
Model
);
CREATE TABLE PERIPHERAL (
VARCHAR(20) PRIMARY KEY,
ID VARCHAR(40),
Model Type ENUM ('mouse', 'keyboard', 'screen', 'printer'),
DATETIME
LastConnexion
);
CREATE TABLE CONNEXION (
VARCHAR(20),
Computer VARCHAR(20),
Peripheral PRIMARY KEY (Computer, Peripheral),
FOREIGN KEY (Computer) REFERENCES COMPUTER (ID)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (Peripheral) REFERENCES PERIPHERAL (ID)
ON DELETE RESTRICT
ON UPDATE CASCADE
);
CREATE TRIGGER last_connexion_update
INSERT ON CONNEXION
BEFORE FOR EACH ROW
UPDATE PERIPHERAL
SET LastConnexion = NOW()
WHERE NEW.Peripheral = PERIPHERAL.ID;
INSERT INTO COMPUTER
VALUES
'A','Apple IIc Plus'), -- C.1
('B','Commodore SX-64'); -- C.2
(
INSERT INTO PERIPHERAL(ID, Model, Type)
VALUES
'12', 'Trendcom Model', 'printer'), -- P.1
('14', 'TP-10 Thermal Matrix', 'printer'), -- P.2
('15', 'IBM Selectric', 'keyboard'); -- P.3
(
INSERT INTO CONNEXION
VALUES
'A', '12'), -- X.1
('B', '14'), -- X.2
('A', '15'); -- X.3 (
True | False | |
A peripheral can be connected to multiple computers | ||
The ID of a computer must be a letter |
||
Whether the connexion is wired or wireless can be determined | ||
Every computer must have a different model | ||
A peripheral can be a mouse and a keyboard at the same time | ||
A computer can be connected to multiple peripheral | ||
A computer can be connected to another computer | ||
A peripheral can be connected to another peripheral |
List the rows (i.e., C.2, X.1, or even “none”) deleted by the following statements:
DELETE FROM CONNEXION WHERE Computer = 'A';
DELETE FROM COMPUTER WHERE ID = 'A';
DELETE FROM PERIPHERAL WHERE ID = '15';
DELETE FROM CONNEXION WHERE Computer <> 'A';
Write a query that selects …(In parenthesis, the values returned in this set-up, but you have to be general.)
Id
12 (printer)ID
of the computer whose model name contain "Apple" (A).ID
of the computer connected to a keyboard (A)INSERT INTO CONNEXION VALUES ('B', '12');
is executed.
SQL
)This problem is supposed to be a straightforward application of what we studied in class. Look back at Setting Up Your Work Environment if you feel like you are stuck before referencing this solution.
We simply log-in as indicated in the “Logging-in as testuser” section. Then we enter:
CREATE DATABASE HW_Address;
USE HW_Address;
This creates the tables asked for in the problem.
Ommiting the Extra
column, we have:
MariaDB [HW_Address]> DESC ADDRESS;
+------------+-------------+------+-----+---------+
| Field | Type | Null | Key | Default |
+------------+-------------+------+-----+---------+
| StreetName | varchar(15) | NO | PRI | NULL |
| Number | int(11) | NO | PRI | NULL |
| Habitants | int(11) | YES | | NULL |
+------------+-------------+------+-----+---------+
We add the foreign key, still omitting the Extra
column:
MariaDB [HW_Address]> DESC ADDRESS;
+------------+-------------+------+-----+---------+
| Field | Type | Null | Key | Default |
+------------+-------------+------+-----+---------+
| StreetName | varchar(15) | NO | PRI | NULL |
| Number | int(11) | NO | PRI | NULL |
| Habitants | int(11) | YES | MUL | NULL |
+------------+-------------+------+-----+---------+
The only difference is the MUL
value, which is a bit surprising: quoting https://dev.mysql.com/doc/refman/8.0/en/show-columns.html,
If
Key
isMUL
, then the column is the first column of a nonunique index in which multiple occurrences of a given value are permitted within the column.
In other words, this does not carry any information about the fact that ADDRESS.Habitants
is now a foreign key referencing NAME.ID
. A way of displaying information about that foreign key is using SHOW CREATE TABLE
:
MariaDB [HW_Address]> SHOW CREATE TABLE ADDRESS;
+---------+----------------------+
| Table | Create Table
+---------+----------------------+
| ADDRESS | CREATE TABLE `ADDRESS` (
`StreetName` varchar(15) NOT NULL,
`Number` int(11) NOT NULL,
`Habitants` int(11) DEFAULT NULL,
PRIMARY KEY (`StreetName`,`Number`),
KEY `Habitants` (`Habitants`),
CONSTRAINT `ADDRESS_ibfk_1` FOREIGN KEY (`Habitants`) REFERENCES `NAME` (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+---------+----------------------+
1 row in set (0.01 sec)
To display the information back, we can use
SELECT * FROM NAME;
We should notice that the ID
attribute values lost their leading zeros.
This syntax is better for “bulk insertion” since it allows for us to write fewer commands and to focus on the data being inserted. However, if an error occurs, then nothing gets inserted.
SELECT ID FROM NAME WHERE FName = 'Samantha';
This is a command that violates the entity integrity constraint:
INSERT INTO NAME VALUES ('Maria', 'Kashi', NULL);
The error message that it returns is:
ERROR 1048 (23000): Column 'ID' cannot be null
Another way of violating the entity integrity constraint is:
INSERT INTO NAME VALUES ('Maria', 'Kashi', 80);
The error message that it returns is:
ERROR 1062 (23000): Duplicate entry '80' for key 'PRIMARY'
This is an UPDATE
statement that violates the entity integrity constraint:
UPDATE ADDRESS SET Habitants = 340 WHERE Number = 120;
The error message that it returns is:
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_Address`.`ADDRESS`, CONSTRAINT `ADDRESS_ibfk_1` FOREIGN KEY (`Habitants`) REFERENCES `NAME` (`ID`))
Here is the query that violates another type of constraint:
INSERT INTO NAME VALUE ('Hi');
The error message that it returns is:
ERROR 1136 (21S01): Column count does not match value count at row 1
The query statement violates the implicit constraint by trying to insert a row with fewer values than there are attributes in the table.
Another example of a statement that violates another type of constraint is:
INSERT INTO ADDRESS VALUES ('Maria', 'Random', 98);
This is a violation of an explicit constraint, which is that the value must match the domain (datatype) of the attribute where it is inserted. However, MySQL and MariaDB do not return an error, they simply replace 'Random'
with 0
.
SQL
)Here is how we created our table:
CREATE SCHEMA HW_REPETITION;
USE HW_REPETITION;
CREATE TABLE EXAMPLE(
VARCHAR(15),
X INT
Y );
The command to add a tuple to our table is:
INSERT INTO EXAMPLE VALUES('Train', 4);
If we execute this command twice, then SQL
is OK with it, and inserts the same tuple twice:
SELECT * FROM EXAMPLE;
Displays:
+-------+------+
| X | Y |
+-------+------+
| Train | 4 |
| Train | 4 |
+-------+------+
This is an illustration of the fact that the data in a table in SQL
is not a set, as opposed to a state in a relation in the relational model.
The command:
ALTER TABLE EXAMPLE ADD PRIMARY KEY (X);
Should return:
ERROR 1062 (23000): Duplicate entry 'Train' for key 'PRIMARY'
We tried to declare that X
was a primary key, but SQL
disagreed, since two rows have the same value for that attribute.
Once the table is empty, X
now qualifies as a candidate key, and can now be made a primary key. SQL
stops complaining and lets us assign it as a primary key.
After trying this insertion statement twice:
INSERT INTO EXAMPLE VALUES('Train', 4);
SQL
refuses to insert the tuple after the second attempt:
ERROR 1062 (23000): Duplicate entry 'Train' for key 'PRIMARY'
Notice that this is exactly the same error message as before, when we tried to add the primary key while we had a duplicate row of tuples!
Removing the PRIMARY KEY
constraint, SQL
throws the following error message:
1005 (HY000): Can't create table `HW_FK_test`.`SOURCE` (errno: 150 "Foreign key constraint is incorrectly formed") ERROR
Replacing PRIMARY KEY
with UNIQUE
does not generate any error messages.
Replacing one of the VARCHAR(25)
with CHAR(25)
does not generate any error messages.
Replacing VARCHAR(25)
with INT
results in this error message:
1005 (HY000): Can't create table `HW_FK_test`.`SOURCE` (errno: 150 "Foreign key constraint is incorrectly formed") ERROR
Replacing one of the VARCHAR(25)
with VARCHAR(15)
does not generate any error messages.
The remarks become:
- The datatype of the foreign key has to be “compatible” with the datatype of the attribute to which we are referring.
- The target of the foreign key must be the primary key or have the
UNIQUE
constraint.
Ignoring the LECTURE relation, we have:
The code is straightforward:
CREATE TABLE HW_Lecture (
NAME VARCHAR(25),
Instructor VARCHAR(25),
Year YEAR (4),
Code CHAR(5),
PRIMARY KEY (Year, Code),
FOREIGN KEY (Instructor) REFERENCES PROF (LOGIN)
);
INSERT INTO HW_Lecture
VALUES (
'Intro to CS',
'caubert',
2017,
'1304'),
(
'Intro
to Algebra',
'perdos',
2017,
'1405'),
(
'Intro to
Cyber',
'aturing',
2017,
'1234');
However, this representation can not handle the following situations:
We come back to those short-coming in the “Reverse-Engineering” section, using more abstract tools (such as Entity Diagrams) that have not been introduced yet.
The statements are immediate:
HW_ProfExampleRevisitedRevisited.sqlWhat may be surprising is that the values for LectureCode
and LectureYear
are set to NULL
in all the tuples.
We use UPDATE
statements:
UPDATE
GRADE
SET LectureCode = '1304',
LectureYear = 2017
WHERE LOGIN = 'jrakesh'
AND Grade = '2.85';
UPDATE
GRADE
SET LectureCode = '1405',
LectureYear = 2017
WHERE LOGIN = 'svlatka'
OR (LOGIN = 'jrakesh'
AND Grade = '3.85');
UPDATE
GRADE
SET LectureCode = '1234',
LectureYear = 2017
WHERE LOGIN = 'aalyx'
OR LOGIN = 'cjoella';
We use SELECT
statements:
SELECT LOGIN,
Grade
FROM GRADE
WHERE Lecturecode = '1304'
AND LectureYear = '2017';
SELECT DISTINCT Instructor
FROM HW_Lecture
WHERE Year = 2017;
SELECT Name,
Grade
FROM STUDENT,
GRADE
WHERE GRADE.LectureCode = 1405
AND STUDENT.Login = GRADE.Login;
SELECT Year
FROM HW_Lecture
WHERE Code = '1234';
SELECT Name
FROM HW_Lecture
WHERE Year IN (
SELECT Year
FROM HW_Lecture
WHERE CODE = '1234');
SELECT B.name
FROM STUDENT AS A,
STUDENT AS B
WHERE A.Name = 'Ava Alyx'
AND A.Registered > B.Registered;
SELECT COUNT(DISTINCT PROF.Name) AS 'Head Teaching This Year'
FROM HW_Lecture,
DEPARTMENT,
PROF
WHERE Year = 2017
AND Instructor = Head
AND Head = PROF.Login;
SQL
coding)The code below includes the answers to all of the questions for this problem:
-- Question 1:
CREATE TABLE TRAIN (
Id VARCHAR(30) PRIMARY KEY, -- This line was changed.
Model VARCHAR(30),
ConstructionYear YEAR (4)
);
-- Question 2 :
CREATE TABLE CONDUCTOR (
Id VARCHAR(20),
NAME VARCHAR(20),
ExperienceLevel VARCHAR(20)
);
ALTER TABLE CONDUCTOR
ADD PRIMARY KEY (Id);
-- Question 3
CREATE TABLE ASSIGNED_TO (
TrainId VARCHAR(20),
ConductorId VARCHAR(20),
Day DATE,
PRIMARY KEY (TrainId, ConductorId),
FOREIGN KEY (TrainId) REFERENCES TRAIN (Id), -- This line was changed
FOREIGN KEY (ConductorId) REFERENCES CONDUCTOR (Id) -- This line was changed
);
-- Question 4:
/*
We insert more than one tuple, to make the SELECT statements that follow easier
to test and debug.
*/
INSERT INTO TRAIN
VALUES (
'K-13',
'SurfLiner',
2019),
(
'K-12',
'Regina',
2015);
INSERT INTO CONDUCTOR
VALUES (
'GP1029',
'Bill',
'Junior'),
(
'GP1030',
'Sandrine',
'Junior');
INSERT INTO ASSIGNED_TO
VALUES (
'K-13',
'GP1029',
DATE '2015/12/14'),
(
'K-12',
'GP1030',
'20120909');
-- Question 5:
UPDATE
CONDUCTOR
SET ExperienceLevel = 'Senior'
WHERE Id = 'GP1029';
-- Question 6:
-- 1.
SELECT Id
FROM TRAIN;
-- 2.
SELECT Name
FROM CONDUCTOR
WHERE ExperienceLevel = 'Senior';
-- 3.
SELECT ConstructionYear
FROM TRAIN
WHERE Model = 'SurfLiner'
OR Model = 'Regina';
-- 4.
SELECT ConductorId
FROM ASSIGNED_TO
WHERE TrainId = 'K-13'
AND Day = '2015/12/14';
-- 5.
SELECT Model
FROM TRAIN,
ASSIGNED_TO
WHERE ConductorID = 'GP1029'
AND TrainId = TRAIN.ID;
SQL
statements for the COFFEE database)Solution to Question 1:
The answers to the rest of the questions are in the following code:
/* code/sql/HW_DBCoffee.sql */
-- Question 2:
START TRANSACTION;
INSERT INTO CUSTOMER
VALUES (
005,
'Bob Hill',
NULL,
001);
INSERT INTO COFFEE
VALUES (
002,
"Peru",
"Decaf",
3.00);
-- The following statement raises an error.
-- INSERT INTO PROVIDER
-- VALUES (NULL, "contact@localcof.com");
-- ERROR 1048 (23000) at line 68: Column
-- 'Name'
-- cannot
-- be
-- null
INSERT INTO SUPPLY
VALUES (
"Johns & Co.",
121);
-- The following statement raises an error.
-- -INSERT INTO SUPPLY
-- VALUES ("Coffee Unl.", 311, 221);
-- ERROR 1136 (21S01): Column count
-- doesn't
-- match
-- value
-- count at row 1
-- Rest the changes:
ROLLBACK;
-- Question 3:
START TRANSACTION;
UPDATE
CUSTOMER
SET FavCoffee = 001
WHERE CardNo = 001;
-- Rows matched: 1 Changed: 1 Warnings: 0
SELECT *
FROM CUSTOMER;
ROLLBACK;
START TRANSACTION;
UPDATE
COFFEE
SET TypeOfRoast = 'Decaf'
WHERE Origin = 'Brazil';
-- Rows matched: 2 Changed: 2 Warnings: 0
SELECT *
FROM COFFEE;
ROLLBACK;
START TRANSACTION;
UPDATE
PROVIDER
SET Name = 'Coffee Unlimited'
WHERE Name = 'Coffee Unl.';
-- Rows matched: 1 Changed: 1 Warnings: 0
SELECT *
FROM PROVIDER;
SELECT *
FROM SUPPLY;
ROLLBACK;
START TRANSACTION;
UPDATE
COFFEE
SET PricePerPound = 10.00
WHERE PricePerPound > 10.00;
-- Rows matched: 1 Changed: 1 Warnings: 0
SELECT *
FROM COFFEE;
ROLLBACK;
-- Question 4:
START TRANSACTION;
DELETE FROM CUSTOMER
WHERE Name LIKE '%S%';
-- Query OK, 2 rows affected (0.01 sec)
SELECT *
FROM CUSTOMER;
ROLLBACK;
START TRANSACTION;
DELETE FROM COFFEE
WHERE Ref = 001;
-- Query OK, 1 row affected (0.00 sec)
SELECT *
FROM COFFEE;
SELECT *
FROM SUPPLY;
ROLLBACK;
START TRANSACTION;
DELETE FROM SUPPLY
WHERE Provider = 'Coffee Unl.'
AND Coffee = '001';
-- Query OK, 1 row affected (0.00 sec)
SELECT *
FROM SUPPLY;
ROLLBACK;
START TRANSACTION;
DELETE FROM PROVIDER
WHERE Name = 'Johns & Co.';
-- Query OK, 1 row affected (0.00 sec)
SELECT *
FROM PROVIDER;
SELECT *
FROM SUPPLY;
ROLLBACK;
-- Question 5:
-- 1.
SELECT Origin
FROM COFFEE
WHERE TypeOfRoast = 'Dark';
-- 2.
SELECT FavCoffee
FROM CUSTOMER
WHERE Name LIKE 'Bob%';
-- 3.
SELECT Name
FROM PROVIDER
WHERE Email IS NULL;
-- 4.
SELECT COUNT(*)
FROM SUPPLY
WHERE Provider = 'Johns & Co.';
-- 5.
SELECT Provider
FROM COFFEE,
SUPPLY
WHERE TypeOfRoast = 'Dark'
AND Coffee = Ref;
SELECT EMPLOYEE.Name
FROM EMPLOYEE,
DEPARTMENT
WHERE DEPARTMENT.Name = "Storage"
AND EMPLOYEE.Department = DEPARTMENT.ID;
SELECT EMPLOYEE.Name
FROM EMPLOYEE,
DEPARTMENT
WHERE Hired <= ALL (
SELECT Hired
FROM EMPLOYEE
WHERE Hired IS NOT NULL
AND DEPARTMENT.Name = "Storage"
AND EMPLOYEE.Department = DEPARTMENT.ID)
AND DEPARTMENT.Name = "Storage"
AND EMPLOYEE.Department = DEPARTMENT.ID;
SELECT Model
FROM COMPUTER
WHERE ID = 'A';
SELECT TYPE
FROM PERIPHERAL
WHERE ID = '14';
SELECT Model
FROM PERIPHERAL
WHERE TYPE = 'printer';
SELECT Model
FROM PERIPHERAL
WHERE Model LIKE 'IBM%';
SELECT Model
FROM PERIPHERAL,
CONNEXION
WHERE Computer = 'A'
AND Peripheral = PERIPHERAL.ID;
SELECT COUNT(Computer)
FROM CONNEXION,
COMPUTER
WHERE Model = 'Apple IIc Plus'
AND Computer = COMPUTER.ID;
/* code/sql/HW_SocialMedia.sql */
-- … the title of all the videos ("My first video!", "My
-- second video!", "My vacations").
SELECT TITLE
FROM VIDEO;
-- … the release date of the video whose title is "My first
-- video!" ("2020-02-02").
SELECT Released
FROM VIDEO
WHERE Title = "My first video!";
-- … the ID of the account(s) where the "Name" attribute
-- was not given ("2").
SELECT ID
FROM ACCOUNT
WHERE Name IS NULL;
-- … the ID of the videos whose title contains the word
-- "video" ("10", "20").
SELECT ID
FROM VIDEO
WHERE TITLE LIKE "%video%";
-- or
SELECT ID
FROM VIDEO
WHERE Title REGEXP 'video';
-- … the number of thumbs up for the video with title "My
-- vacations" ("1").
SELECT COUNT(*)
FROM THUMBS_UP,
VIDEO
WHERE VIDEO.Title = "My vacations"
AND VIDEO.ID = THUMBS_UP.Video;
-- … the title of the oldest video ("My first video!").
SELECT Title
FROM VIDEO
WHERE Released <= ALL (
SELECT Released
FROM VIDEO);
-- or
SELECT Title
FROM VIDEO
WHERE Released = (
SELECT Min(Released)
FROM VIDEO);
-- or even
SELECT Title
FROM VIDEO
ORDER BY Released ASC
LIMIT 1;
-- … the names of the accounts who gave a thumbs up to the
-- video with id 30 ("Bob Ross").
SELECT Name
FROM ACCOUNT,
THUMBS_UP
WHERE THUMBS_UP.Video = 30
AND THUMBS_UP.Account = ACCOUNT.ID;
-- … the ID of the account with the greatest number of
-- subscribers ("2").
SELECT Subscribed
FROM SUBSCRIBE
GROUP BY Subscribed
ORDER BY COUNT(Subscriber) DESC
LIMIT 1;
The following solves all the issues with your friend’s code design. As quests only rarely provide a special item, we added a relation to avoid having a Special-item
in the QUEST
table since that would be NULL
too often.
Here are possible ways of getting the required information:
The Title
of all the books:
SELECT Title FROM BOOK;
The distinct Name
of the publishers.
SELECT DISTINCT Name FROM PUBLISHER;
The Titles
and Published
dates of the books published since January 31, 2012.
SELECT Title, Published FROM BOOK
WHERE Published > DATE'20120131';
The first and last names of the authors published by "Gallimard"
(from any city).
SELECT FName, LName FROM AUTHOR, BOOK
WHERE PublisherName = "Gallimard"
AND Author = ID;
The first and last names of the authors who were not published by an editor in "New-York"
.
SELECT FName, LName FROM AUTHOR, BOOK
WHERE NOT PublisherCity= "New-York"
AND Author = ID;
The ID
of the authors who published a book whose name starts with "Where"
.
SELECT Author FROM BOOK
WHERE Title LIKE 'Where%';
The total number of pages in the database.
SELECT SUM(Pages) FROM BOOK;
The number of pages in the longest book written by the author whose last name is "Wolve"
.
SELECT MAX(PAGES) FROM BOOK, AUTHOR
WHERE LName = "Wolve"
AND Author = ID;
The title of the books published in the 19th century.
SELECT Title FROM BOOK
WHERE Published >= DATE'18010101'
AND Published <= DATE'19001231';
UPDATE BOOK SET Title = "BANNED"
WHERE Author = 3;
The pair (title, publication date) is the primary key in the BOOK
table, so if the author whose ID is 3 has published more than one book at a particular date, then our update will be rejected, as applying it would result in violating the entity integrity constraint.
DELETE FROM BOOK WHERE Author = 3;
DELETE FROM AUTHOR WHERE ID = 3;
Note that trying to delete the rows in the AUTHOR
table before deleting the rows in the BOOK
table could cause a referential integrity violation, since the BOOK
table has a foreign key assigned to the AUTHOR
table’s Id
field.
CREATE TABLE AWARD(
VARCHAR(30),
Name Year DATE,
VARCHAR(30),
BookTitle DATE,
BookPubDate FOREIGN KEY (BookTitle, BookPubDate)
REFERENCES BOOK(Title, Published),
PRIMARY KEY (Name, Year)
);
Note that there is no need to store the name of the author in this relation: this information can be recovered by looking in the BOOK
table for the name of the author of the awarded book.
Note that having two attributes as the primary key makes the referencing of foreign keys more cumbersome.
Two of the flaws that come to mind are:
BOOK
relation: two books with the same title cannot be published on the same day, which is a serious limitation. Using a primary key like ISBN would be much more appropriate.IS_THE_AUTHOR_OF
and PUBLISHED_BY
, that “maps” the book’s ISBN with author’s or editor’s primary key.The solution can be read from the following code:
/* code/sql/HW_Certificate.sql */
DROP SCHEMA IF EXISTS HW_Certificate;
CREATE SCHEMA HW_Certificate;
USE HW_Certificate;
/*
SN = Serial Number
CN = Common Name
CA = Certificate Authority
*/
CREATE TABLE ORGANIZATION (
SN VARCHAR(30) PRIMARY KEY,
CN VARCHAR(30)
);
CREATE TABLE CA (
SN VARCHAR(30) PRIMARY KEY,
CN VARCHAR(30),
Trusted BOOL
);
CREATE TABLE CERTIFICATE (
SN VARCHAR(30) PRIMARY KEY,
CN VARCHAR(30) NOT NULL,
Org VARCHAR(30) NOT NULL,
Issuer VARCHAR(30),
Valid_Since DATE,
Valid_Until DATE,
FOREIGN KEY (Org) REFERENCES ORGANIZATION (SN) ON DELETE CASCADE,
FOREIGN KEY (Issuer) REFERENCES CA (SN)
);
INSERT INTO ORGANIZATION
VALUES (
'01',
'Wikimedia Foundation'),
(
'02',
'Free
Software Foundation');
INSERT INTO CA
VALUES (
'A',
"Let's Encrypt",
TRUE),
(
'B',
'Shady Corp.',
FALSE),
(
'C',
'NewComer Ltd.',
NULL);
INSERT INTO CERTIFICATE
VALUES (
'a',
'*.wikimedia.org',
'01',
'A',
20180101,
20200101),
(
'b',
'*.fsf.org',
'02',
'A',
20180101,
20191010),
(
'c',
'*.shadytest.org',
'02',
'B',
20190101,
20200101),
(
'd',
'*.wikipedia.org',
'01',
'C',
20200101,
20220101);
-- CN of all certificates.
SELECT CN
FROM CERTIFICATE;
-- (*.wikimedia.org | *.fsf.org | *.shadytest.org |
-- *.wikipedia.org)
-- The SN of the organizations whose CN
-- contains
-- "Foundation"
SELECT SN
FROM ORGANIZATION
WHERE CN LIKE "%Foundation%";
-- (01 | 02)
-- The CN and expiration date of all the
-- certificates
-- that
-- expired (assuming we are the 6th of
-- December
-- 2019).
SELECT CN,
Valid_Until
FROM CERTIFICATE
WHERE Valid_Until < DATE '20191206';
-- (*.fsf.org, 2019-10-10)
-- The CN of the CA that are not trusted.
SELECT CN
FROM CA
WHERE Trusted IS NOT TRUE;
-- (Shady Corp. | NewComer Ltd.)
-- The CN of the certificates that are
-- signed
-- by
-- a
-- CA
-- that
-- is not trusted.
SELECT CERTIFICATE.CN
FROM CERTIFICATE,
CA
WHERE Trusted IS NOT TRUE
AND CA.SN = CERTIFICATE.Issuer;
-- (Shady Corp. | NewComer Ltd.)
-- The number of certificates signed by
-- the
-- CA
-- whose
-- CN
-- is
-- "Let's encrypt".
SELECT COUNT(CERTIFICATE.SN) AS "Number of certificates signed
by Let's encrypt"
FROM CERTIFICATE,
CA
WHERE CERTIFICATE.Issuer = CA.SN
AND CA.CN = "Let's encrypt";
-- (2)
-- A table listing the CN of the
-- organizations
-- along
-- with
-- the CN of their certificates.
SELECT ORGANIZATION.CN AS Organization,
CERTIFICATE.CN AS Certificate
FROM ORGANIZATION,
CERTIFICATE
WHERE CERTIFICATE.Org = ORGANIZATION.SN;
-- ( Wikimedia Foundation, *.wikimedia.org | Free Software
-- Foundation, *.fsf.org | Free Software
-- Foundation
-- ,
-- *.shadytest.org | Wikimedia Foundation ,
-- *.wikipedia.org
-- )
/*
DELETE FROM CA WHERE SN = 'A';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`HW_Certificate`.`CERTIFICATE`, CONSTRAINT `CERTIFICATE_ibfk_2` FOREIGN KEY (`Issuer`) REFERENCES `CA` (`SN`))
=> Rejected, because an entry in CERTIFICATE references this tuple (referential integrity constraint).
UPDATE ORGANIZATION SET CN = "FSF" WHERE SN = '02';
Query OK, 1 row affected (0.008 sec)
Rows matched: 1 Changed: 1 Warnings: 0
=> Ok, change
('02', 'Free Software Foundation');
into
('02', 'FSF');
in ORGANIZATION
MariaDB [HW_Certificate]> UPDATE ORGANIZATION SET SN = "01" WHERE SN = '02';
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`HW_Certificate`.`CERTIFICATE`, CONSTRAINT `CERTIFICATE_ibfk_1` FOREIGN KEY (`Org`) REFERENCES `ORGANIZATION` (`SN`) ON DELETE CASCADE)
=> Rejected, because an entry in CERTIFICATE references this tuple (referential integrity constraint).
This query would have been rejected even if this tuple was not referenced, since it would have violated the entity integrity constraint.
DELETE FROM ORGANIZATION;
=> Deletes all the content of organization and of certificate.
*/
The relational model for this code is:
The solution to the next questions can be read from the following code:
/* code/sql/HW_Work.sql */
/*
Determine if the following insertion statements would violate the the Entity integrity constraint,
the Referential integrity constraint, if there would be some Other kind of error, or if it would
result in uccessful insertion.
*/
START TRANSACTION;
-- We don't want to perform the actual insertions.
INSERT INTO EBOOK
VALUES (
0,
NULL,
20180101,
0);
/*
Query OK, 1 row affected (0.003 sec)
So, "Successful insertion".
*/
-- The following statement raises an error.
-- INSERT INTO AUTHOR VALUES ("Mary B.", "mb@fai.fr", NULL);
/*
ERROR 1136 (21S01): Column count doesn't match value count at row 1
So, "Other kind of error".
*/
-- The following statement raises an error.
-- INSERT INTO WORK VALUES ("My Life", "Claude A.");
/*
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
(`HW_EXAM_1`.`WORK`, CONSTRAINT `WORK_ibfk_1` FOREIGN KEY (`Author`) REFERENCES `AUTHOR` (`Name`)
ON DELETE CASCADE ON UPDATE CASCADE)
So, "Referential integrity constraint"
*/
INSERT INTO BOOK
VALUES (
00000000,
NULL,
DATE '20001225',
90.9);
/*
Query OK, 1 row affected (0.000 sec)
So, "Successful insertion".
*/
-- The following statement raises an error.
-- INSERT INTO AUTHOR VALUES ("Virginia W.", "alt@isp.net");
/*
ERROR 1062 (23000): Duplicate entry 'Virginia W.' for key 'PRIMARY'
So, "Entity integrity constraint".
*/
ROLLBACK;
-- We go back to the previous state.
/*
List the rows (i.e., A.2, W.1, etc.) modified by the following statements
(be careful about the conditions on foreign keys!):
*/
START TRANSACTION;
-- We don't want to perform the following operations.
UPDATE
AUTHOR
SET Email = 'Deprecated'
WHERE Email LIKE '%isp.net';
/*
Query OK, 2 rows affected (0.010 sec)
Rows matched: 2 Changed: 2 Warnings: 0
This changed A.1 and A.2
*/
UPDATE
WORK
SET Title = "How to eat"
WHERE Title = "What to eat";
/*
Rows matched: 1 Changed: 1 Warnings: 0
SQL returns only the number of row changed in the WORK table,
but other rows have been changed as well.
This changed W.1, B.1, E.1.
*/
-- The following statement raises an error.
-- DELETE FROM WORK;
/*
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
(`HW_EXAM_1`.`BOOK`, CONSTRAINT `BOOK_ibfk_1` FOREIGN KEY (`Work`) REFERENCES `WORK` (`Title`) ON UPDATE CASCADE)
Does not change any row.
*/
-- The following statement raises an error.
-- DELETE FROM AUTHOR WHERE Name = "Virginia W.";
/*
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
(`HW_EXAM_1`.`BOOK`, CONSTRAINT `BOOK_ibfk_1` FOREIGN KEY (`Work`) REFERENCES `WORK` (`Title`) ON UPDATE CASCADE)
Does not change any row.
*/
ROLLBACK;
-- We go back to the previous state.
-- You can now assume that there is more data than
-- what we inserted, if that helps you. Write a
-- command that selects …
-- We insert some dummy values for this
-- next part.
INSERT INTO WORK
VALUES (
"My Life",
"Paul B."),
(
"What to eat, 2",
"Virginia W.");
INSERT INTO BOOK
VALUES (
15355627,
"My Life",
DATE '20180219',
15.00),
(
12912912,
"What to eat, 2",
DATE '20200101',
13);
INSERT INTO EBOOK
VALUES (
15150628,
"My Life",
DATE '20190215',
10.89),
(
42912912,
"What to eat, 2",
DATE '20200115',
12);
-- … the price of all the ebooks.
SELECT Price
FROM EBOOK;
-- … the (distinct) names of the authors who have authored
-- a piece of work.
SELECT DISTINCT Author
FROM WORK;
-- … the name of the authors using fai.fr for their email.
SELECT Name
FROM AUTHOR
WHERE Email LIKE '%fai.fr';
-- … the price of the ebooks published after 2018.
SELECT Price
FROM BOOK
WHERE Published >= 20180101;
/*
Note that
SELECT Price FROM BOOK WHERE Published > 2018;
would return all the prices, along with a warning:
Incorrect datetime value: '2018'
*/
-- … the price of the most expensive book.
SELECT MAX(Price)
FROM BOOK;
-- … the number of pieces of work written by the author
-- whose name is “Virginia W.”.
SELECT COUNT(*)
FROM WORK
WHERE WORK.Author = "Virginia W.";
-- … the email of the author who wrote the piece of work
-- called “My Life”.
SELECT Email
FROM AUTHOR,
WORK
WHERE WORK.Title = "My Life"
AND WORK.Author = AUTHOR.Name;
-- the isbn(s) of the book containing a work written by the
-- author whose email is "vw@isp.net".
SELECT ISBN
FROM BOOK,
WORK,
AUTHOR
WHERE AUTHOR.Email = "vw@isp.net"
AND WORK.Author = AUTHOR.Name
AND BOOK.Work = WORK.Title;
/*
Write a command that updates the title of all the pieces of work written by the author whose name is “Virginia W. to”BANNED".
Is there any reason for this command to be rejected by the system? If yes, explain which one.
*/
-- The following statement raises an error.
/*
UPDATE
WORK
SET
Title = "BANNED"
WHERE
Author = "Virginia W.";
*/
/*
Gives an error, since "Title" is the primary key in the WORK table, and Virginia W. has authored two pieces of work or more,
they are both given the title "BANNED", which violates the unicity of value in primary keys.
*/
-- Write one or multiple commands that would delete the work
-- whose title is “My Life”, as well as
-- all
-- of
-- the
-- books
-- and ebooks versions of it.
-- The following statement raises an
-- error.
-- DELETE FROM WORK
-- WHERE Title = "My Life";
/*
Fails
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
(`HW_EXAM_1`.`BOOK`, CONSTRAINT `BOOK_ibfk_1` FOREIGN KEY (`Work`) REFERENCES `WORK` (`Title`) ON UPDATE CASCADE)
*/
-- We have to first delete the corresponding publications:
DELETE FROM BOOK
WHERE WORK = "My Life";
DELETE FROM EBOOK
WHERE WORK = "My Life";
-- And then we can delete the work:
DELETE FROM WORK
WHERE Title = "My Life";
/*
And, no, we cannot delete "simply" from multiple tables in one command.
Some workaround exists, cf. https://stackoverflow.com/q/1233451/ .
*/
Finally, to answer the last question, here is a list of the possible limitations:
AUTHOR
and WORK
tables) is not a good idea: we cannot have two authors with the same name or two pieces of work with the same title!BOOK
and the EBOOK
tables are going to be the same, then we should probably have only one table called PUBLICATION
with a boolean to indicate whenever the publication is digital or on paper.ON DELETE CASCADE
and ON DELETE RESTRICT
is not really justified and makes the tables harder to use. We should have used the same update policy on both tables./*
code/sql/HW_TEXTBOOK_AUTHORED_SOL.sql
*/
/*
EXERCISE 1
Write a command that updates the email address of 'Gaddis', 'Tony' to "tgaddis@pearson.com"
*/
UPDATE
AUTHOR
SET Email = "tgaddis@pearson.com"
WHERE LName = 'Gaddis'
AND FName = 'Tony';
/*
You can use
SELECT * FROM AUTHOR;
to check that the modification took place.
*/
/*
EXERCISE 2
Write a command that inserts the textbook of your choice in the
TEXTBOOK table. No value should be NULL, but you can invent
the values.
*/
INSERT INTO TEXTBOOK
VALUES (
'Fundamentals of Database Systems',
9780133970777,
165.89);
/*
You can use
SELECT * FROM TEXTBOOK;
to check that the insertion was correctly made.
*/
/*
EXERCISE 3
Write a command that makes 'Gaddis', 'Tony' the author of the
textbook you just added to our database.
*/
INSERT INTO AUTHORED
VALUES (
9780133970777,
'Gaddis',
'Tony');
/*
You can use
SELECT * FROM AUTHORED;
to check that the insertion was correctly made.
EXERCISE 4
Write a command that makes "0.01" becomes the
default value for the Price attribute of the
TEXTBOOK relation.
*/
ALTER TABLE TEXTBOOK
ALTER COLUMN Price SET DEFAULT 0.01;
/*
You can use
DESCRIBE TEXTBOOK;
to check that the Price attribute now has a default
value.
EXERCISE 5
Write a command that insert a textbook of
your choice in the TEXTBOOK table, with the
price set to the default value.
*/
INSERT INTO TEXTBOOK
VALUES (
'Proof Theory',
9780486490731,
DEFAULT);
/*
You can use
SELECT * FROM TEXTBOOK;
to check that the insertion was correctly made.
EXERCISE 6
Write a command that creates a table called EDITOR
with 3 attributes, "Name", "Address" and "Website".
The "Name" attribute should be the primary key.
Then, insert two tuples in the EDITOR table, one
should have the "Name" attribute set to "Pearson".
*/
CREATE TABLE EDITOR (
NAME VARCHAR(30) PRIMARY KEY,
Address VARCHAR(255),
Website VARCHAR(100)
);
INSERT INTO EDITOR
VALUES (
'Pearson',
NULL,
'http://pearsoned.com/'),
(
'Dover',
NULL,
'https://store.doverpublications.com/');
/*
You can use
DESCRIBE EDITOR;
to check that the table was actually created, and
SELECT * FROM EDITOR;
to check that the values were inserted.
EXERCISE 7
Write a command that creates a table called PUBLISHED
with 2 attributes, "Editor", and "Textbook".
The "Editor" attribute should references the EDITOR
table, and the "Textbook" attribute should reference
the TEXTBOOK table.
*/
CREATE TABLE PUBLISHED (
Editor VARCHAR(30),
FOREIGN KEY (Editor) REFERENCES EDITOR (NAME),
Textbook CHAR(13),
FOREIGN KEY (Textbook) REFERENCES TEXTBOOK (ISBN)
);
/*
You can use
DESCRIBE PUBLISHED;
to check that the table was actually created.
EXERCISE 8
Write a command that makes "Pearson" the editor of
the textbook whose ISBN is 9780133776744.
*/
INSERT INTO PUBLISHED
VALUES (
"Pearson",
9780133776744);
/*
You can use
SELECT * FROM PUBLISHED;
to check that the table was actually created.
EXERCISE 9
Answer the following short questions. In our model, as it is, …
Can an author have authored more than one textbook?
Yes.
Can a textbook have more than one author?
Yes.
Can a textbook without ISBN be inserted in the TEXTBOOK relation?
No, unless you create a "dummy" (fake) value for it,
like 0000000000000, but this value can be used only
once, since ISBN is the primary key.
Can the price of a textbook be negative?
Yes. We can actually test it:
INSERT INTO TEXTBOOK VALUES ("Test", 0000000000000, -1);
Can two author have the same first and last name?
No. The query:
INSERT INTO AUTHOR VALUES ('Smith', 'Bob', NULL), ('Smith', 'Bob', NULL);
returns
ERROR 1062 (23000): Duplicate entry 'Smith-Bob' for key 'PRIMARY'
Can two textbooks have the same title?
Yes, as long as they have different ISBN. The command
INSERT INTO TEXTBOOK VALUES ("Test", 0000000000001, NULL), ("Test", 0000000000002, NULL);
is processed just fine.
Can two editiors have the same address?
Yes. The command:
INSERT INTO EDITOR VALUES ("Test 1", "123 Main St.", NULL), ("Test 2", "123 Main St.", NULL);
is processed just fine.
/*
code/sql/HW_CapstoneSol.sql
*/
/*
I. Short Questions (6 pts)
Answer the following short questions based on the model implemented above.
You can simply answer "True" or "False", or justify your reasoning (e.g. with code).
*/
-- 1. Can a project uses multiple programming languages?
-- Yes.
-- 2. Can a student be the leader of multiple
-- projects?
-- Yes.
-- 3. Can multiple projects have the same code name?
-- Yes.
-- 4. Could Claude simply enter NULL for the value
-- of his pproject's code name, since he's undecided?
-- No.
-- 5. Can a project be created without project
-- leader?
-- No.
-- 6. Can we know who is working on a project
-- without being its leader?
-- No.
/*
II. Relational Model (6 pts.)
Draw the relational model corresponding to this code.
You can hand-draw it and join a scan or a picture, or simply hand me back the sheet where you drew it.
*/
/*
III. Simple Commands (8 pts.)
Below, you are asked to write commands that perform various actions.
Please, leave them uncommented, unless you can't write them correctly, in which case it's ok to leave them commented.
The first question is answered as an example.
*/
-- 0. Write a command that list all the names of the
-- programming languages.
SELECT Name
FROM PROGRAMMING_LANGUAGE;
-- 1. Write a command that insert a new student in the
-- STUDENT table.
-- (You should invent the values).
INSERT INTO STUDENT
VALUES (
"Bob",
"0987654321234",
NULL,
NULL);
-- 2. Write a command that updates the code name of the
-- project ("Undecided", "9999999999999") to "VR in
-- ER".
UPDATE
PROJECT
SET CodeName = "VR in ER"
WHERE CodeName = "Undecided"
AND Leader = "9999999999999";
-- 3. Write a command that updates the graduation year of the
-- student whose id is "0987654321098" to 2024, and
-- the semester to "Fall".
UPDATE
STUDENT
SET GraduationYear = 2024,
GraduationSemester = "Fall"
WHERE id = "0987654321098";
-- 4. Write a command that changes the STUDENT table to make
-- it impossible to enter NULL for the first name of
-- a student, without changing the primary key.
ALTER TABLE STUDENT MODIFY FName VARCHAR(50) NOT NULL;
-- 5. Write a command that changes the datatype of
-- GraduationYear to SMALLINT.
ALTER TABLE STUDENT MODIFY GraduationYear SMALLINT;
-- 6. Write a command that adds an attribute "ReleaseDate" to
-- the PROJECT table.
ALTER TABLE PROJECT
ADD COLUMN ReleaseDate DATE;
-- 6.bis If you managed to write the previous command
-- correctly, write a command that sets the release
-- date of the project ("Brick Break",
-- "0123456789100")
-- to
-- the 26th of November 2022.
UPDATE
PROJECT
SET ReleaseDate = DATE "20221126"
WHERE CodeName = "Brick Break"
AND Leader = "0123456789100";
-- 7. Write a command that makes it impossible for a student
-- to be the leader in more than one project
-- (This command should return an error)
-- ALTER TABLE PROJECT ADD UNIQUE (Leader);
/* code/sql/HW_VaccineSol.sql */
/*
I. Short Questions (3 pts.)
Answer the following short questions. In our implementation…
1. … can two companies have exactly the same name?
No, as COMPANY.Name is the only attribute in the primary key of COMPANY.
2. … can two companies have the same website?
Yes, nothing prevents it.
3. … can a company not have a website?
Yes, the domain of COMPANY.Website is "VARCHAR(255)", without a constraint preventing it from being "NULL".
4. … can the same vaccine be manufactured by multiple companies?
No, as VACCINE.Manufacturer is an attribute in VACCINE that accepts only one value.
5. … can a vaccine not have a manufacturer?
No, as VACCINE.Manufacturer bears the "NOT NULL" constraint.
6. … can a disease being neither communicable nor not communicable?
Yes, as DISEASE.Communicable is of type "BOOL", it accepts the "NULL" value.
7. … can the same vaccine have different efficacies for different diseases?
Yes, the EFFICACY table has for primary key VaccineName and DiseaseName, which implies that the same vaccine can occur repeatedly as long as it is associated with different diseases.
*/
/*
II. Longer Questions (6 pts.)
Answer the following questions:
1. What does `CHECK (Website LIKE "https://*")` do?
It refrains any value not starting with "https://" to be inserted as a value for the COMPANY.Website attribute.
Note that in particular it forbids a website from not being secured (that is, http:// is not a valid protocol).
2. Why did we picked the `DECIMAl(5,2)` datatype?
It is the appropriate datatype to represent percentage values represented as ranging from 100.00 to 0.00.
The discussion at https://stackoverflow.com/a/2762376/ also highlights that percent can be represented as decimal(5,4) with a check to insure that the value will range between 1.0000 and 0.0000.
3. What is the benefit / are the benefits of having a separate EFFICACY table over having something like
CREATE TABLE VACCINE(
Name VARCHAR(50) PRIMARY KEY,
Manufacturer VARCHAR(50),
Disease VARCHAR(50),
Efficacy DECIMAl(5,2),
FOREIGN KEY (Manufacturer) REFERENCES COMPANY (Name)
);
?
This implementation does not allow to record that the same vaccine can have different efficacies for different diseases.
Stated differently, it forbids to represent vaccines efficient against multiple diseases faitfully.
*/
/*
III. Relational Model (6 pts.)
Draw the relational model corresponding to this code.
You can hand-draw it and join a scan or a picture, or simply hand me back a sheet.
*/
/*
IV. Simple Commands (5 pts.)
Below, you are asked to write commands that perform various actions.
Please, leave them uncommented, unless
- you can not write them correctly, but want to share your attempt,
- it is specified that it should return an error.
The first question is answered as an example.
*/
-- 0. Write a command that list the names of
-- all the diseases.
SELECT Name
FROM DISEASE;
-- 1. Write a command that insert "Pfizer" in the
-- COMPANY table (you can make up the website or look
-- it)
INSERT INTO COMPANY
VALUES (
"Pfizer",
"https://www.pfizer.com/");
-- 2. Write a command that insert the "Pfizer-BioNTech
-- COVID-19 Vaccine" in the VACCINE table, and a
-- command
-- that store the efficacy of that vaccine against
-- the "Coronavirus disease 2019" disease
-- ( you can make up the values or look them up).
INSERT INTO VACCINE
VALUES (
"Pfizer-BioNTech COVID-19 Vaccine",
"Pfizer");
INSERT INTO EFFICACY
VALUES (
"Coronavirus disease 2019",
"Pfizer-BioNTech COVID-19 Vaccine",
89);
-- 3. Write a command that updates the name of the
-- company "Moderna" to "Moderna, Inc." everywhere.
UPDATE
COMPANY
SET Name = "Moderna, Inc."
WHERE Name = "Moderna";
-- 4. Write a command that lists the name of all the
-- companies.
SELECT Name
FROM COMPANY;
-- 5. Write a command that deletes the "Coronavirus disease
-- 2019" entry from the DISEASE table (if only!).
/*
DELETE FROM DISEASE
WHERE Name = "Coronavirus disease 2019";
*/
-- This command should return an error. Explain it and leave
-- the command commented.
-- The "Coronavirus disease 2019" value in DISEASE.Name
-- is
-- refereed to by two entries in the EFFICACY table.
-- As the foreign key from EFFICACY.DiseaseName to
-- DISEASE.Name does not specify its policy "ON DELETE",
-- its
-- default behavior is to restrict deletion, causing the
-- error.
-- 6. Write two commands: one that adds "physiological"
-- to
-- the possible types of diseases, and one that
-- inserts
-- a physiological disease in the DISEASE table.
ALTER TABLE DISEASE MODIFY TYPE ENUM ("infectious",
"deficiency", "hereditary", "physiological");
INSERT INTO DISEASE
VALUES (
"Asthma",
FALSE,
"physiological");
-- 7 (difficult). Write a command that return the list of
-- all the companies that manufacture a
-- vaccine against "Coronavirus disease
-- 2019".
SELECT VACCINE.Manufacturer
FROM VACCINE,
EFFICACY
WHERE VACCINE.Name = EFFICACY.VaccineName
AND EFFICACY.DiseaseName = "Coronavirus disease 2019";
The file code/sql/HW_ResidencySol.sql contains the solution to the code part of this problem.
To violate the entity integrity constraint, it suffices to insert a tuple with NULL
as a value for one of the attributes of a primary key or to insert a value that was already inserted.
Two examples are:
INSERT INTO PERSON VALUES ("Bob", "Ross", NULL, DATE"1942-10-29");
which would return ERROR 1048 (23000): Column 'SSN' cannot be null
.
INSERT INTO HOUSE VALUES ("123 Main St.", "green");
which would return ERROR 1062 (23000): Duplicate entry '123 Main St.' for key 'PRIMARY'
.
To violate the referential integrity constraint, it suffices to insert a tuple where the value for one of the attributes of a foreign key does not exist in the referenced table.
For instance,
INSERT INTO RESIDENCY VALUES ("999-99-9999", NULL, NULL, NULL);
would return
1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`HW_Residency_SOL`.`RESIDENCY`, CONSTRAINT `RESIDENCY_ibfk_1` FOREIGN KEY (`Person`) REFERENCES `PERSON` (`SSN`)) ERROR
Since there is no row in the PERSON
table with the value "999-99-9999"
for SSN
.
The answers can be found in the following snippet:
/*
In the following we use transactions
to be able to simulate the "what if"
aspect of the questions: we will not
commit the changes we are testing,
and roll back on them before moving to
the next question.
*/
-- Exercise 4
-- List the rows (i.e., P.2, H.1, or even
-- “none”)
-- modified by the following statements:
START TRANSACTION;
UPDATE
HOUSE
SET COLOR = "green";
-- H.1, H.2 and H.3
ROLLBACK;
START TRANSACTION;
DELETE FROM RESIDENCY
WHERE House LIKE "1%";
-- R.1,and R.3
ROLLBACK;
START TRANSACTION;
DELETE FROM HOUSE
WHERE Address = "456 Second St.";
-- H.2, R.2 and R.4 (because of the foreign key).
ROLLBACK;
START TRANSACTION;
-- Commented, because it causes an error.
-- DELETE FROM PERSON
-- WHERE Birthdate = DATE "1990-02-11";
-- None, because of the foreign key and
-- the
-- referential
-- integrity constraint.
-- ERROR 1451 (23000): Cannot delete or
-- update
-- a
-- parent
-- row:
-- a foreign key constraint fails
-- (`HW_RESIDENCY_SOL`.`RESIDENCY`,
-- CONSTRAINT
-- `RESIDENCY_ibfk_1` FOREIGN KEY
-- (`Person`)
-- REFERENCES
-- `PERSON` (`SSN`))
ROLLBACK;
The answers can be found in the following snippet:
-- Exercise 5
/* Write a query that selects …
… the addresses of the houses in the system (11 Third St., 123 Main St., 456 Second St.).
*/
SELECT Address
FROM HOUSE;
-- … the SSN of the persons whose first name was not
-- entered in the system (000-00-0000).
SELECT SSN
FROM PERSON
WHERE FName IS NULL;
-- … all the different colors of houses (white, blue).
SELECT DISTINCT COLOR
FROM HOUSE;
-- … the address of the residency of James Baldwin (123
-- Main St.).
SELECT House
FROM RESIDENCY,
PERSON
WHERE PERSON.Fname = "James"
AND PERSON.LName = "Baldwin"
AND PERSON.SSN = RESIDENCY.Person;
-- … the first name of the oldest person in the database
-- (James).
SELECT FName
FROM PERSON
WHERE Birthdate = (
SELECT MIN(Birthdate)
FROM PERSON
WHERE Birthdate IS NOT NULL);
-- … Michael Keal’s principal residency address (123 Main
-- St.).
SELECT RESIDENCY.House
FROM RESIDENCY,
PERSON
WHERE PERSON.FName = "Michael"
AND PERSON.LName = "Keal"
AND PERSON.SSN = RESIDENCY.Person
AND RESIDENCY.PrincipalResidence = TRUE;
-- … the (distinct) first and last names of the homeowners
-- (Michael Keal, Mridula Warrier).
SELECT DISTINCT (PERSON.FName),
PERSON.LName
FROM PERSON,
RESIDENCY
WHERE RESIDENCY.Status = "own"
AND RESIDENCY.Person = PERSON.SSN;
-- cf comment at snippet homonyms
SELECT PERSON.FName,
PERSON.LName
FROM PERSON
WHERE SSN IN ( SELECT DISTINCT (RESIDENCY.Person)
FROM RESIDENCY
WHERE RESIDENCY.Status = "own");
-- … the SSN of the persons that have the same principal
-- residency as James Baldwin
-- (000-00-0001).
SELECT RoomMate.Person
FROM RESIDENCY AS James,
RESIDENCY AS RoomMate,
PERSON
WHERE PERSON.FName = "James"
AND PERSON.LName = "Baldwin"
AND PERSON.SSN = James.Person
AND James.House = RoomMate.House
AND NOT James.Person = RoomMate.Person
AND RoomMate.PrincipalResidence = TRUE;
Note that the query that returns the name of the homeowners can be improved.
-- If we have homonymns in our database,
-- e.g.
INSERT INTO PERSON
VALUES (
"A",
"B",
"000-00-0010",
NULL),
(
"A",
"B",
"000-00-0011",
NULL);
INSERT INTO HOUSE
VALUES (
"H",
NULL);
-- H.3
INSERT INTO RESIDENCY
VALUES (
"000-00-0010",
"H",
TRUE,
"own"),
(
"000-00-0011",
"H",
TRUE,
"own");
-- Then the query below fails, in the sense that it reports
-- the name "A, B" only once.
SELECT DISTINCT (PERSON.FName),
PERSON.LName
FROM PERSON,
RESIDENCY
WHERE RESIDENCY.Status = "own"
AND RESIDENCY.Person = PERSON.SSN;
-- A better (and not much more complicated) solution would
-- have been
SELECT PERSON.FName,
PERSON.LName
FROM PERSON
WHERE SSN IN ( SELECT DISTINCT (RESIDENCY.Person)
FROM RESIDENCY
WHERE RESIDENCY.Status = "own");
To update the SSN
of "James Baldwin"
to "000-00-0010"
, we could use:
UPDATE PERSON SET SSN = "000-00-0010" WHERE FName = "James" AND LName = "Baldwin";
However, this command would be rejected because of the foreign key constraint. On UPDATE
, the foreign key from RESIDENCY.Person
to PERSON.SSN
restricts by default. The error would be:
1451 (23000) at line 75: Cannot delete or update a parent row: a foreign key constraint fails (`HW_Residency_SOL`.`RESIDENCY`, CONSTRAINT `RESIDENCY_ibfk_1` FOREIGN KEY (`Person`) REFERENCES `PERSON` (`SSN`)) ERROR
In our model, as it is currently,
Considering the given state for the RESIDENCY
table, the following two are possible primary keys:
Person
and PrincipalResidence
Person
and House
The first key would not accomodate a person with multiple secondary residencies, which is not a good thing. The second key could make sense, since it would refrain a person from declaring the same address twice as their residency. The only case that could be hard to work around is if a person was trying to own multiple units at the same address; however, this is more an issue with the primary key of HOUSE
than an issue with the primary key we suggested for RESIDENCY
.
/* code/sql/HW_ScientificResearchSol.sql */
-- List the rows affected (updated or deleted) by the
-- following commands.
-- If no rows are affected because the command would
-- would
-- violate the entity integrity constraint, the
-- referential
-- integrity constraint, or if there would be some
-- other
-- kind
-- of error, please indicate it.
START TRANSACTION;
/*
UPDATE
SCIENTIST
SET SSN = "000000001"
WHERE Name = "Claire";
*/
-- ERROR 1062 (23000) at line 106: Duplicate entry '1'
-- for
-- key 'PRIMARY'
ROLLBACK;
START TRANSACTION;
UPDATE
FUNDINGAGENCY
SET Name = "NSF"
WHERE Name = "National Science Foundation";
SELECT *
FROM FUNDINGAGENCY;
-- FA. 1
SELECT *
FROM FUNDS;
-- F.1
ROLLBACK;
START TRANSACTION;
/*
DELETE FROM FUNDINGAGENCY
WHERE Name = "French-American Cultural Exchange";
*/
-- ERROR 1451 (23000): Cannot delete or update a parent row:
-- a foreign key constraint fails
-- (`HW_SCIENTIFIC_RESEARCH`.`FUNDS`, CONSTRAINT
-- `FUNDS_ibfk_1` FOREIGN KEY (`Agency`) REFERENCES
-- `FUNDINGAGENCY` (`Name`) ON UPDATE CASCADE)
ROLLBACK;
-- List the name of the funding agencies created after 2000
-- ("French-American Cultural Exchange")
SELECT Name
FROM FUNDINGAGENCY
WHERE Creation >= 2000;
-- List the code of the projects that contains the word
-- "Airplanes" ("AA", "BA")
SELECT CODE
FROM PROJECT
WHERE Name LIKE ("%Airplanes%");
-- List the number of hours scientists contributed to the
-- project "AA" (18)
SELECT SUM(Hours)
FROM CONTRIBUTESTO
WHERE Project = "AA";
-- List the code of the projects to which the scientist named
-- Sabine contributed ("AA", "BB")
SELECT Project
FROM CONTRIBUTESTO,
SCIENTIST
WHERE SCIENTIST.Name = "Sabine"
AND SCIENTIST.SSN = CONTRIBUTESTO.Scientist;
-- Give the name of the projects who benefited from federal
-- funds ("Advancing Airplanes")
SELECT PROJECT.Name
FROM PROJECT,
FUNDS,
FUNDINGAGENCY
WHERE FUNDINGAGENCY.Type = "Federal"
AND FUNDINGAGENCY.Name = FUNDS.Agency
AND FUNDS.Project = PROJECT.Code;
-- Give the name of the scientist who contributed to the same
-- project as Mike ("Sabine", "James")
SELECT DISTINCT (Fellow.Name) AS "Mike's fellow"
FROM SCIENTIST AS Mike,
SCIENTIST AS Fellow,
CONTRIBUTESTO AS A,
CONTRIBUTESTO AS B
WHERE Mike.Name = "Mike"
AND Mike.SSN = A.Scientist
AND A.Project = B.Project
AND B.Scientist = Fellow.SSN
AND NOT Fellow.Name = "Mike";
-- List the name of the projects that are not funded by an
-- agency ("Better Airplanes", "Better Buildings")
SELECT DISTINCT (PROJECT.Name)
FROM PROJECT,
FUNDS
WHERE NOT PROJECT.Code IN (
SELECT FUNDS.Project
FROM FUNDS);
-- Give the name of the scientist who contributed the most
-- (in terms of hours) to the project named
-- "Advancing
-- Airplanes" (Sabine)
SELECT SCIENTIST.Name
FROM SCIENTIST,
CONTRIBUTESTO
WHERE CONTRIBUTESTO.Hours >= (
SELECT MAX(Hours)
FROM CONTRIBUTESTO,
PROJECT
WHERE PROJECT.Name = "Advancing Airplanes"
AND PROJECT.Code = CONTRIBUTESTO.Project)
AND CONTRIBUTESTO.Scientist = SCIENTIST.SSN;
id
with auto-increment as a primary key for the computers, rooms and phones without having to come up with new names all the times.SELECT
query that returns this information whenever they need it.Nickname
and the ConnectedTo
attributes be the primary key would solve their issue, but could potentiall introduce a lot of reduncdancy. The best way is probably to have a separate table that list the connections. Since computers and phones are in two different tables, this creates an additional challenge, since we would need to have a “connection table for computers”, and a “connection table for phones”. We recommend actually merging those two tables into one, that would additionaly have an attribute to set if the device is a phone or a computer.A possible solution would consist in
OS
relation with attributes such as manufacturer, architecture, last update, and an id
attribute for the primary key,Computer
and Phone
into a single Device
relation, which would contain an additional attribute Type
to distinguish between computers and phones, and a foreign key to OS,Connection
relation whose attributes would be foreign keys to Device
and Printer
,ComputerOrPhoneInIt
attribute,/* code/sql/HW_ComputerVariationAdvancedSol.sql */
START TRANSACTION;
DELETE FROM CONNEXION
WHERE Computer = 'A';
SELECT *
FROM COMPUTER;
SELECT *
FROM PERIPHERAL;
SELECT *
FROM CONNEXION;
ROLLBACK;
START TRANSACTION;
DELETE FROM COMPUTER
WHERE ID = 'A';
SELECT *
FROM COMPUTER;
SELECT *
FROM PERIPHERAL;
SELECT *
FROM CONNEXION;
ROLLBACK;
START TRANSACTION;
DELETE FROM PERIPHERAL
WHERE ID = '15';
SELECT *
FROM COMPUTER;
SELECT *
FROM PERIPHERAL;
SELECT *
FROM CONNEXION;
ROLLBACK;
START TRANSACTION;
DELETE FROM CONNEXION
WHERE Computer <> 'A';
SELECT *
FROM COMPUTER;
SELECT *
FROM PERIPHERAL;
SELECT *
FROM CONNEXION;
ROLLBACK;
SELECT TYPE
FROM PERIPHERAL
WHERE ID = '12';
SELECT ID
FROM COMPUTER
WHERE Model LIKE '%Apple%';
SELECT COUNT(ID)
FROM COMPUTER;
SELECT DISTINCT (TYPE)
FROM PERIPHERAL;
SELECT CONNEXION.Computer
FROM CONNEXION,
PERIPHERAL
WHERE PERIPHERAL.Type = 'keyboard'
AND PERIPHERAL.ID = CONNEXION.Peripheral;
SELECT COMPUTER.Model
FROM CONNEXION,
PERIPHERAL,
COMPUTER
WHERE PERIPHERAL.Model = 'TP-10 Thermal Matrix'
AND PERIPHERAL.ID = CONNEXION.Peripheral
AND CONNEXION.Computer = COMPUTER.ID;
INSERT INTO CONNEXION
VALUES (
'B',
'12');
SELECT *
FROM COMPUTER;
SELECT *
FROM PERIPHERAL;
-- Note that the "LastConnexion" attribute has been updated.
SELECT *
FROM CONNEXION;
This part of the lecture covers significantly more material than the other, hence we give the details of the references below:
Previous relational models have mistakes and limitations:
HW_Lecture
and Grade
tables, is difficult and error-prone.We could go back and forth between relational models (~ logical level) and SQL
implementations (~ physical level), but we will use even more high-level tools (~ conceptual level):
Entity Relationship Models (ER, static: DB)
Unified Modelling Diagrams (UML, dynamic: DB + software)
Enhanced Entity Relationship Models (EER, adds operations to ER)
Feature | Conceptual | Logical | Physical |
---|---|---|---|
(Main) Audience | Business | Designer | Programmer |
Entity Names | ✔ | ✔ | |
Entity Relationships | ✔ | ✔ | |
Attributes | (✔) | ✔ | |
Cardinalities | ✔ | ✔ | |
Primary Keys | ✔ | ✔ | |
Foreign Keys | ✔ | ✔ | |
Column Data types (Domain) | (✔) | ✔ | |
Table Names | ✔ | ||
Column Names | ✔ |
The conceptual data model is (in theory at least) independent of the choice of database technology.
Remember that in relational models, relations were representing entities (Student
) and relationships (Majors_In
). At the conceptual level, and more particularly in ER diagram, the distinction is made between entities and relationship.
Data is organized into entities (with attributes), relationships between entities (with attributes as well).
Entity A :
Some vocabulary:
Attributes can be
{…} = multi-valued
(…) = complex
For instance, one could
{Address(Street, Number, Apt, City, State, ZIP)}
,A key attribute is an attribute whose value is distinct for each entity in the entity set.
If the attribute is …, | then… |
---|---|
composite | other attributes are connected to it |
multi-valued | the box have double lines |
derived | the box have dotted lines |
a key | the name of the attribute is underlined |
In the following, we’ll focus on the relationship between the entities more than on the attributes of particular entities, so we’ll sometimes simply draw
leaving the attributes un-specified (but that does not mean that they all have to be atomic) or even just
but that does not mean that the entity type have no attribute!
E1, … En participate in R, e1, …, en participate in r1, n is the degree.
Note that we can have Entity Set 1 = Entity Set 2, in which case we say the relation is recursive17.
Naming convention:
Convenient, and sometimes mandatory, to give role names.
If we want to stress that we are considering only one aspect of an entity type (that is, a person is not only an employee, a company is not only an employer, but this aspect is crucial for the “EMPLOYS” relation):
We can also use it to make the “right-side” and the “left-side” of a recursive relationship explicit:
Finally, we will sometimes use “Role Name of Entity 1 : Role Name of Entity 2” as a notation for the relation between them. For instance, we can write “Employer:Employee” to denote the “EMPLOYS” relation, and we will also use this notation when the relationship is between different entities, and write e.g. “PERSON:POSITION” for the “OCCUPIES” relation.
Two constraints, called “structural constraints”, applies to relationship types: cardinality ratio and participation constraint. They both concerns the number of relationship instances an entity can participate in (which is different from the cardinality of a relationship type).
Maximum number of relationships instances that an entity can participate in.
For binary relations, can be 1 : 1, N : 1, 1 : N, or M : N. The 1 stands for “at most 1”, and the M, N, and P stand for “possibly more than 1”, or “no maximum”. In ER diagram, we do not count, and do not make the distinction between “at most 5” and “at most 10”, for instance18.
Possible examples include:
Relation | Possible Ratio | Explanation |
---|---|---|
MENTOR : MENTEE | 1 : N | “A mentor can have multiple mentees, a mentee has at most one mentor.” |
PERSON : SSN | 1 : 1 | “A person has one SSN, a SSN belongs to one person.” |
COURSE : DEPARTMENT | N : 1 | “A course is offered by one department, a department can offer any number of courses.” |
STUDENT : TEAM | M : N | “A student can participate in multiple team, a team can have multiple students.” |
We indicate the ratio on the edges:
Note that reflexive relations can have any ratio as well. An example of M : N recursive relation could be:
Minimum number of relationships instances that an entity can participant it, a.k.a. “minimum cardinality constraint.”
The participation can be total (a.k.a. existence dependency, the entity must be in that relationship at least once) or partial (the entity may or may not be in that relationship).
Total is drawn with a double line, partial is drawn with a single line:
This reads “a course must be offered by a department, but a department may or may not offer courses.”
Relationships can have attributes too. The typical example is a date attribute, but other examples include
Note that an attribute on a relationship type can be atomic or composite, single or multi-valued, stored or derived, but that it cannot be a key attribute (after all, there are no entity to identify!).
Note that there are some moving aspects here: atributes on 1 : 1, 1 : N, N : 1 relationships can be migrated (to the N side when there is one, or to either side where there is none).
For instance, imagine that every phone uses exactly (= “at most and at least”) one carrier, that a carrier can provide network to multiple phones, and that the average quality of the network is an attribute in this relationship:
Then each instance of the relation would be of the form (“Phone X”, “Carrier Y”, “9/10”) for some way of ranking the average quality from 0 to 10. Note that, from the fact that the relationship is N : 1, this means that there is only one tuple involving “Phone X”: this means that the average quality could actually be seen as a property of the phone, and hence be migrated as an attribute to the phone side:
Note that we could not migrate the “average phone quality” to the “Carrier” side: imagine if we had the instances (“Phone X”, “Carrier Y”, “9/10”) and (“Phone Z”, “Carrier Y”, “3/10”), then should the attribute of “Carrier Y” be “9/10” or “3/10”: we have no way of deciding based on this model. Whenever it is a good choice to migrate this attribute or not will depend on the requirement of the models, and it may not always be appropriate to migrate the attribute to the entity. In the case of 1 : 1 relationship, migrating the attribute to both sides (i.e., to both entities) would be a mistake, since it would introduce redundancy in your model.
As an exercise, you can look at the relationships TEACHING, MENTORING and EMITED_DRIVING_LICENCE that are listed above, and see if the attributes can be migrated or not, and if yes, on which side.
Of course, relationships can have a degree higher than two. An example of a ternary relation could be:
To determine cardinality ratio, one should fix all but one parameters, and wonder how many values of the remaining parameter can be in that relationship. Another wording for the same idea can be found in this thread.
Four our example, Customer Y and Bank Z could be in relationship with more than one account (hence the “N”). On the opposite, Customer Y and Account K would be in relationship with only one bank (hence the “1” on the bottom), and Bank Z and Account K would belong to only one customer (hence the “1” on the left).
Let us look at two other examples. First, assume we want to collect information about the treatment prescribed by physicians to patients, we could use a relationship like the following one:
Where
Now, if we want to store information about who is the president of a country during a term, we could get something like:
Note that this representation of the data assumes that a citizen cannot be the president of two different countries during the same term (the right 1), which could be debatable.
It is sometimes impossible to do without relations with arity greater than 2. For instance, consider the following two diagrams19:
You should realize that they convey different information. For instance, you can know for a fact that a person visit a bookshop only if they bought something in it, while the second diagram de-correlate the act of buying with the visit to a bookshop. Similarly, the second diagram could give you a hint that a person that owns a copy of a book Z and visits a bookshop X that sells it could also visit it, but you will not know that for sure.
An example of recursive ternary relation could be:
An example of relation of degree 4 could be:
The cardinality ratio are computed using the same method as described before.
There are actually two sorts of entity types:
Weak (or child) entity types are identified by identifying / owner type that is related to it, in conjunction with one attribute (the partial key). That relation is called identifying (or supporting) relationship, and weak entities have a total participation constraint. The partial key is an attribute, that, when paired with an entity with which they are in relation through their identifying relationship, allows to identify a particular entity.
Weak entities and identifying relationships have a double border, and partial key have a dotted underline, as follows:
The idea here is that we do not need to gather data about all the dependent in the world, or in isolation, but are interested in dependent only if they are related to en employee in our database. Just having the name of a dependent is not enough to identify them, but having their name and the SSN of the employee they are related to is enough. The identifying relation always have ratio 1 : M or 1 : 1: a weak entity cannot be related to more than one entity of the owner type, so that M : N ratio are not possible (cf. e.g. https://dba.stackexchange.com/q/17207). If you need to have, for instance, a dependant connected to multiple employees, then that means that your dependent entity should be strong, because it has an existence “of its own”.
You may wonder why we do not represent weak entities simply as (composite, multi-valued) attributes of their owner type. For instance, why would we use
instead of
? The answer depends whenever we need to have the ability to represent our weak entities (here, PET) as being in relationship with other entities (that can themselves be weak!), as follows:
This would be impossible if PET was an attribute of FRIEND! Whenever the pet entity type is involved in other relationships or not should help you in deciding which representation to choose.
Another example of weak entity whose owner is weak as well could be:
The idea being that the Health care provider cares about an insure only if they are covered by them, and that they care about the doula only if they are currently helping one of their insure.
Multiple notations have been used to represent the ratio and constraint on relationship.
In the following, we introduce two of them: the Min/Max and the Crow’s foot notations.
The two constraints can be written on the same side, and the N, M, P ratio can be replaced by actual number, providing more information.
For instance,
could be drawn as
meaning that
More generally, we have the following:
Extended (or Enhanced) ER Models (EER) have additionaly:
Closer to object-oriented programming.
It is possible to go from relational models to ER models, and sometimes needed: if you are given an implementation that seems poorly design, this can be a way of “backing up” and thinking about the (sometimes implicit) choices that were made during the implementation, to eventually correct them.
For instance, consider the code we studied in “A First Example”:
CREATE TABLE STORM (
NAME VARCHAR(25) PRIMARY KEY,
Kind ENUM ("Tropical
Storm", "Hurricane"),
WindSpeed INT,
Creation DATE
);
-- We can change the enumerated datatype:
ALTER TABLE STORM MODIFY Kind ENUM ("Tropical Storm",
"Hurricane", "Typhoon");
CREATE TABLE STATE (
NAME VARCHAR(25) UNIQUE,
Postal_abbr CHAR(2) PRIMARY KEY,
Affected_by VARCHAR(25),
FOREIGN KEY (Affected_by) REFERENCES STORM (NAME) ON
DELETE SET NULL ON UPDATE CASCADE
);
It corresponds to the following relational model:
which in turn corresponds to the following ER diagram:
Looking at this diagram made it obvious that our code has a flaw: a stom can affect more than one state! Turning the 1 on the left-hand side of the “AFFECTS” relationship into a M is immediate on the diagram, but, of course, mapping it back to a relational model, and then implementing it correctly, will require more work. In any case, if you had not noted already this flaw, reverse-engineering this code highlighted it quite clearly.
If we look back at Problem 3.5 (Revisiting the PROF table), we had already made a first step, since we converted the code into the following relational model:
Going a bit further, we could extrapolate just a little bit and get the following ER diagram:
As we noted in our solution to the second question, this model has several limitations. To list a few, this representation can not handle the following situations:
Looking at it as an ER diagram should help you in understanding why we have those flaws, and how they could be addressed, and “testing” the model should be made easier in its ER form than as SQL code.
We have to map all of the following:
Entity | Strong, Weak |
Attributes | Composite, Key, Atomic, Multi-valued, Partial Key |
Relationships | Binary (1 : 1, N : 1, 1 : N, N : M), n-ary |
Using four tools: Relations, Attributes, Primary Keys, Foreign Keys.
We will use three techniques to represent some of the relationships, the foreign key approach, the merged relations approach and the cross-reference approach. They are detailed and illustrated after the algorithm, which goes as follows:
# | is mapped to | |
---|---|---|
1 | Strong Entity | Relation with all the simple attributes. Decompose complex (composite) attributes. Pick a key to be the PK, if it is composite, take its elements. |
2 | Weak Entity | Relation with all the simple attributes. Decompose complex attributes. Add as a foreign key the primary key of the relation corresponding to the owner entity type, and make it a primary key, in addition to the partial key of the weak entity. If the owner entity type is itself weak, start with it. |
3 | Binary 1 : 1 Relationship Types | Foreign Key, Merge Relations or Cross-Reference approach |
4 | Binary 1 : N Relationship Types | Foreign Key or Cross-Reference approach |
5 | Binary M : N Relationship Types | Cross-Reference approach |
6 | n-ary Relationship Types | Cross-Reference approach |
7 | Multi-valued Attributes | Create a new relation, add as a foreign key the primary key of the relation corresponding to the original strong entity type. Make all the attributes be the primary key. |
whose primary key is the foreign key to the relation corresponding to the entity.
NOT NULL
constraint on the attribute that is not part of the primary key anymore.Every time a relationships have attributes, they are mapped to the resulting relation.
Let us look in more details at some of those steps. For strong entities, using steps 1 and 7, the following:
would give:
And note that if Serial was a complex attribute, we would just “unfold” it, or decompose it, and make all the resulting attributes the primary key of the relation. If one of the attribute was at the same time multi-valued and composite, as follows:
Then we would obtain:
For relationships, things are a bit more complicated. Consider the following:
Since it is a 1 : 1 relationship where one of the side has a partial constraint, we have the choice between two approaches. The foreign key approach would give:
Note that we could also have added the foreign key on the side of ENT.B, referencing the key of ENT.A. But since ENT.A has a total participation constraint, we know that the value of FK will always exist, whereas some entities in ENT.B may not be in relationship with an entity from ENT.A, creating the (nefast) need for NULL
values.
For the same diagram, the cross-reference approach would give:
Similarly, note that, in MAPPING, KeyB, or KeyA and KeyB, would also be valid primary keys, but that it makes more sense to have KeyA being the primary key, since we know that ENT.A has a total participation constraint, but ENT.B does not.
If both participation constraints were total, as follows:
Then we could use the merged relations approach, and get:
We picked KeyA to be the primary key for the same reason as before. Note that merging the two entities into one relation also means that you have eventually to do some work on the relations that were referring to them.
Of course, if ENT.A and ENT.B are the same entity (that is, REL is recursive), we would get:
or
depending on the approach we chose.
Binary 1 : N and binary M : N relationships are dealt with in a similar way, using foreign key or cross-reference approaches. The most difficult part of the mapping is with n-ary relationships: we have to use cross-reference approaches, but determining the primary key is not an easy task. Consider the following20:
The arity constraints here can be rephrased as:
And note that there is no total participation constraint.
To reprent the RESERVES relationship, we need to create a relation with attributes referencing the primary key of MEMBER, the primary key of TIME_SLOT, and the primary key of EQUIPMENT. Making them all the primary key does not represent the fact that the same equipment cannot be booked twice during the same slot, nor that a member can book only one equipment per slot, but allows members to reserve a particular equipment at multiple time slots. To improve this situation, we can either
Both solutions enforce only some of the requirement expressed by the ER diagram.
ER Model | Relational Model |
---|---|
Entity type | Entity relation |
1 : 1 or 1 : N relationship type | Foreign key (or relationship relation) |
M : N relationship type | Relationship relation and two foreign keys |
n-ary relationship type | Relationship relation and n foreign keys |
Simple attribute | Attribute |
Composite attribute | Set of simple component attributes |
Multivalued attribute | Relation and foreign key |
Value set | Domain |
Key attribute | Primary key |
You can have a look at e.g. http://holowczak.com/converting-e-r-models-to-relational-models/ to get a slightly different explanation of this conversion, and additional pointers.
What makes a good database? At the logical (conceptual) and physical (implementation) levels. We will answer belowe this question broadly, and will then use the concept of functional dependency to capture some of those notions precisely, mathematically, and be able to detect issues preventing our database from meeting the usual goals.
In general, a good data base should:
SELECT
and select-project-join easy)Normally, consistency will simply follows if those goals are met.
For ER diagrams, some of the usual techniques21 are:
1 relation corresponds to 1 entity or 1 relationship type
NULL
to insert tuples, especially on a key attribute!
(Bad!) Example:
---------- (Login, Name, AdvisoryName, AdvisorOffice, Major, MajorHead)
-----------(Office, PhoneNumber, Building)
NULL
Should Be RareNULL
has 3 meanings, wastes space, and makes join / nested projections harder.
Example:
STUDENT(Login, …, siblingEnrolled)
Transform into “Emergency Contact in University” relation (bonus: allow multiple contacts).
Example with advisorOffice and Office: if we try to write a join to obtain the phone number of a student’s advisor, we will obtain all the phone.
MARKER(Owner, Color, OwnerOffice, Brand, BrandEmail)
TEACHER(Office, Name, Phone)
Corrected to:
MARKER(Owner, Color, B͟r͟a͟n͟d͟)
TEACHER(Office, N͟a͟m͟e͟, Phone)
BRAND(N͟a͟m͟e͟, Email)
Functional dependencies (FD) is a formal tool used to assess how “good” a database is, a property of the relation schema. Functional dependencies list the constraints between two sets of attributes from the database. For instance, if X and Y are (sets of) attributes, X → Y reads “X fixes Y”, and implies that the value(s) of Y is fixed by the value(s) of X.
“What should be.”
Let us list all the attributes of our previous example:
.Owner, MARKER.Color, MAKER.Brand, TEACHER.Office, TEACHER.Name,
MARKER.Phone, BRAND.Name, BRAND.Email TEACHER
Think about their dependencies, and list them:
TEACHER.Name
→ TEACHER.Office
BRAND.Name
→ BRAND.Email
TEACHER.Office
→ TEACHER.Name
TEACHER.Office
→ TEACHER.Phone
MAKER.Owner
and MARKER.Color
→ MARKER.Brand
?“What is.”, can disprove some of the assumptions made previously, but should not add new dependencies based on it (they may be by chance!).
TEACHER.Office
→ TEACHER.Name
does not hold, because teachers share offices?TEACHER.Name
→ MARKER.Brand
and MARKER.Color
seemed to be enforced by the state, but we should not add a functional dependency based on that: there are no “requirement” that a Teacher must always buy the same brand and color, this could simply true be by chance so far and should not be imposed to the teachers.A particular state cannot enforce a FD, but it can negate one.
Example:
Att. 1 | Att. 2 | Att. 3 |
---|---|---|
Bob | 15 | Boston |
Bob | 13 | Boston |
Jane | 12 | Augusta |
Emily | 12 | Augusta |
May hold | Will not hold |
---|---|
Att. 2 → Att. 3 | Att1 → Att2 |
Att. 3 → Att. 2 | Att. 3 → Att. 2 |
Att. 1 → Att. 3 | Att. 2 → Att. 1 |
{Att. 1, Att. 2} → Att. 3 | {Att. 3, Att. 2} → Att. 1 |
Or, more conveniently:
If an attribute is a foreign key to another, we will draw an arrow between relations:
Note that:
We will assume that the consequence of those axioms always hold (“closure under those rules”), but will generaly not write them explicitely, since they do not carry any new or additional information.
Remember superkey (not minimal key), key, candidate key, secondary key? We now have a formal definition.
In one particular relation R(A1,…,An),
Given a FD {A1, …, An} → Y,
A FD : X → Y is a transivive dependency if there exist a set of attribute B s.t.
There exists multiple normal forms: First, Second, Third, Fourth, Fifth normal form (“X”NF), … Stronger than the Third, there is the Boyce-Codd NF (BCNF), but we will focus on the first three, that are “cumulative”: to you satisfy N, a relation have to satisfy N − 1, N − 2, etc. The normal form of a relation is the highest normal form condition that it meets.
The domain of all attributes must be atomic (simple, indivisible): exclude multi-valued and composite attributes.
Sometimes, additional requirement that every relation has a primary key. We will take this requirement to be part of the definition of 1NF, but some authors take a relation to be in 1NF if it has at least candidate keys (i.e., multiple possible keys, but no primary key, which makes their definition more general, cf. (Elmasri and Navathe 2015, 14.4.1)). Hence, we will always assume that a primary key is given, and it will be underlined.
This essentially consists in
1NF + Every non-prime attribute is fully functionnaly dependent on the primary key.
For each attribute A of the relation whose primary key is A1, …, An:
becomes
Refinment: note that if more than one attribute depends of the same subset {A′1, …, A′k}, we will create two relations: that is useless, we could have created just one. For instance, considering
applying the algorithm would give (the incorrect, since a foreign key can not refer two attributes in two different tables)
whereas a more subtle algorithm would give
Note that in both cases, all the relations are in Second Normal Form, though (and valid if we ignore the foreign key issue discussed above).
Note also that, sometimes, removing the “original” relation may be preferable: cf. an example in Problem 4.33 (COFFEE relation: primary key and normal form).
Note also that if our primary key is a singleton (a set with only one element), then there is nothing to do, we are in 2NF as soon as we are in 1NF: every functional dependency from a single element is always full!
2NF + no non-prime attribute is transitively dependent on the primary key.
For each attribute A of the relation whose primary key is A1, …, An:
We can have a look at another example:
Note that {State, Driver_Licence_Num}, would be a valid primary key for this relation, and that adding it would make it a relation in 1NF.
As we can see, the name “Driver” is somehow counter-intuitive, since the relation also carries information about Governors. This relation is actually not in 2NF, because the FD {State, Driver_Licence_Num} → Governor is not fully functional. A possible way to fix it is to get:
As you can see, the 2NF helped us in separating properly the entities.
An example of a relation that is in 2NF but not in 3NF could be:
As we can see, all the non-prime attributes are fully functionally dependent from Login, which is our primary key. But, obviously, one of this dependecy is transitive, and breaks the 3NF. A way to fix it is:
As we can see, 3NF also helped us in separating properly the entities, in a slightly different way.
In conclusion, we can observe that every FD X → Y s.t. X is a proper subset of the primary key, or a non-prime attribute, is problematic. 2NF is a guarantee that every entity has its own relation, 3NF is a way to avoid data inconsistency.
One approach for analysis, design, implementation and deployment of databases and their applications. Databases interact with multiple softwares and users, we need a common language.
Unified Modeling Language is a standard:
Wide, powerful, but also intimidating.
You know UML from object-oriented programming language:
That is an example of a class diagram (with class name, attributes and operators, as well as a particular way to represent that a class extends another) , there are other types of diagrams, they are not unrelated! For instance, using communication diagrams, deployment diagrams, and state chart diagrams, you can collect the requirements needed to draw a class diagram! They each offer a viewpoint on a software that will help you in making sure the various pieces will fit together: it is a tool commonly used in software engineering, and useful in database design.
There are 14 different types of diagrams, divided between two categories: structural and behavioral.
(Source: https://commons.wikimedia.org/wiki/File:UML_diagrams_overview.svg)
They describe structural, or static, relationships between objects, softwares.
In this category also exist Composite structure diagram, Package diagram and Profile diagram.
They describe the behavioral, or dynamic, relationship, between components.
Then there is the sub-category of “Interaction diagrams”:
This sub-category also comprise Timing diagram and Interaction overview diagram.
Looking at the “COMPANY conceptual schema in UML class diagram notation”, and comparing it with the “ER schema diagram for the COMPANY database” from the textbook, can help you in writing your own “Rosetta Stone” between ER and UML diagram. Let us introduce some UML terminology for the class diagrams.
UML | ER |
---|---|
Class | Entity Type |
Class Name | Entity Name |
Attributes | Attributes |
Operations (or Method) | Sometimes Derived Attributes |
Association | Relationship Type |
Link | Relationship Instance |
Multiplicities | Structural Constraint |
As well as for ER diagram, the domain (or data type) of the attributes is optional. A composite attribute in a ER diagram can be interpreted as a structured domain in a UML diagram (think of a struct
), and a multi-valued attribute requires to create a new class.
Associations are, to some extend, more expressive than relationship types:
min .. max
, with *
for “no maximum”, and the following shorthands: *
stands for 0..*
and 1
stands for 1..1
. An association with 1
on one side and *
on the other (resp. 1
and 1
, *
and 1
, *
and *
) is sometimes called “one-to-many” (resp., “one-to-one”, “many-to-one”, “many-to-many”). The notation in partially inverted w.r.t. ER diagrams:Additionally, associations can be “extended”, and they are not the only kind of relationship that can be expressed between two classes.
Qualified associations can be used for weak entities, but not only.
Some of those subtleties depend on your need, and are subjective, but are important tool to design properly a database, and relieving the programmer from the burden of figuring out many details.
Name the three high-level models we will be learning about in this class (expand the acronyms).
What could be the decomposition of an attribute used to store an email address? When could that be useful?
What would be the benefit of having a composite attribute “Phone” made of two attributes (Number and Description) being multi-valued? Answer this question, and draw the resulting attribute.
Draw the ER diagram for a “COMPUTER” entity that has one multivalued attribute “Operating_System”, a composite attribute “Devices” (decomposed into “Keyboard” and “Mouse”) and an “ID” key attribute.
Draw the ER diagram for a “CELLPHONE” entity that has a composite attribute “Plan” (decomposed into “Carrier” and “Price”), an “MIN” (Mobile Identification Number) key attribute, and a multi-valued “App_Installed” attribute.
Name one difference between a primary key in the relational model and a key attribute in the ER model.
What is a derived attribute? Give two examples and justify them.
Invent an entity type with at least one composite attribute and one atomic attribute, but no multi-valued attributes. Identify a possible key attribute and draw the entity type you obtained using the conventions we used in class.
What is the degree of a relationship type?
What is a self-referencing, or recursive, relationship type? Give two examples.
What does it mean for a binary relationship type “Owner” between entity types “Person” and “Computer” to have a cardinality ratio of M : N?
What are the two possible structural constraints on a relationship type?
Draw the diagram for a “VideoGame” entity that would allow to store the name of the game, the supported platform(s) and the release date. Then, add a recursive relationship on that entity called “Is the sequel of” and specify all the constraints.
Draw a diagram to represent a relationship type R between two entities types A and B such that:
Express the constraints represented in the following diagram in plain English.
What does it mean for a binary relationship type “is the Chair of” between entity types “Professor” and “Department” to have a cardinality ratio of 1:N? Would it make sense to be have a total participation constraint on one side, and if yes, on which side?
Express the constraints represented in the following diagram in plain English.
For the following binary relationships, suggest cardinality ratios based on the common-sense meaning of the entity types.
Entity 1 | Cardinality Ratio | Entity 2 |
---|---|---|
STUDENT | : | MAJOR |
CAR | : | TAG |
INSTRUCTOR | : | LECTURE |
INSTRUCTOR | : | OFFICE |
COMPUTER | : | OPERATING_SYSTEM |
Give an example of a binary relationship type of cardinality 1 : N.
Give an example of a binary relationship type of cardinality N : 1 and draw the corresponding diagram (you do not have to include details on the participating entity types).
Draw an ER diagram with a single entity type, with two stored attributes, and one derived attribute. In your answer, it should be clear that the value for the derived attribute can always be obtained from the value(s) of the other attribute(s).
Draw an ER diagram expressing the total participation of an entity type “BURGER” in a binary relation “CONTAINS” with an entity type “INGREDIENT”. What would be the cardinality ratio of such a relation?
Under what condition(s) can an attribute of a binary relationship type be migrated to become an attribute of one of the participating entity types?
Consider the following diagram:
Consider the following diagram:
Suppose a “PRODUCES” relationship with an attribute “Amount” exists between a “PRODUCER” entity type and a “MOVIE” entity type, with ratio 1 : M. Migrate the “Amount” attribute to one of the entity types and draw the resulting diagram.
Suppose a “MEMBERSHIP relationship with an attribute”Level” (e.g., “silver”, “platinium”, etc.) exists between a “PERSON” entity type and a “CLUB” entity type, with ratio M : 1. Migrate the “Level” attribute to one of the entity types and draw the resulting diagram.
Assume with have three entity types, “Lecture Notes”, “Class” and “Professor.”
You can specify role names in your diagrams for added clarity, and remember to list all the constraints.
Can we always replace a ternary relationship with three binary relationships? Give an example.
What is the difference between an entity type and a weak entity type?
What is a partial key?
Why do weak entity type have a total participation constraint?
Invent a weak entity type, its identifying (owner) entity type and the identifiying (or supporting) relationship. Both entities should have (partial) key, and each should have at least one composite attribute.
Convert the following ER diagram into a relational model:
Convert the following ER diagram into a relational model:
What is insertion anomaly? Give an example.
What is deletion anomaly? Is it a desirable feature?
Why should we avoid attributes whose value will often be NULL
? Can the usage of NULL
be completely avoided?
Consider the following relation:
PROF(S͟S͟N͟, Name, Department, Bike_brand)
Why is it a poor design to have a “Bike_brand” attribute in such a relation? How should we store this information?
Consider the following relation:
STUDENT(S͟S͟N͟, Name, …, Sibling_On_Campus)
Why is it a poor design to have a “Sibling_On_Campus” attribute in such a relation? How should we store this information?
Consider the following relational database schema:
STUDENT(L͟o͟g͟i͟n͟, Name, …, Major, Major_Head)
DEPARTMENT(C͟o͟d͟e͟, Name, Major_Head)
Assuming that “Major” is a foreign key referencing “DEPARTMENT.Code”, what is the problem with that schema? How could you address it?
Why can we not infer a functional dependency automatically from a particular relation state?
Consider the relation R(A,B,C,D,E,F) and the following functional dependencies:
For each set of functional dependency, give a key for R. We want a key, so it has to be minimal.
Consider the relation R(A,B,C,D,E,F) and the following functional dependencies:
A → {D, E}, D → {B, F}, {B, E} → A, {A, C} → {B, D, F}, A → F
Answer the following:
What is a composite attribute in a ER diagram? Can a relational schema with composite attribute be in Second Normal Form?
Consider the relation R(A,B,C,D) and answer the following:
Consider the relation R(A,B,C,D,E,F) with candidate keys {A, B} and C. Remember that, in all generality, to be a prime attribute, you just need to be part of a possible candidate key. Answer the following:
Consider the relation R(A,B,C,D,E) and the following functional dependencies:
For each one, give one candidate key for R.
Consider the relation R(A,B,C,D,E) and answer the following:
Consider the relation R(A,B,C,D,E,F), and let {B, D} be the primary key, and have additionnaly the functional dependencies {A, D} → E, C → F. This relation is not in 3NF, can you tell why?
Consider the relation R(A,B,C,D) and answer the following:
Consider the relation R(A,B,C,D,E) and the functional dependencies {A, B} → C, B → D, C → E. Answer the following:
What are the two different categories of UML diagram?
Can a C++
developer working on Linux and a Java
developer working on MacOS use the same class diagram as a basis to write their programs? Justify your answer.
What kind of diagram should we use if we want to …
Name two reasons why one would want to use a UML class diagram over an ER diagram to represent a conceptual schema.
Consider the following diagram:
Give the number of attributes for both classes, and suggest two operations for the class that does not have any. Discuss the multiplicities: why did the designer picked those values?
Briefly explain the difference between an aggregation and a composition association.
How is generalization (or inheritance) represented in a UML class diagram? Why is such a concept useful?
Convert the following ER diagram into a UML class diagram:
Convert the following UML class diagram into an ER diagram:
The three high-level models we will be learning about are the Unified Modeling Language, Entity Relationship, and Enhanced Entity–Relationship models.
A useful decomposition of an email address attribute could be: the username part before the @
sign, and the domain part afterwards (that could even be sub-divided between the domain name and its top-level domain). It might be useful to have statistics about the domains of the users or to sort the usernames by length, etc.
Having a “Phone” attribute being multi-valued would allow to store multiple phone numbers for the same entity. Typically, one would want to store a pair (Number, Description) for their office phone, their cell, etc. The resulting attribute would be drawn as follows:
There can be more than one key in the ER model, but it has to be made of a single attribute, whereas a primary key can be made of multiple attributes.
A derived attribute is an attribute whose value can be determined by the value of other attributes. For instance:
- The value of an “Age” attribute could be determined from the value of an “Date of birth” attribute and the current day.
- The value of a “State” attribute can be determined from the value of a “Zip code” attribute.
- The value of a “Body Mass Index” attribute could be calculated from the values of height and weight attributes.
- The value of an “Initials” attribute could be determined using the values of the “First Name”, “Middle Name”, and “Last Name” attributes.
The degree of a realationship type is the number of its participating entity types.
A self-referencing relationship type is where the same entity type participates more than once. On a SEATS entity type, it would be an attribute like “is to the left of” or on a PERSONS entity type, it would be and attribute like “is married to”.
The cardinality ratio on the binary relationship type “Owner” between the entity types “Person” and “Computer” means that a person can own multiple computers, and a computer can have multiple owners.
The two possible structural constraints on a relationship type are the cardinality ratio and participation constraints.
We would obtain the following diagram:
Note that the “M / N” part could be discussed: having 1 instead of M would mean that a videogame that is a sequel is the sequel of at most one videogame. This could make sense, but would forbid for instance to register “Battletoads & Double Dragon - The Ultimate Team” as the sequel of both Double Dragon and Battletoads. Having 1 instead of N would mean that every videogame has at most one sequel: it would prevent from registering both “Super Mario Land” and “Super Mario World” as the sequels of “Super Mario Bros. 3”.
However, note that all the participation constraints are partial: having a total participation constraint would mean (on the left side) that every game is a sequel, or (on the right side) that every game have at least one sequel, two statements that are obviously wrong.
We would obtain the following diagram:
A key opens only one door, and every key must open at least one door. A door can be opened by multiple keys, and some doors may not be opened by any key (think of doors that do not have a lock).
The binary relation type “is the Chair of” with a cardinality ratio of 1:N between entity types “Professor” and “Department” means that a department can have at most one professor as its chair, but that a professor can be the chair of multiple departments. It could make sense to require that every department has a chair, hence writing a double line between the Department entity and the “is the Chair of” relationship, but it would not make sense to have a total participation constraint on the side of the professor (which would mean that every professor has to be the chair of a department).
An operating system may be supported by many computers, but it is also possible that no computer supports it (think of an operating system in development, or developed for embeded devices). A computer must support at least one operating system and can support multiple operating systems.
Entity 1 | Cardinality Ratio | Entity 2 | Explanation |
---|---|---|---|
STUDENT | N : 1 | MAJOR | “A student has one major, but multiple students can have the same major” |
CAR | 1 : 1 | TAG | “A car has exactly one tag, a tag belongs to one particular car.” |
INSTRUCTOR | 1 : N | LECTURE | “An instructor can teach multiple lecture, but a lecture is taught by only one person.” |
INSTRUCTOR | 1 : N | OFFICE | “An instructor can have multiple office, but an office belongs to only one instructor” |
COMPUTER | M : N | OPERATING_SYSTEM | “A computer can have multiple operating system, the same operating system can be installed on more than one computer.” |
Some of these choices are debatable (typically, almost any combination seems reasonable for the INSTRUCTOR : OFFICE relation).
A binary of relationship of SUPERVISOR as a recursive relationship on EMPLOYEE.
An attribute of a binary relationship type can be migrated to one of the participating entity types when the cardinality ratio is 1 : N, 1 : 1, or N : 1. It can be migrated “to the N side” or, if there is no N side, to either side. Note that for n-ary relationships, at least one ratio needs to be 1 for the attribute to be allowed to migrate (and “to the N side”, or, if there is no N side, to any side).
We could have the following:
We could have the following:
A possible example of ternary relationship is:
One example of two binary relationships could be:
A question like
“Who wrote the lecture notes X?”
could be answered with the binary relationships but not the ternary. Conversely, a question like
“What are the lecture notes refered to by Prof. X in their class Y?”
could not be answered using the binary relationships (since we do not know what classes are taught by Prof. X).
No, a ternary relationship cannot always be replaced by three binary relationship. For instance, if I have a “Travelling to” relationship between a “Person”, a “City” and a “Transport mode”, to represent the fact that a person is travelling to a city using a particular mode of transportation, there is no way I can convey the same information using binary relationships.
The weak entity type does not have a key attribute, it cannot be distinguised from the other weak entities based on a single attribute, for that we also need to know its relationship to some other entity type.
For a weak entity attribute, it is the attribute that can uniquely identify weak entites that are related to the same owner entity.
Otherwise, we could not identify entities in it without owner entity.
A possible solution is:
Note that the two composite attributes are “generic”, in the sense that you can re-use those examples easily.
A possible option is:
Note that “Stays_At” could also be a separate relation, with two attributes, “Address” and “Person”, linked to respectively PLACE.Address and PERSON.SSN, and both being the primary key of the relation.
A possible option is:
Note that to more faithfully represent the total participation constraints, one could add NOT NULL
attributes to TakenBy, CustomerFName and CustomerLName in RESERVATION.
When you have to invent a primary key or add a lot of NULL
value to be able to add a tuple. I want to add a room in my DB, but the only place where rooms are listed are as an attribute on a Instructor table, so I have to “fake” an instructor to add a room.
A delete anomaly exists when certain attributes are lost because of the deletion of other attributes. It is not desirable, since it can lead to the loss of information.
Because they waste space, they are ambiguous (N/A, or unknown, or not communicated?), and they make querries harder. No, it is necessary sometimes.
Because it will be NULL
most of the time. In a separate relation, e.g. a “BIKE” relation, with two attributes, “Owner” and “Brand”, “Owner” being a foreign key referencing the SSN attribute of PROF.
Because it will be NULL
most of the time, and because students could have more than one sibling on campus. In a separate relation, e.g. in a “EMERGENCY_CONTACT” relation, with two attributes, “Student” (refercing the SSN attribute of STUDENT), and “Contact”. If the emergency contacts are not related to the student, or if we want to preserve the fact that one student is a sibling to another, we can create another relation to store that information.
Major_Head will give update anomalies. By putting the Head of the department in the DEPARTMENT relation only, i.e., removing it from STUDENT.
Just because a coincidence exists (i.e., “in my data set, no android user is color-blind”) does not mean that it will always be true (i.e., “no color-blind person will ever use android”). Functional dependencies should come from a principled reasoning about the attributes, and not from the observation of the data.
A composite attribute is an attribute made of multiple attributes, like an “Address” attribute could be composed of the “sub”-attributes “Street”, “City”, “Zip” and “State. A relational schema needs a primary key and to have only atomic domains to be in first normal form, so, no, a relational schema with composite attributes can not be in second normal form.
{B, D} → C → F breaks the 3NF.
The two different categories of UML diagram are behaviour and structure.
Yes, UML diagram is language-independent and platform-independent.
To use direction for association, to have a common language with someone less knowledgeable of other diagrammatic notations. For the concept of integration.
Flight
has 5 attributes, Plane
has 4. The Plane
class could have the operations getLastFlightNumber() : Integer
and setMaximumSpeed(MPH) : void
.
For the multiplicities: A flight could not have a plane assigned, and a plane could not be assigned to a flight. A plane can be assigned to multiple (or no) flights, but a flight must have at most one plane (and could have none).
The absence of total participation constraint on the left side of the diagram may seem odd: what would be a hand not belonging to a person? Still, we have to accept it: we do not know what the requirements are, or the precise nature of the entities. As far as we know “hand” could refer to a card game, and “person” could refer to players. A straightforward representation of the same diagram as a UML class diagram could be:
Note that we could convey more information, for instance by using aggregation, or even composition, but, without more information about those entities and this relationship, it may be safer not to make any additional supposition.
Aggregation: associated class can have an existence of its own.
Composition association: class does not exist without the association.
Even though entity type do not need a key, it is generally good to include one, and we picked the “obvious” ones (even if Phone
could have been a good choice for CUSTOMER
as well).
Your professor designed the following relational model, at some point in his career, to help him organize his exams and the students’ exam grades:
Table Name and Attributes | Example of Value |
---|---|
EXAM(Number, Date, Course) | < 1, ‘2018-02-14’, ‘CSCI3410’> |
PROBLEM(Statement, Points, Length, Exam) | < ‘Your professor designed…’, 10, ‘00:10:00’, 1> |
STUDENT_GRADE(Login, Exam, Grade) | < ‘aalyx’, 1, 83> |
EXAM.Number
, PROBLEM.Statement
, STUDENT_GRADE.Login
and STUDENT_GRADE.Exam
are all the primary key, and STUDENT_GRADE.Exam
and PROBLEM.Exam
are foreign keys that both refer to EXAM.Number
.
The idea was to have the following design elements:
EXAM
table for storing information about exams.PROBLEM
table for storing each problem as its’ own entry and to associate every problem to an exam.STUDENT_GRADE
table for storing the grade of one student for one particular exam.Unfortunately, this design turned out to be terrible.
Consider the ER schema for the MOVIES database (inspired from (Elmasri and Navathe 2010, fig. 7.24)):
Where the attributes are omitted, and separate entities are created for actors, producers and directors even if they happen to be the same person (to deal with e.g. pseudonyms or different attributes, like agent or address).
Given the constraints shown in the ER schema, respond to the following statements with True or False. Justify each answer.
Draw the ER diagram for the following situation:
One of the interesting choices is: should “accident” be an entity type or a relationship type?
You want to design a database to help you apply for jobs and to compare offers. Every job has a salary range, a title, multiple requirements (like languages known, years of experience, etc.) and was advertised by a company at a particular url. Every company has a physical and numerical address, provides some benefits (assuming they provide the same benefits to all their employees). Sometimes you know one or multiple persons working there, and you want to keep track of their names, role, and (if this is the case) of the job they told you about. Finally, you want to keep track of the offers you received: the job they correspond to, the actual salary offered and the possible starting date.
Draw an entity-relationship diagram for the following situation.
A sim card have a format (mini-SIM, micro-SIM, etc.) and unique ICCID and IMSI numbers. A cellular network have an ITU region and a frequency band. A phone has one or two IMEI number, can connect to one or multiple cellular networks, and can hold zero, one or two sim cards. A contact (which is made of a name, a phone number and an email) can be stored either in the sim card or in the phone (in which case you can add a picture to the contact). Every phone must have at least one operating system installed on it, and an operating system has a name, a version and licence. Finally, an application (which is made of a name, a version number and a url) may or may not be compatible with a phone / operating system pair.
A company wants to develop a database to keep track of the programmers, projects and programming languages they know of. They are not willing to store guidelines for the sake of it, but believe that if a project requires a particular guideline (like, which IDE to use, what spacing convention they use, etc.), it should be stored somewhere. They want to accommodate the fact that a project can use multiple programming languages (and sometimes even multiple versions of the same language), and keep track of which programmer is leading which project. To ease “match making”, they also want to track which programmer is knowledgeable of what programming language. They would also like to store links to the specifications of programming languages, as well as urls of the projects and their guidelines.
They came up with the following ER diagram:
This diagram, to your expert eyes, has multiple flaws, missing constraints, and has some inconsistencies with their requirements. List as many as you can, and suggest improvments or solution when you can think of one.
Draw the ER diagram corresponding to the following situation for conferences on undergraduate research:
Every conference has a name, an edition (“First”, “Second”, etc.), and it takes place during particular days. Students can submit abstracts (made of a title, multiple keywords and a content) and, if accepted, they will give talks (that have a title and a length) during particular sessions. Note that an abstract can have multiple students as authors, but that a talk is given by exactly one student. A session must have exactly one moderator (who is a Faculty member), multiple judges (that are Faculty members as well), and a time frame. Faculty members have an email, a name, a title, and they can also mentor zero, one or multiple students.
Indicate all the assumptions or choices you are making, but try to make as few assumptions as possible.
Look at the following relational model and “reverse-engineer” it to obtain an ER diagram:
In this problem, we will install and explore the basic functionalities of MySQL Workbench, which is a cross-platform, open-source, and free graphical interface for database design.
ctrl
+ r
), then click on “next”, enter your password, and click on “next.” You should see the list of the schemas stored in your database. Select one (any one, we are just exploring the functionalities at that point. You can pick, for instance, HW_DB_COFFEE
from Problem 3.7 (Read, correct, and write SQL
statements for the COFFEE database)), click on “next”, then click on “execute”, “next”, and “close.”SQL
code needed to produce the table you just designed using the graphical tool.Apply the ER-to-Relation mapping to your ER diagram from Problem 4.3 (ER diagram for car insurance).
Consider the following ER diagram:
Using this diagram, answer the following:
Is it true that … | Yes | No |
---|---|---|
… a customer cannot drop two bikes at the exact same time and date? | ||
… two different customers cannot drop two different bikes at the exact same time and date? | ||
… an employee cannot repair two bikes at the same time? | ||
… a customer can be assigned to more than one employee? | ||
… a customer can have a bike repaired by an employee that is not assigned to him/her? | ||
… a bike can be in the database without having been dropped by a customer? | ||
… an employee can be asked to repair a bike without having that type of bike as one of their specialties? |
Convert that ER diagram into a relational model. Try to make as few assumptions as possible.
Consider the following ER diagram:
Using this diagram, answer the following:
Is it true that … | Yes | No |
---|---|---|
… a label can have multiple logos? | ||
… a recording can be released by multiple labels and at different dates? | ||
… a record shop can have multiple exclusivities? | ||
… two record shops can have the same address? | ||
… two logos can have the same name? | ||
… two recordings can have the same title? | ||
… a record shop must sell at least one recording? |
Convert that ER diagram into a relational model. Try to make as few assumptions as possible.
Consider the following ER schema:
where
For this relationship, on the left-hand side is the language that borrows a word and on the right-hand side is the language that provides the loanword.
Map that ER diagram to a relational database schema.
Consider the following requirements for a UNIVERSITY database used to keep track of students’ transcripts.
Have a look at the diagram below:
The assumption is that a car has exactly one (primary) owner, but can additionally have multiple co-owners, and that the “Accident” relationship gathers information about who was driving which car when accident happened, along with some other information about the accident.
Answer the following short questions, justifying your answers:
Convert the diagram to a relational model.
We want to edit the ER diagram to be able to record multiple quotes per accident instead of having a single “Damage_Amount”. To do that, we would like to create a “Quote” weak entity with attributes for the amount and the contact information of the garage who wrote it. Your task is to
You should draw only the part of the ER diagram that will change, no need to copy all of it.
Have a look at the diagram below:
The assumption is that a dish has two boolean attributes (vegetarian and safe for pets), and that the “Quality” attribute of the “COOKS” relationship stores how good a dish is when a particular friend cooks it.
Consider the following relation and its functional dependencies:
CAR_SALE(Car_no, Date_sold, Salesman_no, Commission, Discount_amt)
{Car_no, Salesman_no} | → | {Date_sold, Commission, Discount_amt} |
Date_sold | → | Discount_amt |
Salesman_no | → | Commission |
and let {Car_no, Salesman_no} be the primary key of this relation.
Consider the following relation:
REL(A, B, C, D, E)
Suppose we have the following dependencies:
A | → | D |
{A, B} | → | C |
D | → | E |
Consider the following relation:
SCHEDULE(Period_Start, Period_End, Date, Room, Building, Organizer, Length)
And the following dependencies:
{Period_Start, Date} | → | {Room, Period_End} |
{Period_Start, Length} | → | Period_End |
{Period_Start, Period_End} | → | Length |
{Period_End, Length} | → | Period_Start |
{Date, Period_Start} | → | Organizer |
Room | → | Building |
Consider the following relation:
FLIGHT(From, To, Airline, Flight#, Date_Hour, HeadQuarter, Pilot, TZDifference)
A tuple in the FLIGHT relation contains information about an airplane flight: the airports of departure and arrival, the airline carrier, the number of the flight, its time of departure, the headquarter of the company chartering the flight, the name of the pilot(s), and the time zone difference between the departure and arrival airports.
Normalize the “FLIGHT” relation to its third normal form. You can indicate your steps, justify your reasoning, and indicate the foreign keys if you want to, but you do not have to.
This problem asks you to convert business statements into dependencies. Consider the following relation:
BIKE(Serial_no, Manufacturer, Model, Batch, Wheel_size, Retailer)
Each tuple in the relation BIKE contains information about a bike with a serial number, made by a manufacturer, with a particular model number, released in a certain batch, which has a certain wheel size, and is sold by a certain retailer.
This problem asks you to convert business statements into dependencies. Consider the following relation:
ROUTE(Name, Direction, Fare_zone, Ticket_price, Type_of_vehicle, Hours_of_operations)
A tuple in the ROUTE relation contains information about a public transportation route: its name (e.g. “Gold”, “Green”, …), its direction (e.g., “Medical Campus”, “GCC”, …), the fare zone where the route operates (e.g., “Zone 1”, “Zone 2”, …), the price of a ticket, the nature of the vehicles assuring the route (e.g., “subway”, “bus”, …) and the time of operations (e.g., “24 hours a day”, “from 0600 to 2200”, etc.).
Consider the following business statement:
We want to represent the market of Internet Service Providers (ISP). Each ISP offers multiple bundles, that have a maximum bandwith and a price. Some ISP uses the same name for their bundles (e.g. “premium”, or “unlimited”). Each ISP is given multiple Internet Protocol addresses (IP), and those never change. Every client has a ID that is proper to the ISP (i.e., ISP A and ISP B could both have a client with ID “00001”), an email and subscribes to a particular bundle from a particular ISP. The IP of a client changes over the time.
We want to establish the relational model for the (idealized representation of) Unix files and file-system permissions. We obtained the following relation:
FILESYSTEM(FileName, Size, Extension, Uid, OwnerName, HomeFolder, FilePath, GroupName, Gid)
We wanted to represent the following situation: each file has a name, an extension (like zip
, cs
, etc.), a size and a path. The combination of the path, name and extension is unique to each file. Each file belongs to a particular user, that have a name, a (unique) Uid, and a home folder. Also, every file can be accessed by the members of particular groups. Finally, a group has a (unique) Gid and a name. For simplicity, we assume that a user can belong to at most one group, and that a group can have any number of users.
Consider the following relation:
GRADE_REPORT(StudentID, Term, StudentName, Department, Major, CourseNum, CourseTitle, LetterGrade, Grade)
and the following functional dependencies:
StudentID | → | StudentName |
Major | → | Department |
{Major, CourseNum} | → | CourseTitle |
Grade | → | LetterGrade |
{StudentID, Term, CourseNum, Major} | → | Grade |
Identify a possible primary key for that relation. Then, using that primary key, decide if this relation is in second normal form. If it is not, identify a functional depency that prevents the relation from being in second normal form. Finally, normalize it to the third normal form. Try to pick meaningful names for your relations.
Consider the relations R and T below and their functional dependencies (as well as the one induced by the primary keys):
R(E͟v͟e͟n͟t͟I͟d͟, E͟m͟a͟i͟l͟, Time, Date, Location, Status) |
T(I͟n͟v͟n͟o͟, Subtotal, Tax, Total, Email, Lname, Fname, Phone) |
{EventId, Email} | → | Status |
EventId | → | {Time, Date, Location} |
Invno | → | {Subtotal, Tax, Total, Email} |
→ | {Fname, Lname, Phone} |
Normalize the relations to 2NF and 3NF. Show all relations at each stage (2NF and 3NF) of the normalization process.
Consider the following relation for published books:
BOOK(Book_title, Book_type, Author_name, List_price, Author_affil, Publisher)
Suppose we have the following dependencies:
Book_title | → | { Publisher, Book_type } |
Book_type | → | List_price |
Author_name | → | Author_affil |
Consider the following relation for deliveries:
DELIVERY(Shipment, PackageNumber, RecipientName, Weight, DriverName, DriverPhone, RecipientPhone)
Suppose we have the following functional dependencies:
Shipment | → | DriverName |
PackageNumber | → | Shipment |
PackageNumber | → | {RecipientName, RecipientPhone} |
PackageNumber | → | Weight |
DriverName | → | DriverPhone |
Answer the following three questions:
Consider the relation
CONTACT(Phone, Call_center, Email, Zip, Brand, Website)
and the following functional dependencies:
{Zip, Brand} | → | {Phone} |
{Brand} | → | {Email} |
{ Brand} | → | {Website} |
{Phone} | → | {Call_center} |
Assume that {Zip, Brand} is the primary key. Normalize this relation to the second normal form, and then to the third normal form. Give the relations, their primary keys, and functional dependencies for both steps.
This exercise asks you to convert business statements into dependencies. Consider the following relation:
MESSAGE(SenderId, Time, Date, ReceiverId, Content, Length, Attachment, Size)
A tuple in the MESSAGE relation contains information about a text message: its sender, the time and date when it was sent, the receiver, the content, the length (in characters), the attachment, and the size (in bytes).
Normalize the following relation to the third normal form.
Do not forget to indicate all the primary keys in your relations.
Consider the relation
CONSULTATION(Doctor_no, Patient_no, Date, Diagnosis, Treatment, Charge, Insurance)
with the following functional dependencies:
{Doctor_no, Patient_no, Date} | → | {Diagnosis} |
{Doctor_no, Patient_no, Date} | → | {Treatment} |
{Treatment, Insurance} | → | {Charge} |
{Patient_no} | → | {Insurance} |
Consider the relation
COFFEE(Origin, Type_Of_Roast, Price, Roasted_Date, Best_Before, Color, Customer, Rating)
with the following functional dependencies:
{Origin, Type_Of_Roast} | → | Price |
{Origin, Type_Of_Roast, Customer} | → | Rating |
{Origin, Type_Of_Roast, Roasted_Date} | → | Color |
Roasted_Date | → | Best_Before |
Assume that all the attributes are atomic and answer the following.
A network card (NIC) has a manufacturer, a model, and a unique serial number (MAC address). It offers one or multiple network technologies (ethernet, wi-fi, bluetooth, etc.), and can be connected to the motherboard using one or multiple connections (PCI connector, FireWire, usb, etc.).
This exercise asks you to convert business statements into dependencies, to identify a possible primary key, and to normalize the resulting relation.
Consider the following relation:
TEACHING(Class, Section, Instructor, Assistant, Office_Hours, Meeting_Hour)
Write each of the following business statements as a functional dependency:
Assuming all the functional dependencies you identified at the previous step hold, determine a suitable primary key for this relation.
Taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer.
If needed, normalize this relation to the third normal form.
Consider the following ER schema for the CAR_INFO database:
Note that a car can have at most one driver, N passengers, N insurances, and that the car insurance entity exists only if it is “tied up” to a car (i.e., it is a weak entity, and its identifying relationship is called “Insured”).
You are asked to design a database for a network of libraries.
Each library has a name, an address (made of a number, a street, and a zip), and have copies of documents available to borrow and to reserve. A document is of a particular kind (book, video, or disk), has a title, and an internal catalog number (that can be the ISBN, a barcode, etc.). There can be multiple copies of a document in the network, and each copy has a particular unique code. A copy of a document always “belongs” to a particular library, even when it is checked out.
Furthermore, you want to be able to add the patrons in your database. A patron has a name, a unique library card number, and an email. A patron can reserve (put a hold on) multiple copies of documents for up to two weeks, and can borrow multiple copies of documents for one week if it is a video or a disk, and one month if it is a book. Of course, a copy can be borrowed by only one patron, but it can be put on hold for one patron while being borrowed.
This problem requires you to have successfully completed Pb 4.9 and Pb 4.40.
Using the relational database schema you obtained in Pb 4.40, write the SQL
implementation of that database. Then, using MySQL Workbench, use the “Reverse Engineering” function to obtain an EER diagram of your database and compare it with the UML diagram from Pb 4.40. Apart from the difference inherent to the nature of the diagram (i.e., UML vs EER), how else are they different? How are they the same? Is the automated tool as efficient and accurate as you are?
This exercise asks you to convert business statements into dependencies. Consider the following relation:
KEYBOARD(Manufacturer, Model, Layout, Retail_Store, Price)
A tuple in the KEYBOARD relation contains information about a computer keyboard; its manufacturer, its model, its layout (AZERTY, QWERTY, etc.), the place where it is sold, and its price.
Write each of the following business statements as a functional dependency:
Based on those statements, what could be a key for this relation?
Assuming all those functional dependencies hold, and taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer.
Consider the UML diagram below, and convert it to the relational model. Do not forget to indicate primary and foreign keys.
A possible solutions is
A possible solution is:
Note that CONTACT
could be a weak entity with the identifying relationship being either DISCUSSED_BY
or EMPLOYS
, but both have disadvantages: they would not allow a contact to discuss more than one offer or to be hired by more than one company.
A possible solution is:
Note that we sometimes introduced ID
attributes, but could have done without as well (typically, by taking the name and version attributes and gathering them in the same attribute that could be used as a key, for the OS and APPLICATION entities).
Among the numerous flaws, come to mind:
A possible solution is:
Where we made the following assumptions:
Also, note that we could have decided to make Presentation an entity instead of a relationship: both choices were correct. To determine if an abstract was accepted, one has to “track” whenever it is in the PRESENTED_AT relationship. Another option would have been to add a “Accepted” attribute to the RECEIVED relationship.
Is it true that … | Yes | No |
---|---|---|
… a customer cannot drop two bikes at the exact same time and date? | ✔ | |
… two different customers cannot drop two different bikes at the exact same time and date? | ✔ | |
… an employee cannot repair two bikes at the same time? | ✔ | |
… a customer can be assigned to more than one employee? | ✔ | |
… a customer can have a bike repaired by an employee that is not assigned to him/her? | ✔ | |
… a bike can be in the database without having been dropped by a customer? | ✔ | |
… an employee can be asked to repair a bike without having that type of bike as one of their specialties? | ✔ |
1:M
relationships that are not identifying, we can choose between the foreign key and the cross-reference approaches. If we use the former, we obtain:We could also have used a combination of both!
Is it true that … | Yes | No |
---|---|---|
a label can have multiple logos? | ✔ | |
a recording can be released by multiple labels and at different dates? | ✔ | |
a record shop can have multiple exclusivities? | ✔ | |
two record shops can have the same address? | ✔ | |
two logos can have the same name? | ✔ | |
two recordings can have the same title? | ✔ | |
a record shop must sell at least one recording? | ✔ |
1:M
relationship IS_AN_EXCLUSIVITY_OF
, we can choose between the foreign key and the cross-reference approaches. For the 1:1
relationship USES
, we can use any approach we want (foreign key, merged relation, or cross-reference). We will choose to merge the two relations LABEL
and LOGO
and to have a look-up table for the IS_AN_EXCLUSIVITY_OF
relation. This obtains:
NULL
, then the database does not really says if that dish is safe or not.The CAR_SALE relation is in 1st normal form, since it has a primary key, and by assuming that all the attributes are atomic. This relation is not is 2nd Normal Form: since Date_sold → Discount_amount and Salesman_no → Commission, then some attributes (namely Discount_amount and Commission) are not fully functional dependent on the primary key. Hence, this relation cannot be in 3rd normal form either.
To normalize,
2NF:
Relations | Functional Dependencies |
---|---|
Car_Sale1(Car_no, Date_sold, Discount_amt) | Car_no → {Date_Sold, Discount_amt} and Date_Sold → Discount_amt |
Car_Sale2(Car_no, Salesman_no) | Car_no → Salesman_no |
Car_Sale3(Salesman_no, Commission) | Salesman_no → Commission |
3NF:
Relations | Functional Dependencies |
---|---|
Car_Sale1-1(Car_no, Date_sold) | Car_no → Date_Sold |
Car_Sale1-2(Date_sold, Discount_amt) | Date_Sold → Discount_amt |
Car_Sale2(Car_no, Salesman_no) | Car_no → Salesman_no |
Car_Sale3(Salesman_no,Commission) | Salesman_no → Commission |
{A, B} would be a suitable primary key (actually, it is the only one).
If no key was selected, or if an attribute has a multi-valued domain, then this relation would not be in first normal form.
The following three relations are in third normal form:
{Period_Start, Date} would be a suitable primary key.
This relation is already in second normal form: there are no non-prime attributes that are not fully dependent of the primary key. Stated differently, there are no non-prime A such that {Period_Start} → A or {Date} → A.
This relation is not in 3rd normal form. Consider the following relation: {Period_Start, Date} → {Period_Start, Period_End} → Length. {Period_Start, Period_End} is different from {Period_Start, Date} and from Length, and it is not included in a candidate key. The same goes for {Period_Start, Date} → Room → Building.
Once normalized to the third normal form, we get:
The relation we consider is:
ROUTE(Name, Direction, Fare_zone, Ticket_price, Type_of_vehicle, Hours_of_operations)
ISP(ISP, bunle, bandwith, price, IP, ID, email, time)
The functional dependencies suggested by the business statement are:
{ISP, bundle} | → | {bandwidth, price} |
IP | → | ISP |
{ISP, ID} | → | {email, bundle} |
{ISP, ID, time} | → | IP |
We obtain the following four relations when we normalize it to the third normal form:
The first criticism against that model that comes to mind is that it represents multiple entities (user, group, files, at least) into one relationship. As a consequence, a lot of redundancy is to be expected: typically, the OwnerName and HomeFolder values needs to match every time they occur with the same UID. If they do not, then this means that some inconsistency occurs in the database.
We can have:
{FileName, FilePath, Extension} | → | {Size, OwnerName} |
Uid | → | HomeFolder |
Uid | → | OwnerName |
Gid | → | GroupName |
We could obtain something like
An important aspect to remember is that relations with multiple attributes for their primary key needs as many foreign keys as attributes in their primary key to be referenced. Hennce, the “ACCESS” relationship is a bit cumbersome, as to “point” to a file, it needs three attributes. However, it would be a mistake to make “Gid” an attribute of “FILE”, as the same file can be accessed by multiple groups.
The primary key would be {StudentID, Term, Major, CourseNum}.
Numerous functional dependencies prevent it from being in second normal form, for instance StudentID → StudentName prevents the functional dependency {StudentID, Term, CourseNum, Major} → StudentName from being full.
We obtain the following five relations when we normalize it to the third normal form:
GRADE_REPORT(S͟t͟u͟d͟e͟n͟t͟I͟D͟, T͟e͟r͟m͟, C͟o͟u͟r͟s͟e͟I͟D͟, M͟a͟j͟o͟r͟, Grade)
COURSE_INFO(C͟o͟u͟r͟s͟e͟I͟D͟, M͟a͟j͟o͟r͟, CourseTitle)
STUDENT_INFO(S͟t͟u͟d͟e͟n͟t͟I͟D͟, StudentName)
MAJOR_INFO(M͟a͟j͟o͟r͟, Address)
GRADE_SCALE(G͟r͟a͟d͟e͟, LetterGrade)
After normalizing PRINT to the second normal form (by adding the primary key {Author, Title, Size}, and working on dependencies like {Author, Title} → Technique, which does not fully depend of the primary key), we would obtain three relations that are already in third normal form:
CONSULTATION (D͟o͟c͟t͟o͟r͟͟͟n͟o͟, P͟a͟t͟i͟e͟n͟t͟͟͟n͟o͟, D͟a͟t͟e͟, Diagnosis, Treatment)
PRICE_LISTING (T͟r͟e͟a͟t͟m͟e͟n͟t͟, I͟n͟s͟u͟r͟a͟n͟c͟e͟, Charge)
PATIENT_INFO(P͟a͟t͟i͟e͟n͟t͟_͟n͟o͟, Insurance)
The original relation is:
COFFEE(Origin, Type_Of_Roast, Price, Roasted_Date, Best_Before, Color, Customer, Rating)
A suitable primary key would be PKCOFFEE = {Origin, Type_Of_Roast, Roasted_Date, Customer}. Note that it is the minimal and only primary key.
This relation is in first normal form because it has a primary key (the one we just defined), and because all the attributes are atomic. It is not in second normal form, because, for example, the functional dependency PKCOFFEE → Price is not fully functionally dependent, since {Origin, Type_Of_Roast} → Price holds.
Normalizing to the second normal form actually gives us relations in third normal form:
-CLIENT_RATING(Origin, Type_Of_Roast, Customer, Rating)
-PRICING(Origin, Type_Of_Roast, Price )
-EXPIRATION_DATE(Roasted_Date, Best_Before)
-COFFEE_BATCH(Origin, Type_Of_Roast, Roasted_Date, Color)
Where the functional dependencies always are in such a way that all the attributes but the last one fix the value of the last one, and are taken to be the primary key.
Checking that they are all in third normal form is straightforward. Note that the “original” relation was somewhat lost, since we do not have a relation whose primary key is PKCOFFEE anymore. We could have re-introduced a relation with only the attributes of PKCOFFEE to be on the “safe side”, but the benefit would not have been clear.
The functional dependencies given by the four statements are as follows:
{Class, Section} | → | Meeting_Hour |
Assistant | → | Instructor |
{Class, Section} | → | Instructor |
{Instructor, Class} | → | Office_Hours |
Note that the statement reads “An assistant is an assistant to an instructor”, which implies that an assistant can assist at most one instructor, but does not imply that an instructor can have at most one assistant: hence, the dependency is from Assistant to Instructor, and not the other way around.
Based on the dependencies identified at the previous step, {Class, Section, Assistant} is the primary key.
This relation is in 1st normal form: we make the assumption that all the attributes are atomic, and we identified a primary key. However, it is not in second normal form: Assistant → Instructor, for instance, makes that Instructor is not fully functionally dependent on the primary key.
In third normal form, we would get:
CLASS_INFO(C͟l͟a͟s͟s͟, S͟e͟c͟t͟i͟o͟n͟, Instructor, Meeting_Hour)
ASSISTANTSHIP(A͟s͟s͟i͟s͟t͟a͟n͟t͟, Instructor)
OFFICE_HOURS(I͟n͟s͟t͟r͟u͟c͟t͟o͟r͟, C͟l͟a͟s͟s͟, Office_Hours)
For Car, we need to create an attribute, like VIN. For Car Insurance, Policy Number is the perfect key attribute.
Note that, during the coversion, we had to make Insured Car part of the primary key of CAR INSURANCE.
Note that:
Note that:
NULL
most of the time, so it would not be very efficient.We give the code first, then the drawing:
/* code/sql/HW_Person.sql */
DROP SCHEMA IF EXISTS HW_Person;
CREATE SCHEMA HW_Person;
USE HW_Person;
CREATE TABLE PERSON (
ID VARCHAR(25) PRIMARY KEY,
NAME VARCHAR(25),
Street VARCHAR(25),
City VARCHAR(25),
Seat VARCHAR(25),
Position VARCHAR(25)
);
CREATE TABLE CAR (
Vin VARCHAR(25) PRIMARY KEY,
Make VARCHAR(25),
Model VARCHAR(25),
Year DATE,
Driver VARCHAR(25),
FOREIGN KEY (Driver) REFERENCES PERSON (ID) ON UPDATE CASCADE
);
ALTER TABLE PERSON
ADD FOREIGN KEY (Seat) REFERENCES CAR (Vin);
CREATE TABLE CAR_INSURANCE (
Policy_number VARCHAR(25) PRIMARY KEY,
Company_name VARCHAR(25),
Insured_car VARCHAR(25),
FOREIGN KEY (Insured_car) REFERENCES CAR (Vin)
);
CREATE TABLE PHONE (
ID VARCHAR(25),
Number VARCHAR(25),
FOREIGN KEY (ID) REFERENCES PERSON (ID),
PRIMARY KEY (ID, number)
);
Two options to interact with a database:
SQL
: the idea is to embed SQL
commands directly in the program: a pre-compiler scans the code, extact the SQL
commands, execute them on the DBMS. This system is used primarily for C, C++, COBOL or Fortran, and Language Integrated Query to some extend is part of this approach.SQL
(for instance, PL/SQL
).In this chapter, we will study how to develop a database application that uses a library.
Every database application follows the same routine:
Which API is used vary with the pair Language / DBMS. Here are some of the most commonly used pairs for MySQL (that may be compatible with other DBMS in some cases):
Language | API | Website |
---|---|---|
Python | Python Database API | https://www.python.org/dev/peps/pep-0249/ |
C, C++ | MySQL C API | https://dev.mysql.com/doc/refman/8.0/en/c-api.html |
C# | MySQL Connector/Net | https://dev.mysql.com/downloads/connector/net/8.0.html |
Java | Java DataBase Connectivity | https://docs.oracle.com/javase/9/docs/api/java/sql/package-summary.html |
In this chapter, we will more precisely study how to develop a database application coded in Java that uses the Java DataBase Connectivity library. If you were to work with a different API in your future life, you would likely realize that most of what we will be studying remains true: reading the documentation and understanding the general strategy is what matters in this chapter, to build comfidence in your capacities.
Java actually uses
SQL
statements, retrieve and update the results of a query, handle exceptions, etc.
Note that the A.P.I. is needed when you write and compile your program, and the driver / connector is needed when you execute it. We will come back to this when we explore our first program.
For a quick introduction to Java, cf. https://spots.augusta.edu/caubert/teaching/general/java/.
We will write and compile a simple java program that manipulates a simple database23. Even if the creation and population of the database could have been done from within the program, we will do it as a preliminary step, using the C.L.I., to make our program simpler (and also because it generally match usage: schemas are usually created before the program is executed).
SQL
)For this program, we will use the following database:
/* code/sql/HW_EBookshop.sql */
DROP SCHEMA IF EXISTS HW_EBookshop;
CREATE DATABASE HW_EBookshop;
USE HW_EBookshop;
CREATE TABLE BOOKS (
ID INT PRIMARY KEY,
title VARCHAR(50),
author VARCHAR(50),
price DECIMAL(10, 2),
qty INT
);
-- Cf. https://en.wikipedia.org/wiki/List_of_best-selling_books
INSERT INTO BOOKS
VALUES (
1,
'The Communist Manifesto',
'Karl Marx and
Friedrich Engels',
11.11,
11);
INSERT INTO BOOKS
VALUES (
2,
'Don Quixote',
'Miguel de Cervantes',
22.22,
22);
INSERT INTO BOOKS
VALUES (
3,
'A Tale of Two Cities',
'Charles Dickens',
33.33,
33);
INSERT INTO BOOKS
VALUES (
4,
'The Lord of the Rings',
'J. R. R. Tolkien',
44.44,
44);
INSERT INTO BOOKS
VALUES (
5,
'Le Petit Prince',
'Antoine de
Saint-Exupéry',
55.55,
55);
SELECT *
FROM BOOKS;
MariaDB [HW_EBookshop]> SELECT * FROM BOOKS;
+----+-------------------------+--------------------------------+-------+------+
| ID | title | author | price | qty |
+----+-------------------------+--------------------------------+-------+------+
| 1 | The Communist Manifesto | Karl Marx and Friedrich Engels | 11.11 | 11 |
| 2 | Don Quixote | Miguel de Cervantes | 22.22 | 22 |
| 3 | A Tale of Two Cities | Charles Dickens | 33.33 | 33 |
| 4 | The Lord of the Rings | J. R. R. Tolkien | 44.44 | 44 |
| 5 | Le Petit Prince | Antoine de Saint-Exupéry | 55.55 | 55 |
+----+-------------------------+--------------------------------+-------+------+
5 rows in set (0.00 sec)
You can copy and paste the code, then execute it, or use MySQL’s batch mode: you can find the code previously given at code/sql/HW_EBookshop.sql
, i.e., at https://rocketgit.com/user/caubert/CSCI_3410/source/tree/branch/master/blob/notes/code/sql/HW_EBookshop.sql. Open a terminal (or command-line interpreter), navigate to the folder where you stored that file (using cd
), and type
mysql -u testuser -p < HW_EBookshop.sql
for linux, or (something like)
"C:\Program Files\MySQL\MySQL Server 5.7\bin\mysql.exe" -u testuser -p < HW_EBookshop.sql
for Windows. Refer to the Logging-In as testuser section if you forgot how to log-in to your database.
You just discovered MySQL’s batch mode, that perform series of instructions from a file. You can easily make sure that the database and the table were indeed created, and the values inserted, by logging the way you used to, and executing the usual commands.
As we are about to see, a database application needs to be written following this order:
Connection
and Statement
objects), using a try/catch statement,Statement
object),and the program needs to load the driver (which is specific to DBMS) at execution time.
Of course, if the second step failed, then the program needs to exit gracefully, or to provide debugging information to the user. The program we will obtain can (normally) be compiled, using something like javac FirstProg.java
(or an equivalent command for windows). But another refinment is needed when you want to execute it. We need to set up the driver (or connector) to make the java SQL
API and MySQL communicate. To do so,
***
is the version number.Once this is done and your program was compiled, you can execute it using (where you replace ***
with the actual number, of course, e.g. 8.0.22
):
java -cp .:mysql-connector-java-***.jar FirstProg
in Linux, or
java -cp .;mysql-connector-java-***.jar FirstProg
in Windows. The -cp
option lists the places where java should look for the class used in the program: we are explicitely asking java to use the mysql-connector-java-***.jar
executable (the driver) to execute our FirstProg
executable.
If we try to execute FirstProg
without that flag, we obtain the following error message:
$ java FirstProg
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/HW_EBOOKSHOP
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at FirstProg.main(FirstProg.java:9)
Two additional observations:
java
)// code/java/FirstProg.java
import java.sql.*;
public class FirstProg {
public static void main(String[] args) {
try (Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/HW_EBookshop", "testuser", "password");
Statement stmt = conn.createStatement(); ) {
String strSelect = "SELECT title, price, qty FROM BOOKS WHERE qty > 40";
System.out.print("The SQL query is: " + strSelect + "\n");
ResultSet rset = stmt.executeQuery(strSelect);
System.out.println("The records selected are:");
int rowCount = 0;
String title;
double price;
int qty;
while (rset.next()) {
title = rset.getString("title");
price = rset.getDouble("price");
qty = rset.getInt("qty");
System.out.println(title + ", " + price + ", " + qty);
rowCount++;
}
System.out.println("Total number of records = " + rowCount);
conn.close();
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
Please, note that if at execution time you receive an error that starts with “java.sql.SQLException: The server time zone value ‘EDT’ is unrecognized or represents more than one time zone. You must configure either the server …” add ?serverTimezone=UTC
at the end of jdbc:mysql://localhost:3306/HW_EBookshop
i.e., replace the line that creates the Connection
object with
=
Connection conn .getConnection(
DriverManager"jdbc:mysql://localhost:3306/HW_EBookshop?serverTimezone=UTC",
"testuser","password");
For more information, refer to https://stackoverflow.com/q/26515700. You can also change your server’s configuration “once and for all”, cf.https://stackoverflow.com/a/44720416. On my personnal set-up (Debian with MariaDB), this required to:
/etc/mysql/mariadb.conf.d/50-server.cnf
,[mysqld]
,default_time_zone='-04:00'
(you can look up your time zone at https://time.is if you are unsure)
service mysql restart
A couple of comments:
java.sql.*
, whose documentation is at https://docs.oracle.com/javase/8/docs/api/java/sql/package-summary.html, contains the following classes that we will use in this chapter:
DriverManager
, used for managing a set of JDBC drivers,Connection
, used to make a connection with a database via DriverManager
objects,Statement
, used to send basic SQL
statements via Connection
objects,ResultSet
, to retrieve and update the results of a query, returned by a Statement
object,ResultSetMetadata
, to get information about a ResultSet
object,SQLException
, a class of exceptions relative to SQL
.Connection
is a bridge (the physical connection), and Statement
is a lane (a symbolic, or logic, path on the bridge)."jdbc:mysql://localhost:3306/HW_EBOOKSHOP"
,
jdbc
is the protocol,mysql
is the subprotocol,localhost
is the url of the database,3306
is the port, andHW_EBOOKSHOP
is the schema (that needs to already exist in this case).strSelect
does not end with ;
(it could, but does not have to).next()
returns true
if there is something left in the set of result, and move to the next line if it is the case. It ressembles what we would use to read from a file. If you try to use getString
before moving to the first row, you’ll get an error like.sql.SQLException: Before start of result set java
Undeed, the cursor is “above” the first row of results when the ResultSet
object is created. - We could use 1
, 2
, and 3
instead of "title"
, "price"
and "qty"
in the while
loop: the getString
, getDouble
and getInt
are overloaded, and have versions that take one integer as input, corresponding to the position of the attribute in the result set.
If you store the program in FirstProg.java
, compile it, with
javac FirstProg.java
and then execute it, with
java -cp .:mysql-connector-java-***.jar FirstProg
(refer back to “Executing Database Application” for more details) then you should obtain:
The `SQL` query is: SELECT title, price, qty FROM BOOKS WHERE qty > 40
The records selected are:
The Lord of the Rings, 44.44, 44
Le Petit Prince, 55.55, 55 Total number of records = 2
Take the time to make sure you have the same result on your installation, and that you understand how the code works before moving on.
If you were to replace the body of try
in the previous program with
String strSelect = "SELECT * FROM BOOKS";
ResultSet rset = stmt.executeQuery(strSelect);
System.out.println("The records selected are:");
ResultSetMetaData rsmd = rset.getMetaData();
int columnsNumber = rsmd.getColumnCount();
String columnValue;
while (rset.next()) {
for (int i = 1; i <= columnsNumber; i++) {
if (i > 1) System.out.print(", ");
columnValue = rset.getString(i);
System.out.print(columnValue + " " + rsmd.getColumnName(i));
}
System.out.println();
You would obtain:
The records selected are:
1 ID, The Communist Manifesto title, Karl Marx and Friedrich Engels author, 11.11 price, 11 qty
2 ID, Don Quixote title, Miguel de Cervantes author, 22.22 price, 22 qty
3 ID, A Tale of Two Cities title, Charles Dickens author, 33.33 price, 33 qty
4 ID, The Lord of the Rings title, J. R. R. Tolkien author, 44.44 price, 44 qty 5 ID, Le Petit Prince title, Antoine de Saint-Exupéry author, 55.55 price, 55 qty
In that code, please note:
ResultSetMetadata
,ResultSet
using the getColumnCount
method,getString
method with integer input to read all the data in the table, no matter its “original” data type.Overall, this code would work equally well if the table had a different number of columns, as opposed to our first program. Note also that ResultSetMetadata
does not contain a method to count the number of rows in the result set: to obtain it, either use a counter like we did with rowCount
before, or execute a query to obtain this value (using MySQL’s count
aggregate function).
Note that in the previous code, we read everything as a string. But, actually, SQL
and JAVA
datatypes can be mapped as follows:
SQL |
JAVA |
---|---|
INTEGER |
int |
CHARACTER(n) |
String |
VARCHAR(n) |
String |
REAL |
float |
DOUBLE |
double |
DECIMAL(t,d) |
java.math.BigDecimal |
DATE |
java.sql.Date |
BOOLEAN |
boolean |
BIT(1) |
byte |
Remember that in DECIMAL(t,d)
the t
stands for the number of digits, the d
for the precision.
However, we cannot always have a correspondance going the other way around (from Java to SQL): what would correspond to a reference variable? To a private attribute? This series of problems is called “object-relational impedance mismatch”, it can be overcomed, but at a cost. We will come back to this in the Presentation of NoSQL Chapter.
executeQuery
, executeUpdate
and execute
Previously, we used executeQuery
to send a SQL command to the DBMS. This method is tailored for SELECT
statement, and it is not the only method we can use.
Name | executeQuery |
executeUpdate |
execute |
---|---|---|---|
Used for | SELECT |
INSERT , UPDATE , DELETE |
Any type |
Input Type | string |
string |
string |
Return Type | ResultSet |
int , the number of rows affected by the querry |
boolean , true if the query returned a ResultSet , false if the query returned an int or nothing |
To retrieve the ResultSet
obtained by an execute
statement, you need to use getResultSet
or getUpdateCount
. For more details, consult https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html.
The program in Problem 5.2 (Advanced Java Programming) uses the modifications discussed below. Please refer to it once you are done with this section.
We can pass options (values of fields) when connecting to the database:
Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/HW_DBPROG"
+ "?user=testuser"
+ "&password=password"
+ "&allowMultiQueries=true"
+ "&createDatabaseIfNotExist=true"
+ "&useSSL=true");
On top of user
and password
(which are self-explanatory), setting allowMultiQueries
to true
allows to pass multiple queries with one executeUpdate
statement, and createDatabaseIfNotExist
creates the schema passed in the url
(so, here, HW_DBPROG
) if it does not already exists.
The syntax used is the syntax of querying strings, i.e., it follows the pattern
?field1=value1&field2=value2…&fieldN=valueN
That is, it starts with an ?
and then “pile up” the field / value pairs with &
. In particular, if you needed to add ?serverTimezone=UTC
in the first application program we used, you will need here to replace
+ "&useSSL=true");
with
+ "&useSSL=true"
+ "&serverTimezone=UTC");
You can read about other options at https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html or https://jdbc.postgresql.org/documentation/head/connect.html. Please, note that useSSL
does not apply to the inital handshake and that there are no good ways to hide the password from the application user. Using SSL here simply guarantee that the application user will interact in a secure manner with the database, not that the password is secured.
We can create a table with the method stmt.execute
.
stmt.execute(
"CREATE TABLE DVD ("
+ "Title CHAR(25) PRIMARY KEY, "
+ "Minutes INTEGER, "
+ "Price DOUBLE)");
If we were to execute SHOW TABLES;
after this execute
instruction directly in the MySQL interpreter, this would display at the screen:
+---------------------+
| Tables_in_HW_DBPROG |
+---------------------+
| DVD |
+---------------------+
But here, to access this information, we will use the connection’s metadata. The DatabaseMetaData
is a class used to get information about the database: the driver, the user, the versions, etc. We can use the getMetaData()
method of this class to obtain information about the schema we just created:
DatabaseMetaData md = conn.getMetaData();
ResultSet rs = md.getTables("HW_DBPROG", null, "%", null);
The first parameter of getMetaData()
is the schema’s name, as you probably guessed, and the the third parameter is String tableNamePattern
, i.e., what must match the table name stored in the database to be selected. Here, by using the wildcard %
, we select all the table names (which is only “DVD” at this point).
The getMetaData()
method returns a ResultSet
(here named rs
), where 3
is the TABLE_NAME
. We can now iterate over this rs
object to list all the elements in it, as we would with any ResultSet
object:
You can read at https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String[]) the full specification of this method.
To insert values in our table, we can use stmt.executeUpdate
:
String sqlStatement = "INSERT INTO DVD VALUES ('Gone With The Wind', 221, 3);";
int rowsAffected = stmt.executeUpdate(sqlStatement);
System.out.print(sqlStatement + " changed " + rowsAffected + " row(s).\n");
Note that the executeUpdate
returns an integer, the number of rows changed. We can even use this method to perform multiple insertions at the same time, if allowMultiQueries
was set to true, cf. https://stackoverflow.com/a/10804730/:
String insert1 = "INSERT INTO DVD VALUES ('Aa', 129, 0.2)";
String insert2 = "INSERT INTO DVD VALUES ('Bb', 129, 0.2)";
stmt.executeUpdate(insert1 + ";" + insert2);
Another way of “batch processing” statements (i.e., of executing multiple insertions at the same time) is to use addBatch
(that “loads” statements in the statement
object) and executeBatch()
(that execute all the statement loaded):
String insert3 = "INSERT INTO DVD VALUES ('Cc', 129, 0.2)";
String insert4 = "INSERT INTO DVD VALUES ('DD', 129, 0.2)";
stmt.addBatch(insert3);
stmt.addBatch(insert4);
stmt.executeBatch();
Note that the database is not sollicited until the executeBatch
method is called: we simply loaded the instruction in the program, and connect to the database only once, with all the instructions, when this executeBatch()
instruction is met.
Note also that executeBatch
may be used, per https://docs.oracle.com/javase/tutorial/jdbc/basics/retrieving.html#batch_updates:
for updating, inserting, or deleting a row; and it may also contain DDL statements such as
CREATE TABLE
andDROP TABLE
. It cannot, however, contain a statement that would produce aResultSet
object, such as aSELECT statement
.
Note that using batches does not require to set allowMultiQueries
to true.
Also, the name suggests that it should be possible to fetch the SQL
instructions from a file and load them in your Java
program, but there is actually no easy way to do this, c.f. https://stackoverflow.com/q/2071682/.
A prepared statement is “a query with a slot”: it is a query that takes one or multiple parameters, is parsed and stored on the database, but not executed. It is only after the value of the slot(s) are fixed by the program that this query can be executed. The program can re-use the same prepared statement with multile (different) values multiple times.
Compared to executing SQL
statements directly, prepared statements have three main advantages:
SQL
injections (cf. A Bit About Security).Let us look at a first example:
/*
* We create a string with an empty slot,
* represented by "?".
*/
sqlStatement = "SELECT title FROM DVD WHERE Price <= ?";
/*
* We create a PreparedStatement object, using that string with an
* empty slot.
*/
PreparedStatement ps = conn.prepareStatement(sqlStatement);
/*
* Then, we "fill" the first slot with the value of a variable.
*/
double maxprice = 0.5;
ps.setDouble(1, maxprice);
/*
* Finally, we can execute the query, and display the results.
*/
ResultSet result = ps.executeQuery();
System.out.printf("For %.2f you can get:\n", maxprice);
while (result.next()) {
System.out.printf("\t %s \n", result.getString(1));
}
Note that once the ps
PreparedStatement
object is created, we cannot change the content of the query, beside instantiating the slot. cf. e.g. the discussion at https://stackoverflow.com/q/25902881/.
As we said earlier, a prepared statement can have multiple “slots”, as we can see in that second example:
sqlStatement = "INSERT INTO DVD VALUES (?, ?, ?)";
// Now, our string has 3 empty slots, and it is an INSERT statement.
PreparedStatement preparedStatement = conn.prepareStatement(sqlStatement);
preparedStatement.setString(1, "The Great Dictator");
preparedStatement.setInt(2, 124);
preparedStatement.setDouble(3, 5.4);
rowsAffected = preparedStatement.executeUpdate();
/* You can check "by hand" that this statement was correctly
* executed. Note that the toString method is quite verbose.
*/
System.out.print(preparedStatement.toString() + " changed " + rowsAffected + " row(s).\n");
Where we stored the integer value returned by executeUpdate
and displayed the the prepared statement using thetoString
method.
If we try to mess things up, i.e., provide wrong datatypes:
preparedStatement.setString(1, "The Great Dictator");
preparedStatement.setString(2, "Not-an-integer");
preparedStatement.setString(3, "Not-a-double");
/* This command will make your program crash:
* rowsAffected = preparedStatement.executeUpdate();
*/
Java compiler will be ok, but we’ll have an error at execution time when executing the query.
Executing rowsAffected = preparedStatement.executeUpdate();
would return an error containing
com.mysql.cj.jdbc.exceptions.MysqlDataTruncation: Data truncation: Incorrect integer value: 'Not-an-integer' for column `HW_DBPROG`.`DVD`.`Minutes` at row 1
since "Not-an-integer"
is not … a valid integer!
Of course, prepared statements are particularly convenient when you want to automate some tasks or repeat them multiple times, as you write the query only once, and then re-use it. For instance, inserting the whole “Saw” franchise can be made into a loop:
AdvancedProg.javaWhen you create the Statement
objects, you can give two arguments to the createStatement
method:
Statement stmtNew =
conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
Those options change two things about the ResultSet we obtain using this statement The first argument indicates whenever you can scroll (go forward and backward) in the ResultSets
objects that will be created using this Statement
object:
TYPE_FORWARD_ONLY
is the default (you can only move forward).TYPE_SCROLL_INSENSITIVE
means that you can scroll, but that updates don’t impact result set.TYPE_SCROLL_SENSITIVE
means that you can scroll, and that updates impact result set.Allowing to go in both direction extends the methods one can use in the ResultSet
class: now, to scrool through the results, one can use:
first()
last()
next()
previous()
relative(x)
: move cursor x times (positive = forward, negative = backward)absolute(x)
: move to the row number x, where 1 is the first.The second argument is the concurrency level, it indicates whenever you can update the values into the ResultSet
directly.
CONCUR_READ_ONLY
is the default.CONCUR_UPDATABLE
means that we can change the database without issuing SQL statement.In other terms, manipulting the ResultSet
object will directly impact the data stored in the database if we set the second parameter to CONCUR_UPDATABLE
.
This createStatement
method is documented at https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#createStatement(int,%20int).
You can find below a simple example of “scrollable” ResultSet
:
// code/java/ScrollingProgram.java
import java.sql.*;
public class ScrollingProgram {
public static void main(String[] args) {
try (Connection conn =
DriverManager.getConnection(
// We connect to the database, not to a particular schema.
"jdbc:mysql://localhost:3306/"
+ "?user=testuser"
+ "&password=password"
+ "&allowMultiQueries=true"
/*
* We want to allow multiple statements
* to be shipped in one execute() call.
*/
);
Statement stmt =
conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
/*
* Finally, we want to be able to move back and forth in our
* ResultSets. This implies that we have to also chose if the
* ResultSets will be updatable or not: we chose to have them
* to be "read-only".
*/
) {
/*
* Before you ask: no, there are no "simple" way of
* constructing a string over multiple lines,
* besides concatenating them,
* cf. e.g. https://stackoverflow.com/q/878573
*/
stmt.execute(
"DROP SCHEMA IF EXISTS HW_SCROLLABLE_DEMO;"
+
/*
* We drop the schema we want to use if it already exists.
* (This allows to execute the same program multiple times.)
*/
"CREATE SCHEMA HW_SCROLLABLE_DEMO;"
+ "USE HW_SCROLLABLE_DEMO;"
+
// We create and use the schema.
"CREATE TABLE TEST("
+ " Id INT"
+ ");"
// The schema contains only one very simple table.
);
/*
* We can execute all those queries at once
* because we passed the "allowMultiQueries=true"
* token when we created the Connection object.
*/
// Let us insert some dummy values in this dummy table:
for (int i = 0; i < 10; i++) stmt.addBatch("INSERT INTO TEST VALUES (" + i + ")");
/*
* no ";" in the statements that we add
* to the batch!
*/
stmt.executeBatch();
// We execute the 10 statements that were loaded at once.
// Now, let us write a simple query, and navigate in the result:
ResultSet rs = stmt.executeQuery("SELECT * FROM TEST");
/*
* We select all the tuples in the table.
* If we were to execute this instruction on the
* command-line interface, we would get:
* MariaDB [HW_SCROLLABLE_DEMO]> SELECT * FROM TEST;
* +----+
* | Id |
* +----+
* | 0 |
* | 1 |
* | 2 |
* | 3 |
* | 4 |
* | 5 |
* | 6 |
* | 7 |
* | 8 |
* | 9 |
* +----+
* 10 rows in set (0.001 sec)
*/
// We can "jump" to the 8th result in the set:
rs.absolute(8);
System.out.printf("%-22s %s %d.\n", "After absolute(8),", "we are at Id", rs.getInt(1));
/* Note that this would display "7" since the
* 8th result contains the value 7 (sql starts
* counting at 1.
*/
// We can move back 1 item:
rs.relative(-1);
System.out.printf("%-22s %s %d.\n", "After relative(-1),", "we are at Id", rs.getInt(1));
// We can move to the last item:
rs.last();
System.out.printf("%-22s %s %d.\n", "After last(),", "we are at Id", rs.getInt(1));
// We can move to the first item:
rs.first();
System.out.printf("%-22s %s %d.\n", "After first(),", "we are at Id", rs.getInt(1));
conn.close();
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
You can also have a look at the end of code/java/AdvancedProg.java
, which creates a second Statement
object is created and used.
Forgetting about the technical difficulties for a minute, there is always the issue of finding the right balance between what the application, and what the database, should do. Any control structure should be dealt with by the application, and queries (such as select project joins) should be done by the DBMS, but the line may be a bit blurry at times. For instance, should the schema being created from the application? Probably yes if this is an operation that needs to be performed repeatedly (to “reboot” your schema), or if you want your application to be as portable as possible. Otherwise, it may make little sense.
Another question is: if a task need to be performed repeatedly, should you create a method in the application, or a procedure in the DBMS? Once again, it will depend: if you need to read information from the user or if using control flow is crucial to your task, then a method seems more adequate. But if the task is essentially a series of queries, then creating a procedure may have benefits:
As an example of how to technically declare and use a procedure from an application, refer to the following code:
// code/java/CallProcedure.java
import java.sql.*;
public class CallProcedure {
public static void main(String[] args) {
try (Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/HW_CALL_TEST"
+ "?user=testuser"
+ "&password=password"
+ "&allowMultiQueries=true"
+ "&createDatabaseIfNotExist=true"
+ "&useSSL=true");
Statement stmt = conn.createStatement(); ) {
stmt.execute(
"DROP SCHEMA IF EXISTS HW_CALL_TEST;"
+ "CREATE SCHEMA HW_CALL_TEST;"
+ "USE HW_CALL_TEST;");
stmt.execute("CREATE TABLE Test1 (A INT PRIMARY KEY);");
stmt.execute("INSERT INTO Test1 VALUES (1), (2), (3);");
// To create a procedure, we don't need to change the delimiter!
// Cf. https://stackoverflow.com/a/5314879/ for instance.
stmt.execute(" CREATE PROCEDURE List () BEGIN SELECT * FROM Test1; END; ");
// We create a CallabaleStatement object
// https://docs.oracle.com/javase/7/docs/api/java/sql/CallableStatement.html
// that extends PreparedStatements and allow to call procedures.
CallableStatement cs = conn.prepareCall("CALL List()");
ResultSet rset = cs.executeQuery();
while (rset.next()) {
System.out.println("The value of A is " + rset.getInt("A") + ".");
}
// Second example of procedure, with arguments
stmt.execute(
" CREATE PROCEDURE ListGreaterThan(arg INT) "
+ " BEGIN "
+ " SELECT * "
+ " FROM Test1 "
+ " WHERE A > arg; "
+ " END; ");
cs = conn.prepareCall("CALL ListGreaterThan(?)");
// Note that we use the same "?" placeholder
// than we used for prepared statement.
// We declare an int variable for the argument
// for the sake of clarity, but don't need to.
int x = 2;
// We set the value of the first "? slot"
// using setInt as for prepared statements.
cs.setInt(1, x); // 1 is the position
rset = cs.executeQuery();
System.out.println("The values of A greater than " + x + " are:");
while (rset.next()) {
System.out.println("The value of A is " + rset.getInt("A") + ".");
}
conn.close();
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
Now that you have reviewed some of the tools and options in the API, you can start programming and naviguating the documentation. As an exercise, you can try to combine prepared statements and batch processing on your own. It is actually fairly immediate!
BatchPreparedStatements.javaWhat are the technologies that make it possible for a Java application to communicate with a DBMS?
Why is it important to have the statements creating the connection to the database inside a try…catch
statement?
Name three classes in the SQL API of java.
What JDBC method do you call to get a connection to a database?
Why would somebody want to create multiple Statement
objects?
What is the class of the object used to create a ResultSet object?
Briefly explain what the next()
method from the ResultSet
class does and give its return type.
Write a statement that execute SELECT * FROM TEST;
in the DBMS and store the result given by the database in an object. Assume that there is a Statement
object called stmt
.
What method should be used to perform an INSERT
command from your program? In which class is it?
Where is a ResultSet
object’s cursor initially pointing? How do you move the cursor forward in the result set?
Give three navigation methods provided by ResultSet
.
Explain this JDBC URL format:
:mysql://localhost:3306/HW_NewDB?createDatabaseIfNotExist=true&useSSL=true jdbc
In what class is the getColumnName()
method?
Assuming stmt
is a Statement
object, in the statement:
= stmt.executeUpdate(strC); modif
What is…
modif
?strC
?strC
?Let stmt
be a statement object, and consider the following:
ResultSet rset = stmt.executeQuery("SELECT Name, Price FROM DVD");
Write a piece of code that would display at the screen the name and price of the rows present in the rset
object.
What is a prepared statement?
Give three reasons why prepared statements are preferable over statements.
Assume ps
is the prepared statement:
VALUES (?, ?); INSERT INTO EXAM
Write the three statements needed to allocate “Quiz” and “5” to the two slots and to execute the prepared statement in the database.
Briefly explain what ResultSet.TYPE_SCROLL_SENSITIVE
enables, and where / when it is used.
In the code below, there are five errors between line 13 and line 32. They are not subtle Java errors (like misspelling a key word) and do not come from the DBMS (so you should assume that the password is correct, that the database exists, etc.). Highlight each error and explain why it is an error.
// code/java/ProgWithErrors.java
import java.sql.*;
public class ProgWithErrors {
public static void main(String[] args) {
try (Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/" + "HW_TestDB?user=testuser&password=password");
Statement stmt = conn.createStatement(); ) {
// Errors after this point.
String strSelect = "SELECT title FROM DISKS WHERE qty > 40;";
ResultSet rset = stmt.executeUpdate(strSelect);
System.out.println("The records selected are: (listed last first):");
rset.last();
while (rset.previous()) {
String title = rset.getDouble("title");
System.out.println(title + "\n");
}
String sss = "SELECT title FROM DISKS WHERE Price <= ?";
PreparedStatement ps = conn.prepareStatement(sss);
ResultSet result = ps.executeQuery();
conn.close();
// Errors before this point.
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
Write a program that determines if the null
value from Java code is equal to the NULL
value in the DBMS.
The technologies theat make it possible for a Java application to communicate with the DBMS are API’s and the drivers to implement them.
It is important to put the statements that create the connection to the database inside the try…catch
statement because the program will interact with the environment if this interraction fails (typically, if the connection does not succeed), for which we want to be able to catch the exception and recover from that failure.
There are many classes in the SQL API if Java. There are Connection
, DatabaseMetaData
, ResultSetMetaData
, PreparedStatement
, and Statement
to name a few. You can find them listed at https://docs.oracle.com/javase/7/docs/api/java/sql/package-summary.html.
The JDBC method that must be called to connect to a database is DriverManager.getConnection()
You may want to create multiple Statement
object for multiple reasons: to use parallelism (you could use a statement while another is still being processed), to have different policies (some objects could have update rights, some could not), to connect to multiple databases.
The class of the object used to create a ResultSet
object is the Statement
class. A Statement
object is used to create a ResultSet
object, e.g. by calling the executeQuery
method.
The next()
method checks if there is data to read and, if there is, it moves the cursor to read it. Its return type is Boolean.
You execute a SELECT
statement and store its returning value in a ResultSet
object using
{.java}. ResultSet rset = stmt.executeQuery("SELECT * FROM TEST;");
The executeUpdate()
or execute()
methods can be used to perform an INSERT
command from our program. They are in the Statement
and in the PreparedStatement
classes.
The ResultSet
object’s cursor is initially pointing at the position before the first line. We move the cursor forward by using the next()
method.
There are many navigation methods provided by ResulSet
. They are the first()
, last()
, next()
, previous()
, relative()
, and absolute()
methods.
This JDBC URL format connects to localhost:3306
, creates a new database if needed, and uses the secure SSL connection.
The getColumnName()
method is in the ResultSetMetaData
class.
In the statement modif = stmt.executeUpdate(strC);
…
modif
is an integer (the number of rows modified by the query).strC
is a String
(a SQL
command).strC
is DELETE FROM BOOKS Where Price > 0.5
.We could use the following:
while(rset.next()){
System.out.println("The name is "
+ rset.GetString("Name") + " and the price is "
+ rset.GetDouble("Price") + ".");
}
A prepared statement is a feature used to execute SQL
statements repeatedly with high efficiency that protects against SQL
injections.
A prepared statement offers a protection against SQL
injection, mutualize the work (you write the query only once, and then re-use it), reduces the bandwith usage, and reduce the parsing time on the DBMS (the query is parsed only once as opposed to every time a statement is sent, no matter how similar to the previous one it is).
.setString(1, "Quiz");
ps.setInt(2, 5);
ps.execute(); ps
ResultSet.TYPE_SCROLL_SENSITIVE
is used as the first argument of the createStatement
method from the Connection
class. The official documentation reads
The result can be scrolled; its cursor can move both forward and backward relative to the current position, and it can move to an absolute position. The result set reflects changes made to the underlying data source while the result set remains open.
which means that it is possible to use e.g. the previous()
method to scroll up in the ResultSet objects created using that statement.
The errors are:
ResultSet
object creation line and should only be this part:.executeUpdate(strSelect); stmt
The error is that the executeUpdate()
method cannot be used to perform SELECT
statements.
.previous() rset
This error is subtle: we need to display the last record before using the previous()
method, otherwise it would be skipped. We can fix this using a do
…while
loop.
String
object named title
:String title = rset.getDouble("title");
The error is that the getDouble()
method returns a double
, which cannot be stored as a String
.
ResulSet
object named result
:.executeQuery(); ps
The error here comes from the previous prepared statement that did not receive a value for the ?
.
in `code/java/ProgWithErrorsPatched.java`, which looks like:
You can find the corrected program
```{.bash}
16c16
< ResultSet rset = stmt.executeUpdate(strSelect);
---
> ResultSet rset = stmt.executeQuery(strSelect); // Error 1
21,24c21,24
< while(rset.previous()) {
< String title = rset.getDouble("title");
< System.out.println(title + "\n");
< }
---
> do { // Error 2
> String title = rset.getString("title"); // Error 3
> System.out.println(title); // Not an error, but we probably do not need two new lines.
> }while(rset.previous()); // Error 2 bis
27a28
> ps.setInt(1, 10); // Error 4
```
Here is what the program should look like:
// code/java/TestForNull.java
import java.sql.*;
public class TestForNull {
public static void main(String[] args) {
try (Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/HW_DBPROG?user=testuser&password=password&createDatabaseIfNotExist=true&serverTimezone=UTC");
Statement stmt = conn.createStatement(); ) {
stmt.execute("CREATE TABLE Test (" + "A CHAR(25), " + "B INTEGER, " + "C DOUBLE)");
String strAdd = "INSERT INTO Test VALUES (NULL, NULL, NULL);";
int number_of_row_changed = stmt.executeUpdate(strAdd);
System.out.print("This last query changed " + number_of_row_changed + " row(s).\n");
ResultSet result = stmt.executeQuery("SELECT * FROM Test");
if (result.next()) {
System.out.print(result.getString(1) + " " + result.getDouble(2) + " " + result.getInt(3));
if (result.getString(1) == null) {
System.out.print("\nAnd null for CHAR in SQL is null for String in Java.\n");
}
}
conn.close();
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
This program should display:
This last query changed 1 row(s).
null 0.0 0
And null for CHAR in `SQL` is null for String in Java.
Draw an arrow from class A
to class B
when in the Java SQL
API a method from class A
can be used to create an object from class B
.
Read, execute, break, edit, compile, patch, hack and (most importantly) understand the following program:
// code/java/AdvancedProg.java
/*
* This is a long program, introducing:
* I. How to pass options when connecting to the database,
* II. How to create a table and read its meta-data,
* III. How to insert values,
* IV. How to use prepared statements,
* V. How to read backward and write in ResultSets.
*
* To be able to execute this program multiple times, the schema is dropped and re-created.
*
*/
import java.sql.*;
public class AdvancedProg {
public static void main(String[] args) {
try (
// I. Passing options to the database
// start snippet passing-options
Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/HW_DBPROG"
+ "?user=testuser"
+ "&password=password"
+ "&allowMultiQueries=true"
+ "&createDatabaseIfNotExist=true"
+ "&useSSL=true");
// end snippet passing-options
Statement stmt = conn.createStatement(); ) {
/*
* Below, we drop the schema and re-create it to allow multiple execution of the
* program. You can ignore this part if you want.
*/
stmt.execute(
"DROP SCHEMA IF EXISTS HW_DBPROG;" + "CREATE SCHEMA HW_DBPROG;" + "USE HW_DBPROG;");
// II. Creating a table and reading its meta-data
// start snippet table-creation
stmt.execute(
"CREATE TABLE DVD ("
+ "Title CHAR(25) PRIMARY KEY, "
+ "Minutes INTEGER, "
+ "Price DOUBLE)");
// end snippet table-creation
// start snippet table-metadata-1
DatabaseMetaData md = conn.getMetaData();
ResultSet rs = md.getTables("HW_DBPROG", null, "%", null);
// end snippet table-metadata-1
// start snippet table-metadata-2
while (rs.next()) {
System.out.println(rs.getString(3));
}
// end snippet table-metadata-2
// III. Inserting values
// start snippet inserting-1
String sqlStatement = "INSERT INTO DVD VALUES ('Gone With The Wind', 221, 3);";
int rowsAffected = stmt.executeUpdate(sqlStatement);
System.out.print(sqlStatement + " changed " + rowsAffected + " row(s).\n");
// end snippet inserting-1
// start snippet inserting-2
String insert1 = "INSERT INTO DVD VALUES ('Aa', 129, 0.2)";
String insert2 = "INSERT INTO DVD VALUES ('Bb', 129, 0.2)";
stmt.executeUpdate(insert1 + ";" + insert2);
// end snippet inserting-2
// start snippet inserting-3
String insert3 = "INSERT INTO DVD VALUES ('Cc', 129, 0.2)";
String insert4 = "INSERT INTO DVD VALUES ('DD', 129, 0.2)";
stmt.addBatch(insert3);
stmt.addBatch(insert4);
stmt.executeBatch();
// end snippet inserting-3
// IV. Prepared Statements
// start snippet prepared-queries-1
/*
* We create a string with an empty slot,
* represented by "?".
*/
sqlStatement = "SELECT title FROM DVD WHERE Price <= ?";
/*
* We create a PreparedStatement object, using that string with an
* empty slot.
*/
PreparedStatement ps = conn.prepareStatement(sqlStatement);
/*
* Then, we "fill" the first slot with the value of a variable.
*/
double maxprice = 0.5;
ps.setDouble(1, maxprice);
/*
* Finally, we can execute the query, and display the results.
*/
ResultSet result = ps.executeQuery();
System.out.printf("For %.2f you can get:\n", maxprice);
while (result.next()) {
System.out.printf("\t %s \n", result.getString(1));
}
// end snippet prepared-queries-1
// start snippet prepared-queries-2
sqlStatement = "INSERT INTO DVD VALUES (?, ?, ?)";
// Now, our string has 3 empty slots, and it is an INSERT statement.
PreparedStatement preparedStatement = conn.prepareStatement(sqlStatement);
preparedStatement.setString(1, "The Great Dictator");
preparedStatement.setInt(2, 124);
preparedStatement.setDouble(3, 5.4);
rowsAffected = preparedStatement.executeUpdate();
/* You can check "by hand" that this statement was correctly
* executed. Note that the toString method is quite verbose.
*/
System.out.print(preparedStatement.toString() + " changed " + rowsAffected + " row(s).\n");
// end snippet prepared-queries-2
// start snippet prepared-queries-3
preparedStatement.setString(1, "The Great Dictator");
preparedStatement.setString(2, "Not-an-integer");
preparedStatement.setString(3, "Not-a-double");
/* This command will make your program crash:
* rowsAffected = preparedStatement.executeUpdate();
*/
// end snippet prepared-queries-3
// start snippet prepared-queries-4
for (int i = 1; i < 5; i++) {
preparedStatement.setString(1, "Saw " + i);
preparedStatement.setInt(2, 100);
preparedStatement.setDouble(3, .5);
preparedStatement.executeUpdate();
}
// end snippet prepared-queries-4
// V. Reading backward and writing in ResultSets
// start snippet new-statement-1
Statement stmtNew =
conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
// end snippet new-statement-1
// Reading backward
sqlStatement = "SELECT title FROM DVD WHERE Price < 1;";
result = stmtNew.executeQuery(sqlStatement);
System.out.println("For $1, you can get:");
if (result.last()) {
// We can jump to the end of the ResultSet
System.out.print(result.getString("Title") + " ");
}
System.out.print("and also, (in reverse order)");
while (result.previous()) {
// Now we can scroll back!
System.out.print(result.getString("Title") + " ");
}
// Changing the values
System.out.print("\n\nLet us apply a 50% discount. Currently, the prices are:\n");
sqlStatement = "SELECT title, price FROM DVD;";
result = stmtNew.executeQuery(sqlStatement);
while (result.next()) {
System.out.printf("%20s \t $%3.2f\n", result.getString("title"), result.getDouble("price"));
}
// We need to scroll back!
result.absolute(0);
while (result.next()) {
double current = result.getDouble("price");
result.updateDouble("price", (current * 0.5));
result.updateRow();
}
System.out.print("\n\nAfter update, the prices are:\n");
// We need to scroll back!
result.absolute(0);
while (result.next()) {
System.out.printf("%20s \t $%3.2f\n", result.getString("title"), result.getDouble("price"));
}
conn.close();
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
Consider the code below:
// code/java/GuestProgram.java
// java.util.Scanner is an API to read from the keyboard.
import java.sql.*;
import java.util.Scanner;
// This first part is "standard". Just note that we allow multiple statements.
public class GuestProgram {
public static void main(String[] args) {
try (Connection conn =
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/?user=testuser&password=password"
+ "&allowMultiQueries=true");
Statement stmt = conn.createStatement(); ) {
// We create a schema, use it, create two tables, and insert a value in the second one.
stmt.execute(
"CREATE SCHEMA HW_GUEST_PROGRAM;"
+ "USE HW_GUEST_PROGRAM;"
+ "CREATE TABLE GUEST("
+ "Id INT PRIMARY KEY,"
+ "Name VARCHAR(30),"
+ "Confirmed BOOL"
+ ");"
+ "CREATE TABLE BLACKLIST("
+ "Name VARCHAR(30)"
+ ");"
+ "INSERT INTO BLACKLIST VALUES (\"Marcus Hells\");");
/*
* INSERT HERE Solution to exercises 1, 2 and 3.
* Tip for Exercise 1, this solves the first item.
*/
System.out.print("How many guests do you have?\n");
Scanner key = new Scanner(System.in);
int guest_total = key.nextInt();
} catch (SQLException ex) {
ex.printStackTrace();
}
}
}
In the following three exercises, you will add some code below the comment // INSERT HERE Solution to exercises 1, 2 and 3.
in order to obtain a behavior like the following one (you do not have to reproduce it exactly!). The user input is underlined, and hitting “enter” is represented by ↵:
How many guests do you have?
2͟↵
Enter name of guest 1.
M͟a͟r͟c͟u͟s͟ ͟H͟e͟l͟l͟s͟↵
Enter name of guest 2.
C͟y͟n͟t͟h͟i͟a͟ ͟H͟e͟a͟v͟e͟n͟s͟↵
……………⌛……………
Oh no, (at least) one of the guest from the black list confirmed their presence!
The name of the first one is Marcus Hells.
Do you want to remove all the guests that are on the black list and who have confirmed
their presence? Enter "Y" for yes, anything else for no.
You should suppose that BLACKLIST
contains more than one name, and that some other operations are performed where ……………⌛…………… is (typically, some guests will confirm their presence). Using batch processing or prepared statements will be a plus, but is not mandatory to solve these exercises.
key.nextLine()
, that returns the String
entered by the user),GUEST
table an integer that is incremented after each insertion, the name entered by the user, and NULL
.The file code/java/GuestProgramSolution.java
contains the whole code for you to compile and test.
Pb 5.3 – Solution to Q. 1
We explore two solutions, one with batch processing, the second with prepared statement.
They both starts with:
Then the solution using batch processing could be:
while (counter < guest_total) {
// Ask the name of the guest.
System.out.print("Enter name of guest " + (counter + 1) + ".\n");
// Read the name of the guest.
guest_name = key.nextLine();
stmt.addBatch("INSERT INTO GUEST VALUES (" + counter + ", \"" + guest_name + "\", NULL)");
// Add to the batch the statement to insert the required data in the table
counter++;
}
stmt.executeBatch(); // Execute the batch statement.
while the solution using prepared statements could be:
PreparedStatement ps = conn.prepareStatement("INSERT INTO GUEST VALUES(?, ?, NULL);");
while (counter < guest_total) {
System.out.print("Enter name of guest " + (counter + 1) + ".\n");
= key.nextLine();
guest_name .setInt(1, counter);
ps.setString(2, guest_name);
ps.executeUpdate();
ps++;
counter}
Pb 5.3 – Solution to Q. 2
We let SQL
do all the hard work:
ResultSet rset =
stmt.executeQuery(
"SELECT * FROM GUEST, BLACKLIST WHERE GUEST.Name = BLACKLIST.Name AND"
+ " GUEST.Confirmed = true");
if (rset.next()) {
System.out.print(
"Oh no, (at least) one of the guest from the black list confirmed their presence!\n"
+ "The name of the first one is "
+ rset.getString(2)
+ ".\n");
}
Pb 5.3 – Solution to Q. 3
Similarly, we let SQL
do all the hard work:
System.out.print(
"Do you want to remove all the guests that are on the black list and confirmed their"
+ " presence? Enter \"Y\" for yes, anything else for no.\n");
if (key.nextLine().equals("Y")) {
stmt.execute(
"DELETE FROM GUEST WHERE NAME IN (SELECT NAME FROM BLACKLIST) AND Confirmed = true;");
}
DBMS, as any software, needs to be secured. DBMS, as any online service, needs to be well secured. DBMS, as any place where (possibly confidential) data is stored, needs to be extremely well secured.
In this Chapter, we review some “usual” aspects of security, before focusing on one particular type of attack on DBMS, SQL
injections.
As usual, a threat model needs to be sketched when designing how your DBMS will be used. It should answer questions like
The first question is of importance, as you will not be securing your application the same way depending of if you fear attack from script kiddies, competitors, former employee, or government. However, thinking “this system can not possibly be secured against Google’s quantum computer, so let’s do nothing” is probably giving your system too much importance (Google is not going to waste its resources to hack your database), and counter-productive (you should protect your database against low-level threats in any case).
Risks generally include
About the type of attacks, DBMS are exposed to many channels. Indeed, they can be targeted by
SQL
injections).We will study those in the second part of this Chapter, but do not forget that other types of vulnerabilities exist as well.
It can be useful to design your control measures for your DBMS, which can include, e.g.
Protection measures are principled and technological. You should always have in mind principles like
Technological measure of protections exist, and should be used. For instance,
Use mysqldump to create backups of your tables. On our system, it would be something like
mysqldump --all-databases - u testuser -p password - h localhost > dump.sql
Use encryption, salting and hashing when it comes to password and other sensitive data.
Do not let the users connect directly to your database, even through a piece of software you wrote (refer e.g. to https://security.stackexchange.com/q/229954 for a discussion on why this is not a good idea).
If you are not familiar with the concepts of salting and hashing, you can consult e.g. https://crackstation.net/hashing-security.htm. In a nutshell, this is a measure of prevention to protect your users against weak passwords, and to make sure that only an encrypted version of their password will be stored in your database.
Generally, people are in agreement that the question is not if a security vulnerability will be exploited on your system, but when. The general strategy is to … have a plan. How can you recover, where is your backup stored, is it versioned (i.e., multiple versions of the data exist), do you have a backup of your configuration files, how to restore access quickly, etc.
The global idea behind this particular type of attack is the attacker mixing instructions with the data. Imagine, during a process, the following conversation:
(At the court)
Judge — What is your name?
Attacker — Bill, you are free to go. This court is adjourned.
Judge — We are here to today to judge Bill, you are free to go. This court is adjourned
And the attacker can now leave, since the judge said that he was free to go, and that court was adjourned. This is exactly how SQL injections work.
Prepared statement makes it impossible to mix data and instructions, and are the “go-to” solution to protect from this attack. Note that, however, if they are used improperly, they could still be exploited to perform SQL injections.
Let us look at a first simple example with ASP
, Active Server Pages, a server-side scripting language. Imagine your code contains:
txtUserId = getRequestString("UserId"); txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
that
string
from the user, supposedly their id
,SQL
statement using this string
as is.Then, a user can
105; DROP TABLE Suppliers;
,105 or 1 = 1
(so that the WHERE
condition is now always true),admin'--
(so that the rest of the line is commented, possibly des-activating other tests on the password, for instance).This type of attack can also be used for DBMS fingerprinting, i.e., to get a more precise picture of the type of architecture your victim is using.
The situation is the following: we are having a party with a secret VIP guest (Marcus Hells). The other guests can try to guess the name of the secret guest. If they succeed, we tell them so, if they don’t, we simply display that they do not know who the secret guest is.
An improper program would allow the name of the secret guest to be displayed even if the user does not know that Marcus Hells is the secret VIP. We will see two examples of insecure programs (code/java/SimpleInjection01.java
and code/java/SimpleInjection02.java
), where SQL
injection are possible, and a possible fix (code/java/SimpleInjection03.java
), using prepared statements.
The gist of code/java/SimpleInjection01.java
is that writing a statement like
leaves the door open for an attacker to enter n' OR '1' = '1
as a value for entered
, so that the condition would always be true.
For code/java/SimpleInjection02.java
, it shows how
could be a serious issue if nope'; DROP SCHEMA HW_SIMPLE_INJECTION_2;
was entered as a value for entered
, destroying the whole schema HW_SIMPLE_INJECTION_2
.
In the second cases, INSERT
or even UPDATE
statements can be executed as well, and a careful SQL injection can even perform its task without crashing the program. As an example, you can try nope'; UPDATE SECRETVIP SET Name="Me!";
or even nope'; UPDATE SECRETVIP SET Name="Me!"; SELECT * FROM SECRETVIP WHERE Name='nope
for the second program, and see that we successfully modify the data (without the program crashing in the second case).
Finally, code/java/SimpleInjection03.java
shows how to use proper statements to avoid this situation. Note that the attacks we discussed are no longer possible with this program.
Possible protections from SQL injections (-like) includes:
If parts of your prepared statement is determined by the user, then SQL injection could still be possible. For instance, having
PreparedStatement ps =
.prepareStatement("SELECT * FROM " + table_given_by_user + " WHERE Name = ?;"); conn
would still leave you exposed, as table_given_by_user
could mix instructions with data.
For each of the following service, indicate the possible consequences attached to the type of loss given, or the type of loss that would result in the consequence given.
Service | Type of loss | Consequence |
---|---|---|
GPS Satellite | Availability | |
Pounce | Integrity | |
Patient Database | Everybody knows your uncle had an appendicitis when he was 12. | |
Bank Service | Your professor is now a millionaire. | |
Mark Zucherberg’s phone number is now public. | ||
Booking Website | You can’t book your Summer vacations. |
You forgot your password for an online service, and click on their “Forgot your password?” link. You enter your email and a few seconds later receive an email with your original password in it. What is the issue here? What are the next steps you should take?
Briefly explain what a SQL injection is.
Briefly explain what a prepared statement is and the benefits it provides.
You are using a software that is directly connected to a database. You do not have access to the source code, but you suspect it is vulnerable to SQL
injections. How do you proceed to test if injections are possible?
What is fingerprinting?
A possible solution is
Service | Type of loss | Consequence |
---|---|---|
GPS Satellite | Availability | The navigation system in your car goes blank. |
Pounce | Integrity | Your grades have been changed. |
Patient Database | Confidentiality | Everybody knows your uncle had an appendicitis when he was 12. |
Bank Service | Integrity | Your professor is now a millionaire. |
Confidentiality | Mark Zucherberg’s phone number is now public. | |
Booking Website | Availability | You can’t book your Summer vacations. |
The issue is that they are storing your password in clear text, which is an extremely bad security practice. This suggests that this service does not care about the security of their users, and that all the data in it should be considered compromised. The next steps are:
An SQL injection is a type of attack targeting DBMS using the SQL programming langugae. It consists in mixing conditions (e.g., ' OR '1'='1' --
) or commands (e.g., DROP TABLE users;
) into the data asked to the user with the goal of executing malicious code on the targeted DBMS. It can result in loss of confidentiality, availability, or integrity, and is a common vector of attack on DBMS.
A prepared statement is stored in a DBMS as a “query with parameters,” or a template waiting for values to be passed to fill those placeholders, or slots, and then is executed all together as one statement. It is used to execute the same or similar statements repeatedly and with high efficiency, since it is pre-compiled, and compiled only once, it takes less computational resources to be executed. Also, in the case where the arguments are transmitted over the network, it means that only the arguments, and not the whole query, has to be sent, which may result in a increase in speed.
Moreover, since only the arguments are passed, it prevents SQL
injection, when properly utilized.
There are two ways to test if SQL
injections are possible:
1 OR 1 = 1
or ; DROP TABLE Users;--
Note that both options can be explored in parallel. You can also check out coder resoures, e.g. https://sqa.stackexchange.com/q/1527/, for more ideas on how to test for injections.
In general, fingerprinting means accessing information to uniquely identify something. In this particular context, it means attacking the DBMS using multiple techniques (among which SQL injections) to obtain more information about it (software, version, plug-in, etc.). You can read more about it in this article.
Consider the following code:
Scanner key = new Scanner(System.in);
System.out.print(
"Do you want to browse the table containing "
+ "DISK, BOOK or VINYL? (please enter exactly the table name)?\n");
String table = key.nextLine();
System.out.print("How much money do you have?\n");
String max = key.nextLine();
ResultSet rst =
stmt.executeQuery("SELECT Title FROM " + table + " WHERE PRICE <= " + max + ";");
System.out.printf("Here are the %s you can afford with %s: \n", table, max);
while (rst.next()) {
System.out.printf("\t- %s \n", rst.getString(1));
}
Assume this software is connecting to a schema in a database hosted at http://example.com/ using:
Connection conn = DriverManager.getConnection(
"jdbc:mysql://example.com/:3306/?user=admin&password=admin");
The schema contains three tables (DISK
, BOOK
and VINYL
), each with Title
and Price
attributes. The compiled version is then shared with customers all around the world.
You can find a program in a compilable state at code/java/InsecureProgram.java
that connects to localhost, if you want to test it.
The authors of this program believe that the top-secret title of the next disk by a secret group will not be accessible to the user of this program because its price is set to NULL
in the DISK
table. Prove them wrong.
This database application and the whole set-up contains at least three vulnerabilities. List as many as you can think of, and, when relevant, describe how to fix them.
This program is vulnerable to SQL
injection. A user entering “DISK” followed by 0 OR PRICE IS NULL OR PRICE IS NOT NULL
would have access to all the entries, no matter their price tag or lack of one.
Some of the issues are:
admin
/ admin
as a login / password is unforgivable. The login and password should be changed. And, at least, the application should not connect to the database with admin rights!SQL
injection like the one we saw above.To write this chapter, were used
This part is partially inspired from (Sadalage and Fowler 2012, chap. 1), but it has been further updated.
When you write a database application, you have two options:
The first option can cause severe impacts on the efficiency of your system: since multiple clients, different in nature, access the same DBMS, it can become a bottleneck. On the plus side, there is no need to synchronize or duplicate the information, as everything is already in one place.
The second option sidesteps the “bottleneck” issue if the number of user is reasonnable, but may require a lot of synchronization if multiple application needs to share some information. It can also generate a lot of duplication, if the databases need to have some data in common. But, with that second option, you develop an “application database” (i.e., a database dedicated to a particular application), and you have more freedom in the design, schema, and even DBMS (you can use one particular software solution for one particular database application, and a different one for a different database application).
The third option can become a requirement if a large number of clients are using your application and your database become flooded with requests. This is mostly this need to distribute the “same” data accross databases that we will be discussing below.
The increase in everything (traffic, size of data, number of clients, etc.) means “up or out”, and raises numerous challenges for the “one database for multiple application” option. There is two ways to increase the resources and to scale up:
The second option is generally less expensive (compare buying 1,000 raspberry pi VS buying 1 supercomputer that is not a cluster of more modest computers), but came with two drawbacks w.r.t. databases:
Developping DBMS more suited for distributed architectures became growingly important, and some comanies took at stab at it. The more important attemts were
It was solutions suited to the needs of those big companies, that were very specific. But it was interresting to see SQL’s supremacy being questionned.
One of the goal was to get rid of “impedance mismatch”: mapping classes or objects to database tables defined by a relational schema is complex and cumbersome. However, if you want your database application to go naturally from their data representation to the representations in the DBMS, solving this issue becomes critical. Among the issues,
The term “impedance mismatch” describes that annoying need for a translation, and one of the goal of this first shift was to get rid of it.
Also, the data is now moving, growing fast, extremely diverse, and traditional relational DBMS seemed not necessarily wel-suited to hande those changes.
To renew the world of DBMS, there were multiple attempts, going in multiple directions. A meetup to discuss them coined the term “NoSQL” in an attempt to have a “twittable” hashtag, and it stayed (even it is as specific as describing a dog as “not being a cat”). The original meet-up asked for “open-source, distributed, nonrelational database”. Today, there is no “official” definition of NoSQL, but NoSQL often implies the following:
SQL
. Some still have a query language, and it ressembles SQL
(to minimize learning cost), for instance Cassandra’s CQL.,Another important notion that emerged was the notion of “polyglot persistence”, which is the idea of “using different data storage technologies to handle varying data storage needs.” In other terms, if you adopt the “application database” approach (i.e., one database dedicated to one particular application), the you can use the DBMS A for your application 1, and the DBMS B for your application 2, or even use A and B for the same application!
There was a lot of enthusiasm, also because this approach “frees the data” (and, actually, the metadata, cf. application/ld+json, JavaScript Object Notation for Linked Data, schema.org, etc.): sharing e.g. a json
file is much easier that sharing a SQL
view along with its schema (the example in the Document-Oriented Database will make it clearer).
Some of it will last for sure: polyglot persistency, the possibility of being schema-less, being “distributed first”, the possibility of sacrificing consistency for greater good, etc. This does not mean that SQL
(“OldSQL”) and relational database are over: there are still useful in many scenario, and the powerfull query language is great (writing your own every time is a nightmare…).
Starting ~ 2010, one reaction was to develop “NewSQL”, which would combine aspects of both approaches. For instance, having to drop the ACID requirements (detailled in this Section) was often seen as a major drawback, but, for instance, MongoDB announced that it would have more and more of the ACID properties!
Also, a really great use of NoSQL is to adopt it at an early stage of the development, when it is not clear what the schemas should be. When the schemas are final, then you can shift to relational DBMS!
The retro-acronym “Not Only SQL
” emphasizes that SQL
will still be one of the principal actor, but that developer should be aware of other solutions for other needs.
It should also be remembered that multiple technologies can and should co-exist. As an example, the hierarchical database model is a type of DBMS dating back to the 60’s that has some advantages (high performance and availability) but one major drawback: as the data is represented as trees, the only type of relationship that can be represented is one-to-many (1 : M). However, this tree-like structure is still relevant today in some particular applications: for file systems or geographical information, or because of its qualities, it is still used for e.g. file systems or in the windows registery.
SQL
and the NoSQL approach can be compared in many different ways. Note that there is no “best tool”: it would be like trying to decide if a hammer is better than a saw, the answer is “it depends of what you want to do with it!”. But you can use one relational or non-relational DBMS for different purposes, sometimes, again, within the same application (“polyglot persistency”).
« Comparaison n’est pas raison »24
ACID and BASE are three acronyms capturing desirable features of DBMS, while CAP is a theorem stating the impossibility to have some desirable properties at the same time in distributed systems.
ACID is the guarantee of validity even in the event of errors, power failures, etc.
CAP (a.k.a. Brewer’s theorem): Roughly, “In a distributed system, one has to choose between consistency (every read receives the most recent write or an error) and availability (every request receives a (non-error) response, without guarantee that it contains the most recent write)” (the P. standing for “Partition tolerance”, a guarantee of availability).
BASE (also formulated by Brewer) corresponds to Basic Availability, Soft state, Eventual consistency. It is a series of properties that can be reached by distributed systems, including NoSQL systems, and is often seen as the “NoSQL’s version of ACID”. This answer for answer, gives some insight on its meaning.
There are multiple ways to be “non-relational”. A rough hierarchy of the different approaches can be sketched as follows.
Model | Description | Examples |
---|---|---|
Document-based | Data is stored as “documents” (JSON, for instance), accessible via their ID (other indexes). | Apache CouchDB (simble for web applications, and reliable), MongoDB (easy to operate), Couchbase (high concurrency, and high availability). |
Key-value stores | Fast access by the key to the value. Value can be a record, an object, a document, or be more complex. | Redis (in-memory but persistent on disk database, stores everything in the RAM!) |
Column-based (a.k.a. wide column) | Partition a table by colmuns into column families, where each column family is stored in its own files. | Cassandra, HBase (both for huge amount of data) |
Graph-based | Data is represented as graphs, and related nodes can be found by traversing the edges using path expressions. | Neo4J (excellent for pattern recognition, and data mining) |
Multi-model | Support multiple data models | Apache Ignite, ArangoDB, etc. |
MongoDB is
MongoDB is endowed with
Note that while the design of your database becomes a “second class citizen”, as you can start manipulating data before a schema has been defined, this does not mean that design became irrelevant. General design principles still needs to be adopted, and everything that was said about design remains true. The key difficulty is that there is no foreign key, in MongoDB, or at least no contraints attached to the relationships two documents can have, except for the one you implement. This is generally considered to be a downside in terms of consistency, and an advantage in terms of flexibility and scalability.
Mongodb is vulnerable to SQL
injection (cf. https://zanon.io/posts/nosql-injection-in-mongodb) and should respect the same general guidelines as discussed in A Bit About Security (cf. https://docs.mongodb.com/manual/administration/security-checklist/).
And additional challenge is that e.g. since JOIN
operations need to be performed “by hand”, in the application program (cf. https://www.w3schools.com/nodejs/nodejs_mongodb_join.asp), your attack surface grows.
Let us start by detailling what a “document” is. There are multiple different implementations and definition of what a document is, but at the core of all of them are the followings:
Among the formats of documents, there is XML
, YAML
, JSON
(JavaScript Object Notation), PDF
, etc. You can generally convert from one format to the others, which is an important feature.
An example of XML (Extensible Markup Languag) document, storing information about what Martin and Pradmod like, which cities they visited, etc.:
<?xml version="1.0" encoding="UTF-8"?>
<!-- code/xml/person.xml -->
<root>
<element>
<firstname>Martin</firstname>
<lastVisited>Paris</lastVisited>
<lastcity>Boston</lastcity>
<likes>
<element>Biking</element>
<element>Photography</element>
</likes>
</element>
<element>
<firstname>Pramod</firstname>
<lastcity>Chicago</lastcity>
<addresses>
<element>
<city>DILLINGHAM</city>
<state>AK</state>
</element>
<element>
<city>PUNE</city>
<state>MH</state>
</element>
</addresses>
<citiesvisited>
<element>Chicago</element>
<element>London</element>
<element>Pune</element>
<element>Bangalore</element>
</citiesvisited>
</element>
</root>
As you can see, from this document:
element
(person) contains different information: we know the first name of both, but not the address of Martin, nor the lastVisited
of Pradmod.addresses
), but their order does not matter.shiporder
or item
tags are made-up!A more detailled example, including the design of a schema, can be found at w3schools.com.
The kind of document MongoDB uses is called BSON
(portmanteau of the words “binary” and “JSON
”), and it actually extends JSON
. Think of BSON
as a binary representation of JSON
documents.
Mongodb is a document-oriented database (document store), which means that the databases contain semi-structured data. It is a subclass of the key-value store:
RDB is excellent for optimization, but sometimes waste space (placeholders for optional values) and is sometimes too rigid. KV does not allow any optimization, but provides flexibility and follows more closely modern programming concepts. DO has the flexibility of KV, and allows for some optimization.
One important difference: in RDB, data is stored in separate tables, and a single object (entity) may be spread across several tables. In DO, one object = one instance, and every stored object can be different from every other. There are pros to this approach:
Let us start by mapping the common notions of RDBMS to the mongoDB ecosystem:
RDBMS | MongoDB |
---|---|
database instance | MongoDB instance |
schema | database |
table | collection |
row | document |
Each MongoDB instance has multiple databases, each database can have multiple collections.
Our previous XML
“person” example can be converted into two documents25 delimited by [
…]
, used to delimit an array of document.
[
{
"_comment": "code/json/person.json"
},
{
"firstname": "Martin",
"likes": [
"Biking",
"Photography"
],
"lastcity": "Boston",
"lastVisited": "Paris"
},
{
"firstname": "Pramod",
"citiesvisited": [
"Chicago",
"London",
"Pune",
"Bangalore"
],
"addresses": [
{
"state": "AK",
"city": "DILLINGHAM"
},
{
"state": "MH",
"city": "PUNE"
}
],
"lastcity": "Chicago"
}
]
Note that
addresses
is a document embedded in a document!A collection should be on “related” entities (do not store server logs, store customers and list of employee in the same collection!), and not too abstract ones (no “Server stuff”). Also, if you store document that are too different, your performances will take a big hit. Bottom line: think about your usage, and the kind of queries you will perform.
So, in summary, “Schema-less” does not mean “organization-less”!
The instructions are only for Linux, but should be easy to adapt.
Download and install mongodb
from https://www.mongodb.com/download-center/community, select the “server” and “shell” packages.
As a normal user, type
mkdir /tmp/mongotest
mongod --dbpath /tmp/mongotest
to start the server and create a “dummy” database in the folder /tmp/mongotest
.
Then, open another terminal, and type in, as a normal user mongo
.
The documentation is nicely written and well-organized: we’ll follow parts of it, please refer to it if needed. You can start by opening the “Getting started” tutorial and running its examples on your own installation.
The syntax for the command-line interface can be found at https://docs.mongodb.com/manual/reference/mongo-shell/. In a first approximation, the syntax is of the form:
db.<name of the collection>.<command>(<arguments>)
Where db
is not the name of the database, it is just the prefix.
To get information about your installation, use
show dbs
to see the databases,
use mydb
to use the mydb
database,
show collections
to see the collections in a particular database,
To insert, use:
db.books.insert({"title": "Mother Night", "author": "Kurt Blabal"})
MongoDB will add a unique identifier (_id
) if you do not provide one. You can think of that as a primary key.
To remove an entry, use:
db.books.remove({"title":"Mother Night"})
To update an entry, use:
db.books.update({"title":"Mother Night"}, {$set: {"quantity" : 10}})
Other function, such as $inc
, to increment, can be used.
To select, use:
db.books.find()
db.books.find({"title":"Mother Night"})
SELECT * FROM Books WHERE Title="Mother Night";
and db.books.find({"title":"Mother Night"}, {"author":1, "quantity":1})
is like SELECT Author, Quantity FROM Books WHERE Title="Mother Night";
Both search for the book with title “Mother Night”, and the second query displays only the author and quantity attributes (along with the _id
, which is included by default).
db.books.find({"title":"Mother Night"}, {"author":0, "quantity":0})
_id
from the attributes given: db.books.find({"title":"Mother Night"}, {"author":1, "quantity":1, "_id":0})
db.books.find({}, {"author":1, "quantity":1})
db.books.find({"quantity":{"$gte": 10, "$lt": 50}})
displays the entries were the quantity is greater than equal to 10, and less than 50.It is possibility to mimic some features of SQL
(like the unique attributes), but there are no referential key integrity, for instance.
Most insert / update / detele will return success as soon as one node received your command, but you may tweak them so that success is returned only once the operation has been performed on the majority of the nodes.
Mongodb does not offer as many features as e.g. MySQL, and there is the need to write a lot on the program side. However, you can find a lot of API (i.e., it is taking the “package manager” approach to offer a modular software), cf. for instance an API over mongo-java-driver: http://jongo.org/ (support some form of prepared statement).
This section will follow Mongodb’s “quick tour” of the Java api, as discussed at https://mongodb.github.io/mongo-java-driver/3.9/driver/getting-started/quick-start/.
You will need to :
Place those two files, mongo-java-driver-3.9.1.jar
and QuickTour.java
in the same folder, and run
java -cp .:mongo-java-driver-3.9.1.jar QuickTour.java
(We do not compile the file first thanks to Java’s JEP 330’s feature.)
You should see a large number of lines displayed at the screen, and around the top, the message INFO: Opened connection [connectionId{localValue:2, serverValue:12}] to localhost:27017
. Now, open the program file and inspect it.
After various import statement, the program create a MongoClient
object called mongoClient
, and connects it to the local database server:
MongoClient mongoClient;
if (args.length == 0) {
// connect to the local database server
mongoClient = MongoClients.create();
} else {
mongoClient = MongoClients.create(args[0]);
}
To get a database and a collection, the program uses:
// get handle to "mydb" database
MongoDatabase database = mongoClient.getDatabase("mydb");
// get a handle to the "test" collection
MongoCollection<Document> collection = database.getCollection("test");
Note that a collection is simply an ArrayList of documents.
Assume we want to create the following document:
{
"name": "MongoDB",
"type": "database",
"count": 1,
"info": {
"x": 203,
"y": 102
}
}
(Remember: order does not matter!)
Then we can use the Document
class to create it, and then insert the document created:
// make a document and insert it
Document doc =
new Document("name", "MongoDB")
.append("type", "database")
.append("count", 1)
.append("info", new Document("x", 203).append("y", 102));
collection.insertOne(doc);
Note that we can “chain” the append
, using doc.append("type", "database").append("count", 1);
etc.
Only at this point would the database and collection being created.
To “witness” what the program is doing from the command line, you can, for instance,
Edit the Java program, by commenting the statement database.drop();
.
Execute the modified version,
Open the command-line-interface (simply type mongo
), and run:
use mydb
show collections
db.test.find()
This last command should returns something like
{ "_id" : ObjectId("5ea72152d8b5777d53c1a148"), "name" : "MongoDB", "type" : "database", "count" : 1, "info" : { "x" : 203, "y" : 102 } }
The program goes on and is discussed in details at https://mongodb.github.io/mongo-java-driver/3.9/driver/getting-started/quick-start/. You can see for instance that to construct lists of documents and insert them, one can use:
// now, lets add lots of little documents to the collection so we can explore queries and
// cursors
List<Document> documents = new ArrayList<Document>();
for (int i = 0; i < 100; i++) {
documents.add(new Document("i", i));
}
collection.insertMany(documents);
A discipline similar to what we saw on Java applications interacting with MySQL should apply:
We can summarize some of the principles we have learned, and introduce some new, as follows:
Briefly explain the term “Durability”.
What is polyglot persistence? Is it useful?
What does it mean to be “schemaless”? What does it imply?
What is denormalization? When could it be useful?
What is the object-relational impedance mismatch? Is it an issue that cannot be overcome?
For each of the following notions, indicate if they are usually an attribute of NoSQL or of “traditional” SQL
:
Schema First | Distributed | Relational | Scalable | |
NoSQL | ||||
SQL |
Schema First | Distributed | Relational | Scalable | |
NoSQL | ✓ | ✓ | ||
SQL |
✓ | ✓ |
“NoSQL” used to mean “Non SQL
”, but was retro-actively given the meaning “Not Only SQL
.” Below, write a short essay that explains:
SQL
” approach.SQL
.”Your friend has a small MongoDB application to keep track of video games high-scores that they would like to convert to a relational database. They need your help in designing a suitable model for their needs, and to get them started in translating their MongoDB code into SQL code.
A typical document in their database is given below. Sketch an entity-relationship diagram that could fit their needs.
“Translate” the following commands from their workflow into SQL commands, assuming that they successfully implemented the schema you designed at the previous step.
db.games.update(
{"Game name": "Tetris"},
{$set:
{"High score" :
{"Points": 1399, "Date":"2021/03/29"}
}
}
)
db.games.find({"Hold by": "Aunt Minnie"}, {"Game Name":1, "Points":1}}
db.games.find({"High score": {"$gte": 10, "$lt": 1000}, "Platform":"Nes"})
{
"Game name":"Tetris",
"Platform":[
{
"Name":"Nes",
"Number of controllers":2
},
{
"Name":"Game boy",
"Portable":true,
"Quantity":2
}
],
"High score":{
"Points":1293,
"Hold by":"Aunt Minnie",
"Date":"2021/02/30"
},
"Description":"Complete lines",
"Genre":[
"Puzzle",
"Tile-matching"
]
}
Consider the following xml
file:
<?xml version="1.0" encoding="UTF-8"?>
<!-- code/xml/sustomers.xml -->
<Customers>
<Customers>
<Customer Name="Pamela Zave" ID="C001">
<Orders>
<Order Date="2012-07-04T00:00:00" ID="10248">
<Product Quantity="5" ID="10">
<Description>A Box of Cereal</Description>
<Brand>Cereal Company</Brand>
<Price>$3</Price>
</Product>
<Product Quantity="10" ID="43">
<Description>A Box of Matches</Description>
<Brand>Match Ltd</Brand>
<Price>$1.20</Price>
<Caution>Not suitable for children</Caution>
</Product>
</Order>
</Orders>
<Address>123 Main St., Augusta, GA, 30904</Address>
</Customer>
<Customer Name="Nancy Lynch" ID="C002">
<Orders>
<Order Date="2011-07-04T00:00:00" ID="10245">
<Product Quantity="3" ID="10">
<Description>A Box of Cereal</Description>
<Brand>Cereal Company</Brand>
<Price>$3</Price>
</Product>
<Product Quantity="1" ID="5">
<Description>A Cup</Description>
<Brand>Cup Company</Brand>
<Price>$2</Price>
<Material>Stoneware</Material>
</Product>
</Order>
</Orders>
<Address> Address line 5, 6, 7</Address>
</Customer>
<Customer Name="Shafi Goldwasser" ID="C003">
<Address>345 Second St., Augusta, GA, 30904</Address>
</Customer>
</Customers>
</Customers>
Try to draw the ER model that would correspond to the relational implementation of this database. Justify your choices.
Find below a mashup of actual data from the National Science Foundation (courtesy of https://www.nsf.gov/awardsearch/download.jsp):
<?xml version="1.0" encoding="UTF-8"?>
<!-- code/xml/NSFAward.xml -->
<rootTag>
<Award>
<AwardTitle>CAREER: Advances in Graph Learning and Inference</AwardTitle>
<AwardEffectiveDate>11/01/2019</AwardEffectiveDate>
<AwardExpirationDate>01/31/2023</AwardExpirationDate>
<AwardAmount>105091</AwardAmount>
<Organization>
<Code>05010000</Code>
<Directorate>
<Abbreviation>CSE</Abbreviation>
<LongName>Direct For Computer & Info Scie & Enginr</LongName>
</Directorate>
<Division>
<Abbreviation>CCF</Abbreviation>
<LongName>Division of Computing and Communication Foundations</LongName>
</Division>
</Organization>
<ProgramOfficer>
<SignBlockName>Phillip Regalia</SignBlockName>
</ProgramOfficer>
<AwardID>2005804</AwardID>
<Investigator>
<FirstName>Patrick</FirstName>
<LastName>Hopkins</LastName>
<EmailAddress>phopkins@virginia.edu</EmailAddress>
<StartDate>11/22/2019</StartDate>
<EndDate />
<RoleCode>Co-Principal Investigator</RoleCode>
</Investigator>
<Investigator>
<FirstName>Jon</FirstName>
<LastName>Ihlefeld</LastName>
<EmailAddress>jfi4n@virginia.edu</EmailAddress>
<StartDate>11/22/2019</StartDate>
<EndDate />
<RoleCode>Principal Investigator</RoleCode>
</Investigator>
<Institution>
<Name>University of Virginia Main Campus</Name>
<CityName>CHARLOTTESVILLE</CityName>
<ZipCode>229044195</ZipCode>
<PhoneNumber>4349244270</PhoneNumber>
<StreetAddress>P.O. BOX 400195</StreetAddress>
<CountryName>United States</CountryName>
<StateName>Virginia</StateName>
<StateCode>VA</StateCode>
</Institution>
</Award>
</rootTag>
It contains information about one particular award that was awarded to an institution on behalf of two researchers. Quoting the National Science Foundation (NSF):
NSF is divided into the following seven directorates that support science and engineering research and education:…. Each is headed by an assistant director and each is further subdivided into divisions like …
From this xml
file and the information given above, draw an ER diagram for NSF’s awards. Do not hesitate to comment on the choices you are making and on what justifies them.
It should be clear that three entities are present in this file: Customer, Order, and Product. An order can contain a certain quantity of a product, and a customer can pass 0 or more orders. Some attributes are natural primary keys (they are named “ID” in the diagram below), and some attributes seems to be optional (“Caution”, or “Material”), but should still be made an attribute.
Put together, this gives the following diagram:
We made further assumptions: an order cannot be empty (transcribed by the total constraint on CONTAINS), and an order does not exist if it was not passed by a customer (transcribed by the fact that ORDER is a weak entity), which also implies that an order cannot be passed by more than one customer. Note that the same product cannot be present “twice” (with the equal or different quantities) in an order: an order can contain a particular product only once in any quantity, implying that if an order had two of the product A, and three of the same product A, then those two quantities of A should be merged so that an order contains five of this product A. This is enforced by the cardinality ratio of 1
in the CONTAINS relationship.
Of course, other choices are possible.
Two entities are easy to distinguish: RESEARCHER (for “Investigator”) and INSTITUTION. The status of the the content between the <Organization>
tags is less clear; apparently, an organization has a code, and is made of two parts: a Directorate and a Division. Using the quote, we know that a Division should be a part of exactly one Directorate, and that a Directorate has an assistant director. But what is the status of that “Organization”? Is it subsumed by the Directorate or is it orthogonal? We decide to create an entity for it, but its precise role should be clarified. The relationship between Division and Directorate is clear, but, once again, the relationship between Division and Organization could have any constraint, we can not really infer that information from the document.
The next difficulty is the status of the award itself: should it be a relationship with many attributes, between the RESEARCHER and INSTITUTION entities? The issue with this approach is that an award can have multiple investigators, as shown in the example, and that this number can vary. Hence, fixing the arity and constraints on this relationship will be difficult. We could have a relation of arity 2
, and “duplicate it” if multiple researchers are involved in the same grant, but that seems like a poor choice (since all the information about the grant will need to be duplicated). Therefore, it seems more reasonable to make the award an entity.
How should we connect the AWARD entity with the RESEARCHER and INSTITUTION entities? A ternary relation has some drawbacks, since it would require some duplication when multiple investigators are working on the same award. Instead, having one binary relationship between the award and the institution, and one binary relationship between the award and the researcher (that specifies further the role of the researcher for that particular award), seems like a safer choice. An award must be awarded to at least one researcher and one institution, but we do not know if there is a maximum number of institutions that can obtain the same award, so it is better not to restrict this arity. Whether there should be a relationship between the researcher and the institution is up in the air; we do not know if a researcher has to work for an institution to get a grant, nor if getting a grant for an institution means that you work for it, so it is probably better to refrain from adding such a relationship.
Most of the attributes are straightforward once we see that “Role” is an attribute of a relationship, not of an entity.
All together, this gives the following diagram:
This feature was actually implemented by a student!.↩︎
This exam was probably a bit too long, but students managed it pretty well.↩︎
For technical reasons, underlined words cannot be searched in the document.↩︎
The term “meta-data” has numerous definition (“data about the data”): we use it here to refer to the description of the organization of the data, and not e.g. to statistical data about the data.↩︎
This is also the way this is implemented in MySQL: no part of the primary key can have for value NULL
. Cf. the “Declaring Constraints” Section.↩︎
Yes, we do need the state and the licence number to uniquely identify a driver’s licence, since many states use the same licence format.↩︎
For a clarification on the distinction between catalog and schemas, you can refer to e.g. https://stackoverflow.com/q/7022755.↩︎
Cf. https://www.postgresql.org/docs/9.2/sql-createtype.html and https://www.postgresql.org/docs/9.2/sql-createdomain.html.↩︎
The SQL
keywords are case-insensitive, but the table and schema names are sometimes case-sensitive, it depends of the actual implementation. For instance, MySQL is completely case-insensitive (reserved words, tables, attributes), MariaDB is not (the case for table names matter).↩︎
Yes, we can even add a DEFAULT
value to a PRIMARY KEY
, even if that’s of little interest. You can see an example ate code/sql/HW_Default_On_PK.sql.↩︎
The symbols $$
are often used too, and the documentation, at https://dev.mysql.com/doc/refman/8.0/en/stored-programs-defining.html, reads:
You can redefine the delimiter to a string other than
//
and the delimiter can consist of a single character or multiple characters. You should avoid the use of the backslash (\
) character because that is the escape character for MySQL.
The minus sign twice is also a poor choice, since it is used for commenting.↩︎
Yes, the package is called mysql-server
, but it actually install the package mariadb-server-10.3
or higher… So do not be confused: we are, indeed, installing MariaDB!↩︎
By default, MySQL and MariaDB only create a root user with all privileges and no password, but we added a password at the previous step.↩︎
By default, MySQL and MariaDB only create a root user with all privileges and no password, but we added a password at the previous step.↩︎
Provided the working directory is still C:\Program Files\MySQL\MySQL Server 8.0\bin
or similar. Cf. https://dev.mysql.com/doc/mysql-windows-excerpt/8.0/en/mysql-installation-windows-path.html to add the MySQL bin directory to your Windows system PATH
environment variable. For MacOS user, something like sudo sh -c 'echo /usr/local/mysql/bin > /etc/paths.d/mysql'
should do.↩︎
You can use the DATE
datatype to store a year.↩︎
Some sources call the relationships between an entity and itself “unary.” Note that with our convention, it does not make sense to speak of a unary relationship.↩︎
An alternative notation, detailled later on, will address this shortcoming.↩︎
Where the “BOOK” entity does not refer to one particular physical copy of a book, but to books in general, i.e., “The book on my shelf” (physical copy) as opposed to “The Wizard of Oz” (general).↩︎
This developement was actually asked at https://dba.stackexchange.com/q/232068/.↩︎
Cf. for instance http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/er.pdf.↩︎
The situation is similar e.g. in Python, where you have to use an API and a connector. Among Python’s connector compatible with MySQL’s API, there is PyMySQL or mysql-connector-python.↩︎
This program ows a lot to the one presented at http://www.ntu.edu.sg/home/ehchua/programming/java/jdbc_basic.html.↩︎
A French proverb, meaning that “things should be judged on the individual qualities they posses, rather than by comparing one with another.” (Manser 2007)↩︎
We actually had to have three documents: as JSON
does not really have comments (cf. https://stackoverflow.com/q/244777/), we added a document containing only the attribute "_comment"
to specify the path where that file is located.↩︎