Lecture Notes (Draft)
February 15, 2021 (05:39:57 PM)
CAMPUS
SQL
SQL
SQL
codingSQL
statements for the COFFEE databaseThese lecture notes are written in an elusive style: they are a support for the explanations that will be made at the board. Reading them before coming to the lecture will help you getting a sense of the next topic we will be discussing, but you may sometimes have trouble deciphering their … unique style.
On top of the notes, you will find in this document:
Any feedback is greatly appreciated. Please refer to https://spots.augusta.edu/caubert/db/ln/README.html#contributing for how to contribute to those notes. The syllabus is at https://spots.augusta.edu/caubert/db/, and the webpage for those notes is at https://spots.augusta.edu/caubert/db/ln/.
Please, refer to those notes using this entry (Aubert 2019):
@report{AubertCSCI3410-DatabaseSystems,
author={Aubert, Clément},
title={CSCI 3410 - Database Systems},
url={https://spots.augusta.edu/caubert/db/ln/},
urldate={2019-11-03},
year={2019},
institution={{School of Computer and Cyber Sciences, Augusta University}},
location={Augusta, Georgia, USA},
langid={en},
type={Lecture notes}
}
There are four way to access the code shared in those lecture notes:
For this latter aspect, note that some portion of code starts with a path in comment, and are followed by a link, like so:
HW_HelloWorld.sqlThis means that this code can be found at
and that you can click the link below the code directly to access it1.
The SQL
code frequently starts with
DROP SCHEMA IF EXISTS HW_NAME_OF_SCHEMA;
CREATE SCHEMA HW_NAME_OF_SCHEMA;
USE HW_NAME_OF_SCHEMA;
This parts starts by deleting the schema HW_NAME_OF_SCHEMA
if it exists, then create and use it: it allows the code to run independently of your installation. It needs to be used with care, though, since it would delete everything you have in the HW_NAME_OF_SCHEMA
schema before re-creating it, but empty.
Finally, the comments
-- start snippet something
and
-- end snippet something
can be ignored, as their are an artifice from pandoc-include-code to select which portion of the code to display in those notes.
A typical (meeting twice a week, ±17 weeks, ±30 classes) semester is divided as follows:
For information purposes, an indication like this:
marks the (usual) separation between two lectures.
To give you a sense of what you will be asked during the exams, quizzes and projects, or simply to practise, please find below the exams given previous semesters, in reverse chronological order. The quizzes are not indicated, but were generally a mix of up to five exercises and one problem from the relevant chapter(s).
Due to the Covid-19 pandemic, only one exam took place, and the final exam was taken remotely on D2L. A second project, more ambitious, was also asked from the students, and accounted for a large portion of their grade.
OR
operator, and Exercise 3.30)SQL
coding)SQL
statements for the COFFEE database)SQL
coding)SQL
statements for the COFFEE database)The source code for those notes is hosted at rocketgit, typeset in markdown, and then compiled using pandoc and multiple filters (pandoc-numbering, the citeproc library, pandoc-include-code). The drawings use various LaTeX packages, including PGF, TikZ, tikz-er2, pgf-umlcd and tikz-dependency. The help from the TeX - LaTeX Stack Exchange community greatly improved this document. The u͟n͟d͟e͟r͟l͟i͟n͟e͟3 text is obtained using YayText, the unicode symbols are searched in the “Unicode characters and corresponding LaTeX math mode commands”. Finally, the pdf
version of the document uses Linux Libertine fonts, the html
version uses Futura.
Those lecture notes were created under an Affordable Learning Georgia Mini-Grant for Ancillary Materials Creation and Revision (Proposal M71).
Those lecture notes have greatly benefited from the contributions of many students, included but not limited to Crystal Anderson, Bobby Mcmanus, Minh Nguyen and Poonam Veeral. Additionally, Daniel Gozan, Mark Holcomb, Assya Sellak, Sydney Strong and Patrick Woolard helped smash some bugs in the tools used to produce this document.
Please refer to https://spots.augusta.edu/caubert/db/ln/README.html#authors-and-contributors for a detail of the contributions.
You can find at the end of this document the list of references, and some particular resources listed at the beginning of each chapter. Let me introduce some of them:
Those resources are listed as complements, but it is not require to read them to understand the content of those notes. (Watt and Eng 2014) –being available free of charge– is more descriptive than the current notes, and as such can constitutes a great complement. Unfortunately, it lacks some technical aspects, and the database program aspect is not discussed in detail.
This work is under Creative Commons Attribution 4.0 International License or later.
Some figures and resources are borrowed from other sources, in which case it is indicated clearly.
There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.
In theory, pretty much any programming language can be used to
But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.
In those lecture notes, we will introduce one of this tool–the SQL
programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL
offered by the NoSQL approach, as well as the limitations and highlights of both models.
A database (DB) is a collection of related data.
It has two components, the data (= information, can be anything, really) and the management (= logical organization) of the data, generally through a Database Management System.
A database
A DBMS has multiple components, as follows:
Note that
A DBMS contains a general purpose software that is used to
You can think of a tool to
Exactly like a program can have
a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.
Role | Task |
---|---|
Client | Specify the business statement, the specifications |
DB Administrator | Install, configure, secure and maintain up-to-date the DBMS |
Designer | Lay out the global organization of the data |
Programmer | Implement the database, work on the programs that will interface with it |
User | Provide, search, and edit the data (usually) |
In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.
From the business statement to the usage, a project generally follows one of this path:
Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.
Let us consider the following:
STUDENT
Name | Student_number | Class | Major |
---|---|---|---|
Morgan | 18 | 2 | IT |
Bob | 17 | 1 | CS |
COURSE
Course_name | Course_number | Credit_hours | Department |
---|---|---|---|
Intro. to CS | 1301 | 4 | CS |
DB Systems | 3401 | 3 | CS |
Principles of Scripting and Automation | 2120 | 3 | AIST |
SECTION
Section_identifier | Course_num | Semster | Year | Instructor |
---|---|---|---|---|
2910 | 1301 | Fall | 2019 | Kate |
9230 | 2103 | Spring | 2020 | Todd |
GRADE_REPORT
Student_number | Section_identifier | Grade |
---|---|---|
17 | 2910 | A |
18 | 2910 | B |
PREREQUISITE
Course_number | Prerequisite_number |
---|---|
2120 | 1301 |
1302 | 1301 |
You can describe the structure as a collection of relations, and a collection of columns:
RELATIONS
Relation Name | Number of Columns |
---|---|
STUDENT | 4 |
COURSE | 4 |
SECCTION | 5 |
GRADE_REPORT | 3 |
PREREQUISITE | 2 |
COLUMNS
Column Name | Datatype | Belongs to relation |
---|---|---|
Name | String | STUDENT |
Student_number | Integer | STUDENT |
Class | String | STUDENT |
Major | String | STUDENT |
Course_name | String | COURSE |
Course_number | Integer | COURSE |
Credit_hours | Integer | COURSE |
Department | String | COURSE |
… | … | … |
Prerequisite_number | Integer | PREREQUISITE |
This organization will allow some interactions. For instance, we can obtain the answer to questions like
“What is the name of the course whose number is 1301?”
“What courses is Kate teaching this semester?”
“Does Bob meets the pre-requisite for 2910?”
Note that this last query is a bit different, as it forces us to look up information in multiple relations.
We should also be able to perform updates, removal, addition of records in an efficient way (using auxiliary files (indexes), optimization).
Finally, selection (for any operation) requires care: do we want all the records, some of them, exactly one?
Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:
In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.
There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.
What is the difference between a database and the meta-data of the database?
Is a pile of trash a database? Why, or why not?
Define the word “miniworld.”
Expand the acronym “DBMS.”
Name two DBMS.
Name the four different kinds of action that can be performed on data.
Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:
Task | Assigned to |
---|---|
Install a DBMS on a server. | |
Sketch the schema so that the data will not be redundant. | |
Write client-side application that uses the DBMS API. | |
Establish the purpose of the database. |
List some of the tasks assigned to the Database Administrator.
Why do DBMS include concurrency control?
Do I have to change my DBMS if I want to change the structure of my data?
What is independence between program and data? Why does it matter?
Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?
Which one comes first, the physical design, the conceptual design, or the logical design?
What is a virtual data? How can I access it?
The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.
No, because it lacks a logical structure.
The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.
Database Management System
Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL,” “NoSQL,” “Relational Model,” or such: we are asking for the names of actual softwares!
The four actions are:
We can have something like:
Task | Assigned to |
---|---|
Install a DBMS on a server. | Administrator, IT service |
Sketch the schema so that the data will not be redundant. | Designer |
Write client-side application that uses the DBMS API. | Programmer, Developer |
Establish the purpose of the database. | Client, business owner |
The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.
DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.
Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model,” e.g., one of the NoSQL models, can require to change the DBMS.
The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.
If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.
The conceptual design.
It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.
CAMPUS
)Define a CAMPUS
database organized into three files as follows:
BUILDING
file storing the name and GPS coordinates of each building.ROOM
file storing the building, number and floor of each room.PROF
file storing the name, phone number, email and room number where the office is located for each professor.A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, Figure 1.3) or (Elmasri and Navathe 2015, Figure 1.3). Write the database catalog corresponding to the CAMPUS
database.
Invent data for such a database, with two buildings, three rooms and two professors.
Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS
database, which is an up-to-date and accurate representation of its miniworld:
CAMPUS
)The database catalog should be similar to the following:
RELATIONS
Relation name | Number of columns |
---|---|
BUILDING |
3 |
ROOM |
3 |
PROF |
4 |
COLUMNS
Column name | Datatype | Belongs to relation |
---|---|---|
Building_Name | Character(30) | Building |
GPSLat | Decimal(9,6) | Building |
GPSLon | Decimal(9,6) | Building |
Building_Name | Character(30) | ROOM |
Room_Number | Integer(1) | ROOM |
Floor | Integer (1) | ROOM |
Prof_Name | Character (30) | PROF |
Phone | Integer (10) | PROF |
Character (30) | PROF |
|
Room_Number | Integer (1) | PROF |
For the data, you could have:
BUILDING
file, we could have:
(Allgood Hall, 33.47520, -82.02503) (Institut Galilé, 48.959001, 2.339999)
ROOM
file, we could have:
(Allgood Hall, 128, 1)
(Institut Galilé, 205, 3) (Allgood Hall, 228, 2)
PROF
file, we could have:
(Aubert, 839401, dae@ipn.net, 128) (Mazza, 938130, Dm@fai.net, 205)
If everything we knew about the campus came from that database, then
The relational data model (or relational database schema) is:
List_of_major
as an enumerated data type, instead of just String
), enforce some constraints (i.e., UNIQUE
, to force all the values to be different), or even have a default value.NULL
value.NULL
is N/A, unknown, unavailable (or withheld).We now study constraints on the tuples. There are constraints on the scheme, for instance, “a relation cannot have two attributes with the same name,” but we studied those already. The goal of those constraints is to maintain the validity of the relations, and to enforce particular connexions between relations.
Those are part of the definition of the relational model and are independent of the particular relation we are looking at.
Those constraints are parts of the schema.
NOT NULL
, UNIQUE
).NULL
5.Those last two constraints will be studied in the next section.
Constraints that cannot be expressed in the schema, and hence must be enforced by
Examples: “the age of an employee must be greater than 16,” “this year’s salary increase must be more than last year’s.”
Since we can not have two identical tuples in the same relation, there must be a subset of values that distinguish them. We study the corresponding subset of attributes.
Let us consider the following example:
A | B | C | D |
---|---|---|---|
Yellow | Square | 10 | (5, 3) |
Blue | Rectangle | 10 | (3, 9) |
Blue | Circle | 9 | (4, 6) |
and the following sets of attributes:
{A, B, C, D} | {A} | {B, C} | {D} | |
---|---|---|---|---|
Superkey ? | ✔ | ✘ | ✔ | ✔ |
Key ? | ✘ | ✘ | ✘ | ✔ |
Note that here we “retro-fit” those definitions, in database design, they come first (i.e., you define what attributes should always distinguish between tuples before populating your database). We are making the assumption that the data pre-exist to the specification to make the concept clearer.
A foreign key (FK) is a set of attributes whose values must match the value in a tuple in another, pre-defined relation. Formally, the set of attributes FK in the relation schema R1 is a foreign key of R1 (“referencing relation”) that references R2 (“referenced relation”) if
NULL
If there is a foreign key from R1 to R2, then we say that there is a referential integrity constraint from R1 to R2. We draw it with an arrow from the FK to the PK. Note that it is possible that R1 = R2.
NULL
. Note also that all the values must be different, as the same value cannot occur twice as the primary key of tuples: we don’t want to enter the same VIN twice, that would mean we are registering a car that was already registered in our database!NULL
. Furthermore, their pair must be different from all the other values. Stated differently, you can have <GA, 1234>
, <GA, 0000>
and <NC, 1234>
as values for the <State, Licence-num> pair, even if they have one element in common, what is forbidden is to have both element in common (i.e., you cannot have <GA, 1234>
twice). If both elements were common, that would mean that we are registering a driver that was already in the database.NULL
(which it could be), then it has to be a value that occurs as the VIN value of some tuple in the CAR relation. For the Insured-Driver-State and Insured-Driver-Licence-Num, the situation is similar: they must either both be NULL
, or be values that occurs paired together as the values for State and Licence-Num in a tuple in the CAR relationship. If e.g. Insured-Car was containing the VIN of a car not in the CAR relation, that would mean we are trying to insure a car that is “not known” from the database’s perspective, something we certainly want to avoid.The operations you can perform on your data are of two kinds: retrievals and updates.
They are two constraints for updates:
A transaction is a series of retrievals and updates performed by an application program, that leaves the database in a consistent state.
In the following, we give examples of insertion, deletion and update that could be performed, as well as how they could lead a database to become inconsistent. The annotations (1.), (2.) and (3.) refer to the “remedies,” discussed afterward.
Insert <109920, Honda, Accord, 2012> into CAR
How things can go wrong:
NULL
for any value of the attributes of the primary key (1.)Delete the DRIVER tuple with State = GA and Licence_number = 123
How things can go wrong:
Update Name of tuple in DRIVER where State = GA and Licence_number = 123 to Georges
How things can go wrong:
NULL
for the any value of the attributes of the primary key (1.)When the operation leads the database to become inconsistent, you can either:
NULL
, the corresponding value(s).What are the meta-data and the data called in the relational model?
Connect the dots:
Row • | • Attribute | |
Column header • | • Tuple | |
Table • | • Relation |
What do we call the number of attributes in a relation?
At the logical level, does the order of the tuples in a relation matter?
What is the difference between a database schema and a database state?
What should we put as a value in an attribute if its value is unknown?
What, if any, is the difference between a superkey, a key, and a primary key?
Name the two kinds of integrity that must be respected by the tuples in a relation.
What is entity integrity? Why is it useful?
Are we violating an integrity constraint if we try to set the value of an attribute that is part of a primary key to NULL
? If yes, which one?
If in a relation R1, an attribute A1 is a foreign key referencing an attribute A2 in a relation R2, what does this implies about A2?
Give three examples of operations.
What is the difference between an operation and a transaction?
Consider the following two relations:
COMPUTER(Owner, RAM, Year, Brand)
OS(Name, Version, Architecture)
For each, give
Give three different ways to deal with operations whose execution in isolation would result in the violation of one of the constraint.
Define what is the domain constraint.
Circle the correct statements:
Consider the following three relations:
For each relation, answer the following:
Consider the following three relations
What are the foreign keys in the ASSIGNED-TO relation? What are they refering?
In the ASSIGNED-TO relation, explain why the Date attribute is part of the primary key. What would happen if it was not?
Assuming the database is empty, are the following instructions valid? If not, what integrity constraint are they violating?
Insert <'AM-356', 'Surfliner', 2012> into TRAIN
Insert <NULL, 'Graham Palmer', 'Senior'> into CONDUCTOR
Insert <'XB-124', 'GPalmer', '02/04/2018'> into ASSIGNED-TO
Insert <'BTed, 'Bobby Ted', 'Senior'> and <'BTed', 'Bobby Ted Jr.', 'Junior'> into CONDUCTOR
Consider the following relation schema and state:
A | B | C | D |
---|---|---|---|
2 | Blue | Austin | true |
1 | Yellow | Paris | true |
1 | Purple | Pisa | false |
2 | Yellow | Augusta | true |
Assuming that this is all the data we will ever have, discuss whenever {A, B, C, D}, {A, B} and {B} are superkeys and/or keys.
Consider the following relation and possible state. Assuming that this is all the data we will ever have, give two superkeys, and one key, for this relation.
A | B | C | D |
---|---|---|---|
1 | Austin | true |
Shelly |
1 | Paris | true |
Cheryl |
3 | Pisa | false |
Sheila |
1 | Augusta | true |
Ash |
1 | Pisa | true |
Linda |
Consider the following relation and possible state. Assuming that this is all the data we will ever have, give three superkeys for this relation, and, for each of them, indicate if they are a key as well.
A | B | C | D |
---|---|---|---|
1 | A | Austin | true |
2 | B | Paris | true |
1 | C | Pisa | false |
2 | C | Augusta | true |
1 | B | Augusta | true |
Consider the following two relations:
INSERT
and one UPDATE
instruction. Both should violate the integrity of your database.Consider the following two relations:
The meta-data is called the schema, and the data is called the relation state. You can refer to the diagram we studied at the beginnig of the Chapter for a reminder.
Row is Tuple, Column header is Attribute, Table is Relation.
The degree, or arity, of the relation.
No, it is a set.
The schema is the organization of the database (the meta-data), while the state is the state is the content of the database (the data).
NULL
A superkey is a subset of attributes such that no two tuples have the same combination of values for all those attributes. A key is a minimal superkey, i.e., a superkey from which we cannot remove any attribute without losing the uniqueness constraint. The primary key is one of the candidate key, i.e., the key that was chosen.
Referential integrity and entity integrity.
Entity integrity ensures that each row of a table has a unique and non-null primary key value. It allows to make sure that every tuple is different from the others, and helps to “pick” elements in the database.
Yes, the entity integrity constraint.
Then we know that A2 is the primary key of R2, and that A1 and A2 have the same domain.
Reading from the database, performing UPDATE
or DELETE
operations.
An operation is an “atomic action” that can be performed on the database (adding an element, updating a value, removing an element, etc.). A transaction is a series of such operations, and the assumption is that, even if it can be made of operations that, taken individually, could violate a constraint, the overall transaction will leave the database in a consistent state.
An operation whose execution in isolation would result in the violation of a constraint can either a) be “restricted” (i.e., not executed), b) result in a propagation (i.e., the tuples that would violate a constraint are updated or deleted accordingly), or c) result in some values in tuples that would violate a constraint to be set to a default value, or the NULL
value (this last option works only if the constraint violated is the referential entity constraint).
The requirement that each tuple must have for an attribute A an atomic value from the domain dom(A), or NULL
.
“Every key is a superkey.” “Every primary key is a key.” and “Every superkey with one element is a key.” are correct statements.
To answer 1 and 2, the diagram would become:
For the last question, the answer is yes: based on the ISSN of the book, we can retrieve the author of the book. Hence, knowing which book was awarded which year, by looking in the GAINED-AWARD table, gives us the answer to that question.
NULL
can be given as a value to an attribute that is part of the PK.'XB-124
and 'GPalmer'
are not values in TRAIN.Ref
and CONDUCTOR.CompanyID
.INSERT <"A.H.", NULL>
would violate the requirement not to have two tuples with the same value for the attributes that constitute the primary key in the BUILDING relation. UPDATE ROOM with CODE = 12 to Building = "G.C.C."
would create an entry referencing a name in the BUILDING relation that does not exist.
Consider the relation:
CLASS(Course_Number, Univ_Section_Number, Instructor_Name, Semester, Building_Code, Room_Number, Time, Weekdays, Credit_Hours)
Here are some examples of values for the attributes:
Attribute | Possible Value |
---|---|
Course_Number | CSCI3410, CSCI1302 |
Building_Code | AH, UH, ECC |
Univ_Section_Number | 1, 2, 3 |
Room_Number | E127, N118 |
Instructor_Name | John Smith, Sophie Adams |
Time | 1400, 1230, 0900 |
Semester | Spring 2015, Fall 2010, Summer 2012 |
Weekdays | M, MW, MWF, T, TH |
Credit_Hours | 1, 2, 3, 4 |
A cinema company wants you to design a relational model for the following set-up:
Propose a relational model for the following situation:
Propose a relational model for the following situation:
We want to design a relational model for an auction website. Members (that can be buyers, sellers, both or neither) can participate in the sale of items.
When creating your schema, do not add any new information, and try as much as possible to avoid relations that will create redundant data and NULL
entries. Note that we should be able to uniquely determine the member account linked to the seller account, and similarly for buyers accounts. Furthermore, members can have at most one buyer and one seller account.
We want to design a relational model for an animal shelter, with three goals in mind: to keep track of the pets currently sheltered, of the veterinarian for each type of pet, and of each pet’s favorite toy (needed during a visit to the veterinarian!).
Follow the specification below:
When creating your schema (that you can draw at the back of previous page), do not add any new information (except possibly “id” attributes), and try as much as possible to avoid relations that will create redundant data and NULL
entries. Identify the primary key for each relation that you create. When you are done, answer the true / false question below.
With your model … | Yes | No |
---|---|---|
…it is possible to determine which pet don’t have a favorite toy. | ||
…it is possible to determine what is the average stay in the shelter. | ||
…it is possible to determine if a pet’s favorite toy is best suited for their type. | ||
…it is possible for multiple types of animal to have the same veterinarian. | ||
…it is possible for multiple veterinarians to be attributed to the same type. |
A possible solution is:
Be careful: saying that a bill has a unique sponsor does not imply that a the sponsor is a good primary key for the bills: a house member could very well be the sponsor of multiple bills! It just implies that a single attribute is enough to hold the name of the sponsor.
For simplicity, we added an ID
to our MEMBER
and BILL
relations. Note that having a “role” in the MEMBER
relation to store the information about speaker, etc., would be extremely inefficient, since we would add an attribute to the ~435 members that would be NULL
in ~430 of them.
A possible solution follows. The part that is the hardest to accomodate is the fact that a course can have multiple codes. We are reading here “cross-listed” as “a course that is offered under more than one departmental heading and can receive different codes (e.g., CSCI XXXX and AIST YYYY).”
A possible solution follows.
In this model,
…it is possible to determine which pet don’t have a favorite toy.
…it is not possible to determine what is the average stay in the shelter, because their exit date is not stored.
…it is possible to determine if a pet’s favorite toy is best suited for their type.
…it is possible for multiple types of animal to have the same veterinarian, as the same value for “Veterinarian” could occur in multiple tuples in the TYPE relation. If both “Veterinarian” and “Name” were parts of the primary key, then that would not be the case.
…it is not possible for multiple veterinarians to be attributed to the same type, as the name of the type is the primary key in the TYPE relation.
SQL
, but none of its implementation.This chapter will be “code-driven”: the code will illustrate and help you understand some concepts. You may want to have a look at the “Setting Up Your Work Environment” Section as early as possible in this lecture. On top of being a step-by-step guide to install and configure a relational database managment system, it contains a list of useful links.
“Common” / Relational | SQL |
---|---|
“Set of databases” | Catalog (named collection of schema)7 |
“Database” | Schema |
Relation | Table |
Tuple | Row |
Attribute | Column, or Field |
A schema is made of
Type and domains are two different things in some implementations, cf. for instance PostgreSQL, where a domain is defined to be essentially a datatype with constraint.8
SQL
is a programming language: it has a strict syntax, sometimes cryptic error messages, it evolves, etc. Some of its salient aspects are:
SQL
is “kind of” case-insensitive9, does not care about spaces and new lines.--
, multi-line comments uses /* …*/
.;
.The following is an adaptation of w3resource.com, the canonical source being MySQL’s documentation:
INTEGER
(or its short-hand notation INT
) or SMALLINT
.FLOAT
and DOUBLE
(or its synonym, REAL
). MySQL also allows the syntax FLOAT(M,D)
or REAL(M,D)
, where the values can be stored up to M
digits in total where D
represents the decimal point.DECIMAL(10, 2)
(or its synonym in MySQL NUMERIC
).CHAR
and VARCHAR
: the length (resp. maximal length) of the CHAR
(resp. VARCHAR
) has to be declared, and CHAR
are right-padded with spaces to the specified length. Historically, 255 was the size used, because it is the largest number of characters that can be counted with an 8-bit number, but, whenever possible, the “right size” should be used.BIT(1)
, and a boolean using BOOLEAN
(or BOOL
, both actually being aliases for TINYINT(1)
).DATE
, TIME
, DATETIME
and TIMESTAMP
(which convert the current day / time to from the current time zone to UTC).There are many other datatypes, but they really depends on the particular implementation, so we will not consider them too much.
/* code/sql/HW_Faculty.sql */
-- We first drop the schema if it already exists:
DROP SCHEMA IF EXISTS HW_Faculty;
-- Then we create the schema:
CREATE SCHEMA HW_Faculty;
/*
Or we could have use the syntax:
CREATE DATABASE HW_FACUTLY;
*/
-- Now, let us create a table in it:
CREATE TABLE HW_Faculty.PROF (
Fname VARCHAR(15),
/*
No String!
The value "15" vas picked randomly, any value below 255 would
more or less do the same. Note that declaring extremely large
values without using them can impact the performance of
your database, cf. for instance https://dba.stackexchange.com/a/162117/
*/
Room INT,
/*
shorthand for INTEGER, are also available: SMALLINT, FLOAT, REAL, DEC
The "REAL" datatype is like the "DOUBLE" datatype of C# (they are actually synonyms in SQL):
more precise than the "FLOAT" datatype, but not as exact as the "NUMERIC" datatype.
cf. https://dev.mysql.com/doc/refman/8.0/en/numeric-types.html
*/
Title CHAR(3),
-- fixed-length string, padded with blanks if needed
Tenured BIT(1),
Nice BOOLEAN,
-- True / False (= 0) / Unknown
Hiring DATE,
/*
The DATE is always supposed to be entered in a YEAR/MONTH/DAY variation.
To tune the way it will be displayed, you can use the "DATE_FORMAT" function
(cf. https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format),
but you can enter those values only using the "standard" literals
(cf. https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html )
*/
Last_seen TIME,
FavoriteFruit ENUM ('apple', 'orange', 'pear'),
PRIMARY KEY (Fname, Hiring)
);
/*
Or, instead of using the fully qualified name HW_Faculty.PROF,
we could have done:
USE HW_Faculty;
CREATE TABLE PROF(…)
*/
-- Let us use this schema, from now on.
USE HW_Faculty;
-- Let us insert some "Dummy" value in our table:
INSERT INTO PROF
VALUES (
"Clément", -- Or 'Clément'.
290,
'PhD',
0,
NULL,
'19940101', -- Or '940101', '1994-01-01', '94/01/01'
'090500', -- Or '09:05:00', '9:05:0', '9:5:0', '090500'
-- Note also the existence of DATETIME, with 'YYYY-MM-DD
-- HH:MM:SS'
'Apple' -- This is not case-sensitive, oddly enough.
);
The following commands are particularly useful. They allow you to get a sense of the current state of your databases.
In the following, <SchemaName>
should be substituted with an actual schema name.
SHOW SCHEMAS; -- List the schemas.
SHOW TABLES; -- List the tables in a schema.
DROP SCHEMA <SchemaName>; -- "Drop" (erase) SchemaName.
You can also use the variation
DROP SCHEMA IF EXISTS <SchemaName>;
that will not issue an error if <SchemaName>
does not exist.
In the following, <TableName>
should be substituted with an actual table name.
SHOW CREATE TABLE <TableName>-- Gives the command to "re-construct" TableName.
DESCRIBE <TableName>; -- Show the structure of TableName.
DROP TABLE <TableName>; -- "Drop" (erase) TableName.
You can also use the variation
DROP TABLE IF EXISTS <TableName>;
that will not issue an error if <TableName>
does not exist.
SELECT * FROM <TableName> -- List all the rows in TableName.
SHOW WARNINGS; -- Show the content of the latest warning issued.
There are six different kind of constraints that one can add to an attribute:
NOT NULL
UNIQUE
DEFAULT
CHECK
We already know the first two from the relational model. The other four are new, and could not be described in this model.
We will review them below, and show how they can be specified at the time the table is declared, or added and removed later. For more in-depth examples, you can refer to https://www.w3resource.com/mysql/creating-table-advance/constraint.php.
We will now see how to declare those constraints when we create the table (except for the foreign key, which we save for later).
/* code/sql/HW_ConstraintsPart1.sql */
DROP SCHEMA IF EXISTS HW_ConstraintsPart1;
CREATE SCHEMA HW_ConstraintsPart1;
USE HW_ConstraintsPart1;
CREATE TABLE HURRICANE (
Name VARCHAR(25) PRIMARY KEY,
WindSpeed INT DEFAULT 76 CHECK (WindSpeed > 74 AND
WindSpeed < 500),
-- 75mph is the minimum to be considered as a hurricane
-- cf. https://www.hwn.org/resources/bws.html
Above VARCHAR(25)
);
CREATE TABLE STATE (
Name VARCHAR(25) UNIQUE,
Postal_abbr CHAR(2) NOT NULL
);
If we wanted to combine multiple co