Overview of SQL

Click here for the previous lecture (overview of Relational databases).

SQL datatypes

The most common SQL datatypes are as follows:

Type Example Used for
INTEGER 7502 Integer numbers
SMALLINT 7502 Also integer numbers, but takes up less memory on some implementations
NUMERIC (p, s) 7502.098 numbers with a decimal point. p is the precision (total number of digits), s is the scale (number of digits after the decimal point)
DECIMAL (p,s) 7502.098 same as NUMERIC, but does not reject numbers with larger precision if permitted by the implementation
REAL or DOUBLE PRECISION 3.33E+6 (=3,330,000) numbers stored in scientific notation
FLOAT(p) 3.33E+6 (=3,330,000) also numbers in scientific notation, but the user gets to specify precision. Better to use than REAL or DOUBLE PRECISION, since it makes the database more portable.
CHAR (x) or CHARACTER(x) 'John Smith' used for strings. x is the length of the string. Shorter strings are padded with blank spaces.
VARCHAR (x) 'John Smith' used for strings. x is the length of the string. Shorter strings are not padded.
DATE '2002-02-16' dates
TIME (x) '14:34:57.75' = 57.75 seconds after 2:34pm time. x specified the number of fractional digits. The example is in format TIME(2) The highest value for seconds is 60.
TIMESTAMP(x) '2002-02-16 14:34:57' date and time. x specifies the number of fractional digits in time. The example is in format TIMESTAMP(0)

Components of SQL

SQL consists of the following:
  1. Data Definition Language (DDL) allows to create, change, and delete tables, views, schemas, constraints, and so on. The main commands are CREATE, ALTER, and DROP.
  2. Data Manipulation Language (DML) allows to insert, retrieve, change, and delete data in the database tables. The main commands are INSERT, SELECT, UPDATE, and DELETE.
  3. Data Control Language (DCL) controls access to the database and provides means to maintain database integrity. The commands COMMIT and ROLLBACK are used to control transaction, GRANT and REVOKE control user's privileges.

Creating, altering, and deleting tables

Consider a table Students with the following columns:

ID First_Name Last_Name Year GPA
This table can be create by the following SQL CREATE command:

CREATE TABLE Students (
ID INTEGER PRIMARY KEY,
First_Name CHAR(20),
Last_Name CHAR(20) NOT NULL,
Year CHAR(2)
GPA NUMERIC(5,3)
);

The column ID is identified as the primary key, which means that this column cannot be NULL and all entries in the column must be distinct. A weaker constraint NOT NULL for Last_Name only requires that the entry is present in each row, but the entries may have repetitions. The database automatically checks that both requirements are satisfied for all data that we enter. Note that we have chosen Year field to be a string rather than an integer, since we would like to store Year (of graduation) in the form '03', instead of an integer 3.

If we want to add a column Date_of_birth, we can do it using the following command:

ALTER TABLE Students ADD COLUMN Date_of_birth DATE;

Similarly we can remove a column:

ALTER TABLE Students DROP COLUMN Date_of_birth;

Finally, if we don't need the entire table, we can get rid of it by typing

DROP TABLE Students;

Assume that we have the table Student as above and a table Courses given below. The primary key in this table is ID.

ID Course_name Semester Instructor Location

Given these tables, we define the table Enrollment which contains information about students taking courses and the student's grades. The constraints defined in the table guarantee that all students and all courses are present in their respective tables. We also define domain for possible values of grades.

CREATE DOMAIN Grade_dom CHAR(1) CHECK (Grade IN ('A', 'B', 'C', 'D', 'F'));

CREATE TABLE Enrollment (
Record_ID INTEGER PRIMARY KEY,
Student_ID INTEGER NOT NULL
Course_ID INTEGER NOT NULL
Grade CHAR(1)
CONSTRAINT Student_cons FOREIGN KEY (Student_ID)
REFERENCES Students(ID) ON DELETE CASCADE,

CONSTRAINT Course_cons FOREIGN KEY (Course_ID)
REFERENCES Courses(ID) ON DELETE CASCADE

);

The clause "ON DELETE CASCADE" guarantees that when the record for the student or the course is deleted from the database, all corresponding entries in enrollment will be deleted as well. Alternatively, you can specify "ON DELETE SET NULL" (if the column is not restricted to non-null data). If you don't specify anything, then the entry will remain even if the student or the course record has been deleted.

Inserting data into tables

INSERT statement creates a new row in a table. The following statement inserts information about a new student into the Students table:

INSERT INTO Students
VALUES (333, 'Ann', 'Smith', '02', 3.5, '1982-10-09');

We may also insert values for only some of the columns:

INSERT INTO Students (ID, First_name, Last_name, Date_of_birth)
VALUES (444, 'Mary', 'Scott', '1984-07-08');

Of course, we must provide values for the PRIMARY KEY and for all columns declared NOT NULL.

Updating and deleting data

UPDATE statement changes information in an already existing row. For instance:

UPDATE Students
SET Year = '04', GPA = '3.8'
WHERE First_Name = 'Mary' AND Last_Name = 'Scott';

As the result, the columns Year and GPA of Mary Scott would be set to the new values. If there were previous values, they will be overwritten.

One has to be careful when specifying the WHERE clause of an UPDATE statement. For instance, if there are several students whose name is Mary Scott, then all such records will be updated! It's safer to rewrite the statement as

UPDATE Students
SET Year = '04', GPA = '3.8'
WHERE ID = 444;

Since IDs are guaranteed to be unique, only one record will be updated. Of course, you have to know the ID to be able to write your query this way!

DELETE statement deletes a row (or rows) in a table. It also has a WHERE clause and affects all rows that satisfy the clause. For instance, the following statement deletes records of all student who have graduated in 1940:

DELETE FROM Students
WHERE Year = '40';

This example suggests that our choice of meaning and representation of Year probably was not very wise. You can think of better options.

Retrieving data

The main purpose of storing data in a database is to be able to retrieve the information when it is needed. This is accomplished by SELECT statement. SELECT statement returns data formatted as a table or a single value, such as an integer or a string.

For these example let's assume that the tables in the database have the following information:

Students

ID First_Name Last_Name Year GPA Date_of_birth
333 Ann Smith 02 3.5 1982-10-09
444 Mary Scott 04 3.8 1984-07-08
555 Alexandra Brown 04 3.7 1984-10-15

Courses

ID Course_name Semester Instructor Location
1115 CS111 F01 781 E101
1122 CS256 F01 899  
1117 CS111 S02 878 E101

Enrollment

Record_ID Student_ID Course_ID Grade
112233 555 1117  
112234 555 1122 B
112244 444 1115 A
112250 444 1122 A
113355 333 1122 B

The simplest SELECT statement selects all information from a table:

SELECT * FROM Students;

will return the entire table Student. If we are only interested in a list of students' first and last names (say, sorted alphabetically by last name), we get the result by entering the following query:

SELECT First_Name, Last_Name FROM Sudents ORDERED BY Last_Name;

The result will be a table

Alexandra Brown
Mary Scott
Ann Smith

We can also return only entries that satisfy some condition. For instance,

SELECT * FROM Students WHERE Year = '04';

will return the table

444 Mary Scott 04 3.8 1984-07-08
555 Alexandra Brown 04 3.7 1984-10-15

We can also specify more complex conditions. For instance, the following query will find all students whose first name starts with an A:

SELECT * FROM Students WHERE First_Name LIKE 'A%';

% is a wildcard that matches any string, including empty string. Underscore _ matches exactly one character. For instance, '%a%r_' matches 'Mary' and 'Alexandra'. Please keep in mind that in strings upper case characters are different from lower case ones.

You can write various conditional expressions for numbers. For instance,

SELECT * FROM Students WHERE GPA > 3.6 AND GPA <= 3.8; will select whose GPA is greater than 3.6 but not less than 3.8. You can form any combinations of conditions by using OR and NOT. You can also rewrite this condition as "WHERE GPA BETWEEN 3.6 AND 3.8". BETWEEN is also applicable to strings, the comparison is alphabetical. You can also specify a list of values, using IN:

SELECT * FROM Students WHERE Year IN ('02', '03');

DATE, TIME, and TIMESTAMP datatypes can be broken down into components (year, month, etc.) using EXTRACT:

SELECT * FROM Students WHERE (EXTRACT (MONTH FROM Date_of_birth)) = 10;

selects all students born in October.

You may also use functions to compute maximum, average, and so on. For instance,

SELECT AVG(GPA) FROM Students;

will compute the average GPA of all students.

SELECT COUNT(*) FROM Courses WHERE Semester LIKE '%01';

counts the number of courses offered in 2001. There are also functions MAX(), MIN(), and SUM() that you might find helpful. Another useful thing to know is that you can check if the value of a column is NULL. For instance, you can choose rows based on the criteria "WHERE Grade IS NULL" or "WHERE grade IS NOT NULL".

Tons of other information about SQL and a lot of examples, is available here.

Selecting from multiple tables. Now we will finally see why this kind of databases is called "relational": we will be selecting data from different tables which is related in a certain why. For instance, suppose we want to find all the students who took CS111 in Fall 2001. Here is how:

SELECT Students.Last_Name, Students.First_Name FROM Students, Enrollment
WHERE Enrollment.Course_ID = 1115 AND Students.ID = Enrollment.Student_ID;

What happens when this query gets processed? A large table gets created which includes all columns of the tables Students and Enrollment. It is filled with all possible combinations of rows of the first and the second table. We say that the two tables are joined. For instance, one row of this large table will contain all the information about CS111 of the spring 2002 semester and all information about Ann Smith. After the large table is constructed, the condition in the WHERE clause is checked for each row. As the result, the following table is returned:

Mary Scott

OK, but what if we don't know that the ID for the course is 1115? Then we need to join three tables:

SELECT Students.Last_Name, Students.First_Name FROM Students, Enrollment, Courses
WHERE Courses.Course_Name = 'CS111' AND Courses.Semester = 'F01'
      AND Courses.ID = Enrollment.Course_ID AND Students.ID = Enrollment.Student_ID;

Whenever unambiguous, you can omit the table name and specify just the column name. In our case, we can simplify the query as

SELECT Last_Name, First_Name FROM Students, Enrollment, Courses
WHERE Course_Name = 'CS111' AND Semester = 'F01'
      AND Courses.ID = Course_ID AND Students.ID = Student_ID;

However, you have to be careful when simplifying queries, because adding another table into the query may create an ambiguity. For instance, what if we want to join these tables with a table Instructors which also has columns Last_Name and First_Name?

You can specify DISTINCT keyword to eliminate duplicate rows in the resulting table. For instance,

SELECT DISTINCT Course_name FROM Courses, Enrollment
WHERE Enrollment.Course_ID = Courses.ID;

will print out the list of names of all courses for which students have ever been registered. Note that it's different from just selecting distinct course names from Courses, because there may be courses that are listed in the table of courses, but have never been offered.

Avoiding table anomalities

Several things may go wrong if the database design does not follow ceratin rules. We have already formulated some rules, s.a. columns in a table have unique names, all rows are different, and so on (see the lecture on Relational Model). A table that satisfies these rules is said to be in the first normal form (1NF). However, these rules are not sufficient to prevent problems with updates and deletes. For instance, suppose a store wants to keep track of customer's purchases and product prices. The prices on products are fixed, i.e. they don't depend on a customer. They create a table to record each purchase:

Purchase

Customer_ID Product Price
101 Bread $1.35
111 Milk $1.65
101 Ice cream $3.99
111 Ice cream $3.99

It's clear that the primary key for this table must be a combination of the customer ID and the product (none of these columns by itself may be used to identify rows). But the price of the product depends only on the product, not on the customer! This causes problems: firstly, we cannot record the price of a product until someone actually buys it, and secondly, we record the price of the product with every purchase, which is just a lot of repetitive information!

The reason for the problem is that the table is not in second normal form (2NF): a table is in 2NF if each attribute depends on all components of the primary key.

To remedy the problem, we need to break up this table into two: one for products and prices, the other for purchases.

Suppose we change the table in the example to allow only one entry for each customer. A resulting table may look like this:

Purchase

Customer_ID Product Price
101 Bread $1.35
111 Milk $1.65

While Customer_ID can be used as the primary key now, and the table is in 2NF, there is still a problem: suppose we want to delete the first row because the customer wants to return the bread back to the store. But if this row is deleted, the price of the bread is deleted as well! The table is not in the third normal form (3NF): A table is said to have transitive dependencies if one attribute depends on another attribute, which in turn depends on a third attribute. A table is in 3NF if it is in 2NF and doesn't have transitive dependencies. In our example the price does not depend on the primary key (Customer_ID) directly, but only through the product.

The way to fix the problem is, again, to break the table into two: one for products, and one for purchases.

There are other, more complex rules that prevent less trivial anomalities. However, the rule of thumb in designing a database is to have one table for each kind of entities that you want to describe. In our case, products and purchases are different things, and information about them should not be combined in the same table.

Privileges and security (a brief overview)

Every database has a database administrator. The database administrator is the person who can do anything in the database: create and remove tables, insert, update, and delete data from any table, retrieve data from any table, and so on. Everyone else who uses the database by default is the owner of any object (s.a. a table) they have created and all data in these objects, but cannot change, or even view, data in other tables. However, owners of objects can give privileges to these objects to other users. In addition, the database administrator can give privileges to other users. In addition, the person who gave privileges to other users may revoke these privileges. Privileges are granted for various kinds of activities. For instance, privileges for a table can be given for any combination of the following: SELECT, DELETE, INSERT, UPDATE. The latter two also allow you to specify particular columns. For instance, the statement

GRANT INSERT (ID, First_Name, Last_Name, Year, Date_of_birth)
ON Students TO Registrar;

allows the registrar to enter information about a new student into the Student table, but does not give privilagies to set the student's GPA.

You can achieve the same effect by granting Registrar INSERT privileges to the entire table and then revoking the right to insert data into GPA:

GRANT INSERT ON Students TO Registrar;
REVOKE INSERT (GPA) ON Students FROM Registrar;

Another important category of privilegies on a table is the ability of a user to reference the table, i.e. use its primary key as a foreign key in his/her own table. Consider the following situation: a company has created a table ToLayOff where it keeps dataabout employees who are going to be laid off. User Cathy does not have SELECT privilegies for the table, so she cannot check who is on the list. However, there is a round-about way to find out! Cathy finds out that the primary key in the table ToLayOff is Emp_ID and creates her own table which references ToLayOff:

CREATE TABLE Snoop (Emp_ID INTEGER REFERENCES ToLayOff);

Now she can try to insert every possible employee ID into her table. Those which she would be able to insert are on the layoff list!

To control user's ability to reference a table (i.e. to create tables which reference it), one may explicitly grant or revoke REFERENCES privileges. By default, no one can reference other owner's table. If you want to let other users reference your table, you must issue statements like this:

GRANT REFERENCES ON ToLayOff TO Personnel_manager;

If you want an object to be accessible to everyone, you may grant privileges on this object to PUBLIC. If you want to grant all of the above privileges, use GRANT ALL. You may redelegate rights to give privileges to other users:

GRANT SELECT ON Students TO Registrar
WITH GRANT OPTION;

You may revoke privileges or just the grant options from other users. For instance, the following statement revokes only the grant priveleges, but not the ability of the user himself to use the table:

REVOKE GRANT OPTION FOR INSERT, DELETE, UPDATE
ON Students FROM Assistant_registrar CASCADE;

The optional CASCADE keyword causes all INSERT, DELETE, UPDATE privileges granted by Assistant_registrar to other users to be revoked. If you omit the CASCADE keyword, then users who got their privileges on Students from Assistant_registrar retain their privileges.

Transactions

Transactions allow control over database data during concurrent access. For instance, suppose a company is filling a customer's order for several products. The order can be shipped only if all products are available in necessary quantities.

A reasonable way of handling this is to go through the list of products, checking if the quantities are available, and if yes, decrement the product counter. When the first unavailable product is encountered, the transaction is rolled back, and all the quantities are restored to their initial values. All this works fine if there is only one person performing these actions at a time.

However, many things can go wrong if two or more users are performing these steps simultaneously. For instance, the first user may decrement the quantity of a product, and later roll back the transaction. However, the second user has checked the value after it has been decremented, which has not been suffiecient to fill the second order. Then the second user is forced to roll back the transaction, too, even though it could've gone through. This situation is called dirty read: the value that the second user has seen was not a real value, because the transaction that changed it has never been committed.

This situation can be avoided if the database system guarantees that all transactions are performed as if each of them was the only one running at a time, i.e. as if they have been ordered in some way. Such transactions are called serializable. Serializable transactions provide the highest level of data protection. However, they may significantly reduce efficiency. Therefore in some cases it makes sense to relax the requirement for transaction isolation. You may do so by declaring a transaction to be one of the following:

You may also give the DBMS helpful hints about behaviour of your transaction by declaring it to be READ ONLY. The default mode of a transaction is READ WRITE.

Any SQL statements which are not preceded by a SET TRANSACTION statement are considered a part of SERIALIZABLE READ WRITE transaction. The end of a transaction (which is automatically the beginning of a new one) is marked by a COMMIT or ROLLBACK statement. The following is an example of defining a transaction with a non-default behaviour:

SET TRANSACTION READ ONLY READ COMMITTED ;
... some SQL statements ...
COMMIT or ROLLBACK

Additional information

Tons of other information about SQL and a lot of examples are available here.
Some material on this page has been adopted from a subset of online sources listed here
This page has been created and is maintained by Elena Machkasova
Comments and suggestions are welcome at emachkas@wellesley.edu

Spring Semester 2002