Database Queries

We'll start our work with databases learning about queries: getting information out of a database. But first, some overview to help you get oriented.

Database Concepts

Lots of things store data. For example, a Microsoft Word file can have data in it. That's not what we mean by a database.

A database stores data efficiently:

  • fast to look up data, particularly using certain keys
  • fast to update/delete data
  • frugal with space

As disks get larger, being space-efficient has become less important, but space-efficiency can result in greater speed if less data needs to be moved from disk into memory or sent across networks.

A Database Management System (DBMS) allows you to create and manage a set of databases. For example, a database of employee records, a database of student registratrations, and a database of helpdesk reports, might all co-exist in a single DBMS.

MySQL is a DBMS and there is one installed on Tempest. You will get your own database within it. If your username is sk13, your database will be named sk13_db. There are also databases that we will share for the purpose of this course: cs304_db and wmdb among others.

Relational Databases

MySQL is a relational DBMS. That means

  • data is stored as rows in a table
  • a table is also called a relation
  • each row comprises one or more columns with different kinds of info
  • every row has the same columns
Name BID House class year
Hermione B123123123 Gryffindor 1998
Ginny B234234234 Gryffindor 1999

Aside on Representation

Under the hood, databases have an Engine that are a data structure that allows fast access where most of the data (rows, AKA records) is on disk, rather than in memory. Examples:

Both of these data structures are really cool. You probably already know hash tables. If we have time, we'll talk about B-trees.

Relational Database properties

  • columns are rarely added/removed. rows routinely are.
  • each column cell is a single piece of information, not a list.
  • So, there's no "list of courses" column for Hermione saying that she's taking ["arithmancy", "potions", "transfiguration", ...]
  • There can be multiple tables. We'll talk about Hermione's list of courses a bit later in the course.
Course Prof location
Potions Snape Dungeon
Divination Trelawney Astronomy Tower

Other Databases

  • MySQL is not the only DBMS.
  • Microsoft Office has MS Access, which is a desktop RDBMS suitable for a single user.
  • There are also non-relational DBMS systems (so-called NoSQL systems) such as MongoDB.
  • MySQL is designed to handle multiple concurrent users, unlike MS Access

Client-Server Architecture

MySQL is designed with a client-server architecture.

  • There is a single daemon process (called mysqld) that is the only process that has direct control of the disks and the data on the disks.
  • Different clients connect to the server to operate on their data.

mysql client-server architecture

Tempest: the CS server

  • running MySQL 5.5.60
  • (Actually, MariaDB, but virtually indistinguishable)
  • There are several databases for this course:
    • cs304_db: a bunch of example tables
    • wmdb: a fleshed-out example that we'll use all semester
  • You will have your own database: youracct_db
  • We'll use the command line a lot
  • For us, both the MySQL server and MySQL clients are on Tempest

The Wellesley Movie Database

  • The database we will be playing with is based on my favorite database, namely the Internet Movie Database (IMDB)
  • There are pages for actors, pages for movies, and links between them.
  • The WMDB has been built by student contributions over many offerings of CS304. The data is not guaranteed.

Please read more about the WMDB

SQL versus a GUI

There is a graphical user interface (GUI), actually many of them, for working with databases, but we won't use one. We'll use the industry-standard Structured Query Language, called SQL

If you want a GUI for your queries, MS Access has a pretty good one, called Query By Example (QBE).

There are many online tutorials for SQL, if you want additional information. You might start with W3Schools SQL tutorial

An excellent reference is https://dev.mysql.com/doc/refman/5.5/en/tutorial.html

Why not a GUI?

I like GUIs as much as most people. GUIs have many advantages:

  • menus remind you of your options: no memorization
  • typos are impossible
  • syntax is guided; much harder to make mistakes

So, why not a GUI?

  • Can't be written down and automated
  • Can't be programmed (well, difficult...)

Since we will be programming with SQL, we should practice whenever possible.

The SELECT Statement

A query gets information out of a table (or multiple tables — we'll learn that later).

(A SQL statement that inserts, updates or deletes information in a table is sometimes called a query, but that's just informal usage.)

Here's the basic outline of a SQL query:

SELECT col1,col1,... or * 
FROM table 
WHERE boolean expression; 

The first line says which columns from the table you want returned. The asterisk is a wildcard meaing "all columns". I suggest avoiding the asterisk in code, because code should be clear about what columns are wanted and in what order, but I use * all the time when using MySQL interactively.

The FROM clause comes second. Clauses in a SQL statement always go in a particular order. The FROM clause says what table you are searching through.

The WHERE clause comes third. It gives a boolean expression to filter the rows of the table. The query returns all rows for which the boolean expression is true. That might be no rows at all, all of them, or a subset.

The boolean expression can use any column from the table, constants, and any of a large number of MySQL functions and operators.

There are many additional clauses that can be added to the SELECT statement; we'll learn more of these later. One that we will use immediately is the LIMIT clause, which goes last, and says "no matter how many rows this query returns, only give me the first N."

SELECT col1,col1,... or * 
FROM table 
WHERE boolean expression
LIMIT 10;

Examples of Queries

Here are some examples of queries. We'll play with a bunch of examples in class.

  • return all rows and columns from the Movie table:
SELECT * FROM movie;
  • return just the titles of just the first 10 rows of the Movie table:
SELECT title
FROM movie
LIMIT 10;
  • return names and birthdates of all people in the Person table:
SELECT name,birthdate FROM person;
  • return name of each person in the Person table born on Sept 2, 1967:
SELECT name
FROM person
WHERE birthdate='1967-09-02';

Syntax Notes

SQL keywords are not case sensitive. I only capitalized the keywords to let them stand out.

SQL doesn't care about line breaks, but it's common to put keywords that begin clauses at the beginning of the line to help the reader.

If you want all the details about the SELECT statement, you can read the reference page on SELECT on the MySQL site. However, I suggest you put it off for now; that reference page is very complex.

Our course resources page has lots of links to reference pages.

Batch Files

A batch file is just a plain text file (like a Python or Java file) that contains SQL code that we can run, again like a Python file.

They are called batch files because we can put any number of SQL statements in a file and run them in one batch. Don't let the terminology confuse you. They are just plain files, and you can create them with any editor, such as Visual Studio Code, Atom, Sublime, Emacs or whatever.

We will play with quite a few example batch files in class.

Your solutions to assignments during the "MySQL" part of the course will be a collection of batch files.

Comments

One nice thing about a batch file is that you can comment your code. MySQL supports three kinds of comment syntax:

  • # comment to end of line, as in Python
  • /* */ multi-line comment as in C and Java
  • -- comment to end of line

I have typically used the three character sequence --: two hyphens and a space. The space is required in MySQL (though not in standard SQL). You may use any of these.

Example Batch File

Here is an example of a batch file. You can download it at this link: example batch file

use wmdb;

/* An example batch file. This SQL code returns (prints) the 
id, name and birthdate of the first 10 people in the WMDB.person
table. */ 

select nm,name,birthdate 
from person
limit 10;

The MySQL Interpreter

One of the things I like about Python is that you can play with little bits of code in an interactive way. If I want to remind myself whether integer division truncates or gives me a floating point value, I can check:

$ python3
>>> 1/2
0.5

(The answer would be zero in Python 2, but the example above used Python 3.)

We can do the same thing with MySQL! If we run the command mysql, an interactive program, called the MySQL interpreter or shell starts up, connects to the MySQL server, and enters a loop where it reads a command or SQL statement, executes it, and prompts you for another input (just like Python).

Note that the MySQL interpreter is a client, using the client-server terminology we learned earlier.

In the example below, we type the use wmdb, show tables and quit commands:

$ mysql
MariaDB [cs304_db]> use wmdb;

Database changed
MariaDB [wmdb]> show tables;
+----------------+
| Tables_in_wmdb |
+----------------+
| credit         |
| movie          |
| person         |
| staff          |
+----------------+
4 rows in set (0.00 sec)

MariaDB [wmdb]> quit
$

Note that the prompt starts with MariaDB which is the formal name for the version of MySQL that we are using.

Practical Use of the MySQL client

  • commands often end in a semi-colon
  • commands that don't finish can be continued on the next line
  • if you messed up a previous line, just end the command by entering a semi-colon, ignore the error, and go on.
  • you can re-use a prior line by hitting up-arrow or control-p. You can do this in the shell, too.
  • you can edit the prior command before submitting it (by pressing enter)
  • in a pinch, you can do control-c, which will kill the client (not the server) and you can start over
  • you can quit the client with control-d

Try these!

The USE statement

We said much earlier that a MySQL DBMS can manage many databases, and a database is just a collection of tables. You can think of this as files and folders, except that it's not recursive: there's just one flat collection of databases, each of which contains tables.

In the MySQL on Tempest, you will have your own personal database (youracct_db), and you'll have access to several databases that we will share in the course, including:

  • cs304_db
  • wmdb

The way to switch from one database to another is done by the USE statement. So, when I did use wmdb; in the example above, MySQL switched me from cs304_db to the wmdb.

SHOW DATABASES

To find out what databases you have permission to access, you can use the show databases command:

MariaDB [cs304guest_db]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| cs304_db           |
| cs304guest_db      |
| movie_lens_db      |
| mysql              |
| studreg            |
| test               |
| webdb              |
| wmdb               |
| wmdb2              |
+--------------------+
10 rows in set (0.00 sec)

MariaDB [cs304guest_db]> 

SHOW TABLES

For whatever database you are in, you can find out the tables in that database with the show tables command. We saw that above. Here it is again:

MariaDB [cs304guest_db]> use wmdb;

Database changed
MariaDB [wmdb]> show tables;
+----------------+
| Tables_in_wmdb |
+----------------+
| credit         |
| movie          |
| person         |
| staff          |
+----------------+
4 rows in set (0.00 sec)

MariaDB [wmdb]> 

Running a Batch File

Running a batch file is a little like running a Python file, but with some small syntactic differences.

We run a Python file like this:

python foo.py

That causes python to start up, read the code from foo.py and execute it.

In MySQL, there are two ways to read and execute code from a batch file. We'll look at each in the following two sections. In both cases, we will use a batch file called wmdb1-all.sql, the contents of which looks like this:

use wmdb;

select nm,name,birthdate 
from person
limit 10;

Notice the use statement on the first line. That switches to the WMDB, so the person table referred to in the select statement is the one in the WMDB.

the SOURCE command

The first way to "run" this file is to use the source command inside the MySQL interpreter:

$ mysql
MariaDB [cs304guest_db]> source wmdb1-all.sql

Database changed
+----+-----------------+------------+
| nm | name            | birthdate  |
+----+-----------------+------------+
|  0 | Alan Smithee    | NULL       |
|  1 | Fred Astaire    | 1899-05-10 |
|  2 | Lauren Bacall   | 1924-09-16 |
|  3 | Brigitte Bardot | 1934-09-28 |
|  4 | John Belushi    | 1949-03-05 |
+----+-----------------+------------+
5 rows in set (0.00 sec)

MariaDB [wmdb]> quit
$

Notice the output "database changed"; that got printed after the use statement was executed.

I/O Redirection

The second way to run the file is more similar to how we would run a Python program, except that we type a left angle bracket:

python foo.py
mysql < foo.sql

(If you've taken CS 240, this is input redirection from the command shell. If you haven't, you can ignore this.)

Here's how it works for our example batch file:

[cs304guest@tempest mysql1-queries]$ mysql < wmdb1-all.sql 
nm  name    birthdate
0   Alan Smithee    NULL
1   Fred Astaire    1899-05-10
2   Lauren Bacall   1924-09-16
3   Brigitte Bardot 1934-09-28
4   John Belushi    1949-03-05
[cs304guest@tempest part1]$ 

Notice two differences:

  • We are back at the command prompt (the Unix shell). In other words, MySQL starts up, runs our code, and exits. This is similar to Python. With the source command, we are still in the MySQL interpreter.
  • The output is formatted differently (no vertical bars, for example). We'll note some other formatting issues later in the course.
  • It doesn't say that the database has changed. That's because things are printed differently.

Which Method to Use?

Which method of running a batch file should you use? Either is fine. I almost always use the second way, because I can use filename-completion (the TAB key) in the Unix command shell to help me type filenames correctly and efficiently.

WHERE Clauses

A query relies on the WHERE clause to choose which rows to report/return.

The WHERE clause should be a boolean expression: true/false.

Boolean expressions can be combined using AND, OR and NOT, as you would expect. Here are some examples (I've elided the rest of the query statement with ellipses, so you can focus on the boolean expression):

... WHERE B1 and B2 ...
... WHERE B1 or B2 ...
... WHERE not B1 ...

Fairly intuitive, but can easily become complex.

Equal and Not Equal

SQL is an old language and so it uses old-fashioned comparison operators.

Test for equality using a single equals sign, not two. The following clause would find people who are 17:

WHERE age = 17

Test for not-equal using <>. The following clause would find people who are not 17:

WHERE age <> 17

It turns out that MySQL also supports != to test for inequality, so you can use either one.

NULL values

MySQL has a special value called NULL. It is used to represent an unknown value. (Since every row in a table has the same number of values, we need something to put in place of an unknown or unspecified value.) NULL is a little like the None value in Python.

In MySQL, NULL is special:

  • it is not a string, so not 'NULL' but just NULL
  • it has special comparison operators:
SELECT * FROM person WHERE birthdate IS NULL;
SELECT * FROM person WHERE birthdate IS NOT NULL;

Date Functions

The MySQL language has a ton of functions to extract parts of dates and times and to compare them in complex ways. For example:

  • ways to subtract two dates to get a time interval,
  • to add a time interval to a date to get a different date
  • format a date in a variety of ways

Check the online documentation about date and time functions

We will see examples of these in class.

Sorting

Sorting is pretty easy. You can request sorting using the ORDER BY clause, which comes at the very end, just before LIMIT.

You can sort by multiple columns, each it its own direction. If we want to sort by last name and then by first name (pretty common), we can do this:

SELECT first,last
FROM employee
ORDER BY last, first;

If, for some reason, you want reverse (descending) alphabetical order by last name, but regular ascending order by first name, you can do this:

SELECT first,last
FROM employee
ORDER BY last DESC, first ASC;

Ascending is the default, so you can omit it if you prefer.

Summary

In class, you'll get to play with all these things. Hopefully, this reading has helped you get ready. Here are some things to remember:

  • mysql command to start the MySQL interpreter from the Unix shell
  • help command to get help about the MySQL interpreter's commands
  • quit command to exit the MySQL interpreter
  • show databases to see your databases
  • use DB to switch to database DB
  • show tables to see the tables in the current database
  • source FILE to read and execute statements in the given FILE from the MySQL interpreter
  • mysql < FILE to read and execute statements in the given FILE from the Unix shell