Database Queries¶
We'll start our work with databases learning about queries: getting information out of a database. But first, some overview to help you get oriented.
- Database Concepts
- Relational Databases
- Aside on Representation
- Relational Database properties
- Other Databases
- Client-Server Architecture
- Tempest: the CS server
- The Wellesley Movie Database
- SQL versus a GUI
- Why not a GUI?
- The SELECT Statement
- Examples of Queries
- Syntax Notes
- Batch Files
- Comments
- Example Batch File
- The MySQL Interpreter
- Practical Use of the MySQL client
- The USE statement
- SHOW DATABASES
- SHOW TABLES
- Running a Batch File
- the SOURCE command
- I/O Redirection
- Which Method to Use?
- WHERE Clauses
- Equal and Not Equal
- NULL values
- Date Functions
- Sorting
- Summary
Database Concepts¶
Lots of things store data. For example, a Microsoft Word file can have data in it. That's not what we mean by a database.
A database stores data efficiently:
- fast to look up data, particularly using certain keys
- fast to update/delete data
- frugal with space
As disks get larger, being space-efficient has become less important, but space-efficiency can result in greater speed if less data needs to be moved from disk into memory or sent across networks.
A Database Management System (DBMS) allows you to create and manage a set of databases. For example, a database of employee records, a database of student registratrations, and a database of helpdesk reports, might all co-exist in a single DBMS.
MySQL is a DBMS and there is one installed on Tempest. You will get
your own database within it. If your username is sk13
, your database
will be named sk13_db
. There are also databases that we will share
for the purpose of this course: cs304_db
and wmdb
among others.
Relational Databases¶
MySQL is a relational DBMS. That means
- data is stored as rows in a table
- a table is also called a relation
- each row comprises one or more columns with different kinds of info
- every row has the same columns
Name | BID | House | class year |
---|---|---|---|
Hermione | B123123123 | Gryffindor | 1998 |
Ginny | B234234234 | Gryffindor | 1999 |
Aside on Representation¶
Under the hood, databases have an Engine that are a data structure that allows fast access where most of the data (rows, AKA records) is on disk, rather than in memory. Examples:
Both of these data structures are really cool. You probably already know hash tables. If we have time, we'll talk about B-trees.
Relational Database properties¶
- columns are rarely added/removed. rows routinely are.
- each column cell is a single piece of information, not a list.
- So, there's no "list of courses" column for Hermione saying that she's taking ["arithmancy", "potions", "transfiguration", ...]
- There can be multiple tables. We'll talk about Hermione's list of courses a bit later in the course.
Course | Prof | location |
---|---|---|
Potions | Snape | Dungeon |
Divination | Trelawney | Astronomy Tower |
Other Databases¶
- MySQL is not the only DBMS.
- Microsoft Office has MS Access, which is a desktop RDBMS suitable for a single user.
- There are also non-relational DBMS systems (so-called NoSQL systems) such as MongoDB.
- MySQL is designed to handle multiple concurrent users, unlike MS Access
Client-Server Architecture¶
MySQL is designed with a client-server architecture.
- There is a single daemon process (called mysqld) that is the only process that has direct control of the disks and the data on the disks.
- Different clients connect to the server to operate on their data.
Tempest: the CS server¶
- running MySQL 5.5.60
- (Actually, MariaDB, but virtually indistinguishable)
- There are several databases for this course:
- cs304_db: a bunch of example tables
- wmdb: a fleshed-out example that we'll use all semester
- You will have your own database: youracct_db
- We'll use the command line a lot
- For us, both the MySQL server and MySQL clients are on Tempest
The Wellesley Movie Database¶
- The database we will be playing with is based on my favorite database, namely the Internet Movie Database (IMDB)
- There are pages for actors, pages for movies, and links between them.
- The WMDB has been built by student contributions over many offerings of CS304. The data is not guaranteed.
Please read more about the WMDB
SQL versus a GUI¶
There is a graphical user interface (GUI), actually many of them, for working with databases, but we won't use one. We'll use the industry-standard Structured Query Language, called SQL
If you want a GUI for your queries, MS Access has a pretty good one, called Query By Example (QBE).
There are many online tutorials for SQL, if you want additional information. You might start with W3Schools SQL tutorial
An excellent reference is https://dev.mysql.com/doc/refman/5.5/en/tutorial.html
Why not a GUI?¶
I like GUIs as much as most people. GUIs have many advantages:
- menus remind you of your options: no memorization
- typos are impossible
- syntax is guided; much harder to make mistakes
So, why not a GUI?
- Can't be written down and automated
- Can't be programmed (well, difficult...)
Since we will be programming with SQL, we should practice whenever possible.
The SELECT Statement¶
A query gets information out of a table (or multiple tables — we'll learn that later).
(A SQL statement that inserts, updates or deletes information in a table is sometimes called a query, but that's just informal usage.)
Here's the basic outline of a SQL query:
SELECT col1,col1,... or *
FROM table
WHERE boolean expression;
The first line says which columns from the table you want returned. The asterisk is a wildcard meaing "all columns". I suggest avoiding the asterisk in code, because code should be clear about what columns are wanted and in what order, but I use * all the time when using MySQL interactively.
The FROM clause comes second. Clauses in a SQL statement always go in a particular order. The FROM clause says what table you are searching through.
The WHERE clause comes third. It gives a boolean expression to filter the rows of the table. The query returns all rows for which the boolean expression is true. That might be no rows at all, all of them, or a subset.
The boolean expression can use any column from the table, constants, and any of a large number of MySQL functions and operators.
There are many additional clauses that can be added to the SELECT statement; we'll learn more of these later. One that we will use immediately is the LIMIT clause, which goes last, and says "no matter how many rows this query returns, only give me the first N."
SELECT col1,col1,... or *
FROM table
WHERE boolean expression
LIMIT 10;
Examples of Queries¶
Here are some examples of queries. We'll play with a bunch of examples in class.
- return all rows and columns from the Movie table:
SELECT * FROM movie;
- return just the titles of just the first 10 rows of the Movie table:
SELECT title
FROM movie
LIMIT 10;
- return names and birthdates of all people in the Person table:
SELECT name,birthdate FROM person;
- return name of each person in the Person table born on Sept 2, 1967:
SELECT name
FROM person
WHERE birthdate='1967-09-02';
Syntax Notes¶
SQL keywords are not case sensitive. I only capitalized the keywords to let them stand out.
SQL doesn't care about line breaks, but it's common to put keywords that begin clauses at the beginning of the line to help the reader.
If you want all the details about the SELECT statement, you can read the reference page on SELECT on the MySQL site. However, I suggest you put it off for now; that reference page is very complex.
Our course resources page has lots of links to reference pages.
Batch Files¶
A batch file is just a plain text file (like a Python or Java file) that contains SQL code that we can run, again like a Python file.
They are called batch files because we can put any number of SQL statements in a file and run them in one batch. Don't let the terminology confuse you. They are just plain files, and you can create them with any editor, such as Visual Studio Code, Atom, Sublime, Emacs or whatever.
We will play with quite a few example batch files in class.
Your solutions to assignments during the "MySQL" part of the course will be a collection of batch files.
Comments¶
One nice thing about a batch file is that you can comment your code. MySQL supports three kinds of comment syntax:
#
comment to end of line, as in Python/* */
multi-line comment as in C and Java--
comment to end of line
I have typically used the three character sequence --
: two hyphens
and a space. The space is required in MySQL (though not in standard
SQL). You may use any of these.
Example Batch File¶
Here is an example of a batch file. You can download it at this link: example batch file
use wmdb;
/* An example batch file. This SQL code returns (prints) the
id, name and birthdate of the first 10 people in the WMDB.person
table. */
select nm,name,birthdate
from person
limit 10;
The MySQL Interpreter¶
One of the things I like about Python is that you can play with little bits of code in an interactive way. If I want to remind myself whether integer division truncates or gives me a floating point value, I can check:
$ python3
>>> 1/2
0.5
(The answer would be zero in Python 2, but the example above used Python 3.)
We can do the same thing with MySQL! If we run the command mysql
, an
interactive program, called the MySQL interpreter or shell starts up,
connects to the MySQL server, and enters a loop where it reads a
command or SQL statement, executes it, and prompts you for another
input (just like Python).
Note that the MySQL interpreter is a client, using the client-server terminology we learned earlier.
In the example below, we type the use wmdb
, show tables
and
quit
commands:
$ mysql
MariaDB [cs304_db]> use wmdb;
Database changed
MariaDB [wmdb]> show tables;
+----------------+
| Tables_in_wmdb |
+----------------+
| credit |
| movie |
| person |
| staff |
+----------------+
4 rows in set (0.00 sec)
MariaDB [wmdb]> quit
$
Note that the prompt starts with MariaDB
which is the formal name
for the version of MySQL that we are using.
Practical Use of the MySQL client¶
- commands often end in a semi-colon
- commands that don't finish can be continued on the next line
- if you messed up a previous line, just end the command by entering a semi-colon, ignore the error, and go on.
- you can re-use a prior line by hitting up-arrow or control-p. You can do this in the shell, too.
- you can edit the prior command before submitting it (by pressing
enter
) - in a pinch, you can do control-c, which will kill the client (not the server) and you can start over
- you can quit the client with control-d
Try these!
The USE statement¶
We said much earlier that a MySQL DBMS can manage many databases, and a database is just a collection of tables. You can think of this as files and folders, except that it's not recursive: there's just one flat collection of databases, each of which contains tables.
In the MySQL on Tempest, you will have your own personal database
(youracct_db
), and you'll have access to several databases that we
will share in the course, including:
cs304_db
wmdb
The way to switch from one database to another is done by the USE
statement. So, when I did use wmdb;
in the example above, MySQL
switched me from cs304_db
to the wmdb
.
SHOW DATABASES¶
To find out what databases you have permission to access, you can use
the show databases
command:
MariaDB [cs304guest_db]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| cs304_db |
| cs304guest_db |
| movie_lens_db |
| mysql |
| studreg |
| test |
| webdb |
| wmdb |
| wmdb2 |
+--------------------+
10 rows in set (0.00 sec)
MariaDB [cs304guest_db]>
SHOW TABLES¶
For whatever database you are in, you can find out the tables in that
database with the show tables
command. We saw that above. Here it is
again:
MariaDB [cs304guest_db]> use wmdb;
Database changed
MariaDB [wmdb]> show tables;
+----------------+
| Tables_in_wmdb |
+----------------+
| credit |
| movie |
| person |
| staff |
+----------------+
4 rows in set (0.00 sec)
MariaDB [wmdb]>
Running a Batch File¶
Running a batch file is a little like running a Python file, but with some small syntactic differences.
We run a Python file like this:
python foo.py
That causes python to start up, read the code from foo.py
and
execute it.
In MySQL, there are two ways to read and execute code from a batch
file. We'll look at each in the following two sections. In both cases,
we will use a batch file called wmdb1-all.sql
, the contents of which
looks like this:
use wmdb;
select nm,name,birthdate
from person
limit 10;
Notice the use
statement on the first line. That switches to the
WMDB, so the person
table referred to in the select statement is the
one in the WMDB.
the SOURCE command¶
The first way to "run" this file is to use the source
command inside
the MySQL interpreter:
$ mysql
MariaDB [cs304guest_db]> source wmdb1-all.sql
Database changed
+----+-----------------+------------+
| nm | name | birthdate |
+----+-----------------+------------+
| 0 | Alan Smithee | NULL |
| 1 | Fred Astaire | 1899-05-10 |
| 2 | Lauren Bacall | 1924-09-16 |
| 3 | Brigitte Bardot | 1934-09-28 |
| 4 | John Belushi | 1949-03-05 |
+----+-----------------+------------+
5 rows in set (0.00 sec)
MariaDB [wmdb]> quit
$
Notice the output "database changed"; that got printed after the
use
statement was executed.
I/O Redirection¶
The second way to run the file is more similar to how we would run a Python program, except that we type a left angle bracket:
python foo.py
mysql < foo.sql
(If you've taken CS 240, this is input redirection from the command shell. If you haven't, you can ignore this.)
Here's how it works for our example batch file:
[cs304guest@tempest mysql1-queries]$ mysql < wmdb1-all.sql
nm name birthdate
0 Alan Smithee NULL
1 Fred Astaire 1899-05-10
2 Lauren Bacall 1924-09-16
3 Brigitte Bardot 1934-09-28
4 John Belushi 1949-03-05
[cs304guest@tempest part1]$
Notice two differences:
- We are back at the command prompt (the Unix shell). In other words,
MySQL starts up, runs our code, and exits. This is similar to
Python. With the
source
command, we are still in the MySQL interpreter. - The output is formatted differently (no vertical bars, for example). We'll note some other formatting issues later in the course.
- It doesn't say that the database has changed. That's because things are printed differently.
Which Method to Use?¶
Which method of running a batch file should you use? Either is fine. I almost always use the second way, because I can use filename-completion (the TAB key) in the Unix command shell to help me type filenames correctly and efficiently.
WHERE Clauses¶
A query relies on the WHERE clause to choose which rows to report/return.
The WHERE clause should be a boolean expression: true/false.
Boolean expressions can be combined using AND, OR and NOT, as you would expect. Here are some examples (I've elided the rest of the query statement with ellipses, so you can focus on the boolean expression):
... WHERE B1 and B2 ...
... WHERE B1 or B2 ...
... WHERE not B1 ...
Fairly intuitive, but can easily become complex.
Equal and Not Equal¶
SQL is an old language and so it uses old-fashioned comparison operators.
Test for equality using a single equals sign, not two. The following clause would find people who are 17:
WHERE age = 17
Test for not-equal using <>
. The following clause would find people
who are not 17:
WHERE age <> 17
It turns out that MySQL also supports !=
to test for inequality, so
you can use either one.
NULL values¶
MySQL has a special value called NULL. It is used to represent an unknown value. (Since every row in a table has the same number of values, we need something to put in place of an unknown or unspecified value.) NULL is a little like the None value in Python.
In MySQL, NULL is special:
- it is not a string, so not 'NULL' but just NULL
- it has special comparison operators:
SELECT * FROM person WHERE birthdate IS NULL;
SELECT * FROM person WHERE birthdate IS NOT NULL;
Date Functions¶
The MySQL language has a ton of functions to extract parts of dates and times and to compare them in complex ways. For example:
- ways to subtract two dates to get a time interval,
- to add a time interval to a date to get a different date
- format a date in a variety of ways
Check the online documentation about date and time functions
We will see examples of these in class.
Sorting¶
Sorting is pretty easy. You can request sorting using the ORDER BY
clause, which comes at the very end, just before LIMIT
.
You can sort by multiple columns, each it its own direction. If we want to sort by last name and then by first name (pretty common), we can do this:
SELECT first,last
FROM employee
ORDER BY last, first;
If, for some reason, you want reverse (descending) alphabetical order by last name, but regular ascending order by first name, you can do this:
SELECT first,last
FROM employee
ORDER BY last DESC, first ASC;
Ascending is the default, so you can omit it if you prefer.
Summary¶
In class, you'll get to play with all these things. Hopefully, this reading has helped you get ready. Here are some things to remember:
mysql
command to start the MySQL interpreter from the Unix shellhelp
command to get help about the MySQL interpreter's commandsquit
command to exit the MySQL interpretershow databases
to see your databasesuse DB
to switch to database DBshow tables
to see the tables in the current databasesource FILE
to read and execute statements in the given FILE from the MySQL interpretermysql < FILE
to read and execute statements in the given FILE from the Unix shell