MySQL Summary

This is a brief summary of some of the concepts and techniques that we've learned with respect to Tempest and MySQL in the first few class meetings. It's arranged in order by when the material is presented. During those first course meetings, it's a good idea to look over what we've done and what we will do the next day.

Part 1: Getting Started with Simple Queries

  • databases comprise tables of rows of columns of different kinds of info
  • under the hood, databases have a data structure that allows rows to be easily searched, inserted, and updated. Examples: b-trees and hash tables.
  • rows often inserted/removed. Columns rarely are.
  • client-server architecture
  • Tempest is the CS server, hosting a MySQLdb server

Unix Commands

  • ls lists the contents of a directory; default the current one
  • ls -l lists more info
  • rm deletes a file
  • up-arrow or ^-p to go to previous command
  • cd changes the current directory
  • more file types out the given file
  • curl -O url downloads the given URL
  • tar xf file unpacks the given tarfile
  • mysql starts the MySQL client (the command line interface)
  • mysql < file runs mysql with input from given file

SQL Commands

  • select user() shows who you are logged in as
  • quit to exit
  • help
  • show databases shows the databases you have
  • source file reads commands from the given file
  • use database makes that database the default one
  • show tables shows tables in the default database
  • SELECT query:
     
    SELECT col1, col2, ... or * 
    FROM table 
    WHERE boolean expression
    LIMIT number of row
    
    the last two clauses are option.

Part 2: Creating Tables, Inserting, Updating and Deleting Data

  • tables need a name, named columns, with datatypes
  • Usually should have a primary key
  • primary key is guaranteed unique and guaranteed to be the fastest lookup
  • datatypes determine semantics of operations and storage requirements
  • special value NULL for missing data.
  • NULL is not equal to any value. Special predicates to search for it.
  • values for primary key must be unique
  • primary key values are often just arbitrary counters, sometimes generated by auto_increment.
  • There are many additional datatypes (unsigned integers, integers of different sizes, enums (one of a set of options), sets (a subset of a set of options)
  • SQL statements to insert, update, and delete data.
  • the mysqldump command gives a snapshot of the state of the database at the moment the dump is taken, in a text file that can be used to re-create the state of the database at that moment. The dump is not the database.

SQL Code

  • CREATE table statement:
     
    CREATE TABLE tablename  ( 
        colname  datatype, 
        colname  datatype, 
        colname  datatype, 
        PRIMARY KEY (colname) 
    );
    
  • describe table describes the given table
  • datatypes:
    • int for arithmetic
    • float for arithmetic
    • char(width) for a fixed number of characters
    • varchar(width) for a variable number of characters up to some fixed limit
    • datetime for storing both date and time
    <li><code>INSERT</code> data:
     
    insert into tablename(col1,col2...coln) values (val1,val2...,valn); 
    
  • is null or is not null to find or skip null values
  • auto_increment is an option for values to be generated by a counter in the database, say for ID values.
  • UPDATE statement:
     
    update tablename set col1 = val1, col2=val2;
    where conditional
    

    the optional WHERE clause often specifies the primary key of the row to update.

  • DELETE statement:
     
    delete from tablename
    where conditional
    

    the optional WHERE clause often specifies the primary key of the row to delete

Part 3: Joins

  • A join combines rows from two or more tables.
  • Almost always, the join combines rows that are semantically related, such as a pet and its owner, or a student and their courses.
  • Joins are how relational databases do 1-many relationships (lists) and many-to-many relationships.
  • Examples to keep in mind: movies and directors (many to one) and movies and actors (many to many)
  • Connections between tables are represented by foreign keys: a foreign key is a key in some other table.
  • Lists are backwards: each of the many points to the one, as opposed to the one having a list of the many.
  • Conceptually, a join can also be thought of as a subset of a cross-product. A cross-product combines all pairs of rows.
  • Many-to-many relationships require an intermediate table, listing pairs of primary keys. For example, the credit table lists pairs of an actor ID (NM) and a movie id (TT) where the pair (x,y) means that actor with NM=x acted in move with TT=y
  • special syntax for joins

SQL

  • columns can be (re)named in a query:
     
    select a.name as 'scoop1', b.name as 'scoop2' 
    from flavor a, flavor b; 
    
  • Join on a condition:
     
    select * from A inner join B on (A.x = B.y) WHERE ...     
    select * from movie inner join person on (movie.director = person.nm) WHERE ...     
    
  • Join on equality of a condition:
     
    select * from A inner join B using (x) WHERE ...     
    select * from movie inner join credit using (tt) WHERE ...     
    select * from person inner join credit using (nm) WHERE ...     
    select * from person inner join credit using (nm) inner join movie using (tt) WHERE ...  

Part 4: Complex Queries

Grouping

  • You can group rows, reporting one result per group.
  • For example, grouping current Wellesley students by graduation year would yield just a few rows (one for each class year, plus Davis Scholars and a few others).
  • Can also group by booleans: one group for true and one for false
  • Sometimes you can report values that vary over the group, but you'll only get one arbitrary value, which is not what you want. You should only report aggregate values, like counts, sums, averages and such.
  • Can filter groups using a having clause.

Subqueries

  • You can use a subquery where the outer query is looking for a (1) value, (2) list, or (3) table.
  • Example of (1) looking for a value:
     
    select * from movie 
    where director = (select nm from person where name = 'Peter Jackson'); 
    
  • Example of (2) looking for a list:
     
    select name from person inner join credit using (nm) 
    where tt in  
        -- returns a list 
        select tt from movie where director = 1392; 
    
  • With a sub-query that returns a list, we can use predicates like IN and NOT IN to see if a value occurs in the list. You can see that in the example above.
  • Example of (3) looking for a table. Here the subquery is aliased to t1 so it's like a temporary table named t1. Here, the subquery is a table of titles and director ids of movies added by Scott (UID=1). The outer query joins with the Person table to find the name of the director.
     
    select name, title 
    from person, (select title, director from movie where addedby = 1) as t1 
    where person.nm = t1.director; 
    
  • With a sub-query that returns a table, we can use predicates like exists and not exists to see if the subquery returns any results. For example: People who don't have acting credits:
     
    select name from person 
    where not exists (select * from credit 
                      where credit.nm = person.nm) 
    

Sorting

  • Databases are good at sorting
  • the ORDER BY clause does it.
  • the ORDER BY clause is almost last, after everything except LIMIT.
  • can sort by more than one column, each ascending (ASC) or descending (DESC)

SQL

  • Counting rows: select count(*) from table
  • Eliminate duplicates by using the distinct keyword:
     
    select `release` from movie;             # many duplicates 
    select distinct `release` from movie;    # no duplicates 
    
  • Grouping reports one row/result per value of the grouping criterion. The following example gives the release years and the number of movies in that year.
     
    select `release`,count(*) 
    from movie 
    group by `release` 
    
  • Common aggregate functions: count(), sum(), max(), min(), avg()
  • Having clauses allow you to choose which groups to report. The following only reports years where we have more than 10 movies.
     
    select `release`,count(*) 
    from movie 
    group by `release` 
    having count(*) > 10; 
    
  • Sorting:
     
    select name, birthdate 
    from person 
    order by name ASC, birthdate DESC; 
    
  • Wildcards. This finds all names starting with "George"
     
    select name,birthdate 
    from person 
    where name like 'George%'; 
    

Part 5: Referential Integrity

  • Use InnoDB for both tables
  • construct an index for the foreign key
  • specify the foreign key constraint with policies for update and delete
use webdb;

-- drop in reverse order
drop table if exists track;
drop table if exists album;

create table album ( 
    aid int auto_increment not null primary key, 
    title varchar(50),
    artist varchar(50)
    ) 
ENGINE = InnoDB;

create table track ( 
    aid int not null, 
    tid tinyint not null comment 'track id', 
    title varchar(50), 
    primary key (aid,tid),
    -- necessary for ref integ
    INDEX (aid),
    foreign key (aid) references album(aid) 
        on update restrict 
        on delete restrict
    ) 
ENGINE = InnoDB;