Welcome!
Everything is fine.

Groups and Subqueries

Today, we'll learn about Grouping and Subqueries.

Plan

  1. (15) Review reading
  2. (10) quiz questions
  3. (20) Group work
  4. (10) Discussion
  5. (20) Continue group work

Announcements

  • Unix is graded and copied to Sakai. If you haven't done it yet, that's fine, but please talk to me about your plan.
  • Most of you have done the self-introduction. If you haven't, please do!
  • Wednesday meetings moved to L240
  • Anonymous Feedback form
    • I'll review often (at least weekly)
    • I'll respond in class as appropriate

Recap

Last time, we learned a few things about:

  • joining tables
  • keys and foreign keys

Questions on any of that?

New Partners for Today

  • 2 Makana+Malinda
  • 4 Maya+Vaishu
  • 6 Ariel+Nicole
  • 8 Linh+Sophia
  • 10 Fangzhong+Mukhlisa
  • 12 Ali+Flora
  • 14 Claire+Maddie
  • 16 Indira+Sana
  • 18 Anjali+Jasmine
  • 20 Bella+Diya
  • 22 Chloe+Sophie
  • 24 Ashley You+Kathy
  • 26 Hannah+Yuneydy
  • 28 Ashley Cong+Winnie
  • 30 solo DaisyTBD

Groups and Subqueries Review

About a third of you are "undecided" about this material, so I'll review as needed.

Let's focus on concepts first.

  • Consider a table of Wellesley students, with column for year and column for major
  • we could group by year and count the number of students in each year
  • we could group by major and count the number of students in each year
  • Or compute other aggregate values like average GPA to answer questions like:
    • do some departments have higher (lower) GPA than others?
    • does GPA go up over time? Or down?

Now, let's expand to think about subqueries. Imagine a subquery that uses grouping to compute the average GPA per major:

major gpa
arch 3.1
art 3.2

We can now have queries like:

  • find all students who have a GPA higher than average for their major
  • find all students whose have a GPA higher than that of every major
  • find the major with the highest GPA

groups-and-subqueries

Quiz Questions on Groups and Subqueries

More of you felt good about this than not, but most were in the middle.

I'll address your quiz questions

Code for Today

You can get started by copying the files for today like this:

cd ~/cs304
cp -r ~cs304/pub/downloads/mysql3-groups mysql3-groups
cd mysql3-groups

Breakout 1

Chat among yourselves.

There are many examples in this folder. Explore and experiment:

  • group1.sql counts movies since 2010, grouped by year. Edit this to remove the WHERE clause to see all years.
  • group2.sql shows you can also group by a boolean, yielding three groups: true, false, null
  • group3.sql is a nonsense query because it doesn't have uniformity across the groups

Exercise

In the cs304_db database, there is a table called cgm. It has diabetes data. The important data is the level of blood glucose, measured in milligrams (mg) per deciliter (dl), stored in the column mgdl. Other columns should be self-explanatory.

Write queries to determine:

  • How many users are in the database?
  • How many different years are in the database?
  • How many data values in each year?

More exercises using the CGM table:

  • Do we have any duplicate timestamps, maybe due to errors from sensors or upload?
  • Do we have any readings that are in disagreement: same timestamp but different data?
  • Do we have any readings that are literally redundant? same timestamp and same data?

Exercise 2

In the wmdb, find movies that have at least 8 actors in them.

Examine/Discuss

In your breakout group, you should look at these examples, try them, and discuss:

  • having{1,2,3}.sql looks at the CGM table looking for duplicates
  • having4.sql finds movies with >= 8 actors

Exercise 3

In the cs304_db, find the lowest value of mgdl.

Does this make sense?

Discussion (Together)

The task of finding the lowest value of mgdl was extra challenging because of the datatypes. We'll talk more about this when we talk about DDL, next time.

Finding when the low happened is tough, too.

The following finds the low:

use cs304_db;
select min(mgdl) from cgm2;

The following does not find when the low happened:

use cs304_db;
select date_time,min(mgdl) from cgm2;

See:

use cs304_db;
select * from cgm2 where date_time = '2014-05-27 22:48:00';

(See select3.sql)

Searching for the Low

The following works, but it assumes that we know what the low is:

use cs304_db;
select * from cgm2
where mgdl = 29;

(See select4.sql)

Subqueries to the Rescue!

This works great:

use cs304_db;
select * from cgm2
where mgdl = (select min(mgdl) from cgm2);

See select5.sql

Breakout Rooms

On your own, explore the other batch files:

  • peter-jackson-movies.sql
  • movies-with-actors.sql
  • movies-with-n-actors.sql
  • movies-without-actors.sql
  • scott-movies.sql
  • count-dups.sql

Activities

You can spend the rest of class continuing with this work, above, chatting among yourselves, or working on the Queries homework.