Can you group by things that are not columns?

Yes. You can compute a value from the row and then group by the values of that. We'll see an example today, where we group by the values of a boolean expression.

How do you add new columns to a table after creating it and using it for a while?

You can do that with ALTER TABLE. We'll do that later in the course.

If there are no NULL values, will COUNT(*) and COUNT(col_name) return the same values?

Excellent point! Let's try it in the WMDB:

 use wmdb;
 select count(*) from person;
 select count(*) from person where birthdate is null;
 select count(birthdate) from person;
 select count(birthdate) from person where birthdate is not null;

Can you give another example of group by?

We could group students in this class by year, by dorm, by age, by country, by ...

Now, imagine computing a property of the group, say average GPA or average height.

Do seniors have better grades than juniors? Are people from the US taller than people from China?

Could you explain the uniformity across a group concept? Specifically, this sentence: "When grouping, the columns should values that are uniform (the same) for every member of the group." Does this mean that columns need to have the same amount of values for each member in the group OR have the same values?

Here's a common mistake:

select name from students group by year

Let's think about what that means

SELECT region, avg(sales)
FROM sales_data
GROUP BY region;

For this example, how would the dates be arranged/aggregated?"

Does it matter? Why would it matter? The dates are "squeezed out" when we group by region, because we end up with one aggregate value per region, averaging over all dates.

I read over the page on aggregate functions but it doesn't look like there's one for subtraction even though there's an aggregate function for sum. Is there any reason why there isn't a function for subtraction?

Interesting question. If I have a group of just three numbers: 1,2,3 I can define what I mean by the sum but how would you define the subtraction?

Instead of using subqueries that return a table inside a query, isn't it more readable to put them outside of the query and set it to a variable (where tableName as () ) to refer to in the query?

Readability is sometimes a matter of taste. Many students find subqueries more readable, but not all. Decide what works for you.

Can you explain the following reading example. I'm having trouble understanding why it's possible to list credit, movie, and person as p2.

select distinct p1.name
from person as p1,credit,movie,person as p2
where p1.nm=credit.nm
    and credit.tt=movie.tt
    and movie.director=p2.nm
    and p2.name = 'Clint Eastwood';"

This is an important example, because it really takes some getting used to.

We need to use the Person table twice because one of them will be Clint and the other will be an actor in a Clint Eastwood movie.

A generalization of this might be pairs of actors (co-stars) or pairs of pets (our breeding pairs example).

In the SUBQUERY vs. JOIN implementation of finding actors who have been directed by Clint Eastwood,

-- actors who have been directed by Clint Eastwood,
-- implemented as a join
use wmdb;
select distinct p1.name
from person as p1,credit,movie,person as p2
where p1.nm=credit.nm
    and credit.tt=movie.tt
    and movie.director=p2.nm
    and p2.name = 'Clint Eastwood';

Why do to check the person column twice in the from clause? When would we put the same column twice in the from clause? Why do we even need to check that movie.director = p2.nm when each movie has a unique director?"

We need to use the PERSON table twice because we have two people in the result. When matching items, we need to enforce the join conditions.

Can we go over the Clint Eastwood example that uses join?

In the "Subquery versus Join" section of the reading, could you go over the first solution (the one using JOIN)? I am having trouble figuring where/what is being joined.

I don't think I understand the ""-- actors who have been directed by Clint Eastwood, -- implemented as a join"" example, can we please go over this in class? Thanks.

I'll go over this as many times as people need.

Could you explain the code for Subquery vs. Join more? I'm kind of confused by what is distinct and what isn't and how that impacts the code.

Very often, the way we structure a query yields duplicates. For example, if we list actors in Peter Jackson movies, we'll get Orlando Bloom like six times. But maybe we only want to list him once.

In the query that looks for actors who have been directed by Clint Eastwood (implemented as a join), why do we need two aliases for the person table (person as p1, person as p2)?

I'm still a bit confused about exists and not exists, can we go through more examples in class?

Sure, we'll see some.

What are some good ways to debug the code, especially when inner queries are used?

Great question! In general, databases are too large to test a query "by hand" so you have to have confidence in your logic. That's part of our goal in this course.

Nevertheless, testing your assumptions with various queries, particularly simpler ones, is helpful.