One row per group. If we group the students in this class by class year, we get 2 groups: juniors and seniors.
Each row might have, say, average GPA or average height.
Correct
We can also compute an aggregate function the base case group of everyone, so we could compute the average GPA or height of people in this class by just not grouping.
Great question, because it's something people are often confused by. Suppose we group the students in this class by class year, so two groups: juniors and seniors. What information could we report? We could report the size of the group (the count), the average/min/max of various quantities.
How about things like name and birthdate? What is the birthdate of the juniors? What would that mean?
It's not meaningful because "birthdate" is not uniform over the group.
We can ask questions like class color, because that is uniform over the group.
In MySQL, sometimes that gives an error and sometimes that gives the birthdate of an arbitrary member of the group. Our version of MySQL does the latter, which is not ideal.
Yes. So if we group by class year and then by birth month, we could get 2*12 or 24 groups.
Good question. Let's look back at the section on having
WHERE clauses look at a single row at a time. HAVING clauses look at a single GROUP at a time.
If there were a column for "COUNT" in the table, then, sure, we can use it in the WHERE clause. Maybe the SOUTH region had 17 sales that totaled $11,000 on 2019-09-09. We can use that.
But otherwise, there isn't a COUNT property of a single row of the sales table. (Or it's vacuously equal to 1.)
But a GROUP BY clause can group all the sales, say by region, and then report the COUNT or number of sales in each region.
If we are only interested in regions that have more than 10 sales, we can use a HAVING clause to do that.
The HAVING clause is mostly used with Aggregate functions, since those are properties of the GROUP. But let's pick a property like HEIGHT. Can a person have an average height? Sorta, but not really.
But groups can have an average height. And we could use the HAVING clause to choose groups whose average height has some property.
select name from person where not exists (select * from credit where credit.nm = person.nm);For the query above, why don't we have to give the subquery a temporary name or alias with AS, since it returns more than one column? Is this only required with FROM?
Right. In this case, we don't need to give it a name, because it's used only by the NOT EXISTS.
Sure. Here's the subquery with table example.
Here, we need an alias (T1) because we're going to refer to it in the WHERE clause.
Let's think about T1: those are titles and director ID of movies that Scott added. It's a subset of the whole Movie table.
The outer query then joins that with the Person table to provide the name of the director.
You could also do that without a subquery:
Note (and this is important), that we can't just say addedby=1
in the WHERE clause, because there are two addedby
columns: one in person
and one in movie
, and the query is about movies added by Scott, whether or not the director was added by Scott.
We could, however, have abbreviated
the person.nm=movie.director
to nm=director
, since both of those columns are
unique in the join.
Use them when that's how you think about the query. The importance is about clarity, not efficiency.
SQL is a fourth generation programming language, where you say what you want rather than how to compute it. (Though definitions vary.)
Usually, this arises where the subquery finds a set of entities fitting some condition and the outer query uses (as part of its criteria) whether something is IN or NOT IN that set.
Let's use the queries above, modifying them a little. So, now the query is "names of people who directed movies added by Scott." Using subqueries:
Yes!
Sometimes.
We are going to trust the query optimizer.