Can you please explain further what having equality on a field means?

It's the same equality that you're familiar with. If a row of the Person table has an addedby field that has the value of 17 and a row of the Staff table has a uid field that has the value 17, we can match those two rows by specifying that addedby=uid. It's just equality of the two integers.

That equality condition is used to infer that staff member 17 added that person to the Person table.

Does a table have to have a primary key?

No. It usually does, but not always.

How do we know when using JOIN is necessary?

When the information to answer a query is in multiple tables. For example, "who added James Earl Jones" to the WMDB? requires both the staff and person tables.

Could you explain how a primary key that requires multiple column values works?

Sure. We happen to have two people named Ashley in this class, so "first name" is not unique. It's also not uncommon to have duplicate last names. But for many classes, (first_name, last_name) would be a unique identifier, and hence a key.

More commonly, it happens with intermediate tables that are used to represent many:many mappings; stay tuned.

Is using "join" the same as using "inner join"?

For now, yes. Later, we'll learn about outer joins.

In the example with a pet table (primary key ""pid"") and an owner table (primary key ""oid""), I understand that the ""oid"" in the pet table is a foreign key that links each pet to its owner in the owner table, but I'm confused what happens to foreign keys when we merge these two tables?

When we join two tables, we get all the columns. If we join table A with 5 columns and table B with 3 columns, the join has 8 columns. So the foreign keys are still there. No data is lost.

The join just matches up rows from A with rows from B, with replacement. So if A has 20 rows and B has 10 rows, the join might have as many as 200 rows.

The join condition might discard some of those 200 rows, so it might be fewer. Maybe even only one row, such as the example in the first question.

I understand that the join operation is virtual and joining happens during a query and does not affect the representation of either table. However, does that mean that join does not create a separate table with the two tables merged? Or that it does, but after the query is executed, it does not save it anywhere?

Right; it does not save it anywhere. It "creates" the table for the purpose of the query and then discards it when it is done.

Could you explain again the difference between JOIN and JOIN ON?

the ON keyword is just a way of stating the JOIN condition. Going back to the first example, the following are equivalent:

    
SELECT person.name,staff.name
FROM person,staff 
WHERE person.addedby=staff.uid and nm=469;

SELECT person.name,staff.name
FROM person INNER JOIN staff ON person.addedby=staff.uid
WHERE nm=469;

Could we also review one:many vs many:many mapping? / I'm still confused by the different mappings and how to implement each of them.

The mapping from owners to pets is one:many, and the mapping from pets to owners is many:one. So, we can implemented that by a foreign key indicating the (unique) OID (owner id) for a particular pet.

The mapping from an actor to their movies is one:many. For example, James Earl Jones was in Star Wars, The Lion King, Field of Dreams, and The Hunt for Red October (and many more).

The cast of a movie is similarly one:many.

The mapping from the set of actors to the set of movies is many:many, which we implement with an intermediate table storing the pairs (NM, TT)

I'm confused about the representation of Many Many Relationships and when (in the given example) there are arrows coming from the movies to the 2 actors.

Let's look at it together: many:many with intermediate table

I'm confused about the concept of many-to-many mapping. Using the actors-to-movies example, isn't each actor a unique individual? If that's the case, how can there be a many-to-many relationship between an actor and the movies they've appeared in, since there's only one actor for each movie they participate in?

Great question! Yes, James Earl Jones is a unique actor. He can be in lots of movies. That's one:many.

The set of actors and the set of movies have a many:many relationship between them.

Could intermediate tables be explained a bit more?

Sure. They represent pairs of entities, such as maybe a pair of people. That's how we could represent that two people are friends, or married, or siblings or ...

Can you give and explain an example of cross-product?

A cross-product is the set of all possible pairs of elements drawn from two sets.

If I'm creating partners for class today, I'm drawing from the set of all possible pairs, so I'm drawing from the cross-product.

If we have a graph, the adjacency matrix is the cross-product of the vertices

In class, could you please go over natural joins again with more examples? / In what cases would it be helpful to use a natural join?

I can't think of any good reason to use a natural join. It's at best a shorthand, and at worst a trap.

I suggest avoiding them.

But you should know that they exist.

How does aliasing work with creating intermediate tables and what are its benefits?

We use aliasing when we need to refer to another occurrence of the same table in a single query.

Consider the following Python code:


def all_pairs(list1, list2):
    return [ (x,y) 
             for x in list1
             for y in list2 ]
all_pairs([1,2,3], [10,20,30])
mylist = [1,2,3]
all_pairs(mylist, mylist)

How would you join more than two tables together?

Just keep going:


SELECT person.name,staff.name,movie.title
FROM person INNER JOIN staff ON person.addedby=staff.uid
            INNER JOIN movie ON person.nm=movie.director
WHERE nm=1392;

I have a question on the "Join USING a shared field". You mention that we can join on fields that have the same name in two tables which makes sense. But in the future, say we have multiple tables that each may have many columns - is there an efficient way to keep track of the field names in each table? Or do we just have to call each table and see where the field names match up?

It's a great question. For another project, I have a big Google doc describing all the tables and all the field names so that when I forget, I can look them up to determine their meaning, usage and other documentation.

Documentation is crucial.

I think I am just a little confused overall but I think putting it into practice in class will help

I think you will.