
Welcome!
Everything is fine.
Relational Algebra, Normalization and Views
Plan¶
- Announcements
- Discuss
- Recap Topics
- Quiz Questions
- Discuss Relational Algebra and do some exercises
- Breakout session activities
Announcements¶
- Grading status
- Schedule:
- presenations on Tuesday
- Senior Poster Fair next Friday, 3-4 in H302
- Project:
- Alpha should be a fairly complete app.
Views, Relational Algebra and Normalization¶
Today, we'll discuss three somewhat related topics:
- views
- relational algebra
- normalization
Views¶
A view is like a virtual table:
use cs304_db;
drop view if exists filmography;
create view filmography(nm,name,title) as
select nm,name,title
from person
inner join credit using (nm)
inner join movie using (tt);
Discussion of Views¶
What are the pros and cons of a view versus creating a real filmography
table?
Classic time/space tradeoff
- real table saves time by not re-computing the query
- view saves space by not storing data. Disk space is increasingly cheap, though.
- real table has to keep the data updated, though. If the data changes slowly (actor filmographies) and is searched often, table can make sense.
- maybe you update it as needed or less often than that, so real table might be out of date
Relational Algebra¶
What's the big deal?
a(b+c) vs ab+ac
- algebraic rules let us transform one expression into another that means the same thing
- relational algebra is the theoretical underpinning of relational databases
- Thus, the DBMS can turn one query into another
- The DBMS can estimate the run time of each such query, and
- choose the fastest. This is called query optimization.
Tables are sets of tuples. Every expression denotes a set of tuples.
Notation¶
Naturally, relational algebra has its own notation:
- select (sigma): σcriteria(R)
- project (pi): πattributes(R)
- union: R ∪ S
- set difference: R - S
- cross product: R × S
- intersection: R ∩ S
- join (bowtie): R ⋈condition S
Join can be defined by cross-product and selection but is typically defined as a primitive.
Union Compatible¶
Same number of attributes and corresponding ones are the same type.
Doesn't have to be sensible. Example:

Quiz Questions¶
I'll address your quiz questions
Exercise on Relational Algebra¶
These all use WMDB relations.
Note that it's okay to name parts of the expression using something like variables.
- Names of people born on September 2, 1967.
πname( σbirthdate='1967-09-02'(Person))
- titles and years of all movies directed by Peter Jackson and
released in the 1990s
M90 = πtt( σrelease='199%'(Movie))
MPJ = πtt( σname='Peter Jackson'(Movie ⋈ nm=directorPerson))
πtitle,year (Movie ⋈ (M90 ∩ MPJ ) ) - Titles of movies directed by someone under 50 and released
after 2010.
Mrecent = πtt( σrelease>'2010'(Movie))
MYoung = πtt( σbirthdate>1966(Movie ⋈ nm=directorPerson))
πtitle (Movie ⋈ (Mrecent ∩ MYoung ) ) - Titles of movies not directed by someone under 50 and
released after 2010.
M2010 = πtt( σrelease>'2010'(Movie))
MYoung = πtt( σbirthdate>1966(Movie ⋈ nm=directorPerson))
πtitle (Movie - Movie ⋈ (M2010 ∩ MYoung ) )
Query Optimization¶
What's the time complexity of each of the following? Discuss with elbow partner.
- Looking up a person by
NM
- Looking up a person by
NAME
- Looking up a person by
NAME
if there's an index? - Should we have an index on
name
in theWMDB.person
table? - Finding the intersection of two sets of tuples?
- O(log N), and probably close to O(1)
- O(N)
- O(log N), and probably close to O(1)
- We often look people up by name, so, yeah.
- O(N log N), using the mergesort idea
Normalization
What's the point?
- Avoiding redundancy lets us be frugal on space (less of an issue nowadays) but less space can also mean less time (less data to read from disk, transmit across networks, store in memory and so forth).
- Avoiding redundancy prevents anomalies
- Normalizing makes the database more flexible, because unanticipated queries are more possible.
Denormalization¶
Sometimes, it's better not to normalize, even to combine tables that were already separate. Criteria to consider:
- normalization (decomposition) can makes complex queries slower because we have to do a join. A denormalized DB is pre-joined.
- normalization can make simple queries faster because there's less data to move around, and more records can fit into a single disk block. Imagine trying to find a movie with a particular pattern of title (so you have to read every movie title), but the movie records also have big text blocks in them (a synopsis?).
- normalization can make simple updates faster for the same reason.
- normalization reduces storage, because it reduces duplication. For example, there's no reason to repeat all the information about a movie in each actor who appeared in the movie.
- normalization can sometimes increase storage if the duplicate data is small (e.g. the name of a hobby) and the shared key is long (maybe an ID or email address).
The MongoDB version of WMDB is de-normalized
Ethics Discussion¶
Should we do it today or next Wednesday?
Today:
- pro: some topics already suggested
- pro: we are here today; next Wednesday is uncertain
- con: less time pressure next Wednesday (maybe?)
- con: maybe more topics next Wednesday
- con: nicer classroom next Wednesday
Thoughts? Votes?
If today, here are the topics so far
Summary¶
- Relational algebra is an important theoretical underpinning with immense practical value as well.
- Normalization is a good idea most of the time, though not always.
- Views are really useful.