Great question. The main thing is that numbers are just different. First, numbers have some weird problems char don't. For example, there's a maximum integer, namely 2,147,483,647. max int. So, suppose we try to store a SSN of 321-45-8911 into an int. We get 2147483647 for the SSN. Hunh??? That's weird, unexpected and very unpleasant.
There is a tiny amount of time spent converting from characters (for I/O) and numbers (internal representation) but that's not important.
True. There is a tiny amount of space overhead for varchar, so varchar(2) is probably less efficient than char(2). And fixed-length fields are faster to access than variable-length fields. So if something is essentially fixed-length (like SSN or unit number) I'd prefer char. But, yeah, for any string of non-trivial length where it's probably variable, I'd use varchar().
Another great question. You're right, if we do varchar(1000) and, in practice, nothing is ever longer than 200, then we never use any extra space. The limit is just there for cutting things off instead of letting them get out of hand.
Varchar doesn't care about the average length, just the worst case. So, allow for that, within reason.
Primary keys are never allowed to be NULL, so that's a standard
declaration. For example, in a student database, B-number
cannot be NULL.
We might also specify that name
can't be
null, but address
might be allowed to be NULL.
I like the "bakery" metaphor: the table has an internal counter,
and increments it whenever an INSERT
happens, and uses
the counter value for the ID. Guarantees uniqueness as long as you
don't try to override the counter.
Yes, it's autogenerated. Has to be a number.
Understandable. We'll revisit this later in the course, when we talk about concurrency. But for now, imagine this scenario:
If we use the MAX technique, they both get the same answer, namely George's ID. If we use the LAST_INSERT_ID technique, they both get the correct answer.
AUTO_INCREMENT doesn't fill in "holes". It just increments a counter.
We can't. Gotta commit to a datatype. You could use the more-general one and coerce things later.
You're responding to a wedding invitation and the RSVP card says chicken, fish or vegetarian? Is that SET or ENUM?
You're asked for your pronouns: SET or ENUM?
You're asked for your ethnicity: SET or ENUM?
You're asked your favorite color: SET or ENUM?
Sure. If we have ENUM('fish','chicken','vegetarian'), then fish is represented internally as a 1, chicken as 2 and vegetarian as a 3. So if we sort (ORDER BY), all the fish entries come first, then the chicken, and lastly the vegetarian.
Maybe you don't care, but if you do, use ENUM('chicken','fish','vegetarian')
No. There's no coordination among tables, even if they have the same name. (You can have a ID and NAME fields in Person and also in MOVIE w/o them being the same ideas or values.)
Well, with an exception we'll talk about next time. There are also fancy techniques to set up some automatic updating, but we won't cover them.
No again. Each datum is independent. That makes updating easier/faster/more concurrent. But at a price.