Quiz

  1. Why does a RDBMS require a structured datatype for each column? I'm wondering what the advantages are of requiring this as opposed to letting the user input what they want. Does it help with the amount of space used by the database or prevent errors?

    Great question! Well, as you recall, we have to define a datatype for all our columns: int, datetime, char(9), varchar(30), etc.

    Why is that? Here are some reasons:

    • correctness: if you know something is an integer or a datetime, the database can reject any input that isn't correct.
    • speed: if you know something is an integer, you don't have to put in run-time code to check its type and possibly convert it to the correct type.
    • compactness: if you know the datatype of something, you don't have to store any type information, either in the database or in memory once the data is read from disk.

    All of the reasons above are the same as in programming languages. Consider Java (strictly typed) versus Python (dynamic typing).

  2. How do you handle relationships between collections in MongoDB if you can't use joins?

    Great question! There are several ways:

    • denormalize: store something in multiple places. For example, the name of an actor in a movie could be stored in the movie's cast list, avoiding having to look it up.
    • Explicit lookup: have the middleware (Python/JS code) look up the movie, then do look up each of the actors in it. This requires multiple database interactions, but hopefully it's rare.
    • Modern MongoDB has a $join operator, so you can do a join in the database.
  3. Could you explain what does it mean that denormalized MongoDB documents can lead to inconsistency across shards?

    Sure. That's another great question.

    Suppose that someone whose name is "Ellen Page" has been in multiple movies and those movies (and their cast lists) are in different shards. (Denormalized the actor name information.)

    Then Ellen changes his name to "Elliot Page" and we have to update the database.

    There might be a time, hopefully brief, when the name has been changed in some shards, but not in others. So the database as a whole is inconsistent.

    This can't happen in a normalized database, because the name is only stored in one place, so the update is atomic. But with denormalized data, we have to chase down all the copies.

    Brief inconsistency is maybe not a problem in the WMDB, but we can try to come up with examples where it is: maybe a banking app where there needs to be global consistency. Or an e-commerce site where you can't sell somethign more than once.