Quiz

  1. Would it be possible to go over the different sharding methods mentioned in the reading? I'm not completely sure I understand them...

    Sure. Let's look how does sharding work

  2. Does sharding affect the way that we design our web apps with mongodb? Or is this just good background information to know?

    It's just good to know. It won't affect your apps. Though, you might try to design your databases so that an endpoint can be served with a single query, rather than having to gather information from multiple collections. E.g. in the WMDB, all the info for /nm/:personid is in a single person document.

  3. It would be helpful to see implementations of the different types of shardings in class.

    We could imagine dividing up the WMDB collections by TT or by NM, if those collections got large enough to span several servers.

  4. What are the specific scenarios or conditions under which the advantages of database sharding (like increased throughput and storage capacity) clearly outweigh its disadvantages for a business or application?

    It depends, of course. That's why the description was phrased that way. In general, we are probably willing to pay a small price in latency to get vast expansion in capacity and throughput. In the era of "big data", lots of companies are deciding to go this way.

    But all these criteria matter. If the data is mostly being read, that's an advantage. (Caching can help there.) If there are complex and coordinated updates, you may prefer not to shard, or to use a relational database.

  5. How does sharding work/ actual code for sharding/ different types of sharding

    It doesn't affect our code (directly). It affects the database administration.

  6. For query result compilation, which server does it use?

    I think it uses the dispatching server: the one that sends queries to shards also compiles the results.

  7. I understand the classic example of latency, but I'm still confused when we translate that into computing. Could we go over that again?

    Sure. If you convert a one-stage process of querying a server into a three-stage process of querying a dispatch server, which queries shards (possibly in parallel) and then gathers the results, you'll introduce additional delay.

  8. Do the databases that we have been using use sharding? Also, algorithmic sharding still seems a bit abstract to me and it would be great if we could talk about it more!

    No, our databases are not sharded, but Mongo Atlas does offer that.

    Algorithmic sharding is just sharding by using a hash, instead of the attribute's direct value. So, instead of sharding by, say, the actor's name, we could shard by hash(actorName).

    Interestingly, I'm talking about hashtables in CS 230 today; same idea: spread the data around and try to avoid clusters.

  9. I feel like i mostly understood things but a brush through in class would be great.

    I've done some review, but I'm happy to answer any followup questions!