Sure. Let's look how does sharding work
It's just good to know. It won't affect your apps. Though, you might try to design your
databases so that an endpoint can be served with a single query,
rather than having to gather information from multiple
collections. E.g. in the WMDB, all the info
for /nm/:personid
is in a single person document.
We could imagine dividing up the WMDB collections by TT or by NM, if those collections got large enough to span several servers.
It depends, of course. That's why the description was phrased that way. In general, we are probably willing to pay a small price in latency to get vast expansion in capacity and throughput. In the era of "big data", lots of companies are deciding to go this way.
But all these criteria matter. If the data is mostly being read, that's an advantage. (Caching can help there.) If there are complex and coordinated updates, you may prefer not to shard, or to use a relational database.
It doesn't affect our code (directly). It affects the database administration.
I think it uses the dispatching server: the one that sends queries to shards also compiles the results.
Sure. If you convert a one-stage process of querying a server into a three-stage process of querying a dispatch server, which queries shards (possibly in parallel) and then gathers the results, you'll introduce additional delay.
No, our databases are not sharded, but Mongo Atlas does offer that.
Algorithmic sharding is just sharding by using a hash, instead of the attribute's direct value. So, instead of sharding by, say, the actor's name, we could shard by hash(actorName).
Interestingly, I'm talking about hashtables in CS 230 today; same idea: spread the data around and try to avoid clusters.
I've done some review, but I'm happy to answer any followup questions!