Ask HN: What is your database sharding strategy?

Need some feedback from experienced DBA on how you would go about sharding a mega huge database. Assume you would want a app-agnostic strategy that you can deploy with any application.

Range/Modulo/Hash-based partitioning?

Table-level sharding or DB-level sharding?

What sort of architecture? Do you have a centralized lookup directory at db-level/app-level? I don't like this idea as I don't want to have a single point of failure.

What about a DHT architecture for the DB servers?

How do you deal with the loss of joins?

  • We're just discussing over in this thread: http://news.ycombinator.com/item?id=296656

    I have had really good experience with a very simple centralized lookup facility. It's very easy to prevent that from becoming a CPOF, either using replication or, if you insist, some form of hash-based partitioning of the directory itself.

    I'm a fan of doing the partitioning at the app-level, and doing it horizontally rather than vertically.

    The loss of joins is an issue, but in some cases you can vertically partition the parts of the app that need those joins. For the others, I have had good luck using a transactional message-passing system, where one shard can reliably send a message to another shard.

  • There's some good answers and examples to the questions about database sharding here:

    http://www.codefutures.com/database-sharding/