An Empirical Study on Crash Recovery Bugs in Large-Scale Distributed Systems

  • "Almost all (97%) of crash recovery bugs involve no more than four nodes. This finding indicates that we can detect crash recovery bugs in a small set of nodes, rather than thousands."

    This was similar to a claim in the NIST slides on combinatorial, test generation. Most bugs were knocked out by 3-way testing. Virtually none made it past 6-way. They claimed this for diverse set of case studies. I'd love to see more replications to corroborate or refute that.

    http://csrc.nist.gov/groups/SNS/acts/index.html

    Note: Currently down due to government shutdown. Wait or try archive.org.