Who Will Test the Tests Themselves? (2017)

  • Mutation Testing [1] is a generalized form of fuzzing. It is also analogous to Sensitivity Analysis [2]. As part of closing the feedback loop between the code and the tests, if one had a repeatable way to break the code and measure the selectivity in the test result, one could ensure that they are testing the same thing as the code evolves.

    Automatic program repair [3] tries to find patches to broken code, maybe goal directed program breaking (possibly using DNNs) could be used to infer properties for code so that better invariants could be discovered.

    [1] https://en.wikipedia.org/wiki/Mutation_testing

    [2] https://en.wikipedia.org/wiki/Sensitivity_analysis

    [3] https://arxiv.org/abs/1807.00515

  • Derailing the conversation a bit, what other strategies beyond mutation testing do you use for validating your tests? I've caught test bugs with a few techniques, but none of them are comprehensive, and I'd love to hear more thoughts. Here are a few examples:

    (1) Validate assumptions in your tests -- if you think you've initialized a non-empty collection and the test being correct depends on it being non-empty, then add another assert to check that property (within reason; nobody needs to check that a new int[5] has length 0).

    (2) Write tests in pairs to test the testing logic -- if your test works by verifying that the results for some optimized code match those from a simple oracle, verify that they don't match those from a broken oracle.

    (3) If you're testing that some property holds, find multiple semantically different ways to test it. If the tests in a given group don't agree then at least one of the tests is broken.