Confirmationist and falsificationist paradigms of science (2014)

  • Article summary: Many scientists (particularly social scientists) think they’re doing Popperian falsification of their hypotheses by using null hypothesis testing. That is, they posit a hypothesis that e.g. IQ is correlated with happiness and then setup a null hypothesis that says IQ Has zero correlation to happpiness and reject it if their data does not have zero correlation. Gelman argues this is confirmation-seeking not falsificationist since the scientist never posited a precise model of how IQ correlated with happiness that could be falsified. They only reject a straw man null hypothesis and then claim that this supports their hypothesis. He also notes that this has nothing to do with frequentist vs Bayesian statistics since null testing can be done using either approach.

    My own added view: Gelman never elaborates on what “making a hypothesis precise” means in this article but I think this is the key. A lot of social science is really about estimating model parameters and not positing new causal explanations (theories) of phenomena. If you hypothesize that IQ correlates with happiness, that is not a theory, one could say it’s not even a hypothesis, it’s really asking “how much does IQ correlate with happiness?” Anything can correlate with anything else, in principle so this “hypothesis” is not a new causal model of reality, it’s just a parameter estimation. So it doesn’t make sense to use falsificationist reasoning here since there’s no theory to falsify only a parameter to estimate. This is why null hypothesis significance testing (NHST) is so wrongheaded because 1) most social science is not about new causal models but about parameter estimation 2) when you do posit a new causal model you should falsify the predictions of your model not some straw man null hypothesis.

  • There's a good case for stating the hypothesis, and summarizing the evidence that could falsify the hypothesis, right in the abstract of a paper. Or at least a separate section on falsification. Whether the authors' views on the falsification of their prediction are correct or not, they say a lot about their standard of evidence and therefore about their level of rigor.

    If they state that falsification isn't relevant to this particular paper, or give an impossible threshold, it's a way to sort it into a category other than science.

  • Science is a deductive method, which means that it can never "confirm" anything unless it can disprove the complement. Strictly speaking that is impossible in real world cases. For example, quantum electrodynamics, one of the most well tested theories ever, holds up in the lab to some absurd number of decimal places (I recall the figure 9, but I read that years ago and it's got to be more now), but we can never be quite sure than the theory won't fall down with the next improvement in measurement. None of this detracts from QED being one of the great human intellectual achievements.

    From a programming perspective, there is an analogy to software testing. Testing can only prove the presence of bugs, it can't prove their absence, except in the case of exhaustion. Since on (most?) modern systems exhaustive testing is impossible, we're left with an imperfect solution that still works quite well when applied properly.

  • I recommend https://aeon.co/essays/a-fetish-for-falsification-and-observ... for further reading on how falsification has interacted with science historically and philosophically. In principle, it's absolutely crucial for good science. In practice, hypotheses that are not directly falsifiable (they require additional technological development) have contributed greatly to science in the long run. This is interesting since it's simultaneously reasonable and correct for contemporaries to label these hypotheses as non-scientific work since at the time they had little to no evidence supporting or denying the hypothesis and no way to test them. In practice, the distinction between "falsifiable" and "unfalsifiable" is a messy spectrum.

  • > So I think it’s worth emphasizing that, when a researcher is testing a null hypothesis that he or she does not believe, in order to supply evidence in favor of a preferred hypothesis, that this is confirmationist reasoning. It may well be good science (depending on the context) but it’s not falsificationist.

    A very good point.

  • We need to bring back abductive/retroductive inference (to the best explanation) as if it were its own mode of reasoning to complement deduction and induction. This is how scientific discoveries are often made IMHO but often chalked up to some kind of intuition. There is actually a logical structure even though is is hard to formalize and prove.

    https://en.wikipedia.org/wiki/Abductive_reasoning

    I am working on dialog management stuff as an engineer who sits between data scientists, NLP, and product people. It is understandable that trained stats people see everything as a stats problem (induction), but we forgot about GOFAI & symbolic AI (deductive) systems along the way.

    Philosophy has actually been ahead of the curve on this one for over 140 years. Maybe the "3rd wave" of AI will be to synthesize.

    https://www.darpa.mil/attachments/AIFull.pdf

    I personally have not had a chance to read Norvig or dive into LISP or Prolog much, but am using CLIPS expert system in a project and have toyed with core.logic in Clojure back in the day. Lots of opportunity to resurrect some of these older methods that may be superior to imperative/bespoke code given modern hardware and infrastructure.