Thursday, May 10, 2018

Prediction, accommodation and pre-registration

Some scientists think that confirming a prediction made before data collection is more convincing than explaining the data after they arrive. I think that this belief is one source (among others) of support for the practice of pre-registration, in which authors publish a plan for an experiment they intend to do, explaining the hypothesis, methods, and data analysis in advance.

Paul Horwich ("Probability and evidence", 1982, pp. 108-117) 
has a complex discussion of this issue, under the heading "Prediction vs. accommodation", but I want to try to provide a more intuitive account of why I do not think that it matters.

Let me take an example based very loosely on my own research. Suppose I want to study evaluative judgment of acts vs. omissions. I got interested in this because of vaccine resistance in the real world. It seemed likely that this resistance resulted from a bias against potentially harmful acts (vaccination, with side effects) compared to potentially more harmful omissions (non-vaccination, with exposure to a natural disease). I thought that this bias was a sort of heuristic, in which people tend to think of action as inherently risky, regardless of what they are told about its risks.

I designed an experiment to examine judgments of acts and omissions, using a variety of scenarios. Some of the scenarios involved money rather than disease risk, so it was possible to include items involving monetary gains as well as losses. I expected that subjects would just be biased against omissions, period.

When I got the data, I noticed something unexpected. Yes, I found the expected bias for vaccination scenarios and for money scenarios that involved losses. In the money scenarios, subjects evaluated action leading to a loss as somewhat worse than omission (doing nothing) leading to the same loss with a higher probability. But when I looked at gains, the results were reversed. Subjects were biased toward action, not omission. They evaluated action leading to a chance on winning some money more highly than omission leading to a somewhat larger chance of winning the same amount.

It was easy to explain my results in hindsight. Action, as opposed to omission, simply amplified the effect on choice of whatever outcome it produced. Subjects were not generally biased against action: it depended on whether the outcomes of action were good or bad. The association with action served to focus attention on the effect of the action, so the action looked better if its outcome was good, and worse if its outcome was bad. (In real life, the amplification effect exists, along with the omission heuristic, but it was not unexpected, as I knew that Janet Landman had already reported it with a different dependent measure. And this "one experiment" is actually a conflation of several done in collaboration with Ilana Ritov.)

Suppose I had stopped there and reported what I expected and what I found. Compare this to a case in which I predicted (expected) the amplification effect. Should your belief in the existence of the amplification effect depend on whether I predicted it or not (all else being equal)?

Note that these two cases can be compared with a third case, in which I report the result for gains but falsely claim that I predicted it, when, actually, I expected a different result. This is the sort of thing that pre-registration of research hypotheses prevents. But, if the two cases just mentioned do not differ in the credibility of amplification as an explanation, such "cheating" would simply be a type of puffery, cheap talk designed to influence the reader without conveying any truly relevant information, like many other things that authors do to make their conclusions sound good, such as baldly asserting that the result is very important. The simple way for editors to avoid such puffery is to change all statements about prediction to statements of questions that the research might answer. Thus "I predicted a general bias toward omission" would be edited to say "I asked whether the bias toward omission was general, for gains as well as losses."

Let me now return to the original question of whether prior prediction matters. In both cases, we have two plausible explanations: a bias against omission, and an amplification effect. What determines the credibility of the amplification hypothesis? Of course, it fits the data, but that helps equally regardless of the prediction. Some doubt may remain, whether I predicted the result or not.

The other determinant of credibility of an explanation is its plausibility, compared to the plausibility of alternatives. In a well written paper, I would give my reasons for thinking that various explanations are plausible. Readers would also have their own reasons. Above I suggested the reasons for each: a heuristic against action; and an attention effect favoring actions. In a paper, I would try to spell these out "with four-part harmony" (to steal a line from Arlo Guthrie), with citations of related work (such as the "feature-positive effect"), and so on.

Should it then add anything that I predicted the result that I got? The answer to this depends on whether my prediction provides additional evidence for this explanation, that is, additional reason to believe it, aside from everything you now know about its plausibility. But, if I have done my job as a writer, I have told you everything I know that is relevant to making my final explanation (amplification) plausible. The fact that I predicted it provides additional relevant information only if I am withholding something relevant that I know. I have no incentive to do that, unless it is somewhat embarrassing. Authors might be embarrassed to say that they made the prediction because God came to them in a dream and told them what was true. Or, more realistically, "I predicted this result intuitively, for reasons I cannot explain, but you should take this seriously because my intuition is very good." As a reader, you are not going to buy that.

In some research, the favored explanation seems pretty implausible, even though it is consistent with the data presented, despite the efforts of the author to convince us that it is plausible.  These cases include some of the "gee whiz" studies in social psychology that raise questions about replicability, but also some of the research in which a precise mathematical model fits the data surprisingly well but otherwise seems to come out of the blue sky.  These cases of low plausibility are the ones where claims that the results were predicted (e.g., in a pre-registration) are thought to be most relevant.

For example, suppose I found no significant "omission bias" overall but did find it if I restrict the sample to those who identify themselves as Protestant Christian.  I supported this restriction with (highly selected) quotations from Protestant texts, thus explaining the result in terms of religious doctrine. You would rightly be suspicious. You would (rightly) suspect that you could find just as many quotations as I found to support the conclusion that Protestant doctrine emphasized sins of omission as well as sins of commission, and that other religions were no different. Would it help convince you of the reality of my effect if you knew that I predicted it but didn't tell you any more about why? You might just think that I was a little nuts, and, well, lucky.

Pre-registration thus does not solve the problem posed by implausible explanations. Of course they might be true, despite being implausible, but that must be established later.  What matters in making an explanation legitimately credible are, first, its fit with the data (compared to alternative explanations) and, second, its fit with other things that we know (again, compared to alternatives). The order in which a researcher thought of things, by itself, provides no additional relevant information.

Going beyond my restatement of Horwich's arguments, analogous reasoning applies to data analysis. One of the nasty things that researchers do is fiddle with their data until they get the result they want. For example, I might fail to find a significant difference in the mean ratings of acts and omissions, but I might find a difference using the maximum rating given by each subject to omissions and to actions, across several scenarios. Pre-registration avoids this fiddling, if researchers follow their pre-registered plan. Doing this, however, discourages the researcher from making reasonable accommodation to the data as they are, such as eliminating unanticipated but nonsensical responses, or transforming data that are turn out to be highly skewed.

But note that many of the statistical options that are used for such p-hacking are ones that do not naturally fit the data very well. Again, it is possible to make up a story about why they do fit the data, but such stories usually tend to be unconvincing, just like the example of Protestantism described above. Thus, data analysis, like explanations, must be "plausible" in order to be convincing.