Monday, December 25, 2017

Replication reservations

Replication of previously reported studies is sometimes useful or even necessary. Drug companies often try to replicate published research before investing a great deal of money in drug development based on that research. Ordinary academic researchers often want to examine more closely some published result, so they often include a replication of that result in a larger design, or just try to see if they can get the effect before they proceed to make modifications. Failures to replicate are often publishable (e.g., Gong, M., & Baron, J. The generality of the emotion effect on magnitude sensitivity. Journal of Economic Psychology, 32, 17–24, 2011), especially when several failures are included in a meta-analysis (e.g., http://journal.sjdm.org/14/14321/jdm14321.html). Finally, people may try to replicate a study when they disagree with its conclusions, possibly because of other theoretical or empirical work they have done.

Researchers are now spending time trying to replicate research studies in the absence of such purposes.  In one project, some students are in the process of trying to replicate most of the papers published in Judgment and Decision Making, the journal I have edited since 2006 (https://osf.io/d7za8/). Let me explain why this bothers me.

First, these projects take time and money that could be spent elsewhere. The alternatives might be more worthwhile, but of course this depends on what they are.

Second, if you want to question a study's conclusions, it is often easier to find a problem with the data analysis or method of the original study. A large proportion of papers published in psychology (varying from field to field) have flaws that can be discovered this way. Many of these flaws are listed in http://journal.sjdm.org/stat.htm. It is possible to publish papers that do nothing but "take down" another published paper, especially if a correct re-analysis of the data yields a conclusion contradicting the original one.

Third, complete replication of a flawed study often succeeds quite well, because it replicates the flaws. A recent paper in the Journal of Personality and Social Psychology (Gawronski et al., 2017. Consequences, norms, and generalized inaction in moral dilemmas: The CNI model of moral decision-making, 113: 343-376) replicated every study in the paper itself. The replication involved new subjects but not new stimuli, but the data analysis ignored variations among the stimuli in the size and direction of the effects of interest (and other methodological problems).

Fourth, what do we conclude when a study does not replicate? Fraud? Dishonesty in reporting? Selective reporting? Luck? Sometimes these explanations can be detected by looking at the data (e.g. http://retractionwatch.com/2013/09/10/real-problems-with-retracted-shame-and-money-paper-revealed/#more-15597). And none of them can be inferred from a failure to replicate. So what is the point? Is it to scare journal editors into accepting papers only when they have very clear results that do not challenge existing theories or claims?

Blanket replication of every study is a costly way to provide incentives for editors. Perhaps these "replication factors" for journals are an antidote to the poison of "impact factors". Impact factors encourage publication of surprising results that will get news coverage, and will need to be cited, just because they are surprising. But the very fact that they are surprising increases the probability that something is wrong with them. A "replication index" will discourage publication of such papers. But it will also encourage publication of papers that go to excess to replicate studies within the paper, use large samples of subjects, and, in general, cost a lot of money. This will thus tend to drive out of the field those who are not on the big-grant gravy train (or who are not in schools that provide them with generous research funding). It is better for editors to ignore both concerns.

Fifth, I think that some good studies are unlikely to replicate. I try to publish them anyway. One general category consists of studies that pit two effects against each other, only one of which is interesting. An example is the "polarization effect" of Lord, Ross and Lepper (1979): subjects who opposed or favored capital punishment were presented with two studies, one showing that it deterred serious crimes and the other showing that it did not deter; both groups became more convinced of their original position, because they found ways to dismiss the study that disagreed with it. This result has in fact been replicated, but other attempts to find polarization have failed. The opposite effect is that presenting people with conflicting evidence moves them toward a more moderate position. In order for the polarization effect to "win", it must be strong enough to overcome this rational tendency toward moderation. The conditions for this to happen are surely idosyncratic. The interesting thing is that it happens at all. If the original study is honestly reported and shows a clear effect, then it does happen.



Another example is a study recently published in Judgment and Decision Making (Bruni and Tufano. The value of vulnerability: The transformative capacity of risky trust, 12, 408-414, 2017). The finding of interest was that people who made themselves "vulnerable", by showing that they had trusted someone who had previously been untrustworthy, evoked more trustworthy behavior in trustees who knew of their vulnerability. Again, this result must be strong enough to counter an opposite effect: these vulnerable people could also be seen as suckers, ripe for exploitation. I suspect that this result will not replicate, but I also think it is real. (I examined the data quite carefully.) It may well depend on details of the sample of subjects, the language, and so on. This is not going to help the "replicability index" of the journal (or the impact factor, for that matter, as it is quite a complex study), but I don't care, and I shouldn't care.

Of course, other important studies simply cannot be replicated, because they involve samples of attitudes in a given time and place, e.g., studies of the determinants of political attitudes, the spread of an epidemic, or the structure of an earthquake. What often can be done instead is to look at the data.

In my view, the problem is not so much "replicability" but rather "credibility". Replications will be done when they are worth doing for other reasons. But for general credibility checking, it is probably more efficient to look at the data and the methods. To smooth the path for both replication and examination of data, journals should welcome replications (with either result when the original result is in doubt) and they should require publication of data whenever possible.

Tuesday, February 28, 2017

Explanations of deontological responses to moral dilemmas

Hundreds of experiments have now shown, in various ways, that responses to moral dilemmas often follow deontological rules rather than utilitarian theory. Deontological rules are rules that indicate whether some category of actions is required, permissible, or forbidden. Utilitarianism says that the best choice among those under consideration is that one that does the most expected good for all those affected. For example, utilitarianism implies that it is better to kill one person to save five others than not to kill (other things being equal), while some deontological rule may say that active killing is forbidden, whatever the consequences.

In many of these experiments, deontological responses (DRs) seem to be equivalent to responses that demonstrate cognitive biases in non-moral situations. For example, the omission bias favors harms of omission over less harmful harms caused by acts, in both moral and non-moral situations (Ritov & Baron, 1990). This similarity suggests that the DRs arise from some sort of error, or poor thinking. Much evidence indicates that the cognitive processes supporting moral and non-moral judgments are largely the same (e.g., Greene, 2007). If this is true, the question arises of what sort of thinking is involved, and when it occurs. Several (mutually consistent) possibilities have been suggested:

1. Dual-system theory in its simplest form ("default interventionist" or "sequential") says that DRs arise largely as an immediate intuitive response to a dilemma presented in an experiment, once the dilemma is understood. Then, sometimes, the subject may question the initial intuition and wind up giving the utilitarian response as a result of a second step of reflective thought. The same two-step sequences has been argued to account for many other errors in reasoning, including errors in arithmetic, problem solving, and logic. By this view, the cognitive problem that produces DRs is a failure to check, a failure to get to the second step before responding. This dual-system view has been popularized by Daniel Kahneman in his book "Thinking, fast and slow". I have provided evidence that it is largely incorrect (Baron & Gürçay, 2016).

2. Very similar to this sequential dual-system theory, but different, is the theory of actively open-minded thinking (AOT; Baron, 1995). AOT begins from a view of thinking as search and inference. We search for possible answers to the question at hand, arguments or evidence for or against one possible answer or another, and criteria or values to apply when we evaluate the relative strengths of the answers in view of the arguments at hand. AOT avoids errors in thinking by searching for alternative possibilities, and for arguments and goals that might lead to a higher evaluation of possibile answers other than those that are already strong. By this view, the main source of errors is that thinking is insufficiently self-critical; the thinker looks for support for possibilities that are already strong and fails to look for support for alternatives. In the case of moral dilemmas, the DRs would be those that are already strong at the outset of thinking and would not be subject to sufficient questioning, even though additional thinking may proceed to bolster these responses. The main difference between this view and the sequential dual-system view is that AOT is concerned with the direction of thinking, not the extent of it, although of course there must be some minimal extent if self-criticism is to occur. AOT also defines direction as a continuous quantity, so it does not assume all-or-none "reflection or no reflection". By this account, utilitarian and deontological responses need not differ in the amount of time or effort required for them. Bolstering and questioning need not differ in either direction, in their processing demands.

3. A developmental view extends the AOT view to what happens outside of the experiment (Baron, 2011). Moral principles develop over many years, and they may change as a result of questioning and external challenges. DRs may arise early in development, but that may also depend on the child's environment, how morality is taught. Reflection may lead to increasingly utilitarian views as people question the justification of DRs, especially in cases where following these DRs leads to obviously harmful outcomes. When subjects are faced with moral dilemmas in experiments, they largely apply the principles that they have previously developed, which may be utilitarian, deontological or (most often) both.

4. We can replace "development of the individual" with "social evolution of culture" (Baron, in press). Historically, morality may not have been distinguished from formal law until relatively recently. Law takes the form of DRs. Cultural views persist, historically, even when some people have replaced them with other ways of thinking. Kohlberg has suggest that this sequence happens in development, where the distinction between morality and law is made fairly late. Thus, the course of individual development may to some extent recapitulate the history of cultures.

These alternatives have somewhat different implications for the question of how to make people more utilitarian, if that is what we want to do. (I do.) But the implications are not that different. A view that is consistent with all of them is to emphasize reflective moral education, presenting arguments for and against utilitarian solutions, and encouraging students to think of such arguments themselves (Baron, 1990).

Recently I and others have written several articles criticizing the sequential dual-system view of moral judgment and other tasks, such as problem solving in logic and mathematics (e.g., Baron & Gürçay, 2016; Pennycook et al., 2014). I think it is apparent that, at least in the moral domain, the role of different mechanisms is not a big deal. All these views are consistent with the more general claim that DRs can be understood as errors, and that they need not be seen as "hard wired", but, rather, malleable.

References

Baron, J. (1990). Thinking about consequences. Journal of Moral Education, 19, 77–87.

Baron, J. (1995). Myside bias in thinking about abortion. Thinking and Reasoning, 1, 221–235.

Baron, J. (2011). Where do non-utilitarian moral rules come from? In J. I. Krueger and E. T. Higgins (Eds.) Social judgment and decision making, pp. 261–278. New York: Psychology Press.

Baron, J. (in press). Utilitarian vs. deontological reasoning: method, results, and theory. In J.-F. Bonnefon & B. Trémolière (forthcoming). Moral inferences. Hove, UK: Psychology Press.

Baron, J. & Gürçay, B. (2016). A meta-analysis of response-time tests of the sequential two-systems model of moral judgment. Memory and Cognition. doi:10.3758/s13421-016-0686-8

Greene, J. D. (2007). The secret joke of Kant’s soul, in W. Sinnott-Armstrong, Ed., Moral psychology, Vol. 3: The neuroscience of morality: Emotion, disease, and development, pp. 36–79. MIT Press, Cambridge, MA.

Pennycook, G., Trippas, D., Handley, S. J., & Thompson, V. A. (2014). Base-rates: Both neglected and intuitive. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 544--554.

Ritov, I., & Baron, J. (1990). Reluctance to vaccinate: omission bias and ambiguity. Journal of Behavioral Decision Making, 3, 263–277.

Sunday, February 26, 2017

Two posts on climate and one on health insurance

The editors of RegBlog have accepted three of my recent posts. Rather than duplicate them here (which I am now allowed to do), I am instead making links to them:

How geographic boundaries determine the social cost of carbon;

The discount rate for the social cost of carbon;

Justifying health insurance.

All of these are philosophical comments about regulatory issues that are likely to be addressed by the Trump administration, the U.S. congress, and possibly the courts. But the issues will persist.