Judgment misguided

Tuesday, August 8, 2023

First impressions of ChatGPT

Here are two examples of my first interactions with ChatGPT or some similar system. In the first, I asked it to construct a new moral dilemma for an experiment. I had in mind the bicycle dilemma discussed by R. M. Hare in "Moral thinking" (1981, section 6.2) concerning whether the actor should move another person's bicycle (a minor inconvenience for the bicycle owner) in order to park a car (to avoid a greater inconvenience of finding another parking space), I may use both now. This one has to be edited a bit, but it does the job, and I never saw anything like it before. I was impressed.

=======================================================

User

I would like a moral dilemma in which the right answer is for me to be selfish because the benefit to me is greater than the harm to someone else.

ChatGPT

Alright, here's a hypothetical scenario that might suit your request:

The Dilemma of the Scarce Scholarship

You've been working exceptionally hard throughout high school, and it's finally paid off. You have the grades, the extracurricular activities, and the teacher recommendations to qualify for the most prestigious scholarship at your dream university. This scholarship covers full tuition, accommodation, and also provides a substantial monthly stipend. It's a once-in-a-lifetime opportunity that could completely change your future.

However, you're not the only one who qualifies for this scholarship. Your best friend, Alex, also qualifies. Alex comes from a more privileged background than you. Their family could afford to send Alex to the university, albeit with some financial strain. Without the scholarship, you would have to take out substantial student loans to attend, and you might even have to give up on your dream university for a less expensive option.

One day, you stumble upon an opportunity. You find out about an obscure academic competition that the scholarship board holds in high regard. Participation in this competition could significantly boost your chances of getting the scholarship, but very few people know about it.

The moral dilemma is this: Do you tell Alex about the competition or keep it to yourself?

If you tell Alex, you keep your integrity intact, but you significantly increase the competition for the scholarship. Alex, with their greater resources, might even have a better shot at winning the competition.

If you don't tell Alex, you maximize your chances of securing the scholarship, but at the cost of acting dishonestly toward your friend.

The self-interested option (not telling Alex about the competition) might be justifiable in this case. The potential benefits for you are life-changing. Alex, although potentially disadvantaged by not knowing about the competition, would still have the opportunity to attend university without the burden of crippling debt.

================================================================

The second set resulted from a letter I got from Maximus, a company I never heard of, which apparently contracts with Medicare (the U.S. health insurance system for old people). The letter said that Maximus had been the victim of a data breach, and my records were included. They included enough information for someone to steal my identity. So I set off to find out whether the records were accurate (hoping that they were not). Simply changing my Medicare number would itself involve a fair bit of work, since many people have it, so I spent some time on it. The Medicare web site would not let me see the information it had about me. I called a phone number at Medicare and got to a human, who informed me that the letter was sent to me by mistake and my records were not part of the breach. I was so happy that I thanked him and said good bye. Later I got more upset at Maximus, so I called the number they said would reach a professional familiar with the breach. The "professional" turned out to be an automated chat, and it was no help.

Next, I used the "chat" function of Medicare itself, which kept a record of our conversation (below). I wanted to say bad things about Maximus. Below is the transcript. It now appears that corporations (Maximus) and major government agencies (Medicare) are using ChatGPT-type systems to replace humans on phone and help lines. I'm sure these things will get better, but this is not ready for prime time.

------------------- Medicare.gov Live Chat Transcript ---------------------

Live chat Date:8/5/2023

Start Time:11:14:13 am

End Time:11:28:06 am

-----------------------------------------------------------------------------

[11:14:13 am]: Thank you for contacting Medicare.gov Live Chat.

[11:14:16 am]: Jonathan I want to report what I think is a seriously misleading and deceptive letter that I received from Maximus about my information being accessed. I would prefer to write an email, but I can't find anywhere to send it. I think that whoever wrote and approved this letter should be disciplined in some way. Should I proceed? Or can you give me an email address?

[11:14:16 am]: Please wait while you are connected to an agent.

[11:14:19 am]: You are now connected with Medicare.gov Live Chat.

Thank you for contacting Medicare.gov Live Chat. My name is ulie. Can you please verify your first and last name?

[11:14:42 am]: Jonathan Jonathan Baron

[11:14:59 am]: Julie Hello, I will be happy to assist you with this today.

[11:15:30 am]: Julie Who send you the letter?

[11:16:32 am]: Jonathan At the top it said Maximus, and (on the right) CMS. The return address was in Claysburg, PA.

[11:16:58 am]: Julie

Thank you.

Let me look into this for you.

[11:17:11 am]: Jonathan 'https://www.cms.gov/newsroom/press-releases/cms-responding-data-breach-contractor' is the letter I received.

[11:19:05 am]: Julie I appreciate your patience.

You should have received a letter notifying you of a data breach. We have determined that your Medicare number was not compromised, and you don't need to request a new one. The letter provides more information about the incident and how it is being addressed. Please note that this does not have any impact on your current Medicare benefits or coverage.

[11:19:12 am]: Julie Maximus, a contractor for the Centers for Medicare & Medicaid Services, recently learned that their file transfer software, called MOVEit, was compromised. They are offering 24 months of credit monitoring and other services from Experian at no cost. You can call Experian to learn more; their phone number is included in your letter. This is not available through email.

[11:23:50 am]: Jonathan This is just what the letter said. Yes, indeed, it was incorrect, and caused me to waste a lot of time. But there were other problems. Do you want to hear about those? One problem is that the further questions suggested calling a number 'staffed with professionals familiar with this incident'. In fact, the 'professional' turned out to be (pretty cleary) an AI program like ChatGPT. I treated it that way. (ChatGPT has a thick skin. It doesn't complain if you criticize it harshly.) It could not answer my questions. I hope you are not the same robot!

[11:25:59 am]: Julie Do you have additional questions regarding the letter you received?

[11:27:25 am]: Jonathan No. So clearly I am wasting my time once again. You have missed the point, but I'm not going to try to get you back on track. I wanted an email address. You are clearly not going to give me one.

[11:27:54 am]: Julie Have a nice day.

[11:28:06 am]: The chat session has ended. Thank you for contacting Medicare.gov Live Chat. Have a nice day.

Friday, July 14, 2023

Diversity, noise, and merit in college admission

I have not studied the recent US Supreme Court decision that ended affirmative action in college admissions, but I have followed the issue in news reports and have other relevant experiences.

For a few years including 1990-92 (details lost) I was head of the committee that supervised undergraduate admissions to the School of Arts and Sciences at Penn. My main goal was to study the "predictive index", a formula used by the admissions office to predict academic achievement after admission. The index was an equally-weighted sum of the SAT aptitude test, the mean of three achievement tests (typically including math and English), and high-school class rank. Working with Frank Norman, a colleague, I discovered that the the aptitude test was essentially useless once we had the other two predictors, and we tried to get the admissions department to drop it, as described in https://www.sas.upenn.edu/~baron/sat.htm/.

Around the same time, I attended a meeting of Ivy League admissions people at Harvard. This was after the "Ivy overlap" meeting was abolished by overzealous anti-trust action, as described in https://news.mit.edu/1992/history-0903. The overlap meeting was a discussion of applicants for financial aid at more than one college in the elite group. It was designed to insure that colleges were not competing for applicants on the basis of financial aid, hence insuring that the "colluding" colleges would "admit students solely on the basis of merit and distribute their scholarship money solely on the basis of need." When this policy was in effect, Penn had a hard time funding need-blind admissions, so it limited this policy to Americans, although Harvard did not. The meeting was to discuss the situation.

There I had occasion to discuss our SAT report with the Harvard dean of admissions, William Fitzsimmons. In passing, he pointed out something that helps to explain some of the apparent irrationality of admission systems in general. He said, roughly, "We could fill the entire freshman class with students who got straight 800s [perfect scores] on all the tests." (He might have added "from India".) But, he might have gone on to say, we don't want a class that is full of academic achievers. We want variety. We want a lot of students who are satisfied with passing grades. These students may not become scholars, but they may benefit from an education that will help them and others in many different ways. And they will dilute the competitive atmosphere that would result from a focus on achievement alone. (Is that what "merit" means when people argue that merit should be the sole basis for admisison?) So we must use other criteria.

The usual way of getting the desired variety/diversity is for admissions staff to read applications and make judgments about "character" or something like that. The psychological literature on selection is fairly clear that such judgments are very poor at predicting anything, in contrast to measures like grades and test scores (like our predictive index, suitably revised), which are pretty good at predicting other grades and test scores. The main result of these "personal" judgments is to add noise, random error, to the process (unfortunataly at considerable cost, since the admissions staff are well paid, and this is a substantial part of their jobs).

In sum, current admissions policy at Harvard, Penn, and similar colleges, is not based on "merit" alone, if that is taken to mean prediction of academic achievement. Instead, "pretty good" applicants who rank high (but not at the very top) on merit are rejected so that applicants lower in merit can be admitted for the sake of diversity. These decisions about acceptance and rejection are made in ways that are noisy, similar to the results of a lottery.

The same sort of noise was introduced by the 1978 Bakke decision of the Supreme Court, which prohibited colleges from doing affirmative action by exactly the method that would have been best according to the psychology literature, which is to rank all applicants by objective criteria and then, if you want more Blacks, use the same ranking but a lower cut-off. This method would optimize the academic performance of both groups. And, in particular, it would minimize the number of affirmative-action admits who were not ready for academic work of the sort expected at the elite colleges. These students exist, and many of them suffer from failure that would not have occurred had they gone to a less demanding college. But the court demanded that students be evaluated "holistically", which served to increase noise and not do much else.

Now it seems that many colleges intend to do more holistic admissions in hopes that they could find minority students who deserved admission. Note that this policy is fully consistent with Fitzsimmons' desire for diversity. A big problem is that, by definition, elite college are those that teach courses at a fairly high level, for example, those that use my textbook "Thinking and Deciding" in undergraduate courses. Students with weak numeracy skills will have trouble. Thus, it is still important to select students who can do the work. Failures in courses and in graduation itself are bad outcomes and should not be seen as a "cost we must pay" for diversity.

Thus the big trouble with holistic judgment as a way of promoting diversity is that it will lead to admission of too many students who are predictably not ready for academic work at the level required. These students are "cannon fodder" for the supposedly enlightened policy that admits them.

Ideally, colleges that use any sort of holistic criteria should also use the best statistical predictors to eliminate applicants who are high risk for failure, thus setting a lower bound to be applied to the students already selected by holistic criteria. This won't be perfect, but it seems worth a try. The Supreme Court, loose cannon that it is, might find that it is unconstitutional, just as it prohibited the optimal use of predictive indices for affirmative action.

When I was a student at Harvard 1962-66, it was just beginning a sort of affirmative action. My first-year roommate was Black. Charlie was a friend from Andover, an elite prep school that we both attended. We both would have been admitted if admission was base fully on merit, but Harvard often rejected pretty-good students like Charlie, or me, just to make room for those less qualified. But he was Black and clearly capable of doing the work. So they took him. (And so did Harvard Law School, 4 years later.)

And I was a legacy. My father, class of '37, was admitted at a time when Harvard was trying not to admit too many Jews (a situation that seems analogous to its attitude toward Asians today). My son also went there, and was thus 3d generation and from a minority that was once discriminated against. Another roommate of mine after the first year was 7th generation, and at least one of his daughters also went to Harvard. Sure, this is a way of preserving privilege, and it has the effect of admitting not-so-great students, thus increasing diversity of achievement, but probably without as much risk of failure as other ways of doing this, such as sports admissions, and with less than average need for financial assistance. Family traditions are also important to some families. Legacy preference for students who need it would increase diversity simply because some of these students will not be competitive high achievers. Many other legacy students would probably be admitted anyway.

My father's situation seems similar to the situation of many "Asian" students today. Harvard in particular does seems not to want "too many Asians," and that outcome would result from selection by merit alone. Yet, since merit is still part of the story, Asians who are admitted have higher test scores and high-school grades, which means that they are predicted to get better grades in college. That happens. We do not need some psychological explanation of why Asian student are relatively high achievers in college. The same used to be true of Jews, but now I think that colleges are no longer biased against them. (A small literature on "calibration" and "predictive bias" looks at the possibility of determining systemic or intentional bias on the basis of performance after selection. I think that it would show bias against Asians in some colleges today.)

In 1962, women were admitted to Radcliffe College, which was much smaller than Harvard College. As a result, the cut-off for admission of women was higher, and "Cliffies" got better grades and often dominated class discussions. Women are not generally that much smarter than men. We had a biased sample.

On the other side, affirmative action for under-represented minorities means that they are admitted with lower test scores and high-school grades. This fact can (fully or partially) explain why they get lower grades once admitted.

In sum, merit still matters, and it still predicts achievement. When students are more selected, they will achieve more on the average. Students selected with less attention to scores and grades will have lower achievement in college, although some ways of doing this seem better than others.

Some affirmative action for under-represented minorities seems reasonable (to me). Cultural diversity, which in the U.S. is correlated with "racial" diversity, is important for the education of all students. The increase in diversity is targeted and predictable, not simply random. But this is now off the table.

One response to the recent court ruling is to aim at diversity in ability to pay, by applying affirmative action to poor students (coupled with attempts to recruit them), those students who would ordinarily require maximum financial aid. Such a policy would deplete funds for financial aid, possibly increasing the difficulty of maintaining need-blind admissions for other students who required moderate amounts of aid. However, a policy favoring legacies would reduce the need for financial aid. Keeping legacy admission might help pay for more poor students.

Affirmative action of under-represented minorities, including the poor, must be done in combination with a strong effort to avoid predictable failure after admission. There is some optimum amount of affirmative action, and some colleges may have gone beyond it for minorities, although not for the poor.

More generally, admission to most selective colleges has never been done on the basis of merit alone. If other criteria are used, it is not unreasonable to know what they are, rather than relying on noise alone.

Wednesday, July 12, 2023

Cluster munitions for Ukraine

The New York Times editorial of July 10, "The flawed moral logic of sending cluster munitions to Ukraine," opposed the U.S. decision to do just that. It tried to rebut some of the arguments made in favor of the plan, but it missed at least one, the fact that most of the area involved would already be littered with mines and unexploded cluster munitions used (extensively) by Russia, so the additional care required to try to avoid them later would already be required.

The editorial, the statements of governments opposed to the plan, and some of the published letters to the Times, seemed to follow the principle that these munitions are morally wrong, whatever the consequences. Such absolute principles are, in the sense I have used (e.g., Baron and Spranca, 1997) protected values. Ideological adherence to such values surely has considerable political influence. These commitments may be held unreflectively. When people are forced to confront specific situations where the principle conflicts with some other principle, such as avoiding terrible consequences, they often admit that their principle is not absolute after all (Baron and Leshner, 2000).

In "Rules of War and Moral Reasoning" (1972, http://www.jstor.org/stable/2264969), R. M. Hare criticizes the "absolutist" deontological views of Thomas Nagel, who advocated strict adherence to accepted rules, such as those prohibiting the use of poison gas, or attacks on the Red Cross.

"The defect in most deontologica theories ... is that they have no coherent rational account to give to any level of moral thought above that of the man who knows some good moral principles and sticks to them. He is a very admirable person, and to question his principles ... is indeed to 'show a corrupt mind'." However, to achieve such an account, "we have to adopt a 'two-level' approach, ... to recognize that the simple principles of the deontologist, important as they are, have their place at the level of character formation." Although we should be careful about violating principles that have been drilled into us (including those inculcated by military training), we need to be willing to override them on the basis of a higher level of analysis, that is, to make exceptions when they clearly lead to worse consequences than some alternative, even if our attachment to the broken rules leads us to feel guilty (just as the failure to prevent terrible consequences can also lead to guilt feelings).

Absolute rules represent a hardening of moral intuitions that are usually sufficient but should sometimes be overridden by more reflective reasoning, as suggested by Greene ("Moral tribes") and others. The opponents of cluster munitions seem to illustrate these hardened intuitions, which are protected values. Once having decided that cluster munitions are morally wrong, whatever the consequences, some opponents then engage in belief overkill, finding ways to ignore relevant facts on the other side, or to exaggerate the probability of harmful consequences resulting from action.

Wednesday, May 3, 2023

An example of actively open-minded thinking

Tucker Carlson January 7, 2021 — 04:18:04 PM UTC

"A couple of weeks ago, I was watching video of people fighting on the street in Washington. A group of Trump guys surrounded an Antifa kid and started pounding the living shit out of him. It was three against one, at least. Jumping a guy like that is dishonorable obviously. It’s not how white men fight. Yet suddenly I found myself rooting for the mob against the man, hoping they’d hit him harder, kill him. I really wanted them to hurt the kid. I could taste it. Then somewhere deep in my brain, an alarm went off: this isn’t good for me. I’m becoming something I don’t want to be. The Antifa creep is a human being. Much as I despise what he says and does, much as I’m sure I’d hate him personally if I knew him, I shouldn’t gloat over his suffering. I should be bothered by it. I should remember that somewhere somebody probably loves this kid, and would be crushed if he was killed. If I don’t care about those things, if I reduce people to their politics, how am I better than he is?"

Quoted in New York Times, May 2, 2023 (and elsewhere)

https://www.nytimes.com/2023/05/02/business/media/tucker-carlson-text-message-white-men.html

Saturday, April 8, 2023

Consensus clouds

I keep wanting to talk about a concept that doesn't seem to have a name. I propose "consensus cloud", but maybe someone can tell me that a name already exists.

The idea is that a group of people have an apparent consensus about some belief or set of beliefs. These belief sets often have their own names, such as "woke" or "MAGA". They include various forms of nationalism, such as the belief in the existence of a "Russian people". They include religious ideologies, such as the ultra-orthodox Jews in Israel, pro-Hindu politics in India, Islamist politics, and the "religious right" in the U.S.

These belief sets differ in that some of them are limited to the groups that hold the beliefs. The ultra-orthodox do not seem to care whether the rest of the world adopts their religion; they just want state support for their own communities. Other belief sets apply to everyone, such as woke ideology or, by definition, various forms of evangelical Christianity.

All these belief sets are maintained as within-group social norms. Group members want others to agree and are willing to take action to promote agreement.

All these cases also benefit from "pluralistic ignorance". Believers think that the number of co-believers is higher than it really is, in part becuase doubters do not make themselves known, thus avoiding the censure that would result from enforcement of a social norm.

Pluralistic ignorance is itself abetted by control of an "information space", such as a government that systematically discourages or punishes dissent, as is happening now in Russia and China. Thus a kind of stability is achieved. Many Russians seem to believe that Ukraine is part of some sort of Russian essence and thus should be a part of Russia.

Belief sets often include beliefs about why outsiders do not agree. For example, some Russians believe that resistance to their claim to Ukraine is the result of western nations efforts to isolate Russia and hamper its development, and its associated propaganda. Control of the information space is not necessary if the group ideology can define outsiders as part of a conspiracy against it (as Russia does). MAGA believers see such things as public heath mandates and climate protection to be the result of infiltration of academia and government (the "deep state") by leftist ideologues who produce pseudo-science to support their politics. The same sort of rationale often allows people with delusions to dismiss counter-arguments as coming from part of the conspiracy against them.

The beliefs of interest seem to descend on people like a cloud that becoms a fog. People cannot see outside of it.

Of course, some people do consider alternatives and question the relevant beliefs. We might expect individual differences in adherence to any sort of consensus cloud. Those who are prone toward myside bias as a trait are more likely to join a consensus, and, therefore, those are endorse the standards of actively open-minded thinking are more resistant. Similarly, those who are prone to accept conspiracy theories are more accepting, especially when opponents are seen as part of a conspiracy.

Possibly the trait of "intellectual humility" can work both ways, as it could make people less accepting of their own conclusions and more willing to listen to others. Its effect thus depends on which conclusions and which other people are affected. Intellectual humility is not the same as actively open-minded thinking, which implies that humility is needed only when a conclusion is the result of little thinking by anyone, or the result of poor thinking.

Sunday, September 5, 2021

Vaccines for children

The U.S. is looking for trouble. Schools are opening and no children below 12 are vaccinated (except perhaps the few in clinical trials, which I hope are happening). These kids could spread COVID-19 fairly quickly. So here is an idea.

Approve emergency-use for 11-year-olds. Meanwhile monitor ages 11 and 12 for medical conditions that could be side effects. Continue to do this. Even an increase in some condition in age 11 after approval might indicate a special problem.

Then move to age 10 in a couple of weeks. And so on. This is something like a regression discontinuity design, but some of it could be based solely on comparisons after approval.

Apparently the FDA sees its duty as only toward the individual, not toward others that the individual might infect. But date from a procedure like this might give it enough information to act, in successive steps. Something is better than nothing.

Added 2/28/2022: Vaccines for children are still a problem in the U.S. Apparently the doses were too low, and the effects were either very small (ages 3 and 4, not approved yet) or wearing off quickly (ages 5-11, approved). So the same idea could be done with increased dosage. Now it seems we will have wait more months for more trials.

At this point, it seems that most people want to get back to "normal", despite the existence of continued spread that could be significantly slowed by greater and more effective vaccination. A return to normal would be more reasonable if we didn't have to worry about long covid, which is apparently a debilitating chronic condition, something like that caused by Lyme disease. As others remove their masks and continue to resist vaccination, some of us are just going to get better masks and continue to avoid indoor restaurants and any kind of public gathering, even though we, too, were hoping to some sort of return to normal.

Thursday, May 10, 2018

Prediction, accommodation and pre-registration

Some scientists think that confirming a prediction made before data collection is more convincing than explaining the data after they arrive. I think that this belief is one source (among others) of support for the practice of pre-registration, in which authors publish a plan for an experiment they intend to do, explaining the hypothesis, methods, and data analysis in advance.

Paul Horwich ("Probability and evidence", 1982, pp. 108-117)

has a complex discussion of this issue, under the heading "Prediction vs. accommodation", but I want to try to provide a more intuitive account of why I do not think that it matters.

Let me take an example based very loosely on my own research. Suppose I want to study evaluative judgment of acts vs. omissions. I got interested in this because of vaccine resistance in the real world. It seemed likely that this resistance resulted from a bias against potentially harmful acts (vaccination, with side effects) compared to potentially more harmful omissions (non-vaccination, with exposure to a natural disease). I thought that this bias was a sort of heuristic, in which people tend to think of action as inherently risky, regardless of what they are told about its risks.

I designed an experiment to examine judgments of acts and omissions, using a variety of scenarios. Some of the scenarios involved money rather than disease risk, so it was possible to include items involving monetary gains as well as losses. I expected that subjects would just be biased against omissions, period.

When I got the data, I noticed something unexpected. Yes, I found the expected bias for vaccination scenarios and for money scenarios that involved losses. In the money scenarios, subjects evaluated action leading to a loss as somewhat worse than omission (doing nothing) leading to the same loss with a higher probability. But when I looked at gains, the results were reversed. Subjects were biased toward action, not omission. They evaluated action leading to a chance on winning some money more highly than omission leading to a somewhat larger chance of winning the same amount.

It was easy to explain my results in hindsight. Action, as opposed to omission, simply amplified the effect on choice of whatever outcome it produced. Subjects were not generally biased against action: it depended on whether the outcomes of action were good or bad. The association with action served to focus attention on the effect of the action, so the action looked better if its outcome was good, and worse if its outcome was bad. (In real life, the amplification effect exists, along with the omission heuristic, but it was not unexpected, as I knew that Janet Landman had already reported it with a different dependent measure. And this "one experiment" is actually a conflation of several done in collaboration with Ilana Ritov.)

Suppose I had stopped there and reported what I expected and what I found. Compare this to a case in which I predicted (expected) the amplification effect. Should your belief in the existence of the amplification effect depend on whether I predicted it or not (all else being equal)?

Note that these two cases can be compared with a third case, in which I report the result for gains but falsely claim that I predicted it, when, actually, I expected a different result. This is the sort of thing that pre-registration of research hypotheses prevents. But, if the two cases just mentioned do not differ in the credibility of amplification as an explanation, such "cheating" would simply be a type of puffery, cheap talk designed to influence the reader without conveying any truly relevant information, like many other things that authors do to make their conclusions sound good, such as baldly asserting that the result is very important. The simple way for editors to avoid such puffery is to change all statements about prediction to statements of questions that the research might answer. Thus "I predicted a general bias toward omission" would be edited to say "I asked whether the bias toward omission was general, for gains as well as losses."

Let me now return to the original question of whether prior prediction matters. In both cases, we have two plausible explanations: a bias against omission, and an amplification effect. What determines the credibility of the amplification hypothesis? Of course, it fits the data, but that helps equally regardless of the prediction. Some doubt may remain, whether I predicted the result or not.

The other determinant of credibility of an explanation is its plausibility, compared to the plausibility of alternatives. In a well written paper, I would give my reasons for thinking that various explanations are plausible. Readers would also have their own reasons. Above I suggested the reasons for each: a heuristic against action; and an attention effect favoring actions. In a paper, I would try to spell these out "with four-part harmony" (to steal a line from Arlo Guthrie), with citations of related work (such as the "feature-positive effect"), and so on.

Should it then add anything that I predicted the result that I got? The answer to this depends on whether my prediction provides additional evidence for this explanation, that is, additional reason to believe it, aside from everything you now know about its plausibility. But, if I have done my job as a writer, I have told you everything I know that is relevant to making my final explanation (amplification) plausible. The fact that I predicted it provides additional relevant information only if I am withholding something relevant that I know. I have no incentive to do that, unless it is somewhat embarrassing. Authors might be embarrassed to say that they made the prediction because God came to them in a dream and told them what was true. Or, more realistically, "I predicted this result intuitively, for reasons I cannot explain, but you should take this seriously because my intuition is very good." As a reader, you are not going to buy that.

In some research, the favored explanation seems pretty implausible, even though it is consistent with the data presented, despite the efforts of the author to convince us that it is plausible. These cases include some of the "gee whiz" studies in social psychology that raise questions about replicability, but also some of the research in which a precise mathematical model fits the data surprisingly well but otherwise seems to come out of the blue sky. These cases of low plausibility are the ones where claims that the results were predicted (e.g., in a pre-registration) are thought to be most relevant.

For example, suppose I found no significant "omission bias" overall but did find it if I restrict the sample to those who identify themselves as Protestant Christian. I supported this restriction with (highly selected) quotations from Protestant texts, thus explaining the result in terms of religious doctrine. You would rightly be suspicious. You would (rightly) suspect that you could find just as many quotations as I found to support the conclusion that Protestant doctrine emphasized sins of omission as well as sins of commission, and that other religions were no different. Would it help convince you of the reality of my effect if you knew that I predicted it but didn't tell you any more about why? You might just think that I was a little nuts, and, well, lucky.

Pre-registration thus does not solve the problem posed by implausible explanations. Of course they might be true, despite being implausible, but that must be established later. What matters in making an explanation legitimately credible are, first, its fit with the data (compared to alternative explanations) and, second, its fit with other things that we know (again, compared to alternatives). The order in which a researcher thought of things, by itself, provides no additional relevant information.

Going beyond my restatement of Horwich's arguments, analogous reasoning applies to data analysis. One of the nasty things that researchers do is fiddle with their data until they get the result they want. For example, I might fail to find a significant difference in the mean ratings of acts and omissions, but I might find a difference using the maximum rating given by each subject to omissions and to actions, across several scenarios. Pre-registration avoids this fiddling, if researchers follow their pre-registered plan. Doing this, however, discourages the researcher from making reasonable accommodation to the data as they are, such as eliminating unanticipated but nonsensical responses, or transforming data that are turn out to be highly skewed.

But note that many of the statistical options that are used for such p-hacking are ones that do not naturally fit the data very well. Again, it is possible to make up a story about why they do fit the data, but such stories usually tend to be unconvincing, just like the example of Protestantism described above. Thus, data analysis, like explanations, must be "plausible" in order to be convincing.