Judgment misguided

Tuesday, June 18, 2024

"Children of a modest star"

I do not have time to read many books these days. But I managed to finish "Children of a modest star" (Jonathan S. Blake and Nils Gilman), which was highly recommended by a review in Science: https://www.science.org/doi/10.1126/science.ado2345. (The title is from a poem by W.H. Auden.)

I would say that it is about political theory, or political philosophy, but the authors draw on their extensive knowledge of intellectual history. They argue that the idea of the nation state as the main container of sovereignty is a fairly recent idea that is already under attack. This idea appealed to me because I have written a fair but about the vices of nationalism and the virtues of cosmpolitanism. But the book goes beyond this polarity by arguing for a principle of "subsidarity", which holds that problems should be handled by the lowest political subdivision capable of handling them, which is often sub-national. It still argues for a "planetary" level of government, with enforcement power, as necessary for such issues as climate change, preparedness for pandemics, and biodiversity. It is not, and does not claim to be, a fully worked out plan for how things would work in the future it proposes. It presents a rough vision of where we should be headed, and even how we might get from here to there.

Within the community of card-carrying utilitarians, I have been suspicious of "longtermism" as recommended by William MacAskill and others. It is too easy to come up with some fantasy about the long-term future, like that of one person I knew who argued that practically all of our extra resources should be spent trying to find ways to stop a large asteroid from hitting the earth, since, if we don't solve this problem, it is inevitable that this will happen eventually. (I think that we humans now have the capacity to deal with this problem, although I don't know how large an asteroid we could deflect.) Thus, I have thought that the most sensible utilitarian approach to government is to look for incremental improvements in the situation, without worrying too much about their long-term effects, which are difficult to predict. It makes sense to reduce CO2 emissions even if it turns out that, in a few decades, we will use fusion power from a single site to pull CO2 right out of the atmosphere.

Still, I found the book an answer to the question: If you want to consider the long term future, what aspect of it is most relevant? The answer is to look at effects on governance.

The prose is incredibly good. Almost every page has something you put on a t shirt. Although you can read through the 215 pages of text as if this were simply a political manifesto without much anchoring in prior literature, it has 69 pages of footnotes at the end. All the ideas are credited to writings that inspired or preceded them.

Tuesday, April 16, 2024

Existential risks from AI?

A recent Policy Forum article in Science argues for banning certain uses of artificial intelligence (AI) (Michael K. Cohen et al., Regulating advanced artificial agents. Science 384,36-38 (2024). DOI:10.1126/science.adl0625). The authors particularly worry about agents that use reinforcement learning (RL).

RL agents "receive perceptual inputs and take actions, and certain inputs are typically designated as 'rewards.' An RL agent then aims to select actions that it expects will lead to higher rewards. For example, by designating money as a reward, one could train an RL agent to maximize profit on an online retail platform." The authors worry that "a sufficiently capable RL agent could take control of its rewards, which would give it the incentive to secure maximal reward single-mindedly" by manipulating its environment. For example, "One path to maximizing long-term reward involves an RL agent acquiring extensive resources and taking control over all human infrastructure, which would allow it to manipulate its own reward free from human interference."

I may be missing something here, but it seems to me that the authors mis-characterize RL. In psychology, reinforcement learning does not require that the organism (or machine) place any value on reinforcement. The process would work just as well if a reinforcement ("reward") were simply an increase in the probability of the response that led to it, and a "punishment" were simply a decrease. The organism does not "try" to seek rewards or avoid punishments in general. It just responds to stimuli (situations) from a menu of possible responses, each with some response strength. The strength of a response, relative to alternative responses, determines its probability of being emitted. "Reward" and "punishment" are terms that result from excessive anthropomorphization.

It would of course be possible to build an AI system with a sense of self-interest, in which positive reinforcements were valued and purposefully sought, independently of their role in shaping behavior. But this system would not do any better at the task it is given. It might do worse, because it could be distracted by searches for other sources of "reward", as Cohen et al. suggest.

If, for some reason, AI engineers thought that a sense of self-interest would be useful, they could design a system with such a sense. It would need a feature of each possible outcome indicating its overall consistency with long-term goals (including the goal of having good experiences). And it would have to represent those goals, and processes for changing these goals and their relative strengths.

Engineers could also build in a sense of morality, so that a decision-making AI system would, like most real people, consider effects on others as well as on the self. In general, options would be favored more when they had better (or less bad) outcomes for others, and when they had better (or less bad) outcomes for the self. Effects on others would be estimated in the same way as effects on the self, in terms of the consistency of outcomes with long-term goals. Such a sense of morality could even work more reliably than it does in humans. The functional form of the self/others trade-off could be set in advance, so that psychopathy, which gives too little relative weight to effects on others, would be avoided.

If self-interest is to be included, then morality should be included too. It is difficult to see why an engineer would intentionally build a system with self-interest unchecked by morality. That seems to be the sort of system that Cohen et al. imagine.

Algorithm aversion and AI

Recently many people have expressed concerns, some to the point of near panic, about recent advances in artificial intelligence (AI). They think AI can now do great harm, even to the point of ending civilization as we know it. Some of these harms are obvious and also difficult to prevent. Autocrats and other bad actors - such as people who now create phishing sites or ransomware - will use AI software to do their jobs better, just as governments, scientists, law enforcers, and businesses of all sorts will do the same for their respective jobs. Identification of individuals, for purposes of harassing them, will become easier, just as the Internet itself made this, and much else, good and bad, easier. Other technologies, such as the telephone, postal system, and telegraph, have also been used for nefarious purposes (as in "wire fraud" and "mail fraud"). The white hats will continue to fight the black hats, often with the same weapons.

Of special concern is the use of AI to make decisions about people, such as whether to give them loans, hire them for jobs, admit them to educational institutions, incarcerate them, treat them for illness, or cover the cost of such treatment. The concerns seem to involve two separate problems: one is that AI systems make errors; the other is that they could be biased against groups that already suffer from the effects of other biases, such as Blacks in the U.S.

The problem of errors in AI is part of another problem that has a large literature in psychology, beginning with Paul Meehl's "Clinical and statistical prediction" (1954) and then followed up by Robyn Dawes, Hal Arkes, Ken Hammond, Jason Dana and many others. A general conclusion from that literature is that simple statistical models, such as multiple linear regression, are often more accurate at various classifications, such as diagnosing psychological disorders, than humans who are trained to make just such classifications and who make them repeatedly. This can be true even when the human has more information, such as a personal interview of a candidate for admission.

A second conclusion from the literature is that most people, including the judges and those who are affected, seem to prefer human judgments to statistical models. Students applying to selective colleges or graduate programs, for example, want someone to consider them as a whole person, without relying on statistical predictors. The same attitudes come up in medical diagnosis and treatment, although the antipathy to statistical models seems weaker in that area. Note that most of these statistical models are so simple that they could be applied with a pencil and paper by someone who remembers how to do arithmetic that way. Recent improvements in AI have resulted from the enhanced capacities of modern computers, which allows them to learn from huge number of examples how to make classifications correctly with much more complex formulas, so complex that the designers of the programs do not know what the formulas are. These models are better than those that can be applied on a small piece of paper, but the issues are much the same. If anything, the issues are more acute exactly because the models are better. If the older, simpler, models were better than humans, then these new ones are better still.

Note that, although some studies fail to find a preference for humans over computers on the average, such results do not result from all the subjects being indifferent between humans and computers. Rather, they reflect differences among the subjects. The average result can favor computers over humans if 40% of the subjects are opposed to computers. The existence of large minorities who oppose the use of AI can make adoption of AI models nearly as difficult as it would be if a majority were opposed, especially when the majority is vocal and organized.

AI models make errors. Before we reject or delay their use, we need to ask the fundamental question of all decision making: compared to what? We often need to "accept error to make less error" (as Hillel Einhorn put it).

The same question is relevant for the bias problem. I put aside questions about how bias should be measured, and whether some apparent biases could result, fully or partially, from real differences in the most relevant populations. When AI tools seem to be biased, would the same be true when AI is not use? The bias might be larger still when decisions are made by individual human judges, or by some simpler formula.

Sunday, December 3, 2023

More on why I am not a fan of pre-registration

This is a draft follow-up to my earlier post on prediction, accomodation, and pre-registration: https://judgmentmisguided.blogspot.com/2018/05/prediction-accommodation-and-pre.html

I argued there that some of the appeal of pre-registration is the result of a philosophical mistake, the idea that prediction of a result is better than post-hoc accomodation of the result once it is found, holding constant the fit of the result to its explanation.

Here I comment on pre-registration from the perspective of editor of Judgment and Decision Making, a journal concerned largely with applied cognitive psychology. I try to answer some common points made by the defenders of pre-registration.

1. As editor, I find myself arguing with authors who pre-registered their data analysis, when I think that their pre-registration is just wrong. Typically, the pre-reg (pre-registration document) ignores our statistics guidelines at https://jbaron.org/journal/stat.htm. For example, it proposes some sort of statistical control, or test for removable interactions. Although it is true that authors do not need to do just what they say in the pre-reg, they must still explain why they changed, and some authors still want to fully report both their pre-registered analysis and what I think is the correct one.

I don't see why pre-registration matters here. For example, one common issue is what to exclude from the main analysis. Often the pre-reg specifies what will be excluded, such as the longest 10\% of responses times, but I often judge this idea to be seriously inferior to something else, such as using a log transform. (The longest times may even reflect the most serious responding, and their outsized influence on statistics can usually be largely eliminate by transformation.) The author might argue that both should be reported because the 10\% idea was thought of beforehand. But does it matter when you think of it? If it is such an obvious alternative to using the log, then you could think of it after collecting the data. (This is related to my blog post mentioned earlier.) If the main analysis will now be based on logs, it doesn't even matter if the decision to use 10\% was thought of after finding that it yielded clearer results (p-hacking).

2. It may be argued that pre-registration encourages researchers to think ahead. It might do that, but it would be a subtle effect, as it may lead to thinking about issues that would be considered anyway.

The most common failure to think ahead is to neglect alternative explanations of an expected result. You can find that in pre-registrations as well as submitted papers. Maybe pre-registration helps a little, like a nudge. But the most common alternative explanations I see seem to be things like reversed causality (mediator vs. DV), or third-variable causality, in mediation analysis. Pre-regs sometimes propose mediation analyses without thinking of these potential problems. Another common alternative explanation is that interactions are due to scaling effects (hence "removable"). I have never see anyone think of this in advance. Most people haven't heard of this problem (despite Loftus's 1978 paper in Memory and Cognition). Nor the problem with statistical control (again pointed out decades ago, by Kahneman among many others), which they also put in pre-regs.

3. Does pre-registration protect against p-hacking anyway? Psychology papers are usually multi-study. You can pre-register one study at a time, and that is what I usually (always?) see. So you don't have to report the related studies you did that didn't work, even if you pre-registered each one, although honest reporting would do that anyway. This is a consequence of the more general problem that pre-registration does not require making public the results whether the study works or not. Unlike some clinical trials, you can pre-register a study, do it, find that the result fails to support the hypothesis tested, and put study in the file drawer. In principle, you can even pre-register two ways of doing the same study or analysis and then refer to the pre-registration that fits better when you write the paper. (I suspect that this has never happened. Unlike failing to report those studies that failed, this would probably be considered unethical. But, if a journal starts to REQUIRE pre-registration, the temptation might be much greater.)

4. What do you do to detect p-hacking, without a pre-reg? I ask whether the analysis done is reasonable or whether some alternative approach would be much better. If a reasonable analysis yields p=.035 for the main hypothesis test, this is a weak result anyway, and it doesn't matter whether it was chosen because some other reasonable analysis yielded p=.051. Weak results are often so strongly consistent with what we already know that they are still very likely to be real. If they are surprising, it is time to ask for another study. Rarely, I find that it is helpful to look at the data; this sometimes happens when the result is reasonable but the analysis looks contrived, so I wonder what is going on.

Pre-registration inhibits the very helpful process of looking at the data before deciding how to proceed with the analysis. This exploration is so much part of my own approach to research that I could not possibly pre-register anything about data analysis. For example, in determining exclusions I often look at something like the distribution of (mean log) response times for the responses to individual items. I often find a cluster of very fast responders, separate from the rest. Sometimes the subjects in these clusters give the same response to every question, or their responses are insensitive to compelling variations that ought to affect everyone. I do this before looking at the effects of removing these subjects on the final results.

5. It seems arrogant to put your own judgment ahead of the authors'.

When it comes to judging other people's papers as editor, I think that relationship between author and editor is not one of equality. I do not need to give equal weight to the author's judgment as reflected in the pre-reg, just as I do not need to give equal weight to the reviewers' opinions and my own.

When I handle a paper, I am the editor. It is my job to enforce my own standards, not to enforce some consensus in which everyone is equal. (I see no point in that when the apparent consensus produces so much junk. "Peer reviewers" are in such demand that any warm body will do. The situation is worst in grant-review panels, which usually don't have an "editor" in charge.) Some editors are better than others. There is nothing I can do about that. (Two of the best were Ray Nickerson and Frank Yates, both of whom died recently.) Journal editors are like judges in the legal system, CEOs of corporations, or deans of universities. They are given special status. We hope they live up to their status.

Tuesday, August 8, 2023

First impressions of ChatGPT

Here are two examples of my first interactions with ChatGPT or some similar system. In the first, I asked it to construct a new moral dilemma for an experiment. I had in mind the bicycle dilemma discussed by R. M. Hare in "Moral thinking" (1981, section 6.2) concerning whether the actor should move another person's bicycle (a minor inconvenience for the bicycle owner) in order to park a car (to avoid a greater inconvenience of finding another parking space), I may use both now. This one has to be edited a bit, but it does the job, and I never saw anything like it before. I was impressed.

=======================================================

User

I would like a moral dilemma in which the right answer is for me to be selfish because the benefit to me is greater than the harm to someone else.

ChatGPT

Alright, here's a hypothetical scenario that might suit your request:

The Dilemma of the Scarce Scholarship

You've been working exceptionally hard throughout high school, and it's finally paid off. You have the grades, the extracurricular activities, and the teacher recommendations to qualify for the most prestigious scholarship at your dream university. This scholarship covers full tuition, accommodation, and also provides a substantial monthly stipend. It's a once-in-a-lifetime opportunity that could completely change your future.

However, you're not the only one who qualifies for this scholarship. Your best friend, Alex, also qualifies. Alex comes from a more privileged background than you. Their family could afford to send Alex to the university, albeit with some financial strain. Without the scholarship, you would have to take out substantial student loans to attend, and you might even have to give up on your dream university for a less expensive option.

One day, you stumble upon an opportunity. You find out about an obscure academic competition that the scholarship board holds in high regard. Participation in this competition could significantly boost your chances of getting the scholarship, but very few people know about it.

The moral dilemma is this: Do you tell Alex about the competition or keep it to yourself?

If you tell Alex, you keep your integrity intact, but you significantly increase the competition for the scholarship. Alex, with their greater resources, might even have a better shot at winning the competition.

If you don't tell Alex, you maximize your chances of securing the scholarship, but at the cost of acting dishonestly toward your friend.

The self-interested option (not telling Alex about the competition) might be justifiable in this case. The potential benefits for you are life-changing. Alex, although potentially disadvantaged by not knowing about the competition, would still have the opportunity to attend university without the burden of crippling debt.

================================================================

The second set resulted from a letter I got from Maximus, a company I never heard of, which apparently contracts with Medicare (the U.S. health insurance system for old people). The letter said that Maximus had been the victim of a data breach, and my records were included. They included enough information for someone to steal my identity. So I set off to find out whether the records were accurate (hoping that they were not). Simply changing my Medicare number would itself involve a fair bit of work, since many people have it, so I spent some time on it. The Medicare web site would not let me see the information it had about me. I called a phone number at Medicare and got to a human, who informed me that the letter was sent to me by mistake and my records were not part of the breach. I was so happy that I thanked him and said good bye. Later I got more upset at Maximus, so I called the number they said would reach a professional familiar with the breach. The "professional" turned out to be an automated chat, and it was no help.

Next, I used the "chat" function of Medicare itself, which kept a record of our conversation (below). I wanted to say bad things about Maximus. Below is the transcript. It now appears that corporations (Maximus) and major government agencies (Medicare) are using ChatGPT-type systems to replace humans on phone and help lines. I'm sure these things will get better, but this is not ready for prime time.

------------------- Medicare.gov Live Chat Transcript ---------------------

Live chat Date:8/5/2023

Start Time:11:14:13 am

End Time:11:28:06 am

-----------------------------------------------------------------------------

[11:14:13 am]: Thank you for contacting Medicare.gov Live Chat.

[11:14:16 am]: Jonathan I want to report what I think is a seriously misleading and deceptive letter that I received from Maximus about my information being accessed. I would prefer to write an email, but I can't find anywhere to send it. I think that whoever wrote and approved this letter should be disciplined in some way. Should I proceed? Or can you give me an email address?

[11:14:16 am]: Please wait while you are connected to an agent.

[11:14:19 am]: You are now connected with Medicare.gov Live Chat.

Thank you for contacting Medicare.gov Live Chat. My name is ulie. Can you please verify your first and last name?

[11:14:42 am]: Jonathan Jonathan Baron

[11:14:59 am]: Julie Hello, I will be happy to assist you with this today.

[11:15:30 am]: Julie Who send you the letter?

[11:16:32 am]: Jonathan At the top it said Maximus, and (on the right) CMS. The return address was in Claysburg, PA.

[11:16:58 am]: Julie

Thank you.

Let me look into this for you.

[11:17:11 am]: Jonathan 'https://www.cms.gov/newsroom/press-releases/cms-responding-data-breach-contractor' is the letter I received.

[11:19:05 am]: Julie I appreciate your patience.

You should have received a letter notifying you of a data breach. We have determined that your Medicare number was not compromised, and you don't need to request a new one. The letter provides more information about the incident and how it is being addressed. Please note that this does not have any impact on your current Medicare benefits or coverage.

[11:19:12 am]: Julie Maximus, a contractor for the Centers for Medicare & Medicaid Services, recently learned that their file transfer software, called MOVEit, was compromised. They are offering 24 months of credit monitoring and other services from Experian at no cost. You can call Experian to learn more; their phone number is included in your letter. This is not available through email.

[11:23:50 am]: Jonathan This is just what the letter said. Yes, indeed, it was incorrect, and caused me to waste a lot of time. But there were other problems. Do you want to hear about those? One problem is that the further questions suggested calling a number 'staffed with professionals familiar with this incident'. In fact, the 'professional' turned out to be (pretty cleary) an AI program like ChatGPT. I treated it that way. (ChatGPT has a thick skin. It doesn't complain if you criticize it harshly.) It could not answer my questions. I hope you are not the same robot!

[11:25:59 am]: Julie Do you have additional questions regarding the letter you received?

[11:27:25 am]: Jonathan No. So clearly I am wasting my time once again. You have missed the point, but I'm not going to try to get you back on track. I wanted an email address. You are clearly not going to give me one.

[11:27:54 am]: Julie Have a nice day.

[11:28:06 am]: The chat session has ended. Thank you for contacting Medicare.gov Live Chat. Have a nice day.

Friday, July 14, 2023

Diversity, noise, and merit in college admission

I have not studied the recent US Supreme Court decision that ended affirmative action in college admissions, but I have followed the issue in news reports and have other relevant experiences.

For a few years including 1990-92 (details lost) I was head of the committee that supervised undergraduate admissions to the School of Arts and Sciences at Penn. My main goal was to study the "predictive index", a formula used by the admissions office to predict academic achievement after admission. The index was an equally-weighted sum of the SAT aptitude test, the mean of three achievement tests (typically including math and English), and high-school class rank. Working with Frank Norman, a colleague, I discovered that the the aptitude test was essentially useless once we had the other two predictors, and we tried to get the admissions department to drop it, as described in https://www.sas.upenn.edu/~baron/sat.htm/.

Around the same time, I attended a meeting of Ivy League admissions people at Harvard. This was after the "Ivy overlap" meeting was abolished by overzealous anti-trust action, as described in https://news.mit.edu/1992/history-0903. The overlap meeting was a discussion of applicants for financial aid at more than one college in the elite group. It was designed to insure that colleges were not competing for applicants on the basis of financial aid, hence insuring that the "colluding" colleges would "admit students solely on the basis of merit and distribute their scholarship money solely on the basis of need." When this policy was in effect, Penn had a hard time funding need-blind admissions, so it limited this policy to Americans, although Harvard did not. The meeting was to discuss the situation.

There I had occasion to discuss our SAT report with the Harvard dean of admissions, William Fitzsimmons. In passing, he pointed out something that helps to explain some of the apparent irrationality of admission systems in general. He said, roughly, "We could fill the entire freshman class with students who got straight 800s [perfect scores] on all the tests." (He might have added "from India".) But, he might have gone on to say, we don't want a class that is full of academic achievers. We want variety. We want a lot of students who are satisfied with passing grades. These students may not become scholars, but they may benefit from an education that will help them and others in many different ways. And they will dilute the competitive atmosphere that would result from a focus on achievement alone. (Is that what "merit" means when people argue that merit should be the sole basis for admisison?) So we must use other criteria.

The usual way of getting the desired variety/diversity is for admissions staff to read applications and make judgments about "character" or something like that. The psychological literature on selection is fairly clear that such judgments are very poor at predicting anything, in contrast to measures like grades and test scores (like our predictive index, suitably revised), which are pretty good at predicting other grades and test scores. The main result of these "personal" judgments is to add noise, random error, to the process (unfortunataly at considerable cost, since the admissions staff are well paid, and this is a substantial part of their jobs).

In sum, current admissions policy at Harvard, Penn, and similar colleges, is not based on "merit" alone, if that is taken to mean prediction of academic achievement. Instead, "pretty good" applicants who rank high (but not at the very top) on merit are rejected so that applicants lower in merit can be admitted for the sake of diversity. These decisions about acceptance and rejection are made in ways that are noisy, similar to the results of a lottery.

The same sort of noise was introduced by the 1978 Bakke decision of the Supreme Court, which prohibited colleges from doing affirmative action by exactly the method that would have been best according to the psychology literature, which is to rank all applicants by objective criteria and then, if you want more Blacks, use the same ranking but a lower cut-off. This method would optimize the academic performance of both groups. And, in particular, it would minimize the number of affirmative-action admits who were not ready for academic work of the sort expected at the elite colleges. These students exist, and many of them suffer from failure that would not have occurred had they gone to a less demanding college. But the court demanded that students be evaluated "holistically", which served to increase noise and not do much else.

Now it seems that many colleges intend to do more holistic admissions in hopes that they could find minority students who deserved admission. Note that this policy is fully consistent with Fitzsimmons' desire for diversity. A big problem is that, by definition, elite college are those that teach courses at a fairly high level, for example, those that use my textbook "Thinking and Deciding" in undergraduate courses. Students with weak numeracy skills will have trouble. Thus, it is still important to select students who can do the work. Failures in courses and in graduation itself are bad outcomes and should not be seen as a "cost we must pay" for diversity.

Thus the big trouble with holistic judgment as a way of promoting diversity is that it will lead to admission of too many students who are predictably not ready for academic work at the level required. These students are "cannon fodder" for the supposedly enlightened policy that admits them.

Ideally, colleges that use any sort of holistic criteria should also use the best statistical predictors to eliminate applicants who are high risk for failure, thus setting a lower bound to be applied to the students already selected by holistic criteria. This won't be perfect, but it seems worth a try. The Supreme Court, loose cannon that it is, might find that it is unconstitutional, just as it prohibited the optimal use of predictive indices for affirmative action.

When I was a student at Harvard 1962-66, it was just beginning a sort of affirmative action. My first-year roommate was Black. Charlie was a friend from Andover, an elite prep school that we both attended. We both would have been admitted if admission was base fully on merit, but Harvard often rejected pretty-good students like Charlie, or me, just to make room for those less qualified. But he was Black and clearly capable of doing the work. So they took him. (And so did Harvard Law School, 4 years later.)

And I was a legacy. My father, class of '37, was admitted at a time when Harvard was trying not to admit too many Jews (a situation that seems analogous to its attitude toward Asians today). My son also went there, and was thus 3d generation and from a minority that was once discriminated against. Another roommate of mine after the first year was 7th generation, and at least one of his daughters also went to Harvard. Sure, this is a way of preserving privilege, and it has the effect of admitting not-so-great students, thus increasing diversity of achievement, but probably without as much risk of failure as other ways of doing this, such as sports admissions, and with less than average need for financial assistance. Family traditions are also important to some families. Legacy preference for students who need it would increase diversity simply because some of these students will not be competitive high achievers. Many other legacy students would probably be admitted anyway.

My father's situation seems similar to the situation of many "Asian" students today. Harvard in particular does seems not to want "too many Asians," and that outcome would result from selection by merit alone. Yet, since merit is still part of the story, Asians who are admitted have higher test scores and high-school grades, which means that they are predicted to get better grades in college. That happens. We do not need some psychological explanation of why Asian student are relatively high achievers in college. The same used to be true of Jews, but now I think that colleges are no longer biased against them. (A small literature on "calibration" and "predictive bias" looks at the possibility of determining systemic or intentional bias on the basis of performance after selection. I think that it would show bias against Asians in some colleges today.)

In 1962, women were admitted to Radcliffe College, which was much smaller than Harvard College. As a result, the cut-off for admission of women was higher, and "Cliffies" got better grades and often dominated class discussions. Women are not generally that much smarter than men. We had a biased sample.

On the other side, affirmative action for under-represented minorities means that they are admitted with lower test scores and high-school grades. This fact can (fully or partially) explain why they get lower grades once admitted.

In sum, merit still matters, and it still predicts achievement. When students are more selected, they will achieve more on the average. Students selected with less attention to scores and grades will have lower achievement in college, although some ways of doing this seem better than others.

Some affirmative action for under-represented minorities seems reasonable (to me). Cultural diversity, which in the U.S. is correlated with "racial" diversity, is important for the education of all students. The increase in diversity is targeted and predictable, not simply random. But this is now off the table.

One response to the recent court ruling is to aim at diversity in ability to pay, by applying affirmative action to poor students (coupled with attempts to recruit them), those students who would ordinarily require maximum financial aid. Such a policy would deplete funds for financial aid, possibly increasing the difficulty of maintaining need-blind admissions for other students who required moderate amounts of aid. However, a policy favoring legacies would reduce the need for financial aid. Keeping legacy admission might help pay for more poor students.

Affirmative action of under-represented minorities, including the poor, must be done in combination with a strong effort to avoid predictable failure after admission. There is some optimum amount of affirmative action, and some colleges may have gone beyond it for minorities, although not for the poor.

More generally, admission to most selective colleges has never been done on the basis of merit alone. If other criteria are used, it is not unreasonable to know what they are, rather than relying on noise alone.

Wednesday, July 12, 2023

Cluster munitions for Ukraine

The New York Times editorial of July 10, "The flawed moral logic of sending cluster munitions to Ukraine," opposed the U.S. decision to do just that. It tried to rebut some of the arguments made in favor of the plan, but it missed at least one, the fact that most of the area involved would already be littered with mines and unexploded cluster munitions used (extensively) by Russia, so the additional care required to try to avoid them later would already be required.

The editorial, the statements of governments opposed to the plan, and some of the published letters to the Times, seemed to follow the principle that these munitions are morally wrong, whatever the consequences. Such absolute principles are, in the sense I have used (e.g., Baron and Spranca, 1997) protected values. Ideological adherence to such values surely has considerable political influence. These commitments may be held unreflectively. When people are forced to confront specific situations where the principle conflicts with some other principle, such as avoiding terrible consequences, they often admit that their principle is not absolute after all (Baron and Leshner, 2000).

In "Rules of War and Moral Reasoning" (1972, http://www.jstor.org/stable/2264969), R. M. Hare criticizes the "absolutist" deontological views of Thomas Nagel, who advocated strict adherence to accepted rules, such as those prohibiting the use of poison gas, or attacks on the Red Cross.

"The defect in most deontologica theories ... is that they have no coherent rational account to give to any level of moral thought above that of the man who knows some good moral principles and sticks to them. He is a very admirable person, and to question his principles ... is indeed to 'show a corrupt mind'." However, to achieve such an account, "we have to adopt a 'two-level' approach, ... to recognize that the simple principles of the deontologist, important as they are, have their place at the level of character formation." Although we should be careful about violating principles that have been drilled into us (including those inculcated by military training), we need to be willing to override them on the basis of a higher level of analysis, that is, to make exceptions when they clearly lead to worse consequences than some alternative, even if our attachment to the broken rules leads us to feel guilty (just as the failure to prevent terrible consequences can also lead to guilt feelings).

Absolute rules represent a hardening of moral intuitions that are usually sufficient but should sometimes be overridden by more reflective reasoning, as suggested by Greene ("Moral tribes") and others. The opponents of cluster munitions seem to illustrate these hardened intuitions, which are protected values. Once having decided that cluster munitions are morally wrong, whatever the consequences, some opponents then engage in belief overkill, finding ways to ignore relevant facts on the other side, or to exaggerate the probability of harmful consequences resulting from action.