Tuesday, April 16, 2024

Existential risks from AI?

A recent Policy Forum article in Science argues for banning certain uses of artificial intelligence (AI) (Michael K. Cohen et al., Regulating advanced artificial agents. Science 384,36-38 (2024). DOI:10.1126/science.adl0625). The authors particularly worry about agents that use reinforcement learning (RL).

RL agents "receive perceptual inputs and take actions, and certain inputs are typically designated as 'rewards.' An RL agent then aims to select actions that it expects will lead to higher rewards. For example, by designating money as a reward, one could train an RL agent to maximize profit on an online retail platform." The authors worry that "a sufficiently capable RL agent could take control of its rewards, which would give it the incentive to secure maximal reward single-mindedly" by manipulating its environment. For example, "One path to maximizing long-term reward involves an RL agent acquiring extensive resources and taking control over all human infrastructure, which would allow it to manipulate its own reward free from human interference."

I may be missing something here, but it seems to me that the authors mis-characterize RL. In psychology, reinforcement learning does not require that the organism (or machine) place any value on reinforcement. The process would work just as well if a reinforcement ("reward") were simply an increase in the probability of the response that led to it, and a "punishment" were simply a decrease. The organism does not "try" to seek rewards or avoid punishments in general. It just responds to stimuli (situations) from a menu of possible responses, each with some response strength. The strength of a response, relative to alternative responses, determines its probability of being emitted. "Reward" and "punishment" are terms that result from excessive anthropomorphization.

It would of course be possible to build an AI system with a sense of self-interest, in which positive reinforcements were valued and purposefully sought, independently of their role in shaping behavior. But this system would not do any better at the task it is given. It might do worse, because it could be distracted by searches for other sources of "reward", as Cohen et al. suggest.

If, for some reason, AI engineers thought that a sense of self-interest would be useful, they could design a system with such a sense. It would need a feature of each possible outcome indicating its overall consistency with long-term goals (including the goal of having good experiences). And it would have to represent those goals, and processes for changing these goals and their relative strengths.

Engineers could also build in a sense of morality, so that a decision-making AI system would, like most real people, consider effects on others as well as on the self. In general, options would be favored more when they had better (or less bad) outcomes for others, and when they had better (or less bad) outcomes for the self.  Effects on others would be estimated in the same way as effects on the self, in terms of the consistency of outcomes with long-term goals.  Such a sense of morality could even work more reliably than it does in humans. The functional form of the self/others trade-off could be set in advance, so that psychopathy, which gives too little relative weight to effects on others, would be avoided.

If self-interest is to be included, then morality should be included too. It is difficult to see why an engineer would intentionally build a system with self-interest unchecked by morality. That seems to be the sort of system that Cohen et al. imagine.


Algorithm aversion and AI

Recently many people have expressed concerns, some to the point of near panic, about recent advances in artificial intelligence (AI). They think AI can now do great harm, even to the point of ending civilization as we know it. Some of these harms are obvious and also difficult to prevent. Autocrats and other bad actors - such as people who now create phishing sites or ransomware - will use AI software to do their jobs better, just as governments, scientists, law enforcers, and businesses of all sorts will do the same for their respective jobs. Identification of individuals, for purposes of harassing them, will become easier, just as the Internet itself made this, and much else, good and bad, easier. Other technologies, such as the telephone, postal system, and telegraph, have also been used for nefarious purposes (as in "wire fraud" and "mail fraud"). The white hats will continue to fight the black hats, often with the same weapons.

Of special concern is the use of AI to make decisions about people, such as whether to give them loans, hire them for jobs, admit them to educational institutions, incarcerate them, treat them for illness, or cover the cost of such treatment. The concerns seem to involve two separate problems: one is that AI systems make errors; the other is that they could be biased against groups that already suffer from the effects of other biases, such as Blacks in the U.S.

The problem of errors in AI is part of another problem that has a large literature in psychology, beginning with Paul Meehl's "Clinical and statistical prediction" (1954) and then followed up by Robyn Dawes, Hal Arkes, Ken Hammond, Jason Dana and many others. A general conclusion from that literature is that simple statistical models, such as multiple linear regression, are often more accurate at various classifications, such as diagnosing psychological disorders, than humans who are trained to make just such classifications and who make them repeatedly. This can be true even when the human has more information, such as a personal interview of a candidate for admission.

A second conclusion from the literature is that most people, including the judges and those who are affected, seem to prefer human judgments to statistical models. Students applying to selective colleges or graduate programs, for example, want someone to consider them as a whole person, without relying on statistical predictors. The same attitudes come up in medical diagnosis and treatment, although the antipathy to statistical models seems weaker in that area. Note that most of these statistical models are so simple that they could be applied with a pencil and paper by someone who remembers how to do arithmetic that way. Recent improvements in AI have resulted from the enhanced capacities of modern computers, which allows them to learn from huge number of examples how to make classifications correctly with much more complex formulas, so complex that the designers of the programs do not know what the formulas are. These models are better than those that can be applied on a small piece of paper, but the issues are much the same. If anything, the issues are more acute exactly because the models are better. If the older, simpler, models were better than humans, then these new ones are better still.

Note that, although some studies fail to find a preference for humans over computers on the average, such results do not result from all the subjects being indifferent between humans and computers. Rather, they reflect differences among the subjects. The average result can favor computers over humans if 40% of the subjects are opposed to computers. The existence of large minorities who oppose the use of AI can make adoption of AI models nearly as difficult as it would be if a majority were opposed, especially when the majority is vocal and organized.

AI models make errors. Before we reject or delay their use, we need to ask the fundamental question of all decision making: compared to what?  We often need to "accept error to make less error" (as Hillel Einhorn put it).

The same question is relevant for the bias problem. I put aside questions about how bias should be measured, and whether some apparent biases could result, fully or partially, from real differences in the most relevant populations. When AI tools seem to be biased, would the same be true when AI is not use? The bias might be larger still when decisions are made by individual human judges, or by some simpler formula.