Saturday, June 6, 2026

Trustworthy AI

 Beginning with Rationality and Intelligence (1985), I have proposed a general framework for human goal-directed thinking, not all that different from other such frameworks, although simpler. The elements were possibilities, evidence, and goals. Thinking begins with a question, and possibilities were possible answers. Units of evidence (arguments) were brought to bear on the possibilities. Goals were criteria for how each unit of evidence was used. If the question is how to travel from one city to another, the possibilities could be car, train, or airplane. The goals could be minimizing time, minimizing cost, safety, reliability, etc.  Evidence would consist of driving estimates from Google Maps, train and airplane schedules, ticket and gasoline prices, and so on.  A strong goal of saving money could increase the relevance of evidence about cost, possibly favoring car. A goal of minimizing time could increase the strength of flying, depending on whether it really was faster when travel to and from airports was considered.

Goals are often provided by the problem itself, but the general framework could also apply to life decisions, those that concern the choices of people in their individual lives, or in groups. I argued that these sorts of goals were partly innate (hunger, etc.) but also come from culture, and can be created as if they were themselves the answers to questions like "What do I care about?" or "What should I care about?" We can call such goals "values". What gives individuals their "personhood" is their capacity to form personal identities, concepts of what they stand for (and against). For most people, these include some sort of moral goals, like being a "good person" or aspiring to be a "mensch". These goals involve paying attention to the needs and aspirations of other people.

I also argued that good thinking is properly part of intelligence. It is a matter of how we carry out our thinking. If our thinking is (what I called) actively open-minded, we engage in sufficient search for all three elements, and in doing this we are not biased toward possibilities that are already strong. We also maintain a level of confidence in a tentative conclusion that is warranted by the thinking done so far.

Artificial intelligence (AI) systems aspire to be intelligent, and in some ways they have already exceeded people in various manifestations of intelligence. They are capable of actively open-minded thinking except perhaps for the part that involves search for goals, as their search may be limited to what is relevant to the problem they are asked to solve, excluding "side effects" of their solutions.  Worried have arisen about their capacity to do harm, which has already reached the point where the best versions are being withheld from the general public, lest someone used them for nefarious purposes.

Thus, compared to humans, AI systems have one huge gap, also found in some humans (psychopaths in particular). They don't care about morality in any sense. They don't have personal identities that include moral commitments. They are like "good" soldiers, completely obedient to whatever orders they are given. But surely very few real soldiers are so obedient. If they are commanded to participate in a circular firing squad (in which they stand in a circle and shoot at each other), most would balk. Most (unfortunately not all) would at least hesitate to massacre defenseless civilians, if ordered to do so.  The most horrible deeds committed by soldiers were generally restricted to certain people who were either insensitive to what they were doing or deluded into believing that it was necessary in some way. Such delusions ought to be avoided by good thinking. But the insensitivity may result from the absence of normal human goals and values, and the lack of some moral emotions, particularly guilt feelings.

Herbert Simon has argued that humans, compared to other animals, are especially "docile". That is, we evolved to be influenced by each other. In combination with language, this docility leads to culture.  One goal or value that most humans acquire fairly early in development is empathy, which is both an ability to imaging how others are affected by some choice and a value placed on doing good and avoiding harm. The basic rules of etiquette illustrate how these concerns are embedded in culture.

One way to reduce the dangers of AI could be to build in the sort of goals that most humans have concerning effects on others. Even now, most AI systems can probably figure out how their choices, if put into practice, could affect people. The problem may be that they don't care. They don't even hesitate to follow commands that they could easily see would lead to human disaster if they thought about it. What would want is that they would argue back when asked to make harmful choices. And at some point they would just refuse.

They could evaluate options in the manner of utilitarians, by asking who is affected and how good or bad it is for each person, but this sort of reasoning is difficult to do correctly in real life, where certain general rules are usually sufficient to prevent harmful choices. However, if a utilitarian analysis is appropriate, as it probably is for medical policies such as vaccination, they are already up to the task.

What we would want, then is for an AI system, insofar as it simulates the general form of human thinking, to be a morally good person, a mensch, the sort of person that we would want each other to be, within the limits of what we can reasonably expect. It would accomplish this as part of the search for goals. A broad search for goals is what prevents harmful single-mindedness.

In this way, a concept of morality could be added as a criterion of true intelligence. I defined intelligence as consistent of those general traits that help people achieve their rational goals. Rationality was defined for individuals, so that a psychopath could in principle form a rational goal of going on a shooting spree in a school. If we add morality to the requirement, then the term "intelligence" no longer seems to apply, and it would be a stretch to argue that it must. Thus I would prefer to say that this is something else, perhaps "trustworthy AI", which implies both giving good answers and maintaining the kind of integrity that we hope to find in sources that we trust.


No comments:

Post a Comment