Recently popular political movements have been anti-immigrant, anti-free-trade, and more generally anti-globalization. What these positions share is a lack of concern for outsiders. For example, U.S. discussions of the Trans Pacific Partnership (which has many advantages and disadvantages for everyone) tend to ignore completely the apparent large benefits for Vietnam. The technical term for this lack of concern is parochialism. In part, parochialism is part of our political language. The use of "we" in refers to fellow citizens, sometimes even excluding members of recently arriving ethnic groups. But some people, in their thinking if not in their speech, consider effects on outsiders, or even think of themselves as members of larger groups such as Europeans or citizens of the world. Once this kind of cosmopolitan thinking was even fashionable, as expressed, for example, in John Lennon's (1971) song "Imagine", and it seems to be coming back into fashion among some young people in Europe.
The simple argument against parochialism is that it is morally arbitrary, hence unjustified. The question of who should count in our moral judgments is a very basic one. The answer cannot be derived from competing philosophical approaches such as utilitarianism or deontology in general. So the usual attack on parochialism of any sort is to ask why a distinction should matter. This was the logical move made against slavery, racial discrimination, and discrimination against women. Of course, the defenders of these institutions sometimes tried to answer this attack by pointing to supposed empirical facts about, for example, how women's emotionality made them unsuitable as voters or office holders. But these arguments were ultimately recognized as post-hoc justifications, with little empirical basis. So the basic argument was, "If you care about what happens to X, why shouldn't you care equally about Y, even though Y is a different race, sex, or nationality?" This kind of logical argument is powerful, yet it is rarely made in public debates.
One counter-argument comes from a different analogy, loyalty to close kin. Equal treatment of everyone would imply that you should care about a stranger's child, spouse, or parent as much as you care about your own. If it is morally acceptable to give preference to loved ones, why not co-nationals too? This objection has several possible answers. One I like is that morality should concern itself with choices among options that are on the table. And the option to sacrificing one's own child for a greater good is not something that most of us would consider. We could just not bring ourselves to do it. (More precisely, our willingness to sacrifice our own concerns and desires is limited, so we should make our decisions so as to do the most good overall within this limit.)
Assuming that this argument works for loved ones -- and I think it does -- then could it also work for co-nationals? Yes, it could, if we feel such strong loyalty to our co-nationals. But we can take a step back and ask where our loyalty comes from. In the case of children, it is biologically determined. However, in the case of co-nationals, it is the result of an acquired abstract category. Even if humans evolved to be loyal to those in their immediate group of non-kin, the extension of group membership to total strangers requires a learned categorization of certain strangers as members of this group. Such categorization cannot plausibly be the result of natural selection, as it is, once again, arbitrary. If we can define "our group" as "German citizens", we could just as easily define it as "European citizens". People who reflect on this arbitrariness may come to change their loyalties.
In sum, it may be too late for those who feel very strongly about their co-nationals. From their perspective, parochialism can be justified, assuming that they cannot modify their feelings by reflection. Yet we can still object to the cultural forces that lead people to think this way, including the assumptions of political discourse itself.
A second line of argument for parochialism concerns the definition of responsibility that comes from the specific social roles. Social organization gives people decision-making authority in limited domains. When people violate these limits, they risk losing their authority, and they set a precedent for subverting a useful system. Police officers are not supposed to make decisions about punishment. That role is left for courts and judges, which are limited in yet other ways.
This is also a good argument, but is the role of a citizen just to advance what is best for their co-nationals. Many citizens do not limit their role in this way, and they are not even considered to be bad citizens as a result. Recent immigrants often think about others from their country of origin who might also want to immigrate. Some people take into account the effects of policy on other countries to which they have secondary loyalty. And still others think about issues that affect the whole world, such as climate change. We have no written rule against such a view of citizenship, nor any obvious social norms. The narrow definition of the citizen's role as serving only national interest is one that some people arrive at by themselves. It is not part of the social structure of roles, unlike the roles of police officers and judges.
Citizens do have a special responsibility toward their own nation, if only because they are in the best position to know what is good for it. They cannot rely on foreigners to decide on issues that have mostly local effects. But the exercise of this responsibility does not imply that outsiders should simply be neglected. It is a responsibility that applies much more to some issues than to others. As a citizen, we have a special responsibility to inform ourselves about national and local issues that don't have much effect on outsiders, and there are many of these. But just as our concern about city and state issues does not justify neglect of national issues, so our concern with national issues does not justify neglect the world outside.
In sum, the justification for parochialism of the sort we see in current politics seems weak. Would it be possible to confront people with arguments against this view in general? We don't know unless we try.
Saturday, June 25, 2016
Saturday, June 18, 2016
Learning social rules
I just read a forthcoming paper (in Mind and Language) by Shaun Nichols and several others, which argues that it is rational to develop moral rules that distinguish (for example) acts and omissions. The relevant idea of "rational" is from rational concept formation.
When you learn a new concept, it is best not to generalize it too much. In some experiments, subjects were given examples of rule violations, for learning, and tested with other examples. When the learning examples were of the form "X did an action A that caused outcome C to happen", subjects generalized this to similar test examples with other examples of A and C. But they did not consider examples of the form "X failed to do B, which would have prevented C from happening" to be violations of the rule. In order to teach subjects that the rule applied to omissions as well as acts, the training had to include omission cases as examples of rule violations.
This behavior of subjects makes perfect sense in the case of arbitrary rules, and even legal rules. But I was bothered because I don't think moral rules should be arbitrary in this way.
One possible explanation of the difference is that sophisticated moral rules arise from reflection on the social rules that we have learned. Specifically, we reflect by asking questions about purposes (which I call "search for goals" in some places). When we see an example of a rule and ask about its purpose, we might discover what general purpose it serves. We can then think about how to generalize it so that it serves that purpose. If it does not serve the purpose in some cases, or if it could serve the same purpose better by a modification, then we can think about improving it.
The same process is part of what it means to understand a design such as a mathematical formula, according to my interpretation of David Perkins' book "Knowledge as Design". For example, we understand the formula for the area of a parallelogram (and its associated arguments) by finding that the argument for this rule serves the purpose of converting the parallelogram to a rectangle, and we already know how to find the area of a rectangle. Once we discover this connection, we can apply the same principle elsewhere, as Max Wertheimer shows in the first chapter of "Productive thinking". We can transfer the principle to cases where it applies while avoiding transfer to other cases.
Similarly, a law with a "loophole" is an example of a rule that is crafted in a way that fails to serve its purpose. We can fix laws by removing loopholes.
A law that gives rights, such as the right to vote, drive, or own property, to men but not to women does not seem to serve reasonable accounts of the purposes of such rights-granting laws. We have trouble coming up with a purpose that applies to men but not women. Any such purpose seems arbitrary; it could just as well distinguish people with odd and even birthdays. Such a search for purposes is, I think, the sort of reflection that Peter Singer discussed in "The expanding circle".
Thus it is one thing to learn a rule, but it is another to understand the rule in a way that allows us to ask whether it serves its purpose as well as it could, and, if not, what could replace it. It may be rational from the perspective of learning to learn whatever we are taught about what to do and not do, but, if this is all we did, we would cut off the possibility of improving these rules.
Could most deontological rules survive this kind of questioning?
When you learn a new concept, it is best not to generalize it too much. In some experiments, subjects were given examples of rule violations, for learning, and tested with other examples. When the learning examples were of the form "X did an action A that caused outcome C to happen", subjects generalized this to similar test examples with other examples of A and C. But they did not consider examples of the form "X failed to do B, which would have prevented C from happening" to be violations of the rule. In order to teach subjects that the rule applied to omissions as well as acts, the training had to include omission cases as examples of rule violations.
This behavior of subjects makes perfect sense in the case of arbitrary rules, and even legal rules. But I was bothered because I don't think moral rules should be arbitrary in this way.
One possible explanation of the difference is that sophisticated moral rules arise from reflection on the social rules that we have learned. Specifically, we reflect by asking questions about purposes (which I call "search for goals" in some places). When we see an example of a rule and ask about its purpose, we might discover what general purpose it serves. We can then think about how to generalize it so that it serves that purpose. If it does not serve the purpose in some cases, or if it could serve the same purpose better by a modification, then we can think about improving it.
The same process is part of what it means to understand a design such as a mathematical formula, according to my interpretation of David Perkins' book "Knowledge as Design". For example, we understand the formula for the area of a parallelogram (and its associated arguments) by finding that the argument for this rule serves the purpose of converting the parallelogram to a rectangle, and we already know how to find the area of a rectangle. Once we discover this connection, we can apply the same principle elsewhere, as Max Wertheimer shows in the first chapter of "Productive thinking". We can transfer the principle to cases where it applies while avoiding transfer to other cases.
Similarly, a law with a "loophole" is an example of a rule that is crafted in a way that fails to serve its purpose. We can fix laws by removing loopholes.
A law that gives rights, such as the right to vote, drive, or own property, to men but not to women does not seem to serve reasonable accounts of the purposes of such rights-granting laws. We have trouble coming up with a purpose that applies to men but not women. Any such purpose seems arbitrary; it could just as well distinguish people with odd and even birthdays. Such a search for purposes is, I think, the sort of reflection that Peter Singer discussed in "The expanding circle".
Thus it is one thing to learn a rule, but it is another to understand the rule in a way that allows us to ask whether it serves its purpose as well as it could, and, if not, what could replace it. It may be rational from the perspective of learning to learn whatever we are taught about what to do and not do, but, if this is all we did, we would cut off the possibility of improving these rules.
Could most deontological rules survive this kind of questioning?
Wednesday, June 1, 2016
Alternatives to mediation (in data analysis)
The following is not vetted. It is some thoughts inspired by several papers I have dealt with recently. It is also about statistics, a new topic for this blog, but one I will probably write more about.
In some studies we measure several variables, and we are primarily interested in the correlation between two of them, e.g., cognitive style and political ideology. When this correlation is found, we are also interested in what the other variables can tell us about why this happens. For example, we might be interested in things like religiosity, or education. Cognitive style might, for example, affect religiosity, and religion, in turn could affect social conservatism, which is (in some countries) related to religious teaching. Let's call the target variable, political ideology, Y, and the main predictor X, and the other variable M (for "mediator" or "middle"). Assume that X and Y are correlated, and this correlation is what we are trying to explain.
The logic of classical mediation is best understood in terms of simple and partial correlations. For this purpose, partial correlations are equivalent to regressions, since the significance test is the same. If the dependent variable is Y, the predictor is X, and the mediator is M, we need to show that r(X,M) and r(Y,M|X) are both significant. The second is a partial correlation (or regression coefficient, or semi-partial). The first is consistent with the claim that X affects M. The second is consistent with the claim that M affects Y and that this effect is not the result of the correlation of M with X.
Mediation tests are most useful when X is an experimental manipulation. Even then we worry about r(Y,M|X) being an artifact. It could be that the causality is an affect of Y on M rather than an effect of M on Y. Or, Y and M could both be affected by some un-measured fourth variable. We could avoid these problems by experimentally manipulating both X and M. Even then one might argue that the experimental manipulation of M is affecting something different from the M that varies spontaneously in the population or the M that is affected by Y.
More generally, in many tests of mediation, almost anything could cause anything else. Moreover, if X, M, and Y are all influenced by roughly the same causal factors, then M will "mediate" the "effect" of X on Y, or the "effect" of Y on X, if M is just the variable that is most highly correlated with these underlying causes. I have never seen a mediation analysis that attempts to correct for the extent to which the different variables correlate with the causal factors that each is supposed to be sensitive to. This sort of validity coefficient surely affects what counts as a significant mediator and what does not.
Note also that any test of mediation is about variation. It is possible that M does affect Y but that the variation in M is mostly error by the time you remove the common variance between M and X (by partialing).
So what should we do instead? One thing is to look at the simple correlations between X and M and between M and Y. If both are large enough (with "significant" being one criterion of that, but significance depends on sample size, which is irrelevant here), then we would conclude that variation in M is a possible explanation of the correlation between X and Y. It correlates with both of them. If it does not correlate with one of them, and if we have no reason to expect any additional variables that affect M and X or Y in opposite directions (thus obscuring a real correlation), then M could not explain the X Y correlation. (For example, X correlates with sex, but sex does not correlate with some other measure Y. Hence sex is not a possible explanation.)
Here "explaining the correlation" simply means that some source of variation exists that affects X, Y, and M. The fact that M is part of this list tells us something about what that source of variation might be.
Can we say more than this? Consider a stricter criterion. Suppose we regress M on X and Y, and we require that both regression coefficients are present (high enough by some standard). Such a result would seem to rule out the possibility that r(X,M) or r(Y,M) are high simply because X and Y are correlated. Suppose, for example, r(X,M|Y), the regression coefficient or partial correlation, is zero even though r(X,M) is positive. This would suggest that X does not really have any common source of variation with M.
This does not quite follow. For example, it could be that X and M are affected equally by some set of variables Z, but X is affected by some additional variables that also affect Y. Thus, X and M are redundant measures of some of the factors that affect X, M, and Y. Similarly, it could happen that X and M are affected by exactly the same set of variables, but X is a more reliable measure than M. This could reduce the role of M to zero in a regression model.
However, if we regress M on X and Y and find that both coefficients are high enough, then it is more plausible that M is indeed capturing some of common variance affecting X an Y (compared to the simple correlations of M with X and M with Y).
In general, I do not think we can learn much from anything other than the simple correlations r(X,M) and r(M,Y). If both of these are positive, then whatever M "measures" is a possible source of variation that accounts for the correlation between X and Y. But regression of M on X and Y could also be useful, if both coefficients are positive.
Mediation tests do have some uses. They can be useful as a manipulation check, a way of testing whether an experimental variable did what it was supposed to do. And, if its effect varies across subjects, does the variation help to explain the variation in outcomes. For example, cognitive therapy for depression (manpulated) changes how people think about the causes of bad events (measured by the Attributional Style Questionnaire), which, in turn, affects their depressive symptoms. The therapy is focused on the thinking, not the symptoms, so this is a manipulation check.
In some studies we measure several variables, and we are primarily interested in the correlation between two of them, e.g., cognitive style and political ideology. When this correlation is found, we are also interested in what the other variables can tell us about why this happens. For example, we might be interested in things like religiosity, or education. Cognitive style might, for example, affect religiosity, and religion, in turn could affect social conservatism, which is (in some countries) related to religious teaching. Let's call the target variable, political ideology, Y, and the main predictor X, and the other variable M (for "mediator" or "middle"). Assume that X and Y are correlated, and this correlation is what we are trying to explain.
The logic of classical mediation is best understood in terms of simple and partial correlations. For this purpose, partial correlations are equivalent to regressions, since the significance test is the same. If the dependent variable is Y, the predictor is X, and the mediator is M, we need to show that r(X,M) and r(Y,M|X) are both significant. The second is a partial correlation (or regression coefficient, or semi-partial). The first is consistent with the claim that X affects M. The second is consistent with the claim that M affects Y and that this effect is not the result of the correlation of M with X.
Mediation tests are most useful when X is an experimental manipulation. Even then we worry about r(Y,M|X) being an artifact. It could be that the causality is an affect of Y on M rather than an effect of M on Y. Or, Y and M could both be affected by some un-measured fourth variable. We could avoid these problems by experimentally manipulating both X and M. Even then one might argue that the experimental manipulation of M is affecting something different from the M that varies spontaneously in the population or the M that is affected by Y.
More generally, in many tests of mediation, almost anything could cause anything else. Moreover, if X, M, and Y are all influenced by roughly the same causal factors, then M will "mediate" the "effect" of X on Y, or the "effect" of Y on X, if M is just the variable that is most highly correlated with these underlying causes. I have never seen a mediation analysis that attempts to correct for the extent to which the different variables correlate with the causal factors that each is supposed to be sensitive to. This sort of validity coefficient surely affects what counts as a significant mediator and what does not.
Note also that any test of mediation is about variation. It is possible that M does affect Y but that the variation in M is mostly error by the time you remove the common variance between M and X (by partialing).
So what should we do instead? One thing is to look at the simple correlations between X and M and between M and Y. If both are large enough (with "significant" being one criterion of that, but significance depends on sample size, which is irrelevant here), then we would conclude that variation in M is a possible explanation of the correlation between X and Y. It correlates with both of them. If it does not correlate with one of them, and if we have no reason to expect any additional variables that affect M and X or Y in opposite directions (thus obscuring a real correlation), then M could not explain the X Y correlation. (For example, X correlates with sex, but sex does not correlate with some other measure Y. Hence sex is not a possible explanation.)
Here "explaining the correlation" simply means that some source of variation exists that affects X, Y, and M. The fact that M is part of this list tells us something about what that source of variation might be.
Can we say more than this? Consider a stricter criterion. Suppose we regress M on X and Y, and we require that both regression coefficients are present (high enough by some standard). Such a result would seem to rule out the possibility that r(X,M) or r(Y,M) are high simply because X and Y are correlated. Suppose, for example, r(X,M|Y), the regression coefficient or partial correlation, is zero even though r(X,M) is positive. This would suggest that X does not really have any common source of variation with M.
This does not quite follow. For example, it could be that X and M are affected equally by some set of variables Z, but X is affected by some additional variables that also affect Y. Thus, X and M are redundant measures of some of the factors that affect X, M, and Y. Similarly, it could happen that X and M are affected by exactly the same set of variables, but X is a more reliable measure than M. This could reduce the role of M to zero in a regression model.
However, if we regress M on X and Y and find that both coefficients are high enough, then it is more plausible that M is indeed capturing some of common variance affecting X an Y (compared to the simple correlations of M with X and M with Y).
In general, I do not think we can learn much from anything other than the simple correlations r(X,M) and r(M,Y). If both of these are positive, then whatever M "measures" is a possible source of variation that accounts for the correlation between X and Y. But regression of M on X and Y could also be useful, if both coefficients are positive.
Mediation tests do have some uses. They can be useful as a manipulation check, a way of testing whether an experimental variable did what it was supposed to do. And, if its effect varies across subjects, does the variation help to explain the variation in outcomes. For example, cognitive therapy for depression (manpulated) changes how people think about the causes of bad events (measured by the Attributional Style Questionnaire), which, in turn, affects their depressive symptoms. The therapy is focused on the thinking, not the symptoms, so this is a manipulation check.
Subscribe to:
Posts (Atom)