Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Ecological fallacy

From Wikipedia, the free encyclopedia
Formal fallacy in statistical interpretation
icon
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Ecological fallacy" – news ·newspapers ·books ·scholar ·JSTOR
(March 2016) (Learn how and when to remove this message)

Anecological fallacy (alsoecologicalinference fallacy[1] orpopulation fallacy) is aformal fallacy in the interpretation ofstatistical data that occurs wheninferences about the nature of individuals are deduced from inferences about the group to which those individuals belong. "Ecological fallacy" is a term that is sometimes used to describe thefallacy of division, which is not a statistical fallacy. The four common statistical ecological fallacies are: confusion between ecological correlations and individual correlations, confusion between group average and total average,Simpson's paradox, and confusion between higher average and higher likelihood. From a statistical point of view, these ideas can be unified by specifying proper statistical models to make formal inferences, using aggregate data to make unobserved relationships in individual level data.[2]

Examples

[edit]

Mean and median

[edit]
icon
This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(September 2025) (Learn how and when to remove this message)

An example of ecological fallacy is the assumption that a population mean has a simple interpretation when considering likelihoods for an individual.

For instance, if the mean score of a group is larger than zero, this does not imply that a random individual of that group is more likely to have a positive score than a negative one (as long as there are more negative scores than positive scores an individual is more likely to have a negative score). Similarly, if a particular group of people is measured to have a lower mean IQ than the general population, it is an error to conclude that a randomly-selected member of the group is more likely than not to have a lower IQ than the mean IQ of the general population; it is also not necessarily the case that a randomly selected member of the group is more likely than not to have a lower IQ than a randomly-selected member of the general population. Mathematically, this comes from the fact that a distribution can have a positive mean but a negative median. This property is linked to theskewness of the distribution.

Consider the following numerical example:

  • Group A: 80% of people got 40 points and 20% of them got 95 points. The mean score is 51 points.
  • Group B: 50% of people got 45 points and 50% got 55 points. The mean score is 50 points.
  • If we pick two people at random from A and B, there are 4 possible outcomes:
    • A – 40, B – 45 (B wins, 40% probability – 0.8 × 0.5)
    • A – 40, B – 55 (B wins, 40% probability – 0.8 × 0.5)
    • A – 95, B – 45 (A wins, 10% probability – 0.2 × 0.5)
    • A – 95, B – 55 (A wins, 10% probability – 0.2 × 0.5)
  • Although Group A has a higher mean score, 80% of the time a random individual of A will score lower than a random individual of B.

Individual and aggregate correlations

[edit]

Research dating back toÉmile Durkheim suggests that predominantlyProtestant localities have highersuicide rates than predominantlyCatholic localities.[3] According to Freedman,[4] the idea that Durkheim's findings link, at an individual level, a person's religion to their suicide risk is an example of the ecological fallacy. A group-level relationship does not automatically characterize the relationship at the level of the individual.

Similarly, even if at the individual level,wealth is positively correlated to tendency to voteRepublican in theUnited States, we observe that wealthier states tend to voteDemocratic. For example, in the2004 United States presidential election, the Republican candidate,George W. Bush, won the fifteen poorest states, and the Democratic candidate,John Kerry, won 9 of the 11 wealthiest states in theElectoral College. Yet 62% of voters with annual incomes over $200,000 voted for Bush, but only 36% of voters with annual incomes of $15,000 or less voted for Bush.[5]Aggregate-level correlation will differ from individual-level correlation if voting preferences are affected by the total wealth of the state even after controlling for individual wealth. The true driving factor in voting preference could be self-perceivedrelative wealth; perhaps those who see themselves as better off than their neighbours are more likely to vote Republican. In this case, an individual would be more likely to vote Republican if they became wealthier, but they would be more likely to vote for a Democrat if their neighbor's wealth increased (resulting in a wealthier state).

However, the observed difference in voting habits based on state- and individual-level wealth could also be explained by the common confusion between higher averages and higher likelihoods as discussed above. States may not be wealthier because they contain more wealthy people (i.e., more people with annual incomes over $200,000), but rather because they contain a small number of super-rich individuals; the ecological fallacy then results from incorrectly assuming that individuals in wealthier states are more likely to be wealthy.

Many examples of ecological fallacies can be found in studies of social networks, which often combine analysis and implications from different levels. This has been illustrated in an academic paper on networks of farmers inSumatra.[6]

Robinson's paradox

[edit]

A 1950 paper by William S. Robinson computed the illiteracy rate and the proportion of the population born outside the US for each state and for the District of Columbia, as of the1930 census.[7] He showed that these two figures were associated with a negative correlation of −0.53; in other words, the greater the proportion of immigrants in a state, the lower its average illiteracy (or, equivalently, the higher its average literacy). However, when individuals are considered, the correlation between illiteracy and nativity was +0.12 (immigrants were on average more illiterate than native citizens). Robinson showed that the negative correlation at the level of state populations was because immigrants tended to settle in states where the native population was more literate. He cautioned against deducing conclusions about individuals on the basis of population-level, or "ecological" data. In 2011, it was found that Robinson's calculations of the ecological correlations are based on the wrong state level data. The correlation of −0.53 mentioned above is in fact −0.46.[8] Robinson's paper was seminal, but the term 'ecological fallacy' was not coined until 1958 by Selvin.[9]

Formal problem

[edit]
icon
This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(September 2025) (Learn how and when to remove this message)

The correlation of aggregate quantities (orecological correlation) is not equal to the correlation of individual quantities. Denote byXiYi two quantities at the individual level. The formula for the covariance of the aggregate quantities in groups of sizeN is

cov(i=1NYi,i=1NXi)=i=1Ncov(Yi,Xi)+i=1Nlicov(Yl,Xi){\displaystyle \operatorname {cov} \left(\sum _{i=1}^{N}Y_{i},\sum _{i=1}^{N}X_{i}\right)=\sum _{i=1}^{N}\operatorname {cov} (Y_{i},X_{i})+\sum _{i=1}^{N}\sum _{l\neq i}\operatorname {cov} (Y_{l},X_{i})}

The covariance of two aggregated variables depends not only on the covariance of two variables within the same individuals but also on covariances of the variables between different individuals. In other words, correlation of aggregate variables take into account cross sectional effects which are not relevant at the individual level.

The problem for correlations entails naturally a problem for regressions on aggregate variables: the correlation fallacy is therefore an important issue for a researcher who wants to measure causal impacts. Start with a regression model where the outcomeYi{\displaystyle Y_{i}} is impacted byXi{\displaystyle X_{i}}

Yi=α+βXi+ui,{\displaystyle Y_{i}=\alpha +\beta X_{i}+u_{i},}
cov[ui,Xi]=0.{\displaystyle \operatorname {cov} [u_{i},X_{i}]=0.}

The regression model at the aggregate level is obtained by summing the individual equations:

i=1NYi=αN+βi=1NXi+i=1Nui,{\displaystyle \sum _{i=1}^{N}Y_{i}=\alpha \cdot N+\beta \sum _{i=1}^{N}X_{i}+\sum _{i=1}^{N}u_{i},}
cov[i=1Nui,i=1NXi]0.{\displaystyle \operatorname {cov} \left[\sum _{i=1}^{N}u_{i},\sum _{i=1}^{N}X_{i}\right]\neq 0.}

Nothing prevents the regressors and the errors from being correlated at the aggregate level. Therefore, generally, running a regression on aggregate data does not estimate the same model than running a regression with individual data.

The aggregate model is correct if and only if

cov[ui,k=1NXk]=0 for all i.{\displaystyle \operatorname {cov} \left[u_{i},\sum _{k=1}^{N}X_{k}\right]=0\quad {\text{ for all }}i.}

This means that, controlling forXi{\displaystyle X_{i}},k=1NXk{\displaystyle \sum _{k=1}^{N}X_{k}} does not determineYi{\displaystyle Y_{i}}.

Choosing between aggregate and individual inference

[edit]

Running regressions on aggregate data can yield valid (unbiased, under the usual identification assumptions) estimates when the target of inference is the aggregate model. For example, for state-level policy analysis, estimating the relationship between police force and crime using state-level data is appropriate for assessing the effect of a statewide change in police staffing. By contrast, inferring city-level impacts from state-level associations constitutes an ecological fallacy, since parameters estimated at the aggregate level need not identify effects at finer geographic units.

Choosing to run aggregate or individual regressions to understand aggregate impacts on some policy depends on the following trade-off: aggregate regressions lose individual-level data but individual regressions add strong modeling assumptions. Some researchers suggest that the ecological correlation gives a better picture of the outcome of public policy actions, thus they recommend the ecological correlation over the individual level correlation for this purpose.[10] Other researchers disagree, especially when the relationships among the levels are not clearly modeled. To prevent ecological fallacy, researchers with no individual data can model first what is occurring at the individual level, then model how the individual and group levels are related, and finally examine whether anything occurring at the group level adds to the understanding of the relationship. For instance, in evaluating the impact of state policies, it is helpful to know that policy impacts vary less among the states than do the policies themselves, suggesting that the policy differences are not well translated into results, despite high ecological correlations.[11]

Group and total averages

[edit]
icon
This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(September 2025) (Learn how and when to remove this message)

Ecological fallacy can also refer to the following fallacy: the average for a group is approximated by the average in the total population divided by the group size. Suppose one knows the number of Protestants and the suicide rate in the US, but one does not have data linking religion and suicide at the individual level. If one is interested in the suicide rate of Protestants, it is a mistake to estimate it by the total suicide rate divided by the number of Protestants.Formally, denoteP[SuicideProtestant]{\displaystyle P[{\text{Suicide}}\mid {\text{Protestant}}]} the mean of the group, we generally have:

P[SuicideProtestant]P[Suicide]P(Protestant){\displaystyle P[{\text{Suicide}}\mid {\text{Protestant}}]\neq {\frac {P[{\text{Suicide}}]}{P({\text{Protestant}})}}}

However, thelaw of total probability gives

P[Suicide]=P[SuicideProtestant]P(Protestant)+P[Suicidenot Protestant](1P(Protestant)){\displaystyle {\begin{aligned}P[{\text{Suicide}}]={\color {Blue}P[{\text{Suicide}}\mid {\text{Protestant}}]}P({\text{Protestant}})+{\color {Blue}P[{\text{Suicide}}\mid {\text{not Protestant}}]}(1-P({\text{Protestant}}))\end{aligned}}}

As we know thatP[Suicidenot Protestant]{\displaystyle P[{\text{Suicide}}\mid {\text{not Protestant}}]} is between 0 and 1, this equation gives a bound forP[SuicideProtestant]{\displaystyle P[{\text{Suicide}}\mid {\text{Protestant}}]}.

Simpson's paradox

[edit]
icon
This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(September 2025) (Learn how and when to remove this message)
Main article:Simpson's paradox

Simpson's paradox is an ecological fallacy:[12] when comparing two populations divided into groups, the average of some variable in the first population can be higher in every group and yet lower in the total population. Formally, when each value ofZ refers to a different group andX refers to some treatment, it can happen that

E[YZ=z,X=1]>E[YZ=z,X=0] for all z, while E[YX=1]<E[YX=0]{\displaystyle E[Y\mid Z=z,X=1]>E[Y\mid Z=z,X=0]\ {\text{for all }}z,{\text{ while }}E[Y\mid X=1]<E[Y\mid X=0]}

WhenE[YZ=z,X=1]E[YZ=z,X=0]{\displaystyle E[Y\mid Z=z,X=1]-E[Y\mid Z=z,X=0]} does not depend onZ{\displaystyle Z}, the Simpson's paradox is exactly theomitted variable bias for the regression ofY onX where the regressorX{\displaystyle X} is adummy variable and the omitted variableZ{\displaystyle Z} is acategorical variable defining groups for each value it takes. The bias can be high enough for conditional and unconditional effects to have opposite signs.

Legal applications

[edit]

The ecological fallacy was discussed in a court challenge to the2004 Washington gubernatorial election in which a number of illegal voters were identified, after the election; their votes were unknown, because the vote was bysecret ballot. The challengers argued that illegal votes cast in the election would have followed the voting patterns of the precincts in which they had been cast, and thus adjustments should be made accordingly.[13] An expert witness said this approach was like trying to figure outIchiro Suzuki's batting average by looking at the batting average of the entireSeattle Mariners team, since the illegal votes were cast by an unrepresentative sample of each precinct's voters, and might be as different from the average voter in the precinct as Ichiro was from the rest of his team.[14] The judge determined that the challengers' argument was an ecological fallacy and rejected it.[15]

See also

[edit]

References

[edit]
  1. ^Charles Ess; Fay Sudweeks (2001).Culture, technology, communication: towards an intercultural global village. SUNY Press. p. 90.ISBN 978-0-7914-5015-4.The problem lies with the 'ecological fallacy' (or fallacy of division)—the impulse to apply group or societal level characteristics to individuals within that group.
  2. ^King, Gary (1997).A Solution to the Ecological Inference Problem. Princeton University Press.ISBN 978-0-691-01240-7.
  3. ^Durkheim, (1951/1897).Suicide: A study in sociology. Translated by John A. Spaulding and George Simpson. New York: The Free Press.ISBN 0-684-83632-7.
  4. ^Freedman, D. A. (1999). Ecological Inference and the Ecological Fallacy.International Encyclopedia of the Social & Behavioral Sciences, Technical Report No. 549.https://web.stanford.edu/class/ed260/freedman549.pdf
  5. ^Gelman, Andrew; Park, David; Shor, Boris; Bafumi, Joseph; Cortina, Jeronimo (2008).Red State, Blue State, Rich State, Poor State.Princeton University Press.ISBN 978-0-691-13927-2.
  6. ^Matous, Petr (2015)."Social networks and environmental management at multiple levels: soil conservation in Sumatra".Ecology and Society.20 (3): 37.doi:10.5751/ES-07816-200337.hdl:10535/9990.
  7. ^Robinson, W.S. (1950). "Ecological Correlations and the Behavior of Individuals".American Sociological Review.15 (3):351–357.doi:10.2307/2087176.JSTOR 2087176.
  8. ^The research note on this curious data glitch is published inTe Grotenhuis, Manfred; Eisinga, Rob; Subramanian, S.V. (2011)."Robinson'sEcological Correlations and the Behavior of Individuals: methodological corrections".Int J Epidemiol.40 (4):1123–1125.doi:10.1093/ije/dyr081.hdl:2066/99678.PMID 21596762. The data Robinson used and the corrections are available at[1]
  9. ^Selvin, Hanan C. (1958). "Durkheim'sSuicide and Problems of Empirical Research".American Journal of Sociology.63 (6):607–619.doi:10.1086/222356.S2CID 143488519.
  10. ^Lubinski, David; Humphreys, Lloyd G. (1996). "Seeing the forest from the trees: When predicting the behavior or status of groups, correlate means".Psychology, Public Policy, and Law.2 (2):363–376.doi:10.1037/1076-8971.2.2.363.
  11. ^Rose, D. D. (1973). "National and local forces in state politics: The implications of multi-level policy analysis".American Political Science Review.67 (4):1162–1173.doi:10.2307/1956538.
  12. ^Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables".Journal of the Royal Statistical Society, Series B.13 (2):238–241.doi:10.1111/j.2517-6161.1951.tb00088.x.
  13. ^George Howland Jr. (May 18, 2005)."The Monkey Wrench Trial: Dino Rossi's challenge of the 2004 election is on shaky legal ground. But if he prevails, watch litigation become an option in close races everywhere".Seattle Weekly. Archived fromthe original on December 1, 2008. RetrievedDecember 17, 2008.
  14. ^Christopher Adolph (May 12, 2005)."Report on the 2004 Washington Gubernatorial Election". Expert witness report to the Chelan County Superior Court in Borders et al v. King County et al.
  15. ^Borders et al. v. King County et al.Archived 2008-10-18 at theWayback Machine, transcript of the decision byChelan County Superior Court Judge John Bridges, June 6, 2005, published: June 8, 2005
Commonfallacies (list)
Formal
Inpropositional logic
Inquantificational logic
Syllogistic fallacy
Informal
Equivocation
Question-begging
Correlative-based
Illicit transference
Secundum quid
Faulty generalization
Ambiguity
Questionable cause
Appeals
Consequences
Emotion
Genetic fallacy
Ad hominem
Otherfallacies
of relevance
Arguments
Retrieved from "https://en.wikipedia.org/w/index.php?title=Ecological_fallacy&oldid=1315606247"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp