The termalgorithmic fairness is used to assess whethermachine learning algorithms operate fairly. To get a sense of whenalgorithmic fairness is at issue, imagine a data scientist is providedwith data about past instances of some phenomenon: successfulemployees, inmates who when released from prison go on to reoffend,loan recipients who repay their loans, people who click on anadvertisement, etc. and is tasked with developing an algorithm thatwill predict other instances of these phenomena. While an algorithmcan be successful or unsuccessful at its task to varying degrees, itis unclear what makes such an algorithm fair or unfair.
To some, algorithmic fairness is a comparative concept. To see if analgorithm is fair, one looks at how one person or group is assessed bythe algorithm as compared to how another person or group is assessed.While such comparative accounts are all committed to the principlethat like cases should be treated alike, they differ with regard towhat dimensions of likeness they assert matter morally, and why. Toothers, algorithmic fairness does not have this comparative dimension.Instead, an algorithm is unfair if it is inaccurate or if itsprocesses are obscure, for example. Additional fairness issues focusspecifically on the data from which an algorithm is developed. Onemight wonder whether the only obligations regarding the use of dataare epistemic in nature, requiring accurate and representative data,or if, instead, data scientists should be concerned about the ways inwhich reliance of accurate data perpetuate unfairness.
In what follows, each of these possible fairness issues, and others,are addressed.Section 1 contains an introduction to the topic, and to an important real-worldcontroversy that has been a focus of much scholarship in this area.Section 2 discusses comparative accounts of algorithmic fairness.Section 3 then turns to non-comparative accounts of algorithmic fairness.Section 4 focuses on moral and epistemic issues relating to data collection anduse. Finally,Section 5 turns to a specific conceptual problem. Because machine learningalgorithms will identify traits that are correlated with legallyprotected traits like race and sex, scholars wonder when and why thesecorrelated traits should be considered as “proxies” forthe protected traits.
The first question we might ask aboutalgorithmic fairness iswhether the term identifies a distinct moral notion that differs inimportant ways from the more generic (albeit contested) notion offairness. In other words, is the concept ofalgorithmicfairness simply describing what constitutesfairnesswhen machine learning algorithms are used in various contexts? Theanswer appears to be yes. “Algorithmic fairness” does nothave a special meaning that is normatively distinctive. Rather theterm refers to debates in the literature regarding when and whetherthe use of machine learning algorithms in different ways and indifferent contexts isfair, simpliciter. That said, becausefairness itself is a contested notion, it is unsurprisingthat it is used in the algorithmic fairness literature to pick outdifferent moral notions. For example, in the philosophical literature,“fairness” can refer to question of distributive justiceand also to questions about how one person ought to relate to another.In addition, people disagree about whatfairness requires ineach of these contexts.
Some complaints of unfairness in the contexts of algorithms focus onhow one person or group is treated as compared to how another personor group is treated. Other complaints of unfairness have a differentform: a person asserts that she is treated less well in some way thanshe ought to be treated. Feinberg divided conceptions of justice intotwo types in a similar way (Feinberg 1974: 298). As he explained
justice consists in giving a person his due, but in some caseone’s due is determined independently of that of other people,while in other cases, a person’s due is determinableonly by reference to his relations to other persons.(Feinberg 1974: 298)
By contrast, some people may reserve “fairness” for thecomparative concept and “justice” for the non-comparativeconcept. In U.S. constitutional law, questions about whether a personis treated as well as others are largely addressed via the“equal protection clause” and questions regarding whethera person is treated as she should be (understood in thenon-comparative sense) are addressed via the “due processclause”, which might be thought to capture notions of fairnessand justice respectively. Some scholars might thus contest whether thenon-comparative complaints are best thought of as raising issues of“fairness” at all because these scholars understand“fairness” as an inherently comparative notion, in whichwhat one is due can only be ascertained by reference to how others aretreated. As this entry aims to canvas the debates that make up thealgorithmic fairnessliterature, it includes a discussion ofthe non-comparative complaints as well, whether those are properlyunderstood as issues of fairness or not. The important point to stressis not the terminological dispute about whether particular issues arematters of “fairness” or instead of “justice”but rather that this literature includes discussion of bothcomparative and non-comparative complaints.
Machine learning algorithms, their use and the data on which theyrely, have spawned a significant literature. In addition, this fieldis new and dynamic. The algorithmic fairness literature isparticularly difficult to summarize because the field isinterdisciplinary and draws on work in philosophy, computer science,law, information science and other fields. As a result, the discussionbelow provides an overview of some of the central and emerging debatesbut is bound to be incomplete.
That said, one should begin by noting a few important distinctions.First, some algorithms are used to allocate or assist in allocatingbenefits and burdens. For example, a bank might use an algorithmictool to determine who should be granted loans. Or a state might use analgorithmic tool to assist in determining which inmates should remainin jail and which should be released on parole. Other algorithms areused to provide information, like translation services or searchengines, or to accomplish tasks, like driving or facial recognition.While each of these functions raises fairness issues, because theallocative function has spawned the most literature, this entry willdevote the most attention to it.
Second, some algorithms are used to predict future events like whowill repay a loan or who will commit future crime. Others are used to“predict” or report some presenting occurring fact likewhether an individual currently has a disease or whether the personholding the phone is the owner of the phone. Third, the scorepresented by the algorithm can be binary (yes or no) orcontinuous.
For example, suppose a state uses an algorithmic tool to evaluate howlikely a person it arrests is to commit another crime and uses thisscore to assist in setting bail. This algorithm predicts afuture event and is used toallocate benefits andburdens (high bail, low bail, no bail). If the score it produces isbinary, it will provide one of two scores (high risk or low risk, forexample). If it is continuous, it will provide a number (between 0 and1, for example) which indicates the likelihood that the scoredindividual will commit a future crime.
Alternatively, consider the context of a facial recognition tool usedto unlock a phone. An algorithm used for this purpose is used todetermine what is presently occurring: whether the person in front ofthe phone is the person who owns the phone. The algorithmic tool isnot used to allocate benefits or burdens, rather it is used for atask, to ensure the security of the phone. The score it provides isbinary (recognition, non-recognition), which in turns determinewhether the phone is unlocked or not.
Interestingly, when an algorithm is used to determine a presentlyoccurring fact, there is a truth about the matter at issue (the personholding the phone is the owner or is not). Where there is a truth tobe ascertained, one can say that the algorithm scored the individualcorrectly or incorrectly. By contrast, when an algorithm is used topredict a future event, ascertaining its accuracy is more complex.Suppose the algorithm determines that the scored individual is likelyto reoffend (meaning, say, .7 of the people so scored will reoffend).In that case, the algorithm may be accurate if 7 out of 10 of thepeople scored as likely to reoffend do reoffend even if thisparticular individual does not.
As a descriptive matter, the algorithmic fairness literature wasjumpstarted by a particular real-world controversy about the use ofthe algorithmic tool (COMPAS). Because the COMPAS controversy playedsuch a pivotal role in framing how latter debates evolved, it makessense to begin by describing COMPAS and the debate surrounding it.
The Correctional Offender Management Profile for Alternative Sanctions(COMPAS) algorithm predicts recidivism risk and is used by many U.S.states and localities to aid officials in making decisions about bail,criminal sentencing and parole. COMPAS bases its predictions on theanswers to a series of questions. Of note, COMPAS does not base itspredictions on the race of the people it scores. Nonetheless, in 2016,the website ProPublica published an article alleging that COMPAStreated Black people differently than it treated white people in a waythat was unfair to Black people. In particular, the Black peoplescored by the algorithm were far more likely than were white peoplealso scored by the algorithm to be erroneously classified as risky.ProPublica claimed:
In forecasting who would re-offend, the algorithm made mistakes withblack and white defendants at roughly the same rate but in verydifferent ways. The formula was particularly likely to falsely flagblack defendants as future criminals, wrongly labeling them this wayat almost twice the rate as white defendants. White defendants weremislabeled as low risk more often than black defendants. (Angwin etal. 2016)
This passage from the ProPublica article illustrates the controversyas a disagreement about how to assess whether an algorithm treatsthose it scores fairly. On the one hand, a Black person and whitepersonwho were each given the same score were equally likelyto recidivate. Thus, according to one measure, COMPAS treated the twogroups equally. However, a Black person and a white personwho didnot go on to recidivate were not equally likely to be scored aslow risk—the Black person was significantly more likely to havebeen scored as high risk than the white person. These different waysof assessing whether COMPAS is fair pick out two different measures,each of which was offered as a measure of fairness.
Following the publication of the ProPublica article, computerscientists demonstrated that except in highly unusual circumstances,it is impossible to treat two groups of people equally according toboth those measures simultaneously (Kleinberg et al. 2016: 1–3;Chouldechova 2016: 5). This conclusion is often referred to as the“impossibility result” or the“Kleinberg-Chouldechova impossibility theorem”. After thepublication of these papers, other scholars proposed other potentialstatistical measures of fairness. Yet the impossibility of satisfyingall at once, and especially ones that had a family resemblance to thefirst two flagged in the ProPublica controversy, has endured.
The fact that this controversy has played such a pivotal role in thedevelopment of the field of algorithmic fairness has influenced theliterature in multiple ways. Perhaps most importantly, it has led manyto assume that mathematical formulas ofsome kind (eventhough there is disagreement about which ones) are important toassessing the fairness of algorithms. This is a controversial claim,however, with which not all scholars agree. For example, Green and Huchallenge the significance of the impossibility result for discussionsof algorithmic fairness, arguing that “[l]abeling a particularincompatibility of statistics as an impossibility of fairnessgenerally is mistaking the map for the territory” (Green &Hu 2018: 3). That said, the fact that proposed statistical measures ofalgorithmic fairness conflict has led to a rich debate about whichones, if any, relate to fairness, understood as a moral notion, andwhy.
The debate between different statistical measures, as well as thedebate about whether any statistical measures are relevant, highlightsthe fact that the algorithmic fairness literature has a hole at itsvery center. It is because scholars in this field have not clarifiedwhat they understandfairness to require that they often talkpast one another. This Entry will attempt to clarify the differentconceptions of fairness at issue and how they track different views ofalgorithmic fairness that are extant in the literature.
The measure COMPAS used to assess the fairness of its product is oftencalled “predictive parity” and what it requires is thatthose given the same score by the algorithm must have an equallikelihood of recidivating. For example, the score of high risk mustbe equally predictive of recidivism for Black people as for whitepeople to satisfy predictive parity.
An alternative way to assess whether the algorithm is fair, and whichProPublica replied upon, looks at whether the false positive rate,false negative rate, or both, are the same for each of the relevantgroups. The false positive rate is the number of people erroneouslypredicted to have the trait in question out of the total number ofpeople predicted to have the trait. To have equal false positiverates, that number must be the same for each of the groups we arecomparing. The term “equalized odds” refers to thesituation where both the false positive rate and the false negativerate is the same for each of the groups being compared.
The reason that COMPAS cannot achieve both predictive parity andequalized odds is because the rate at which Black people and whitepeople recidivate is different (the “base rate”). Analgorithmic tool cannot satisfy both predictive parity and equalizedodds when base rates are different unless the tool is perfectlyaccurate. As most predictive algorithms are imperfect and base ratesof properties of interest like recidivism, loan repayment, and successon the job often differ between groups, it will be impossible tosatisfy both types of measures in most contexts (Kleinberg et al.2016: 3: Chouldechova 2016: 5).
Noting group-based differences in base rates gives rise to severalimportant questions including the following. Is the base rate dataaccurate? If it is not, how does this inaccuracy affect algorithmicfairness? If the data is accurate, one might wonder what caused theobserved differences between socially salient groups like racialgroups. Does the causal etiology affect how we should think aboutalgorithmic fairness? Questions about how the accuracy of data relateto fairness will be discussed inSection 4.
Predictive parity and equalized odds are the most prominent of thestatistical measures of algorithmic fairness proposed in theliterature, but more recently other measures have been suggested. Forexample, Hedden canvases eleven different statistical measures thathave been proposed—though several share a family resemblance(Hedden 2021: 214–218). In particular, predictive parity has asimilar motivation as “Equal Positive Predictive Value”,which requires that the “(expected) percentage of individualspredicted to be positive who are actually positive is the same foreach relevant group” (Hedden 2021: 215).
There are also several measures that focus in different ways on errorrates including ones that require false positive rates, false negativerates, or both to be equal, as well as those that look to the ratio offalse positive to false negative rates. In addition to these twofamilies of metrics, we also might measure the fairness of analgorithm by reference to whether the percentage of a socially salientgroup that is predicted to be positive (or negative) is the same foreach group. This criterion resembles the U.S. legal concept ofdisparate impact. Some scholars have also proposed composite measuresthat attempt to maximize the satisfaction of measures from each familyof approaches.
The controversy about which statistical measure to prioritize can beusefully understood as grounded, albeit implicitly, on assumptionsabout what the moral notion of fairness requires. What that notion is,however, can be difficult to unearth as scholars writing in this fieldoften implicitly assume that something is manifestly fair or unfairwithout being explicit about what they understand fairness to be.
On one view, what fairness requires is treating like cases alike.Views that fall within this framework see fairness as a comparativenotion. This conception of fairness focuses to how one person or groupis treatedas compared to how another person or group istreated. Conceptions of algorithmic fairness that are not comparativein orientation are discussed inSection 3.
The view that fairness requires thatlike cases should be treatedalike does not help to settle the COMPAS dispute however as thismaxim can support both types of so-called fairness measures offered bythe parties to that controversy. The treat likes alike principle (orTLA for short) could require that people whoare equallylikely to recidivate are treated the same, meaning that they get thesame risk score (as equalized odds would require). Or alternatively,the TLA principle could require that two people with the same riskscore are equally likely to recidivate (as predictive parityrequires).
The TLA principle, while important, thus provides only the firstorganizing division among competing conceptions of fairness that areextant in the literature. While it does not provide a way to choosebetween the two measures proposed in the COMPAS controversy, it allowsus to see what the supporters and critics of COMPASagreedabout regarding the nature of algorithmic fairness (and how theydiffer from the non-comparative conceptions of fairness discussed in§3).
For one family of views (and the scholars who defend them), thelikeness that is relevant for fairness has an epistemic cast. To befair, a risk score derived from an algorithmic tool must “meanthe same thing” for members of group X and group Y. Hedden, forexample, argues that “Calibration with Groups” is the onlystatistical measure that is necessary for fairness in precisely theseterms. Calibration within Groups, according to Hedden, requiresthat
[f]or each possible risk score, the (expected) percentage ofindividuals assigned that risk score who are actually positive is thesame for each relevant group and is equal to that risk score. (Hedden2021: 214)
The reason Calibration Within Groups is necessary for fairness, in hisview, is that when it is violated “that would mean that the samerisk score would have different evidential import for the twogroups” which amounts, in his view, to “treatingindividuals differently in virtue of their differing groupmembership” (Hedden 2021: 225–26). In this passage, we cansee that Hedden assumes that fairness requires that scores mean thesame thing and that treating people differently in virtue of theirgroup membership just is giving them risk scores with differentevidential support.
Some scholars challenge whether Calibration Within Groups does in factassure that a score will mean the same thing for each of the groupsaffected. Hu argues that while an algorithm may achieve calibrationwithin groups when we focus on a particular pair of groups (blacks andwhites, men and women), it will not achieve calibration across allpossible groups to which the actual people belong unless it isperfectly accurate. For that reason, Hu concludes that it is false toclaim that the scoremeans the same thing, no matter thegroup to which a person belongs (Hu 2025).
Eva, liked Hedden, also understands fairness in epistemic terms,though he speaks about accuracy, rather than meaning. Eva proposes astatistical criterion of fairness he calls “base ratetracking” which requires that the average risk scores for eachgroup mirror the base rates of the underlying phenomenon in each group(Eva 2022: 258). Fairness is, on this view, a matter of the scoreaccurately reflecting the world for each of the groups at issue.Interestingly, Eva’s view not only requires that differences inthe scores reflect differences in underlying base rates. In his view,differences in base rates must also be reflected in differences inscores. For Eva, this requirement rests on what he takes to be a“natural intuition: that it would be unfair to treat two groupsas equally risky if one was in fact more risky than another”(Eva 2022: 262). However, this intuition may not be as universal asEva supposes, as sometimes what fairness requires is ignoringdifferences between people and treating people who are different thesame (Schauer 2003: chapter 7). For example, airport screeningrequires that everyone put their bags through a screener despite thefact that people are not equally likely to be carrying impermissibleitems onto the plane. In this case, treating people who pose differentdegrees of risk the same is seen as the embodiment of fairness anddepartures from this process (increased screening for high-riskindividuals, for example) are seen as potentially unfair.
Scholars who focus on the false positive rate, false negative rate orboth (i.e., equalized odds) see the fairness of an algorithmic scorein terms other than evidential support and accuracy. Instead, theyspeak in terms of “care”, about whether and how thealgorithm’s predictions affect existing social structures andtend to emphasize the connection between the algorithm itself and thedecisions it aids and supports.
Babic and Johnson King, for example, argue that fairness requires thatan actor care equally about each of the groups scored by the algorithm(Babic & Johnson King 2025: 116). They operationalize the notionof caring, in the algorithmic context, by reference to the ratio offalse positives to false negatives and treat differences in this ratioas prima facie evidence of unfairness.
Lazar and Stone argue that a model is unfair if it is less accuratefor members of disadvantaged groups than for advantaged groups andcould be improved without unreasonably compromising predictions foradvantaged groups (Lazar & Stone 2023: 112–15). Theyemphasize that when people endorse a model that works less well forthe disadvantaged group and especially if this fact results from priorinjustice toward members of these disadvantaged groups, the user ofthe algorithm (unfairly) reinforces structures of injustice in theworld.
Some scholars shift the emphasis from the scores themselves to howthey are used. These scholars resist a sharp delineation betweenassessing the fairness of the algorithm itself and assessing itsfairness in the context of its likely use. To see the intuitionanimating this view, imagine that COMPAS is not being used to aiddecisionmakers in making decisions about who will remain in prison butinstead to determine who will be provided with beneficial services.The fact that the algorithm generates more false positives for Blackinmates than for white inmates seems unfair to the Black inmates ifthe result will be incarceration but not if it will be beneficialservices (Mayson 2019: 2293; Grant 2023: 101).
Because the consequences of each type of error (false positives andfalse negatives) differ depending the action to be taken as a result(incarceration and high bail versus beneficial services), thesignificance of each type of error differs depending on what willhappen in response to the score. In addition, the consequences of thesame type of error can be different for different groups. For example,the cost of setting bail higher than warranted may be more burdensomefor an already disadvantaged social group than for one that is moreprivileged. Long, for example, argues that treating people fairlyrequires valuing the risk and cost of error equally for the relevantgroups being compared. Where the costs of error for two groups differ,one should take this into account by setting the threshold for therelevant action differently for each group (Long 2022:54–55).
Scholars of algorithmic fairness also differ in their methodologicalapproach. Some approach the problem of assessing whether an algorithmtreats like cases alike by constructing hypothetical cases in whichthe question is asked about how Group A is treated as compared toGroup B, where both groups are artificial groups constructed for thehypothetical. Others focus on real social groups like racial groups.These methodological differences likely embed assumptions about thenature of social groups. For example, Hedden builds his argument infavor of “Calibration within Groups” using an example inwhich the groups at issue are members of room A or room B rather thanusing socially salient groups. In so doing, he assumes thatdifferences between such hypothetical groups and real social groups donot matter to ascertaining what fairness requires, a position thatothers contest. In other words, fairness in the view of Hedden andothers, can be understood as a purely abstract notion, untethered fromthe real-world social situations in which algorithms and the systemsthat deploy them are used. To others, described below, fairness alwaysrequires assessing how an algorithmic tool interacts with a historyand culture that may be marked by injustice.
Eva emphasizes that an algorithm is unlikely to be calibrated withregard to all possible groups and recognizes that we are likely to beinterested only in whether a statistical measure of fairness issatisfied with regard to significant social groups like those definedby race and sex (Eva 2022). However, he also thinks that statisticalmeasures of fairness speak to whether an algorithm is“intrinsically” fair or unfair, in a manner that ismeaningful for both socially salient groups and artificial groups.
For Hedden and Eva, the fairness of algorithms (or at least theirso-called “inherent fairness”) can be assessed by ignoringreal world facts about actual social groups. For others, the socialsituation of disadvantaged groups is relevant to the inherent fairnessof algorithms. For example, Grant emphasizes the fact that people whoare disadvantaged often have fewer of the features that predictdesirable outcomes. Living in a poor neighborhood is predictive ofrecidivism and failure to repay loans. As a result, individual poorpeople who are in fact unlikely to recidivate and likely to repaytheir loans will have difficulty identifying themselves as such, asthey will have difficulty showing that they differ from the sociallysalient groups to which they belong. Grant calls this problem“evidentiary unfairness” (Grant 2023: 101).
Lazar and Stone make the case for what they term “situatedpredictive justice”. They argue that if a model performs lesswell for a disadvantaged group due to systematic backgroundinjustice,
we can acquire reasons to care about differential model performancewhich are dependent on those situated, contextual facts, rather thanbeing applicable in every logically possible world. (Lazar & Stone2023: 21)
Zimmerman and Lee-Stronach make similar arguments (2022:6–7).
The relevance of the real world to assessments of unfairness alsoarises when assessing the fairness of the manner in which searchengines deliver content. Delivery can be skewed due to the actionstaken by others, a phenomenon termed “market effects”. Forexample, Lambrecht and Tucker describe how an advertisement forcareers in STEM (Science, Technology, Engineering and Math) fields wasshown to men more than to women, even though it was designed to beshown on a gender-neutral basis. They hypothesized that this skewresulted from the fact that the STEM ad was outbid by otheradvertisers who wanted to get the attention of women, who are moredesirable consumers of advertising (Lambrecht & Tucker 2019:2966–97). This result raises a question about whether skeweddelivery of information, especially information about valuableopportunities, is unfair when caused by the interaction of variousforces as opposed to resulting from the algorithm itself. In thiscase, the fact that women engage in more commercial activity than men(if that was the cause) had the effect of making women less likely toreceive information about STEM careers. Kim finds this unfair andargues that the law should address disparities in access to valuableopportunities (Kim 2020: 934–35). In making this argument, Kimappears to understand fairness in terms of equality of opportunity.See the entry onequality of opportunity.
Barocas, Hardt, and Narayanan describe several examples that have asimilar structure and appear to consider each to raise problems offairness (Barocas, Hardt, & Narayanan 2023: 199). For example,they report thatstaples.com offered lower prices to peoplein certain ZIP codes which were, on average, wealthier. The reason itdid so was that the algorithm provided lower prices in locations inwhich a competitor store was located. From a moral perspective, onemight wonder whether differential treatment is unfair when the reasonfor it is market-based but its effect tracks protected attributes.Finding this differential pricing unfair draws on the same moralintuitions as do claims of disparate impact (or indirect)discrimination.
Scholars also differ with regard to whether they think their preferredmeasure of fairness is constitutive of fairness or unfairness ormerely suggestive of fairness or unfairness. Hedden, for example,treats “calibration within groups” as necessary forfairness (Hedden 2021: 222). By contrast, Hellman argues that when theratio of false positives to false negatives is different for differentgroups (which she terms “error ratio parity”), this issuggestive of unfairness only (Hellman 2020: 835–36). Settingthe balance between false positives and false negatives instantiatesan important normative judgment about the costs of each type of error.Is it more important to avoid incarcerating an innocent person (falsepositive) or to avoid allowing a dangerous criminal to go free (falsenegative) and by how much? Fairness requires weighing the costs ofthese two types of errors in the same way for Black criminaldefendants as for white criminal defendants. One might be tempted tothink that error ratio parity does just that, and so differences inthis ratio for different groups is inherently unfair. It does not.Weighing the costs of each type of error equally for each group cannotbe directly translated into the form via error ratio parity becausethe error ratio also depends on whether the incidence of the featurebeing tested for (the base rate) is equally present in each group. Asa result, a lack of error ratio parity is suggestive but notconstitutive of unfairness, according to Hellman.
If fairness requires treating like cases alike, one must articulate inwhat ways two people are like (and unalike) one another. For example,if we want to assess whether Black loan applicants are treated fairlyas compared to white loan applicants, we need to assess the featuresthat make two applicantslike one another in all respectsexcept race. Suppose the Black applicant is denied a loan while thewhite applicant is granted a loan on the grounds that the income ofthe Black loan applicant is lower than the income of the white loanapplicant. Should we conclude that these applicants are unlike oneanother, because their income is different, and thus the differenttreatment does not demonstrate unfairness? Or should we conclude thatincome is, in part, constitutive of race and thus these applicants maybe like one another after all?
When scholars discuss algorithmic fairness as applied to real socialgroups, some scholars ignore this debate about how the group itselfshould be defined while others emphasize its importance.
Beigang proposes that we focus on matched pairs from the relevantgroups and then assess whether predictive parity and equalized oddscan be satisfied for groups that are similar in all respects otherthan the trait with reference to which we are assessing fairnessbetween groups (Beigang 2023: 179, 183). Beigang’s assumptionthat we can identify matched pairs of people who differonlywith respect to race assumes that race can be defined in a manner thatallows it to be clearly isolated from correlated traits.
Others disagree. They emphasize that race and sex-based groups areimportantly different from hypothetical, abstract groups. Whilemembership in group A can be given a stipulated definition such that ascholar can isolate a particular issue under discussion, socialcategories like race and sex can never be so neatly hived off fromquestions about what such an approach is aiming to elucidate.
For example, Hu and Kohler-Hausmann argue that algorithms are makingdecisions on the basis of race and sex, even when these labels areabsent from the data. To see the idea, suppose an algorithm used topredict who is likely to be a successful computer programmer isunaware of the sex of the individuals in the training data set. Thealgorithm learns that being a math major in college is a goodpredictor of job success. Should we conclude that sex did not affectthe algorithmic recommendation? Or should we recognize that math is a“male-y thing”, as men are steered, directly orimplicitly, into math and women away from it. If male sex isconstructed, at least in part, as being a person reasonably likely tomajor in math, then the algorithm is not unaware of sex after all (Hu& Kohler-Hausmann 2020: 7–8).
This view of protected attributes like race and sex is influenced byan understanding of race and sex as “socially constructed”(Haslanger 2012: chapter 7). For example, one could understand blackrace as constituted by social subordination, a likelihood of beingsubject to substantial police supervision, etc. If so, then removingracial labels does not make the algorithm unaware of race (Hu 2024:13–14). Weinberger challenges this view, arguing that byfocusing on the signals of race, sex, and other protected attributes,the causal inference model of discrimination can be maintained(Weinberger 2022: 1265).
The upshot of the social constructivist view of race and sex and itsimplications for formal measures of algorithmic fairness can be put ina stronger or weaker form. In the strong form, any so-called“fairness measure” which aims to identify whether peopleare treated fairly regardless of their group membership, but whichdefines such membership in terms of a label in the data misunderstandsthe nature of race and sex. As a result, any conclusions that followfrom a finding of parity in a statistical measure of fairness is notmeaningful for genuine fairness between real social groups scored bythe algorithm.
In its more modest form, such “fairness measures” rest ona contested ontological view about the nature of race and sex. Theupshot of this view is that scholars and policy makers mustacknowledge and defend the particular understanding of race or sex onwhich their conclusions of algorithmic fairness depend. In eitherform, this critique suggests that statistical fairness measure may sayless about whether an algorithm treats people fairly irrespective ofthe race or sex than their proponents imagine.
In this section, I examine several different ways in which complaintsof algorithmic unfairness occur where that the unfairness is not amatter of examining whether a person or group is treated fairly givenhow another person or group is treated. Rather, the complaint is thatperson or group is treated less well (in some manner) than they oughtto be treated. The sections that follow describe several suchconcerns, though this is not an exhaustive list. First, the complaintmight point to the fact that the algorithm is inaccurate, or lessaccurate that it could or should be (§3.1). Second, the complaint might object that a judgment about anindividual is impermissibly based on features of the group to whichthe individual belongs (§3.2). Third, perhaps the use of algorithms is unfair because machinesrather than humans are making consequential decisions that affectpeople (§3.3). Lastly, the incomprehensibility of complex algorithmic systems may beunfair to those affected by them, leading to demands for explainableAI (§3.4).
Algorithms may be unfair to individuals because they are inaccurate.This charge of inaccuracy can be understood in two different ways. Thealgorithm as a whole may not do a good (or good enough) job ofproducing the outcomes it is designed to achieve. In such a case, thepeople affected by this inaccurate algorithm may claim that beingjudged by an inaccurate algorithm is unfair. Alternatively, thealgorithm, though accurate in general, may mischaracterize aparticular individual. This occurs because even a very accuratealgorithm is not perfect and so will mischaracterize or predictinaccurately for some subset of individuals. When a generally accuratealgorithm makes a mistake about a particular person, that personinaccurately assessed by the algorithm may assert that she has beentreated unfairly.
Inaccuracy in algorithmic predictions or assessment can be an“artifact” of the data on which the algorithm is trained.For example, suppose a bank uses an algorithmic tool to determine towhom loans should be awarded. Further suppose the data on which analgorithm is trained just happens to contain several examples in whichall borrowers named “Jamila” defaulted on their loans. Insuch a case, the system might learn that being named“Jamila” predicts failure to repay even though there is noreason to think that having the name “Jamila” has anythingto do with repaying loans. If a new Jamila scored by the algorithm isactually a good risk, one might think that the use of the algorithmtrained on this data is unfair because it is inaccurate (or because itis inaccurateand there is no plausible causal story aboutwhy having this name relates to loan repayment).
Of course, there may well be an explanation for why having the nameJamilais predictive of loan repayment. The name may becorrelated with race, and race may be correlated with loan repayment.If so, then this example raises the issue of when and whynon-sensitive attributes should be treated as proxies for race orother protected attributes (see§5)
Assuming the correlation between the name and failure to repay doesnot correlate with sensitive attributes and thus is truly arbitrary,Creel and Hellman argue that this arbitrariness itself is not of moralconcern (Creel & Hellman 2022: 2). However, they argue,arbitrariness can become of moral concern when it leads to wide-spreadexclusion from important opportunities. For example, if thealgorithmic tool used by Jamila’s bank is also used by all otherlenders, then Jamila will be denied not only this loan but allopportunities to borrow money. This systemic exclusion may be of moralconcern and that moral concern could be characterized as a form ofunfairness. Systemic exclusion can occur even if different lenders usedifferent algorithmic products to predict loan repayment if they wereall trained on the same data set with the same data anomaly. Note,however, that on this account the unfairness resides not in theinaccuracy but in the systemic exclusion from an importantopportunity.
Another complaint about algorithms is that they work by makingassessments about individuals on the basis of group-based statisticalgeneralizations. This complaint ties the unfairness of algorithms tofamiliar debates about when and whether such group to individualinferences are morally permissible. Schauer, Eidelson and othersdiscuss these issues outside of the algorithmic context (Schauer 2003:chapters 2–3, 7; Eidelson 2020: 1635–36). Debates aboutthe moral permissibility of racial profiling, the wrongs ofstereotyping, the permissibility of reaching legal judgments on thebasis of statistical evidence and legal requirements of treatingpeople “as individuals” all raise similar moral issues.When algorithms are used to predict crime or recidivism in particular,concerns about whether statistical information, on which such toolsrely, is the right sort of information on which to base this decisionmay raise special moral issues (Grant, Behrends, & Basl 2025:63–64).
It is unclear whether fairness requires that judgments aboutindividuals or actions that affect individuals prescind from relianceon statistical generalizations about the groups to which theseindividuals belong. Some argue that fairness requires that one refrainfrom making judgments about an individual on the basis of groupmembership, or membership in certain groups (as avoiding statisticalreasoning altogether may not be possible). Others argue thatdecisionmakers should supplement coarse-grained statisticalinformation with more fine-grained statistical information, thusnarrowing the groups on the basis of which inferences aboutindividuals are based. Still others contend that fairness permits theuse of statistical generalization about membership in socially salientgroups (like race and sex) so long as one also pays attention to howthe exercise of individual autonomy distinguishes a particularindividual from the group (Eidelson 2020: 1635–39). In contrast,others argues that sometimes the value of equality requires ignoringindividual differences and treating everyone the same (Schauer 2003:258–59; Lippert-Rasmussen 2011: 57–58).
Issues about reliance on group-based generalizations can arise when analgorithm is used to aid in the allocation of burdens or benefits.They also arise in debates about search engines, which provideinformation to users. There are several related and important fairnessconcerns in the information-provision setting. First, the use ofmachine-learning algorithms raises an issue that is the inverse of thegeneralization problem just discussed: personalization. Whenalgorithms personalize, targeting information to ever smaller, more“intersectional” groups, this raises worries about thesegmentation of information in our increasingly polarized environmentand gives rise to privacy concerns because the algorithm seems to knowwhat each person likes and dislikes. In addition, when the results ofsearches conform to negative stereotypes about a group, the harm theycause also may be a form of unfairness. If so, one could go on to askwhether this unfairness is attributable to the harm such resultscause, reinforcing negative stereotypes. Alternatively, perhaps theseresults are unfair because they are themselves a form of stereotypingthat is morally objectionable regardless of the harm that results.
For example, Noble finds that search results for terms like“Black girls” generate results that consist largely ofpornography, which seems unfair as it may harm Black women and girls.In addition, the result may be a kind of comparative unfairness aswell if the same result does not occur for white women and girls(Noble 2018: chapter 2) or for men of any race. Sweeney finds thatsearches for Black-sounding first names were 25% more likely to returna result related to an arrest record than were searches ofwhite-sounding first names (Sweeney 2013: 51). Selbst and Barocasdescribe this problem as “representational harm[]” (2023:1036). These scholars see these results as unfair even if a higherpercentage of Black women are involved in pornography compared tonon-Black women (Noble 2018: chapter 2).
Much like algorithms, human beings frequently rely on group-basedgeneralizations to make judgments about individuals and do so bothaccurately and inaccurately. Indeed, some scholars argue thatalgorithmic decisions promote fairness because they reduce bias (Lobel2022: introduction; Sunstein 2022: 1203–05). Albright has foundthat when judges have discretion about whether to follow or not followbail recommendations based on risk scores, they are more likely todeviate from these recommendations for Black defendants than for whitedefendants by imposing cash bail when it was not recommended (Albright2019: 4).
If basing decisions on the results of a machine learning algorithm isnonetheless problematic, perhaps this is because fairness requires ahuman decisionmaker or does so in certain contexts or with regard tocertain kinds of decisions.
The challenge is to articulatewhy that is so. For one, itmight matter because machines differ from human beings in several waysincluding that they are not constrained by moral duties (Eggert 2025:4–5). In addition, machines lack what Eggert terms “moralreceptiveness”, that is, the ability to appreciate the moraldimension of their actions.
Second, automation by algorithm may be problematic in contexts inwhich a certain type of process is normatively required, including theopportunity to challenge the results and be heard (Citron 2008: 1283).In addition, when algorithms replace experts, the justification forallocating the decision to the expert in the first instance may beundermined. Calo and Citron argue, for example, that use of algorithmsby administrative agencies, without specific safeguards, may underminethe legal justifications for allocating decisions to agency judgmentin the first instance (Calo & Citron 2021: 844–45). However,if the algorithmic decision is more accurate, perhaps it provides therequisite expertise.
Algorithmic fairness may require an explanation of how the algorithmreached the result that it did. Where the processes are complex,additional questions arise about what such an explanation shouldconsist of. It could require information about the data on which thealgorithm was trained or audits that demonstrate the accuracy of thesystem’s results. Alternatively, the demand for explanationcould require a detailed account of the factors that were relevant ordominant in reaching the particular result that the algorithm reached.If fairness requires the latter, additional questions arise about howto structure such an explanation. In particular, does fairness requirethat the individual affected be provided with information about thefactors that weighed the most heavily in the algorithmic determinationabout that individual’s case? Or does it require that theindividual be told what she might change about herself or hercircumstances to get a different result? Or both?
One might wonder what the relationship is between the demand forexplanation and concerns of fairness. To some, the explanationrequirement derives from legal norms of “due process”,which could be glossed morally as fair process (Citron 2008:1276–77). On this view, in order for an individual who is scoredby an algorithm to be treated fairly, the individual must have accessto some information about how that result is reached.
One can understand the explanation requirement instrumentally ornon-instrumentally. If fairness requires that an individual be able tounderstand the factors that affected the score the algorithm reachedso that the individual can identify errors that might havebeen made and correct them, then the explanation requirement isinstrumental. On this account, explanation helps an individual todetect and correct error. To the extent that explanation is requiredfor fairness, then, this is because fairness relates to the accuracyof the score. Alternatively, explanation may be required fornon-instrumental reasons. Perhaps treating others with appropriaterespect or consideration requires that the scored individual haveaccess to the factors that affected the algorithmic score (Grant,Behrends, & Basl 2025: 56–58). This gloss on the explanationrequirement seems especially plausible in instances where thealgorithm is used to make, or assist others in making, consequentialdecisions.
Some scholars see both instrumental and noninstrumental reasons thatexplanations must be provided. For example, Vredenburgh grounds theright to an explanation on what she terms “informedself-advocacy” which requires that a person have the ability todetermine how to act given how decisions will be made, and that sheknows how decisions were reached so that she can correct errors(Vredenburgh 2022: 212–13). For Jorgensen, individuals areentitled to know the factors thatmay affect how an algorithmwill score them so that they can make informed and autonomous choicesabout how to act in the world. In addition, Jorgensen argues thatexplanation is especially important in the context of criminal lawbecause legitimate law must be genuinely public (Jorgensen 2022:64).
Each of these four versions of the non-comparative complaint ofunfairness asserts that the use of algorithms in the particularcontext treats a person less well than the person ought to be treated.This formulation raises a question about how to think about flawedprocesses that are nonetheless an improvement on human judgment anddecision-making. After all, human judgment is often inaccurate andrelies on group-based generalizations. Johnson, for example, arguesthat because both humans and machine learning algorithms rely oninduction, both are unavoidably subject to bias (Johnson 2021: 9951)and stresses the similarities between human and machine bias (Johnson2024: 9–10). In addition, the workings of the human mind can bethe ultimate “black box” (Selmi 2021: 632–33).Whether we should demand better explanations from AI systems than fromhuman beings is another subject of dispute (compare Zerilli et al.2019 and Günther & Kasirzadeh 2022).
Claims of algorithmic fairness or unfairness sometimes focusparticularly on the data on which algorithms are trained. We candivide data related issues into two broad categories: those whereinaccurate or nonrepresentative data lead to unfairness (§4.1,§4.2) and those where the data, while accurate, nonethelessproduces or reveals unfairness (Hellman 2024a: 80–81) (§4.3).
A common data collection problem is measurement error, which hasseveral forms. First, measurement error occurs when the phenomenon onereally cares about is difficult to measure. As a result, theresearcher may collect data on something else that is easier tomeasure and thought to be closely correlated with the desiredattribute. For example, a law school might want to admit students withan aptitude to learn complex and subtle legal material (a facility wemight call “legal ability”). The problem, however, is thatit is difficult to know who has legal ability until after the studentarrives at law school. Thus, a law school admissions process mightrely instead, in part, on the Law School Admissions Test (LSAT) whichpurports to measure legal ability. There is clearly a gap between theactual trait the law school seeks—legal ability—and LSATscores; the traits are not the same, even if they are correlated. Thisgap is termed “measurement error”.
Second, measurement error can occur when the data are collected from anonrepresentative subset of the population it purports to represent.For example, if a facial recognition tool is created using only imagesof light skinned faces, it may fail to work for dark skinned faces.Relatedly, perhaps the data are representative of the population butstill do not include sufficient minority group members to be reliablefor that subgroup. Fairness issues related to representation will beaddressed inSection 4.2 below.
Measurement error is ubiquitous and often unavoidable. Of specialconcern morally, however, are skewed measurement errors. Consider, forexample, an employer who desires to hire reliable employees.Reliability is difficult to measure directly, so the employer basesher judgment about whether job applicants will be reliable on therecommendations the applicants receive from their prior employers.Suppose the prior employers are biased, consciously or unconsciously,such that they are more likely to see their female employees asunreliable when they miss work occasionally than they are to reachthat judgment about male employees who miss work equally often. If so,hiring based on reliability will import that bias into an algorithmthrough reliance on the recommendations of prior employers. When themeasurement error is larger for one group than for another, thealgorithm developed from this data will incorporate that bias. In atwist on the empiricist’s maxim, “Garbage in, Garbageout”, one might describe this problem as “Bias in, Biasout” (Mayson 2019: 2224).
While it is likely impossible to completely avoid biased data, itsrole can be minimized by choosing features that are less subject tobias. For this reason, Mayson and others object to the use of arrestrecords in recidivism risk assessment tools (Mayson 2017: 556). Priorcriminal activity is predictive of reoffending. However, priorcriminal activity itself is difficult to measure because some criminalactivity is not detected. So, researchers often collect data onarrests instead. Arrests, however, are a product of both criminalactivity and policing practices. If some groups are policed moreheavily than others, the gap between arrests (which are measured) andcriminal activity (which is not) is likely to be biased against thesegroups. For that reason, one might substitute arrests for violentcrime (rather than arrests for any crime) because data about arrestsfor violent crime are thought to be less subject to bias.
Data that is not representative of the relevant population can raisedistinct fairness issues. For example, an algorithmic tool that isdeveloped based on nonrepresentative data may work less well for somegroups than for others. Some contend that this is a form ofunfairness, as a product or service that is developed from this dataand paid for by consumers works better for members of some groups thanfor others (Selbst & Barocas 2023: 1024). In addition, whenconsequential decisions are based on unreliable tools whoseunreliability results from data inadequacies of this sort, seriousharm may result. For example, Gebru and Buolamwini demonstrated thatfacial recognition works less well in identifying women than men andin identifying darker skinned faces than lighter skinned faces. Inaddition, it works especially poorly for dark-skinned women(Buolamwini & Gebru 2018: 10). Gebru and Buolamwini trace thisdisparity in reliability to the fact that the data on which thedominant three facial recognition tools were trained contained too fewwomen and too few darker-skinned people. While this particulardisparity has decreased over time, the same phenomenon occurs in othercontexts.
The concept of fair representation can be understood in two distinctways. First, fairness may be understood to require that the data berepresentative of the relevant population. Second, fairness mayrequire that the data contain enough examples of all groups to beequally reliable for each group. These two demands are not equivalent.If a socially salient group is a minority group, a training data setcould contain a representative sample but nonetheless contains too fewexamples of the minority group to produce reliable (or equallyreliable) results. This is especially likely to occur in the contextof intersectional classes like dark-skinned women. If algorithmicfairness is understood to require equal accuracy of the algorithmictool, then having data that matches the population will beinsufficient. It must also have sufficient minority group members toproduce equally accurate results for these groups.
In addition, one might wonder if it matters why the data is lessreliable for one group than another. For example, suppose the trainingdata for the facial recognition tool did contain equal numbers ofwomen and men yet was less accurate in predicting female sex,nonetheless. In the actual case dealing with facial recognition, someresearchers have traced the disparity in effectiveness in facialrecognition tools to sources other than a lack of representatives. Inparticular, researchers found that it was attributable to the use ofcelebrities in the training data. Because female celebrities tend towear more makeup than female non-celebrities, the algorithm learnedthat makeup use was a predictive of female sex, which was lessaccurate in the context of non-celebrities than celebrities. Becausethe disparity in make-up use for men between celebrities andnon-celebrities was less pronounced, the tool worked better for men.One might wonder whether this explanation makes the disparity in howwell the tool worked less unfair than if it were traced to disparitiesin representation.
When data are flawed for any of these reasons, one can ask the furtherquestion whether this departure from good data practices should beseen simply as an error or instead as a form of unfairness. Perhapsthe answer to that question depends on why it occurred. For example,should we see the error in the facial recognition tool as a simpleoversight or instead see it as tied in important ways to differingbeauty standards for women versus men that result from injustice. Orperhaps the answer depends on whether the consequences for thoseaffected are serious or trivial. In other words, inaccuracies aresurely problematic from an epistemic perspective. But whether they arealso a problem of unfairness is an unsettled question.
Even if the data on which an algorithm is trained do not suffer fromskewed measurement error, one still may worry about the fairness of analgorithm that relies on these data if injustice led to differences intraits that the algorithm uses to make predictions. For example, theaverage Black person has less wealth and a lower income than theaverage white person (see Irving 2023 inOther Internet Resources). As a result, an algorithm used by a bank to assist decision makingabout who should be offered loans may score the average Black personless well than the average white person, even if it has no access tothe race of the prospective borrowers. If the fact that the averageBlack person has less wealth and income than the average white personis itself the result of prior discrimination and injustice, then whenthe bank acts on the basis of these data, it may wrong individuals bycompounding that prior injustice (Wachter et al. 2021: 759; Hellman2024a: 63–64). Friedman and Nissenbaum identify three types ofbias in computational systems, including pre-existing bias, which“has its roots in social institutions, practices andattitudes”, technical bias, which arises from technicalconstraints, and “emergent bias” which arises from the useof the system in the real world (Friedman & Nissenbaum 1996: 332).Friedman and Nissenbaum’s conception of pre-existing bias hassome similarities to the concepts of structural injustice (Lin &Chen 2022: 2–4) and “compounding injustice” (Hellman2024a: 67–68).
The fact that an action will compound prior injustice provides a moralreason not to do that action. For example, Hellman argues that anactor has a moral reason, when interacting with victims of injustice,not to carry that injustice forward or into other domains. While thisreason can be outweighed by other reasons, it counts in the balance ofreasons that determine how one should act. Others think that we havereasons to care about the fact that social and economic inequalitytracks socially salient traits like race because this “patternedinequality” causes harm. For example, Eidelson argues that banksand other actors should care about whether the use of algorithmictools perpetuates this social stratification for consequentialistreasons (Eidelson 2021: 253–56).
The concern that algorithms based on data about the past may compoundinjustice or perpetuate patterned inequality is not novel or unique tothe algorithmic context. Any decision based on data about the pastrisks compounding injustice or causing the harm of patternedinequality because injustice has affected people in meaningful ways.That said, because machine learnings algorithms are able to make useof so much data (so-called “Big Data”), the moral problemsof compounding injustice and patterned inequality may occur at alarger and more comprehensive scale. In addition, the computationalpower now available through machine learning algorithms allows thesetools to detect patterns that prior methods would have missed. Ifscale matters to fairness, then machine learning algorithms maypresent more acute moral problems than did older methods of predictingthe future based on the past.
Even if these moral concerns are not new, the use of algorithmic toolsillustrates the consequences of prior injustice in a dramatic fashionand so might shine a light on a moral concern that was previously lesssalient. When an algorithm used to assess who will repay a loan doesnot have access to data regarding the race of prospective borrowersbut nonetheless recommends lending to members of racial minoritygroups at a significantly lower rate than to members of majoritygroups, the effect of prior injustice is made especially clear. Inthis sense, algorithmic tools may act as a mirror (Vallor 2024),allowing society to see itself more clearly, and also as a catalyst toreexamine existing legal doctrine in reaction to the fact that currentlaw tolerates significant disparate negative impact on disadvantagedgroups.
Algorithms based on accurate data can thus present problems ofalgorithmic fairness to the extent that they compound prior injusticeor reinforce patterned inequality. At the same time, they may help toameliorate unfairness by making ever present unfairness salient.
The use of some attributes—race and sexparadigmatically—as proxies for legitimate target variables likeloan repayment or recidivism is both legally prohibited in mostcontexts and generally viewed as morally problematic. However, becauserace, sex, and other protected attributes often are predictive ofoutcomes of interest (loan repayment, recidivism, health, etc.),policies that prohibit the use of a defined list of protectedattributes may have limited value in the context of machine learning.The algorithm will unearth other traits that it can use instead toarrive at much the same results as it would have had it been able touse these prohibited attributes directly (Dwork et al. 2012: 215;Johnson 2021: 9942; Prince & Schwarcz 2020: 1283; Sapiezynski etal. 2022: 5–7). This issue is discussed in the literature as theproxy problem (Johnson 2021: 9942).
Consider the following example. An algorithm used by a lender isunaware of the race of the borrowers on whose data it was trained. Inother words, the training data contains to racial labels and often thenames of the individuals are removed as well. Yet, the algorithmlearns that the ZIP code of the borrower’s residence is a strongpredictor of loan repayment. Due to housing segregation in society, aborrower’s ZIP code is also strongly correlated with race. Ifthe algorithm is permitted to use ZIP code as a factor in predictingloan repayment, the algorithm will have the effect of excluding adisproportionate number of Black borrowers, perhaps closely resemblingwhat would occur had the algorithm used race to predict loan repaymentin the first instance.
This fact may undermine the legal and moral prohibitions on the use ofrace, sex and other protected attributes themselves when such traitsare correlated with legitimate outcomes of interest. If so, shouldsuch proxies also be prohibited? And if so, how should such“proxies” be defined?
The proxy problem arises because the law forbids (or society ormorality frowns on) the use of some traits in making decisions aboutindividuals in certain contexts. This prohibition gives rise to thequestion whether other traits should also be forbidden because theyare “proxies” for these protected attributes. For ease ofexposition, call the protected traitT (for target) and thetrait which may be a proxy for the protected attributeP (forproxy). The relevant moral or philosophical question, then, is when isP a proxy forT?
This section describes four different accounts of when and why aneutral traitP may be a proxy for a protected traitT. This debate is not limited to the algorithmic context.Indeed, some of the examples predate the use of machine learning andBig Data. Nonetheless, it is widely perceived to be more pressing toanswer the question of what makes a neutral trait a proxy for aprotected attribute in the algorithmic context because machinelearning algorithms will easily identify other features or traits thatcorrelate with protected attributes and that contain the informationthese protected attributes provide. As a result, merely prohibitingthe use of protected attributes is likely to be ineffectual withoutalso prohibiting “proxies” for such traits. Yet, whatmakes a trait a proxy for a protected attribute is contested.
This proxy problem is a problem for algorithmic fairness because ifalgorithmic fairness requires that algorithms avoid use of protectedattributes like race and sex, that prohibition may be eviscerated bythe ease with which machine learning algorithms can find substitutes.Does fairness require prohibiting the use of some of thesesubstitutes. If so, which ones and why?
On one view,P is a proxy forT whenP isstrongly correlated withT. For example, consider thefollowing example which is loosely based on a real-world case. Supposean algorithm is used to identify which job applicants will besuccessful software engineers. The algorithm, having learned thatgraduates of Wellesley, Smith, and other prominent women’scolleges were not successful in this job in the past, downgrades theapplications from applicants from five specific women’scolleges. In such a case, is being a graduate of one of these collegesa proxy for sex such that we should view the use of this trait todowngrade applicants in the same manner as if the algorithm downgradedwomen applicants? While graduates of one of these colleges are likelyall or predominantly women, the group of people who are not graduatesof these colleges includes both men and women. If a strong correlationbetweenP andT makesP a proxy forT, one must determine how strong the correlation must be,where strength is assessed in terms of the degree of overlap betweenthe set of people withP and the set of people withT. In addition, one must determine whether what matters isonly whether all or most people withP haveT orwhether it also matters whether all or most people withThaveP (which is not the case in the example just described).In a similar circumstance (albeit before the use of machine learningalgorithms), the U.S. Supreme Court declined to find that veteranstatus was a proxy for sex even though at the time the case wasdecided almost all veterans were men in part because the group ofnon-veterans contained both women and men (Pers. Adm’r ofMassachusetts v. Feeney, 442 U.S. 256, 275 (1979)).
Another related way to define what makesP a proxy forT is to look at whetherP is predictive of theoutcome of interestonly becauseP correlates withT (Prince & Schwarcz 2020: 1282). Consider again thealgorithm used to predict who will be a successful software engineer.In that case, perhaps being a graduate of one of the specifiedwomen’s colleges predicts who will be unsuccessful as a softwareengineer, to the extent that it does, only because attendance at oneof these colleges correlates with sex. In other words, the correlationbetween being a graduate of one of these colleges and being successfulas a software engineer is neither an anomaly in the training data, norcausally connected to the computer science training offered at thesecolleges. Rather, it is due to the fact that sex is predictive ofwhether a person will be successful as a computer engineer. If that iscorrect, perhaps graduating from one of the named colleges should beseen as a “proxy” for sex. The upshot of this judgmentwould be that the algorithm should not use graduation from thesespecified colleges to predict who will be a poor employee.
Part of the appeal of this account comes from the following thought.If a society forbids the use of sex itself, despite the fact that sexis often predictive of important outcomes, surely it would also wantto forbid the use of traits that get their predictive power only invirtue of the fact that these neutral traits correlate with sex. Ifsociety did not also prohibit such traits, the reason to forbid theprotected trait would be undermined.
There are potential problems with this account, however. Inparticular, on this view, ZIP code is not a proxy for race, in thelending context, because ZIP code is likely to be predictive of loanrepayment for reasons other than the fact that it correlates with race(though it may be predictive in part for that reason). Because ZIPcode also correlates with wealth and income, one cannot say that theonly reason that ZIP code predicts loan repayment is that itcorrelates with race, and thus it would not count as a proxy.
If ZIP code feels like a proxy, perhaps an account based in thestatistical relationships between various variables alone is missingsomething. Other accounts look not only at the relationships betweenvariables but also at why those relationships exist.
One such account would find thatP is a proxy forTifP picks out the people it does because of priordiscrimination or injustice against people withT. On thisaccount, ZIP codeis a proxy for race because the reason thatZIP code correlates with race is likely due, at least in part, toprior housing discrimination and segregation (Hellman 2023:241–42). Such an explanatory account of the concept of a proxymay or may not require a correlation between theP andT. Johnson, for example, argues that a correlation between Pand T is not a necessary component of a proxy (Johnson 2025).
Alternatively, perhaps the intention of an actor determines whetherP is a proxy forT. On this view,P is aproxy forT when an actor deliberately usesP toselect for or against people withT. U.S. legal doctrine attimes relies on a view of this kind. Recall the case mentioned earlierin which the Supreme Court considered whether an employment preferencefor veterans constituted discrimination on the basis of sex at a timein which almost all veterans were men (Pers. Adm’r ofMassachusetts v. Feeney, 442 U.S. 256, 257 (1979). InFeeney, the U.S. Supreme Court declined to hold that veteranstatus was a proxy for sex in this context because there was noevidence that the Massachusetts legislature had adopted the preferencein order to exclude women from civil service jobs.
The view that intentions create proxies gives rise to a furtherquestion. What intentions are relevant in this context. Recall, thenormative significance of asking the question what makesP aproxy forT is that ifP is a proxy forT,thenP will be treated as if it isT legally (andperhaps morally). If an actor deliberately usesP to selectfor people withT, isP always a proxy forT? Or isP only a proxy forT when theactor deliberately usesP to select people on the basis ofT in orderto harm people withT? Thisquestion is likely to become especially important in the current legalclimate as universities, banks, employers and others consider whetherthey are permitted to use non-race traits to increase racial diversityor reduce racial disparities. Several legal scholars have recentlyweighed in on this question, including Starr (2024: 169, 172–73)and Hellman (2024b: 1953) arguing that this it is permissible to userace neutral means to promote diversity or reduce racial disparities.Others adopt a contrary view (Fitzpatrick 2016: 172).
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
artificial intelligence |artificial intelligence: ethics of |artificial intelligence: logic-based |computing: and moral responsibility |discrimination | fairness |information technology: and moral values |reasoning: automated |reasoning: defeasible |scientific research and big data |social procedures, formal approachesphilosophy of statistics |
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054