A fundamental problem faced by any group of people is how to arrive ata good group decision when there is disagreement among its members.The difficulties are most evident when there is a large number ofpeople with diverse opinions, such as, when electing leaders in anational election. But it is often not any easier with smaller groups,such as, when a committee must select a candidate to hire, or when agroup of friends must decide where to go for dinner. Mathematicians,philosophers, political scientists and economists have devised variousvoting methods that select a winner (or winners) from a set ofalternatives taking into account everyone’s opinion. It is not hard tofind examples in which different voting methods select differentwinners given the same inputs from the members of the group. Whatcriteria should be used to compare and contrast different votingmethods? Not only is this an interesting and difficult theoreticalquestion, but it also has important practical ramifications. Given thetumultuous 2016 election cycle, many people (both researchers andpoliticians) have suggested that the US should use a different votingmethod. However, there is little agreement about which voting methodshould be used.
This article introduces and critically examines a number of differentvoting methods. Deep and important results in the theory of socialchoice suggest that there is no single voting method that is best inall situations (see List 2013 for an overview). My objective in thisarticle is to highlight and discuss the key results and issues thatfacilitate comparisons between voting methods.
Suppose that there is a group of 21 voters who need to make a decisionabout which of four candidates should be elected. Let the names of thecandidates be \(A\), \(B\), \(C\) and \(D\). Your job, as a socialplanner, is to determine which of these 4 candidates should win theelection given theopinions of all the voters. The first stepis to elicit the voters’ opinions about the candidates. Suppose thatyou ask each voter to rank the 4 candidates from best to worst (notallowing ties). The following table summarizes the voters’ rankings ofthe candidates in this hypothetical election scenario.
| # Voters | Ranking |
| 3 | \(A\s B\s C\s D\) |
| 5 | \(A\s C\s B\s D\) |
| 7 | \(B\s D\s C\s A\) |
| 6 | \(C\s B\s D\s A\) |
Read the table as follows: Each row represents a ranking for a group of votersin which candidates to the left are ranked higher. The numbers in the first column indicate the number of voters with that particularranking. So, for example, the third row in the table indicates that7 voters have the ranking \(B\s D\s C\s A\) which means that each of the 7 voters rank \(B\) first, \(D\) second, \(C\) third and \(A\) last.Suppose that, as the social planner, you do not have any personalinterest in the outcome of this election. Given the voters’ expressedopinions, which candidate should win the election? Since the votersdisagree about the ranking of the candidates, there is no obviouscandidate that best represents the group’s opinion. If there wereonly two candidates to choose from, there is a very straightforwardanswer: The winner should be the candidate or alternative that is supportedby more than 50 percent of the voters (cf. the discussion below aboutMay’s Theorem in Section 4.2). However, if there are more than twocandidates, as in the above example, the statement “thecandidate that is supported by more than 50 percent of thevoters” can be interpreted in different ways, leading todifferent ideas about who should win the election.
One candidate who, at first sight, seems to be a good choice to winthe election is \(A\). Candidate \(A\) is ranked first by more votersthan any other candidate. (\(A\) is ranked first by 8 voters,\(B\) is ranked first by 7; \(C\) is ranked first by 6; and\(D\) is not ranked first by any of the voters.) Of course, 13 peoplerank \(A\)last. So, while more voters rank \(A\) first thanany other candidate, more than half of the voters rank \(A\) last.This suggests that \(A\) shouldnot be elected.
None of the voters rank \(D\) first. This fact alone does not rule out\(D\) as a possible winner of the election. However, note thatevery voter ranks candidate \(B\) above candidate \(D\). While thisdoes not mean that \(B\) should necessarily win the election, it doessuggest that \(D\) should not win the election.
The choice, then, boils down to \(B\) and \(C\). It turns out that there are goodarguments for each of \(B\) and \(C\) to be elected. The debate aboutwhich of \(B\) or \(C\) should be elected started in the 18th-centuryas an argument between the two founding fathers of voting theory,Jean-Charles de Borda (1733–1799) and M.J.A.N. de Caritat,Marquis de Condorcet (1743–1794). For a history of voting theoryas an academic discipline, including Condorcet’s and Borda’s writings,see McLean and Urken (1995). I sketch the intuitive arguments for theelection of \(B\) and \(C\) below.
Candidate \(C\) should win. Initially, this might seem likean odd choice since both \(A\) and \(B\) receive more first placevotes than \(C\) (only 6 voters rank \(C\) first while 8 voters rank \(A\) first and 7 voters rank \(B\) first). However, note how the population would vote in the various two-way elections comparing \(C\) with each of the other candidates:
| # Voters | \(C\) versus \(A\) | \(C\) versus \(B\) | \(C\) versus \(D\) |
| 3 | \(\bA\s \gB\s \bC\s \gD\) | \(\gA\s \bB\s \bC\s \gD\) | \(\gA\s \gB\s \bC\s \bD\) |
| 5 | \(\bA\s \bC\s \gB\s \gD\) | \(\gA\s \bC\s \bB\s \gD\) | \(\gA\s \bC\s \gB\s \bD\) |
| 7 | \(\gB\s \gD\s \bC\s \bA\) | \(\bB\s \gD\s \bC\s \gA\) | \(\gB\s \bD\s \bC\s \gA\) |
| 6 | \( \bC\s \gB\s \gD\s \bA\) | \( \bC\s \bB\s \gD\s \gA\) | \( \bC\s \gB\s \bD\s \gA\) |
| Totals: | 13 rank \(C\) above \(A\) 8 rank \(A\) above \(C\) | 11 rank \(C\) above \(B\) 10 rank \(B\) above \(C\) | 14 rank \(C\) above \(D\) 7 rank \(D\) above \(C\) |
Condorcet’s idea is that \(C\) should be declared the winner since she beatsevery other candidate in a one-on-one election. A candidate with thisproperty is called aCondorcet winner. We can similarly defineaCondorcet loser. In fact, in the above example, candidate\(A\) is the Condorcet loser since she loses to every other candidatein a one-on-one election.
Candidate \(B\) should win. Consider \(B\)’s performance inthe one-on-one elections.
| # Voters | \(B\) versus \(A\) | \(B\) versus \(C\) | \(B\) versus \(D\) |
| 3 | \(\bA\s \bB\s \gC\s \gD\) | \(\gA\s \bB\s \bC\s \gD\) | \(\gA\s \bB\s \gC\s \bD\) |
| 5 | \(\bA\s \gC\s \bB\s \gD\) | \(\gA\s \bC\s \bB\s \gD\) | \(\gA\s \gC\s \bB\s \bD\) |
| 7 | \(\bB\s \gD\s \gC\s \bA\) | \(\bB\s \gD\s \bC\s \gA\) | \(\bB\s \bD\s \gC\s \gA\) |
| 6 | \( \gC\s \bB\s \gD\s \bA\) | \( \bC\s \bB\s \gD\s \gA\) | \( \gC\s \bB\s \bD\s \gA\) |
| Totals: | 13 rank \(B\) above \(A\) 8 rank \(A\) above \(B\) | 10 rank \(B\) above \(C\) 11 rank \(C\) above \(B\) | 21 rank \(B\) above \(D\) 0 rank \(D\) above \(B\) |
Candidate \(B\) performs the same as \(C\) in a head-to-head electionwith \(A\), loses to \(C\) by only one vote and beats \(D\) in alandslide (everyone prefers \(B\) over \(D\)). Borda suggests that we shouldtake into accountall of these facts when determining whichcandidate best represents the overall group opinion. To do this, Borda assigns a score to each candidate that reflects how much support he or she has among the electorate. Then, thecandidate with the largest score is declared the winner. One way tocalculate the score for each candidate is as follows (I will give analternative method, which is easier to use, in the next section):
The candidate with the highest score (in this case, \(B\)) is the onewho should be elected.
Both Condorcet and Borda suggest comparing candidates inone-on-one elections in order to determine the winner. While Condorcettallies how many of the head-to-head races each candidate wins, Bordasuggests that one should look at the margin of victory or loss. The debate about whether to elect the Condorcet winner or the Bordawinner is not settled. Proponents of electing the Condorcet winnerinclude Mathias Risse (2001, 2004, 2005) and Steven Brams (2008);Proponents of electing the Borda winner include Donald Saari (2003,2006) and Michael Dummett (1984). See Section 3.1.1for further issues comparing the Condorcet and Borda winners.
The take-away message from this discussion is that in many electionscenarios with more than two candidates, there may not always be oneobvious candidate that best reflects the overall group opinion. Theremainder of this entry will discuss different methods, or procedures,that can be used to determine the winner(s) given the a group ofvoters’ opinions. Each of these methods is intended to be an answer tothe following question:
Given a group of people faced with some decision, how should a centralauthority combine the individual opinions so as to best reflect the“overall group opinion”?
A complete analysis of this question would incorporate a number ofdifferent issues ranging from central topics in political philosophyabout the nature of democracy and the “will of the people”to the psychology of decision making. In this article, I focus on oneaspect of this question: the formal analysis of algorithms that aggregate the opinions of a group of voters (i.e., voting methods). Consult, for example, Riker 1982, Mackie 2003, and Christiano 2008 for a more comprehensive analysis of the above question, incorporating many of the issues raised in this article.
In this article, I will keep the formal details to a minimum; however,it is useful at this point to settle on some terminology. Let \(V\)and \(X\) be finite sets. The elements of \(V\) are called voters andI will use lowercase letters \(i, j, k, \ldots\) or integers \(1, 2,3, \ldots\) to denote them. The elements of \(X\) are calledcandidates, or alternatives, and I will use uppercase letters \(A, B,C, \ldots \) to denote them.
Different voting methods require different types of information fromthe voters as input. The input requested from the voters are calledballots. One standard example of a ballot is arankingof the set of candidates. Formally, a ranking of \(X\) is a relation\(P\) on \(X\), where \(Y\mathrel{P} Z\) means that “\(Y\) isranked above \(Z\),” satisfying three constraints: (1) \(P\) iscomplete: any two distinct candidates are ranked (for allcandidates \(Y\) and \(Z\), if \(Y\ne Z\), then either \(Y\mathrel{P}Z\) or \(Z\mathrel{P} Y\)); (2) \(P\) istransitive: if acandidate \(Y\) is ranked above a candidate \(W\) and \(W\) isranked above a candidate \(Z\), then \(Y\) is ranked above\(Z\) (for allcandidates \(Y, Z\), and \(W\), if \(Y\mathrel{P} W\) and \(W\mathrel{P} Z\), then \(Y\mathrel{P} Z\)); and (3) \(P\) isirreflexive: no candidate is rankedabove itself (there is no candidate \(Y\) such that \(Y\mathrel{P}Y\)). For example, suppose that there are three candidates \(X =\{A,B, C\}\). Then, the six possible rankings of \(X\) are listed in thefollowing table:
| # Voters | Ranking |
| \(n_1\) | \(A\s B\s C\) |
| \(n_2\) | \(A\s C\s B\) |
| \(n_3\) | \(B\s A\s C\) |
| \(n_4\) | \(B\s C\s A\) |
| \(n_5\) | \(C\s A\s B\) |
| \(n_6\) | \(C\s B\s A\) |
I can now be more precise about the definition of a Condorcet winner(loser). Given a ranking from each voter, themajorityrelation orders the candidates in terms of how they perform inone-on-one elections. More precisely, for candidates \(Y\) and \(Z\),write \(Y \mathrel{>_M} Z\), provided that more voters rank candidate\(Y\) above candidate \(Z\) than the other way around. So, if thedistribution of rankings is given in the above table, we have:
\[\begin{align}A\mathrel{>_M} B\ &\text{ just in case } n_1 + n_2 + n_5 > n_3 + n_4 + n_6 \\A\mathrel{>_M} C\ &\text{ just in case } n_1 + n_2 + n_3 > n_4 + n_5 + n_6 \\B \mathrel{>_M} C\ &\text{ just in case } n_1 + n_3 + n_4 > n_2 + n_5 + n_6\end{align}\]A candidate \(Y\) is called theCondorcet winner in an electionscenario if \(Y\) is the maximum of the majority ordering \(>_M\) forthat election scenario (that is, \(Y\) is the Condorcet winner if\(Y\mathrel{>_M} Z\) for all other candidates \(Z\)). TheCondorcetloser is the candidate that is the minimum of the majorityordering.
Rankings are one type of ballot. In this article, we will see examplesof other types of ballots, such as selecting a single candidate,selecting a subset of candidates or assigning grades to candidates.Given a set of ballots \(\mathcal{B}\), aprofile for a set ofvoters specifies the ballot selected by each voter. Formally, aprofile for set of voters \(V=\{1,\ldots, n\}\) and a set of ballots\(\mathcal{B}\) is a sequence \(\bb=(b_1,\ldots, b_n)\), wherefor each voter \(i\), \(b_i\) is the ballot from \(\mathcal{B}\)submitted by voter \(i\).
Avoting method is a function that assigns to each possibleprofile agroup decision. The group decision may be a singlecandidate (the winning candidate), a set of candidates (when ties areallowed), or an ordering of the candidates (possibly allowing ties). Note that since a profile identifies the voter associated with each ballot, a voting method may take this information into account. This means that voting methods can be designed that select a winner (or winners) based only on the ballots of some subset of voters while ignoring all the other voters’ ballots. An extreme example of this is the so-called Arrovian dictatorship for voter \(d\) that assigns to each profile the candidate ranked first by \(d\). A natural way to rule out these types of voting methods is to require that a voting method isanonymous: the group decision should depend only on the number of voters that chose each ballot. This means that if two profiles are permutations of each other, then a voting method that is anonymous must assign the same group decision to both profiles. When studying voting methods that are anonymous, it is convenient to assume the inputs areanonymizedprofiles. An anonymous profile for a set of ballots\(\mathcal{B}\) is a function from \(\mathcal{B}\) to the set ofintegers \(\mathbb{N}\). The election scenario discussed in theprevious section is an example of an anonymized profile (assuming thateach ranking not displayed in the table is assigned the number 0). Inthe remainder of this article (unless otherwise specified), I willrestrict attention to anonymized profiles.
I conclude this section with a few comments on the relationshipbetween the ballots in a profile and the voters’ opinions about thecandidates. Two issues are important to keep in mind. First, theballots used by a voting method are intended to reflectsomeaspect of the voters’ opinions about the candidates. Voters may choosea ballot that best expresses their personal preference about the setof candidates or their judgements about the relative strengths of thecandidates. A common assumption in the voting theory literature isthat a ranking of the set of candidates expresses a voter’sordinal preference ordering over the set of candidates (seethe entry on preferences, Hansson and Grüne-Yanoff 2009, for anextended discussion of issues surrounding the formal modeling ofpreferences). Other types of ballots represent information that cannotbe inferred directly from a voter’sordinal preferenceordering, for example, by describing theintensity of apreference for a particular candidate (see Section 2.3). Second, it isimportant to be precise about the type of considerations voters takeinto account when selecting a ballot. One approach is to assume thatvoters choosesincerely by selecting the ballot that bestreflects their opinion about the different candidates. A secondapproach assumes that the voters choosestrategically. Inthis case, a voter selects a ballot that sheexpects to leadto her most desired outcome given the information she has about howthe other members of the group will vote. Strategic voting is animportant topic in voting theory and social choice theory (see Taylor2005 and Section 3.3 of List 2013 for a discussion and pointers to the literature), but in this article, unless otherwise stated, I assume that voters choose sincerely (cf. Section 4.1).
A quick survey of elections held in different democratic societiesthroughout the world reveals a wide variety of voting methods. In thissection, I discuss some of the key methods that have been analyzed inthe voting theory literature. These methods may be of interest becausethey are widely used (e.g., Plurality Rule or Plurality Rule withRunoff) or because they are of theoretical interest (e.g., Dodgson’smethod).
I start with the most widely used method:
Plurality Rule:
Each voter selects one candidate (or none if voters can abstain), andthe candidate(s) with the most votes win.
Plurality rule (also calledFirst Past the Post) is a verysimple method that is widely used despite its many problems. The mostpervasive problem is the fact that plurality rule can elect aCondorcet loser. Borda (1784) observed this phenomenon in the 18thcentury (see also the example from Section 1).
| # Voters | Ranking |
| 1 | \(A\s B\s C\) |
| 7 | \(A\s C\s B\) |
| 7 | \(B\s C\s A\) |
| 6 | \(C\s B\s A\) |
Candidate \(A\) is the Condorcet loser (both \(B\) and \(C\) beatcandidate \(A\), 13 – 8); however, \(A\) is the plurality rulewinner (assuming the voters vote for the candidate that they rank first). In fact, the plurality ranking (\(A\) is first with 8votes, \(B\) is second with 7 votes and \(C\) is third with 6votes) reverses the majority ordering \(C\mathrel{>_M} B\mathrel{>_M}A\). See Laslier 2012 for further criticisms of Plurality Rule andcomparisons with other voting methods discussed in this article. Oneresponse to the above phenomenon is to require that candidates pass acertain threshold to be declared the winner.
Quota Rule:
Suppose that \(q\), called thequota, is any number between 0and 1. Each voter selects one candidate (or none if voters canabstain), and the winners are the candidates that receive at least\(q \times \# V\) votes, where \(\# V\) is the number of voters.Majority Rule is a quota rule with \(q=0.5\) (a candidate is thestrict orabsolute majority winner if that candidatereceives strictly more than \(0.5 \times \# V\) votes).UnanimityRule is a quota rule with \(q=1\).
An important problem with quota rules is that they do not identify awinner in every election scenario. For instance, in the above electionscenario, there are no majority winners since none of the candidatesare ranked first by more than 50% of the voters.
A criticism of both plurality and quota rules is that they severelylimit what voters can express about their opinions of the candidates.In the remainder of this section, I discuss voting methods that useballots that are more expressive than simply selecting a singlecandidate. Section 2.1 discusses voting methods that require voters torank the alternatives. Section 2.2 discusses voting methods thatrequire voters to assign grades to the alternatives (from some fixedset of grades). Finally, Section 2.3 discusses two voting methods inwhich the voters may have different levels of influence on the groupdecision. In this article, I focus on voting methods that either arefamiliar or help illustrate important ideas. Consult Brams andFishburn 2002, Felsenthal 2012, and Nurmi 1987 for discussions ofvoting methods not covered in this article.
The voting methods discussed in this section require the voters torank the candidates (see section 1.1 for the definition of aranking). Providing a ranking of the candidates is much moreexpressive than simply selecting a single candidate. However,rankingall of the candidates can be very demanding,especially when there is a large number of them, since it can bedifficult for voters to make distinctions between all thecandidates. The most well-known example of a voting method that usesthe voters’ rankings is Borda Count:
Borda Count:
Each voter provides a ranking of the candidates. Then, a score (theBorda score) is assigned to each candidate by a voter as follows: Ifthere are \(n\) candidates, give \(n-1\) points to the candidate rankedfirst, \(n-2\) points to the candidate ranked second,…, 1 point tothe candidate ranked second to last and 0 points to candidate rankedlast. So, the Borda score of candidate \(A\), denoted \(\BS(A)\), iscalculated as follows (where \(\#U\) denotes the number elements inthe set \(U)\):\[\begin{align}\BS(A) =\ &(n-1)\times \# \{i\ |\ i \text{ ranks \(A\) first}\}\\ &+ (n-2)\times \# \{i\ |\ i \text{ ranks \(A\) second}\} \\ &+ \cdots \\ &+ 1\times \# \{i\ |\ i \text{ ranks \(A\) second to last}\}\\ &+ 0\times \# \{i\ |\ i \text{ ranks \(A\) last}\} \end{align}\]The candidate with the highest Borda score wins.
Recall the example discussed in the introduction to Section 1. Foreach alternative, the Borda scores can be calculated using the abovemethod:
\[\begin{align}\BS(A) &= 3 \times 8 + 2 \times 0 + 1 \times 0 + 0 \times 13 = 24 \\\BS(B) &= 3 \times 7 + 2 \times 9 + 1 \times 5 + 0 \times 0 = 44 \\\BS(C) &= 3 \times 6 + 2 \times 5 + 1 \times 10 + 0 \times 0 = 38 \\\BS(D) &= 3 \times 0 + 2 \times 7 + 1 \times 6 + 0 \times 8 = 20 \end{align}\]Borda Count is an example of ascoring rule. A scoring rule isany method that calculates a score based on weights assigned tocandidates according to where they fall in the voters’ rankings. Thatis, a scoring rule for \(n\) candidates is defined as follows: Fix asequence of numbers \((s_1, s_2, \ldots, s_n)\) where \(s_k\ges_{k+1}\) for all \(k=1,\ldots, n-1\). For each \(k\), \(s_k \)is the score assigned to a alternatives ranked in position \(k\).Then, the score for alternative \(A\), denoted \(Score(A)\), iscalculated as follows:
\[\begin{align}\textit{Score}(A)=\ &s_1\times \# \{i\ |\ i \text{ ranks \(A\) first}\}\\ &+ s_2\times \# \{i\ |\ i \text{ ranks \(A\) second}\}\\ &+ \cdots \\ &+ s_n\times \# \{i\ |\ i \text{ ranks \(A\) last}\}.\end{align}\]Borda count for \(n\) alternatives uses scores \((n-1, n-2, \ldots,0)\) (call \(\BS(X)\) the Borda score for candidate \(X\)). Note that Plurality Rule can be viewed as a scoring rule thatassigns 1 point to the first ranked candidate and 0 points to theother candidates. So, theplurality score of a candidate \(X\) is the number of voters that rank \(X\) first. Building on this idea,\(k\)-Approval Votingis a scoring method that gives 1 point to each candidate that isranked in position \(k\) or higher, and 0 points to all othercandidates. To illustrate \(k\)-Approval Voting, consider thefollowing election scenario:
| # Voters | Ranking |
| 2 | \(A\s D\s B\s C\) |
| 2 | \(B\s D\s A\s C\) |
| 1 | \(C\s A\s B\s D\) |
Note that the Condorcet winner is \(A\), so none of the abovemethodsguarantee that the Condorcet winner is elected(whether \(A\) is elected using 1-Approval or 3-Approval depends onthe tie-breaking mechanism that is used).
A second way to make a voting method sensitive to more than thevoters’ top choice is to hold “multi-stage” elections. Theidea is to successively remove candidates that perform poorly in theelection until there is one candidate that is ranked first by morethan 50% of the voters (i.e., there is a strict majority winner). Thedifferent stages can be actual “runoff” elections in whichvoters are asked to evaluate a reduced set of candidates; or they canbe built in to the way the winner is calculated by asking voters tosubmit rankings over the set of all candidates. The first example of amulti-stage method is used to elect the French president.
Plurality with Runoff:
Start with a plurality vote to determine the top two candidates (thecandidates ranked first and second according to their plurality scores).If a candidate is ranked first by more than 50% of the voters, thenthat candidate is declared the winner. If there is no candidate witha strict majority of first place votes, then there is a runoffbetween the top two candidates (or more if there are ties). Thecandidate(s) with the most votes in the runoff elections is(are) declared thewinner(s).
Rather than focusing on the top two candidates, one can alsoiteratively remove the candidate(s) with the fewest first-place votes:
The Hare Rule:
The ballots are rankings of the candidates. If a candidate is rankedfirst by more than 50% of the voters, then that candidate is declaredthe winner. If there is no candidate with a strict majority of firstplace votes, repeatedly delete the candidate or candidates thatreceive the fewest first-place votes (i.e., the candidate(s) with the lowest plurality score(s)). The first candidate to be rankedfirst by strict majority of voters is declared the winner (if there isno such candidate, then the remaining candidate(s) are declared thewinners).
The Hare Rule is also calledRanked-Choice Voting,Alternative Vote, andInstant Runoff. If there are only three candidates, then the above two voting methodsare the same (removing the candidate with the lowest plurality score isthe same as keeping the two candidates with highest and second-highest plurality score). The following exampleshows that they can select different winners when there are more thanthree candidates:
| # Voters | Ranking |
| 7 | \(A\s B\s C\s D\) |
| 5 | \(B\s C\s D\s A\) |
| 4 | \(D\s B\s C\s A\) |
| 3 | \(C\s D\s A\s B\) |
| Candidate \(A\) is the Plurality with Runoff winner Candidate \(D\) is the Hare Rule winner | |
Candidate \(A\) is the Plurality with Runoff winner: Candidates \(A\)and \(B\) are the top two candidates, being ranked first by 7 and 5voters, respectively. In the runoff election (using the rankings fromthe above table), the groups voting for candidates \(C\) and \(D\)transfer their support to candidates \(B\) and \(A,\) respectively,with \(A\) winning 10 – 9.
Candidate \(D\) is the Hare Rule winner: In the first round, candidate\(C\) is eliminated since she is only ranked first by 3 voters. Thisgroup’s votes are transferred to \(D\), giving him 7 votes. This meansthat in the second round, candidate \(B\) is ranked first by thefewest voters (5 voters rank \(B\) first in the profile with candidate\(C\) removed), and so is eliminated. After the elimination ofcandidate \(B\), candidate \(D\) has a strict majority of thefirst-place votes: 12 voters ranking him first (note that in thisround the group in the second column transfers all their votes to\(D\) since \(C\) was eliminated in an earlier round).
The core idea of multi-stage methods is to successively removecandidates that perform "poorly" in an election. For the Hare Rule,performing poorly is interpreted as receiving the fewest first placevotes. There are other ways to identify "poorly performing" candidatesin an election scenario. For instance, the Coombs Rule successivelyremoves candidates that are ranked last by the most voters (seeGrofman and Feld 2004 for an overview of Coombs Rule).
Coombs Rule:
The ballots are rankings of the candidates. If a candidate is rankedfirst by more than 50% of the voters, then that candidate is declaredthe winner. If there is no candidate with a strict majority of firstplace votes, repeatedly delete the candidate or candidates thatreceive the most last-place votes. The first candidate to be rankedfirst by a strict majority of voters is declared the winner (if there isno such candidate, then the remaining candidate(s) are declared thewinners).
In the above example, candidate \(B\) wins the election using CoombsRule. In the first round, \(A\), with 9 last-place votes, iseliminated. Then, candidate \(B\) receives 12 first-place votes, whichis a strict majority, and so is declared the winner.
There is a technical issue that is important to keep in mind regarding the above definitions of the multi-stage voting methods. When identifying the poorly performing candidates in each round, there may be ties (i.e., there may be more than one candidate with the lowest plurality score or more than one candidateranked last by the most voters). In the above definitions, I assume that all of the poorly performing candidates will be removed in each round. An alternative approach would use a tie-breaking rule to select one of the poorly performing candidates to be removed at each round.
The voting methods discussed in this section can be viewed asgeneralizations of scoring methods, such as Borda Count. In a scoringmethod, a voter’s ranking is an assignment ofgrades (e.g.,"1st place", "2nd place", "3rd place", ... , "last place") to thecandidates. Requiring voters to rank all the candidates means that (1)every candidate is assigned a grade, (2) there are the same number ofpossible grades as the number of candidates, and (3) differentcandidates must be assigned different grades. In this section, we dropassumptions (2) and (3), assuming a fixed number of grades for everyset of candidates and allowing different candidates to be assigned thesame grade.
The first example gives voters the option to either select a candidatethat they want to votefor (as in plurality rule) or toselect a candidate that they want to voteagainst.
Negative Voting:
Each voter is allowed to choose one candidate to either votefor (giving the candidate one point) or to voteagainst (giving the candidate –1 points). The winner(s)is(are) the candidate(s) with the highest total number of points (i.e., the candidatewith the greatest score, where the score is the total number of positive votes minus the total number of negative votes).
Negative voting is tantamount to allowing the voters to support eithera single candidate or all but one candidate (taking a point away froma candidate \(C\) is equivalent to giving one point to all candidatesexcept \(C\)). That is, the voters are asked to choose a set ofcandidates that they support, where the choice is between setsconsisting of a single candidate or sets consisting of all except onecandidate. The next voting method generalizes this idea by allowingvoters to chooseany subset of candidates:
Approval Voting:
Each voter selects asubset of the candidates (where theempty set means the voter abstains) and the candidate(s) with selected by the most voters wins.
If a candidate \(X\) is in the set of candidates selected by a voter, we say that the voter approves of candidate \(X\). Then, the approval winner is the candidate with the most approvals. Approval voting has been extensively discussed by Steven Brams and Peter Fishburn (Brams and Fishburn 2007; Brams 2008). See, also, therecent collection of articles devoted to approval voting (Laslier andSanver 2010).
Approval voting forces voters to think about the decision problemdifferently: They are asked to determine which candidates theyapprove of rather than selecting a single candidate to voterfor or determining the relative ranking of the candidates.That is, the voters are asked which candidates are above a certain“threshold of acceptance”. Ranking a set of candidates andselecting the candidates that are approved are two different aspectsof a voters overall opinion about the candidates. They are related butcannot be derived from each other. See Brams and Sanver 2009, forexamples of voting methods that ask voters to both select a set ofcandidates that they approveand to (linearly) rank thecandidates.
Approval voting is a very flexible method. Recall the electionscenario illustrating the \(k\)-Approval Voting methods:
| # Voters | Ranking |
| 2 | \(\underline{A}\s D\s B\s C\) |
| 2 | \(\underline{B}\s D\s A\s C\) |
| 1 | \(\underline{C}\s \underline{A}\s B\s D\) |
In this election scenario, \(k\)-Approval for \(k=1,2,3\) cannotguarantee that the Condorcet winner \(A\) is elected. The Approvalballot \((\{A\},\{B\}, \{A, C\})\) does elect the Condorcet winner. Infact, Brams (2008, Chapter 2) proves that if there is a uniqueCondorcet winner, then that candidate may be elected under approvalvoting (assuming that all voters votesincerely: see Brams2008, Chapter 2, for a discussion). Note that approval voting may alsoelect other candidates (perhaps even the Condorcet loser). Whetherthis flexibility of Approval Voting should be seen as a virtue or avice is debated in Brams, Fishburn and Merrill 1988a, 1988b and Saariand van Newenhizen 1988a, 1988b.
Approval Voting asks voters to express something about theirintensity of preference for the candidates by assigning oneof two grades: "Approve" or "Don’t Approve". Expanding on this idea,some voting methods assume that there is a fixed set of grades, or agrading language, that voters can assign to each candidate.See Chapters 7 and 8 from Balinksi and Laraki 2010 for examples and adiscussion of grading languages (cf. Morreau 2016).
There are different ways to determine the winner(s) given a profile ofballots that assign grades to each candidate. The main approach is tocalculate a "group" grade for each candidate, then select thecandidate with the best overall group grade. In order to calculate agroup grade for each candidate, it is convenient to use numbers forthe grading language. Then, there are two natural ways to determinethe group grade for a candidate: calculating the mean, or average, ofthe grades or calculating the median of the grades.
Cumulative Voting:
Each voter is asked to distribute a fixed number of points, say ten,among the candidates in any way they please. The candidate(s) with themost total points wins the election.
Score Voting (also called Range Voting):
The grades are a finite set of numbers. The ballots are an assignmentof grades to the candidates. The candidate(s) with the largest averagegrade is declared the winner(s).
Cumulative Voting and Score Voting are similar. The importantdifference is that Cumulative Voting requires that the sum of thegrades assigned to the candidates by each voter is the same. The nextprocedure, proposed by Balinski and Laraki 2010 (cf. Bassett andPersky 1999 andthe discussion of this method at rangevoting.org), selects the candidate(s) with the largestmedian grade ratherthan the largest mean grade.
Majority Judgement:
The grades are a finite set of numbers (cf. discussion of common grading languages). The ballots are an assignment of grades to the candidates. The candidate(s) with the largest median grade is(are) declared the winner(s). See Balinski and Laraki 2007 and 2010 for further refinements of this voting method that use different methods for breaking ties when there are multiple candidates with the largest median grade.
I conclude this section with an example that illustrates Score Votingand Majority Judgement. Suppose that there are 3 candidates \(\{A, B,C\}\), 5 grades \(\{0,1,2,3,4\}\) (with the assumption that the largerthe number, the higher the grade), and 5 voters. The table belowdescribes an election scenario. The candidates are listed in the first row. Each row describes an assignment of grades to a candidate by a set of voters.
| Grade (0–4) for: | |||
| # Voters | \(A\) | \(B\) | \(C\) |
| 1 | 4 | 3 | 1 |
| 1 | 4 | 3 | 2 |
| 1 | 2 | 0 | 3 |
| 1 | 2 | 3 | 4 |
| 1 | 1 | 0 | 2 |
| Mean: | 2.6 | 1.8 | 2.4 |
| Median: | 2 | 3 | 2 |
The bottom two rows give the mean and median grade for eachcandidate. Candidate \(A\) is the score voting winner with the greatestmean grade, and candidate \(B\) is the majority judgement winner withthe greatest median grade.
There are two types of debates about the voting methods introduced in this section. The first concerns the choice of thegrading language that voters use to evaluate the candidates. Consult Balinski and Laraki 2010 amd Morreau 2016 for an extensive discussion of the types of considerations that influence the choice of a grading language. Brams and Potthoff 2015 argue that two grades, as in Approval Voting, is best to avoid certain paradoxical outcomes. To illustrate, note that, in the above example, if the candidates are ranked bythe voters according to the grades that are assigned, then candidate\(C\) is the Condorcet winner (since 3 voters assign higher grades to\(C\) than to \(A\) or \(B\)). However, neither Score Voting nor Majority Judgement selects candidate \(C\).
The second type of debate concerns the method used to calculate the group grade for each candidate (i.e., whether to use the mean as in Score Voting or the median as in Majority Judgement). One important issue is whether voters have an incentive to misrepresent their evaluations of the candidates. Consider the voter in the middle column that assigns the grade of 2 to \(A\), 0 to \(B\), and 3 to \(C\). Suppose that these grades represents the voter’s true evaluations of the candidates. If this voter increases the grade for \(C\) to 4 and decreases the grade for \(A\) to 1 (and the other voters do not change their grades), then the average grade for \(A\) becomes 2.4 and the average grade for \(C\) becomes 2.6, which better reflects the voter’s true evaluations of the candidates (and results in \(C\) being elected according to Score Voting). Thus, this voter has an incentive to misrepresent her grades. Note that the median grades for the candidates do not change after this voter changes her grades. Indeed, Balinski and Laraki 2010, chapter 10, argue that using the median to assign group grades to candidates encourages voters to submit grades that reflect their true evaluations of the candidates. The key idea of their argument is as follows: If a voter’s true grade matches the median grade for a candidate, then the voter does not have an incentive to assign a different grade. If a voter’s true grade is greater than the median grade for a candidate, then raising the grade will not change the candidate’s grade and lowering the voter’s grade may result in the candidate receiving a grade that is lowering than the voter’s true evaluation. Similarly, if a voter’s true grade is lower than the median grade for a candidate, then lowering the grade will not change the candidate’s grade and raising the voter’s grade may result in the candidate receiving a grade that is higher than the voter’s true evaluation. Thus, if voters are focused on ensuring that the group grades for the candidates best reflects their true evaluations of the candidates, then voters do not have an incentive to misrepresent their grades. However, as pointed out in Felsenthal and Machover 2008 (Example 3.3), voters can manipulate the outcome of an election using Majority Judgement to ensure a preferred candidate is elected (cf. the discussion of strategic voting in Section 4.1 and Section 3.3 of List 2013). Suppose that the voter in the middle column assigns the grade of 4 to candidate \(A\), 0 to candidate \(B\) and 3 to candidate \(C\). Assuming the other voters do not change their grades, the majority judgement winner is now \(A\), which the voter ranks higher than the original majority judgement winner \(B.\) Consult Balinski and Laraki 2010, 2014 and Edelman 2012b for arguments in favor of electing candidates with the greatest median grade; and Felsenthal and Machover 2008, Gehrlein and Lepelley 2003, and Laslier 2011 for arguments against electing candidates with the greatest median grade.
In this section, I briefly discuss two new approaches to voting thatdo not fit nicely into the categories of voting methods introduced inthe previous sections. While both of these methods can be used toselect representatives, such as a president, the primary applicationis a group of people voting directly on propositions, or referendums.
Quadratic Voting: When more than 50% of the voters support analternative, most voting methods will select that alternative. Indeed,when there are only two alternatives, such as when voting for oragainst a proposition, there are many arguments that identify majorityrule as the best and most stable group decision method (May 1952;Maskin 1995). One well-known problem with always selecting themajority winner is the so-calledtyranny of the majority. Acomplete discussion of this issue is beyond the scope of this article.The main problem from the point of view of the analysis of votingmethods is that there may be situations in which a majority of thevoters weakly support a proposition while there is a sizable minorityof voters that have a strong preference against the proposition.
One way of dealing with this problem is to increase the quota requiredto accept a proposition. However, this gives too much power to a smallgroup of voters. For instance, with Unanimity Rule a single voter canblock a proposal from being accepted. Arguably, a better solution isto use ballots that allow voters to express something about theirintensity of preference for the alternatives. Setting aside issuesabout interpersonal comparisons of utility (see, for instance, Hausman1995), this is the benefit of using the voting methods discussed inSection 2.2, such as Score Voting or Majority Judgement. These votingmethods assume that there is a fixed set ofgrades that thevoters use to express their intensity of preference. One challenge isfinding an appropriate set of grades for a population of voters. Toofew grades makes it harder for a sizable minority with strongpreferences to override the majority opinion, but too many gradesmakes it easy for a vocal minority to overrule the majority opinion.
Using ideas from mechanism design (Groves and Ledyard 1977; Hylland andZeckhauser 1980), the economist E. Glen Weyl developed a voting methodcalled Quadratic Voting that mitigates some of the above issues(Lalley and Weyl 2018a). The idea is to think of an election as amarket (Posner and Weyl, 2018, Chapter 2). Each voter can purchasevotes at a costs that is quadratic in the number of votes. Forinstance, a voter must pay $25 for 5 votes (either in favor or againsta proposition). After the election, the money collected is distributedon apro rata basis to the voters. There are a variety ofeconomic arguments that justify why voters should pay \(v^2\) topurchase \(v\) votes (Lalley and Weyl 2018b; Goeree and Zhang 2017).See Posner and Weyl 2015 and 2017 for further discussion and avigorous defense of the use of Quadratic Voting in national elections. Consult Laurence and Sher 2017 for two arguments against the use of Quadratic Voting. Both arguments are derived from the presence of wealth inequality. The first argument is that it is ambiguous whether the Quadratic Voting decision really outperforms a decision using majority rule from the perspective of utilitarianism (see Driver 2014 and Sinnott-Armstrong 2019 for overviews of utilitarianism). The second argument is that any vote-buying mechanism will have a hard time meeting a legitimacy requirement, familiar from the theory of democratic institutions (cf. Fabienne 2017).
Liquid Democracy: Using Quadratic Voting, the voters’ opinionsmay end up being weighted differently: Voters that purchase more of avoice have more influence over the election. There are other reasonswhy some voters’ opinions may have more weight than others when makinga decision about some issue. For instance, a voter may have beenelected to represent a constituency, or a voter may be recognized asan expert on the issue under consideration. An alternative approach togroup decision making isdirect democracy in which everycitizen is asked to vote on every political issue. Asking the citizensto vote onevery issue faces a number of challenges, nicely explained by Green-Armytage (2015, pg. 191):
Direct democracy without any option for representation is problematic.Even if it were possible for every citizen to learn everything theycould possibly know about every political issue, people who did thiswould be able to do little else, and massive amounts of time would bewasted in duplicated effort. Or, if every citizen voted but mostpeople did not take the time to learn about the issues, the resultswould be highly random and/or highly sensitive to overly simplisticpublic relations campaigns. Or, if only a few citizens voted,particular demographic and ideological groups would likely beunder-represented
One way to deal with some of the problems raised in the above quote is to useproxy voting, in which voters can delegate their vote on some issues (Miller 1969). Liquid Democracy is a form of proxy voting in which voters can delegate their votes to other voters (ideally, to voters that arewell-informed about the issue under consideration). What distinguishesLiquid Democracy from proxy voting is that proxies may furtherdelegate the votes entrusted to them. For example, suppose that thereis a vote to accept or reject a proposition. Each voter is given theoption to delegate their vote to another voter, called a proxy. Theproxies, in turn, are given the option to delegate their votes to yetanother voter. The voters that decide to not transfer their votes casta vote weighted by the number of voters who entrusted them as a proxy,either directly or indirectly.
While there has been some discussion of proxy voting in the politicalscience literature (Miller 1969; Alger 2006; Green-Armytage 2015),most studies of Liquid Democracy can be found in the computer scienceliterature. A notable exception is Blum and Zuber 2016 that justifiesLiquid Democracy, understood as a procedure for democraticdecision-making, within normative democratic theory. An overview ofthe origins of Liquid Democracy and pointers to other onlinediscussions can be found in Behrens 2017. Formal studies of LiquidDemocracy have focused on: the possibility of delegation cycles andthe relationship with the theory of judgement aggregation (Christoffand Grossi 2017); the rationality of delegating votes (Bloembergen,Grossi and Lackner 2018); the potential problems that arise when manyvoters delegate votes to only a few voters (Kang et al. 2018; Golz etal. 2018); and generalizations of Liquid Democracy beyond binarychoices (Brill and Talmon 2018; Zhang and Zhou 2017).
This section introduced different methods for making a group decision.One striking fact about the voting methods discussed in this sectionis that they can identify different winners given the same collectionof ballots. This raises an important question: How should wecompare the different voting methods? Can we argue that somevoting methods are better than others? There are a number of differentcriteria that can be used to compare and contrast different votingmethods:
In this section, I introduce and discuss a number ofvotingparadoxes — i.e., anomalies that highlight problems withdifferent voting methods. Consult Saari 1995 and Nurmi 1999 forpenetrating analyses that explain the underlying mathematics behindthe different voting paradoxes.
A very common assumption is that arational preferenceordering must betransitive (i.e., if \(A\) is preferred to\(B\), and \(B\) is preferred to \(C\), then \(A\) must be preferredto \(C\)). See the entry on preferences (Hansson and Grüne-Yanoff2009) for an extended discussion of the rationale behind thisassumption. Indeed, if a voter’s preference ordering is nottransitive, for instance, allowing for cycles (e.g., an ordering of \(A, B, C\) with \(A \succ B \succ C \succ A\), where \(X\succ Y\) means \(X\) is strictly preferred to \(Y\)), then there is no alternative that the voter can be said to actually support (for eachalternative, there is another alternative that the voter strictly prefers). Manyauthors argue that voters with cyclic preference orderings haveinconsistent opinions about the candidates and should beignored by a voting method (in particular, Condorcetforcefully argued this point). A key observation of Condorcet (whichhas become known as the Condorcet Paradox) is that the majority orderingmay have cycles (even when all the voters submitrankings of the alternatives).
Condorcet’s original example was more complicated, but the followingsituation with three voters and three candidates illustrates thephenomenon:
| # Voters | Ranking |
| 1 | \(A\s B\s C\) |
| 1 | \(B\s C\s A\) |
| 1 | \(C\s A\s B\) |
Note that we have:
That is, there is amajority cycle \(A>_M B >_M C >_M A\). Thismeans that there is no Condorcet winner. This simple, but fundamentalobservation has been extensively studied (Gehrlein 2006; Schwartz2018).
The Condorcet Paradox shows that there may not always be a Condorcetwinner in an election. However, one natural requirement for a votingmethod is that if there is a Condorcet winner, then that candidateshould be elected. Voting methods that satisfy this property arecalledCondorcet consistent. Many of the methods introducedabove are not Condorcet consistent. I already presented an exampleshowing that plurality rule is not Condorcet consistent (in fact,plurality rule may even elect the Condorcetloser).
The example from Section 1 shows that Borda Count is not Condorcetconsistent. In fact, this is an instance of a general phenomenon thatFishburn (1974) calledCondorcet’s other paradox. Consider thefollowing voting situation with 81 voters and three candidates fromCondorcet 1785.
| # Voters | Ranking |
| 30 | \(A\s B\s C\) |
| 1 | \(A\s C\s B\) |
| 29 | \(B\s A\s C\) |
| 10 | \(B\s C\s A\) |
| 10 | \(C\s A\s B\) |
| 1 | \(C\s B\s A\) |
The majority ordering is \(A >_M B >_M C\), so \(A\) is the Condorcetwinner. Using the Borda rule, we have:
\[\begin{align}\BS(A) &= 2\times 31 + 1 \times 39 + 0 \times 11 = 101 \\\BS(B) &= 2 \times 39 + 1 \times 31 + 0 \times 11 = 109 \\\BS(C) &= 2 \times 11 + 1 \times 11 + 0 \times 59 = 33\end{align}\]So, candidate \(B\) is the Borda winner. Condorcet pointed outsomething more: The only way to elect candidate \(A\) usingany scoring method is to assign more points to candidatesranked second than to candidates ranked first. Recall that a scoringmethod for 3 candidates fixes weights \(s_1\ge s_2\ge s_3\), where\(s_1\) points are assigned to candidates ranked 1st, \(s_2\) pointsare assigned to candidates ranked 2nd, and \(s_3\) points are assignedto candidates ranked last. To simplify the calculation, assume thatcandidates ranked last receive 0 points (i.e., \(s_3=0\)). Then, thescores assigned to candidates \(A\) and \(B\) are:
\[\begin{align}Score(A) &= s_1 \times 31 + s_2 \times 39 + 0 \times 11 \\Score(B) &= s_1 \times 39 + s_2 \times 31 + 0 \times 11 \end{align}\]So, in order for \(Score(A) > Score(B)\), we must have \((s_1 \times31 + s_2 \times 39) > (s_1 \times 39 + s_2 \times 31)\), which impliesthat \(s_2 > s_1\). But, of course, it is counterintuitive to givemore points for being ranked second than for being ranked first. PeterFishburn generalized this example as follows:
Theorem (Fishburn 1974).
For all \(m\ge 3\), there is some voting situation with a Condorcetwinner such that every scoring rule will have at least\(m-2\) candidates with a greater score than the Condorcet winner.
So, no scoring rule is Condorcet consistent, but what about othermethods? A number of voting methods were devised specifically toguarantee that a Condorcet winner will be elected, if oneexists. The examples below give a flavor of different types ofCondorcet consistent methods. (See Brams and Fishburn, 2002, andFishburn, 1977, for more examples and a discussion of Condorcet consistent methods.)
Condorcet’s Rule:
Each voter submits a ranking of the candidates. If there is aCondorcet winner, then that candidate wins the election. Otherwise,all candidates tie for the win.
Black’s Procedure:
Each voter submits a ranking of the candidates. If there is aCondorcet winner, then that candidate is the winner. Otherwise, use Borda Count to determine the winners.
Nanson’s Method:
Each voter submits a ranking of the candidates. Calculate the Bordascore for each candidate. The candidates with a Borda score below theaverage of the Borda scores are eliminated. The Borda scores of thecandidates are re-calculated and the process continues until there isonly one candidate remaining. (See Niou, 1987, for a discussion ofthis voting method.)
Copeland’s Rule:
Each voter submits a ranking of the candidates. Awin-lossrecord for candidate \(B\) is calculated as follows:
The Copeland winner is the candidate that maximizes the win-lossrecord.
Schwartz’s Set Method:
Each voter submits a ranking of the candidates. The winners are thesmallest set of candidates that are not beaten in a one-on-oneelection by any candidate outside the set (Schwartz 1986).
Dodgson’s Method:
Each voter submits a ranking of the candidates. For each candidate,determine the fewest number of pairwise swaps in the voters’ rankings needed to make that candidate the Condorcet winner. The candidate(s) with the fewest swaps is(are) declared the winner(s).
The last method was proposed by Charles Dodgson (better known by thepseudonym Lewis Carroll). Interestingly, this is an example of aprocedure in which it is computationally difficult to compute thewinner (that is, the problem of calculating the winner isNP-complete). See Bartholdiet al. 1989 for a discussion.
These voting methods (and the other Condorcet consistent methods)guarantee that a Condorcet winner, if one exists, will be elected.But,should a Condorcet winner be elected? Many people arguethat there is something amiss with a voting method that does notalways elect a Condorcet winner (if one exists). The idea is that aCondorcet winner best reflects theoverall group opinion and isstable in the sense that it will defeat any challenger in a one-on-onecontest using Majority Rule. The most persuasive argument that theCondorcet winner should not always be elected comes from the work ofDonald Saari (1995, 2001). Consider again Condorcet’s example of 81voters.
| # Voters | Ranking |
| 30 | \(A\s B\s C\) |
| 1 | \(A\s C\s B\) |
| 29 | \(B\s A\s C\) |
| 10 | \(B\s C\s A\) |
| 10 | \(C\s A\s B\) |
| 1 | \(C\s B\s A\) |
This is another example that shows that Borda’s method need not electthe Condorcet winner. The majority ordering is
\[A >_M B >_M C,\]while the ranking given by the Borda score is
\[B >_{\Borda} A >_{\Borda} C.\]However, there is an argument that candidate \(B\) is the best choicefor this electorate. Saari’s central observation is to note that the81 voters can be divided into three groups:
| # Voters | Ranking | |
| 10 | \(A\s B\s C\) | |
| Group 1 | 10 | \(B\s C\s A\) |
| 10 | \(C\s A\s B\) | |
| 1 | \(A\s C\s B\) | |
| Group 2 | 1 | \(C\s B\s A\) |
| 1 | \(B\s A\s C\) | |
| Group 3 | 20 | \(A\s B\s C\) |
| 28 | \(B\s A\s C\) |
Groups 1 and 2 constitute majority cycles with the voters evenlydistributed among the three possible rankings. Such profiles arecalledCondorcet components. These profiles form aperfect symmetry among the rankings. So, within each of these groups,it is natural to assume that the voters’ opinions cancel each other out; therefore, the decisionshould depend only on the voters in group 3. In group 3, candidate\(B\) is the clear winner.
Balinski and Laraki (2010, pgs. 74–83) have an interesting spin onSaari’s argument. Let \(V\) be a ranking voting method (i.e., a votingmethod that requires voters to rank the alternatives). Say that \(V\)cancels properly if for all profiles \(\bR\), if \(V\)selects \(A\) as a winner in \(\bP\), then \(V\) selects \(A\) asa winner in any profile \(\bP+\bC\), where \(\bC\) is aCondorcet component and \(\bP+\bC\) is the profile thatcontains all the rankings from \(\bP\) and \(\bC\). Balinskiand Laraki (2010, pg. 77) prove that there is no Condorcet consistentvoting method that cancels properly. (See the discussion of themultiple districts paradox in Section 3.3 for a proof of a closelyrelated result.)
A voting method ismonotonic provided that receiving moresupport from the voters is always better for a candidate. There aredifferent ways to make this idea precise (see Fishburn, 1982, Sanverand Zwicker, 2012, and Felsenthal and Tideman, 2013). For instance,moving up in the rankings should not adversely affect acandidate’s chances to win an election. It is easy to see thatPlurality Rule is monotonic in this sense: The more voters that rank acandidate first, the better chance the candidate has towin. Surprisingly, there are voting methods that do not satisfy thisnatural property. The most well-known example is Plurality withRunoff. Consider the two scenarios below. Note that the onlydifference between the them is the ranking of the fourth group ofvoters. This group of two voters ranks \(B\) above \(A\) above \(C\)in scenario 1 and swaps \(B\) and \(A\) in scenario 2 (so, \(A\) isnow their top-ranked candidate; \(B\) is ranked second; and \(C\) isstill ranked third).
| # Voters | Scenario 1 Ranking | Scenario 2 Ranking |
| 6 | \(A\s B\s C\) | \(A\s B\s C\) |
| 5 | \(C\s A\s B\) | \(C\s A\s B\) |
| 4 | \(B\s C\s A\) | \(B\s C\s A\) |
| 2 | \(\bB\s \bA\s C\) | \(\bA\s \bB\s C\) |
| Scenario1: Candidate \(A\) is the Plurality with Runoff winner | ||
| Scenario 2: Candidate\(C\) is the Plurality with Runoff winner | ||
In scenario 1, candidates \(A\) and \(B\) both have a plurality scoreof 6 while candidate \(C\) has a plurality score of 5. So, \(A\) and\(B\) move on to the runoff election. Assuming the voters do notchange their rankings, the 5 voters that rank \(C\) transfer theirsupport to candidate \(A\), giving her a total of 11 to win the runoffelection. However, in scenario 2, even after moving up in therankings of the fourth group (\(A\) is now ranked first by thisgroup), candidate \(A\) doesnot win this election. In fact,by trying to give more support to the winner of the election inscenario 1, rather than solidifying \(A\)’s win, the lastgroup’s least-preferred candidate ended up winning the election!The problem arises because in scenario 2, candidates \(A\) and \(B\)are swapped in the last group’s ranking. This means that\(A\)’s plurality score increases by 2 and \(B\)’splurality score decreases by 2. As a consequence, \(A\) and \(C\) moveon to the runoff election rather than \(A\) and \(B\). Candidate\(C\) wins the runoff election with 9 voters that rank \(C\) above\(A\) compared to 8 voters that rank \(A\) above \(C\).
The above example is surprising since it shows that, when usingPlurality with Runoff, it may not always be beneficial for a candidateto move up in some of the voter’s rankings. The other voting methodsthat violate monotonicity include Coombs Rule, Hare Rule, Dodgson’sMethod and Nanson’s Method. See Felsenthal and Nurmi 2017 for further discussion of voting methods that are not monotonic.
In this section, I discuss two related paradoxes that involve changesto the population of voters.
No-Show Paradox: One way that a candidate may receive“more support” is to have more voters show up to anelection that support them. Voting methods that do not satisfy thisversion of monotonicity are said to be susceptible to theno-showparadox (Fishburn and Brams 1983). Suppose that there are 3candidates and 11 voters with the following rankings:
| # Voters | Ranking |
| 4 | \(A\s B\s C\) |
| 3 | \(B\s C\s A\) |
| 1 | \(C\s A\s B\) |
| 3 | \(C\s B\s A\) |
| Candidate \(C\) is the Plurality with Runoff winner | |
In the first round, candidates \(A\) and \(C\) are both ranked firstby 4 voters while \(B\) is ranked first by only 3 voters. So, \(A\)and \(C\) move to the runoff round. In this round, the voters in thesecond column transfer their votes to candidate \(C\), so candidate\(C\) is the winner beating \(A\) 7-4. Suppose that 2 voters in thefirst group do not show up to the election:
| # Voters | Ranking |
| \(\mathbf{2}\) | \(A\s B\s C\) |
| 3 | \(B\s C\s A\) |
| 1 | \(C\s A\s B\) |
| 3 | \(C\s B\s A\) |
| Candidate \(B\) is the Plurality with Runoff winner | |
In this election, candidate \(A\) has the lowest plurality score inthe first round, so candidates \(B\) and \(C\) move to the runoffround. The first group’s votes are transferred to \(B\), so \(B\) isthe winner beating \(C\) 5-4. Since the 2 voters that did not show upto this election rank \(B\) above \(C\), they prefer the outcome ofthe second election in which they did not participate!
Plurality with Runoff is not the only voting method that issusceptible to the no-show paradox. The Coombs Rule, Hare Rule andMajority Judgement (using the tie-breaking mechanism from Balinski and Laraki 2010)are all susceptible to the no-show paradox. It turns out that alwayselecting a Condorcet winner, if one exists, makes a voting methodsusceptible to the above failure of monotonicity.
Theorem(Moulin 1988).
If there are four or more candidates, then every Condorcet consistentvoting method is susceptible to the no-show paradox.
See Perez 2001, Campbell and Kelly 2002, Jimeno et al.2009, Duddy 2014, Brandt et al. 2017, 2019, and Nunez and Sanver 2017for further discussions and generalizations of this result.
Multiple Districts Paradox: Suppose that a population isdivided into districts. If a candidate wins each of the districts, onewould expect that candidate to win the election over the entirepopulation of voters (assuming that the two districts divide the set of voters into disjoint sets). This is certainly true for Plurality Rule: If acandidate is ranked first by the most voters in each ofthe districts, then that candidate will also be ranked first by athe most voters over the entire population. Interestingly, this isnot true for all voting methods (Fishburn and Brams 1983). The examplebelow illustrates the paradox for Coombs Rule.
| # Voters | Ranking | |
| District 1 | 3 | \(A\s B\s C\) |
| 3 | \(B\s C\s A\) | |
| 3 | \(C\s A\s B\) | |
| 1 | \(C\s B\s A\) | |
| District 2 | 2 | \(A\s B\s C\) |
| 3 | \(B\s A\s C\) | |
| District 1: Candidate \(B\) is the Coombs winner | ||
| District 2: Candidate \(B\) is the Coombs winner | ||
Candidate \(B\) wins both districts:
District 1: There are a total of 10 voters in this district.None of the candidates are ranked first by 6 or more voters, socandidate \(A\), who is ranked last by 4 voters (compared to 3 votersranking each of \(C\) and \(B\) last), is eliminated.In the second round, candidate \(B\) wins the election since 6 voters rank \(B\) above \(C\) and 4 voters rank \(C\) above \(B\).
District 2: There are a total of 5 voters in this district.Candidate \(B\) is ranked first by a strict majority of voters, so\(B\) wins the election.
Combining the two districts gives the following table:
| # Voters | Ranking | |
| Districts 1 + 2 | 5 | \(A\s B\s C\) |
| 3 | \(B\s C\s A\) | |
| 3 | \(C\s A\s B\) | |
| 1 | \(C\s B\s A\) | |
| 3 | \(B\s A\s C\) | |
| Candidate \(A\) is the Coombs winner | ||
There are 15 total voters in the combined districts. None of thecandidates are ranked first by 8 or more of the voters. Candidate\(C\) receives the most last-place votes, so is eliminated in thefirst round. In the second round, candidate \(A\) is beats candidate\(B\) by 1 vote (8 voters rank \(A\) above \(B\) and 7 voters rank\(B\) above \(A\)), and so is declared the winner. Thus, even though\(B\) wins both districts, candidate \(A\) wins the election when thedistricts are combined.
The other voting methods that are susceptible to themultiple-districts paradox include Plurality with Runoff, The HareRule, and Majority Judgement. Note that these methods are alsosusceptible to the no-show paradox. As is the case with the no-showparadox, every Condorcet consistent voting method is susceptible tothe multiple districts paradox (see Zwicker, 2016, Proposition 2.5). Isketch the proof of this from Zwicker 2016 (pg. 40) since it adds tothe discussion at the end of Section 3.1 about whether the Condorcetwinner should be elected.
Suppose that \(V\) is a voting method that always selects theCondorcet winner (if one exists) and that \(V\) is not susceptible tothe multiple-districts paradox. This means that if a candidate \(X\)is among the winners according to \(V\) in each of two districts, then\(X\) must be among the winners according to \(V\) in the combineddistricts. Consider the following two districts.
| # Voters | Ranking | |
| District 1 | 2 | \(A\s B\s C\) |
| 2 | \(B\s C\s A\) | |
| 2 | \(C\s A\s B\) | |
| District 2 | 1 | \(A\s B\s C\) |
| 2 | \(B\s A\s C\) |
Note that in district 2 candidate \(B\) is the Condorcet winner, somust be the only winner according to \(V\). In district 1, there areno Condorcet winners. If candidate \(B\) is among the winnersaccording to \(V\), then, in order to not be susceptible to themultiple districts paradox, \(B\) must be among the winners in thecombined districts. In fact, since \(B\) is the only winner indistrict 2, \(B\) must be the only winner in the combined districts.However, in the combined districts, candidate \(A\) is the Condorcetwinner, so must be the (unique) winner according to \(V\). This is acontradiction, so \(B\) cannot be among the winners according to \(V\)in district 1. A similar argument shows that neither \(A\) nor \(C\)can be among the winners according to \(V\) in district 1 by swapping\(A\) and \(B\) in the first case and \(B\) with \(C\) in the secondcase in the rankings of the voters in district 2. Since \(V\) mustassign at least one winner to every profile, this is a contradiction;and so, \(V\) is susceptible to the multiple districts paradox.
One last comment about this paradox: It is an example of a moregeneral phenomenon known as Simpson’s Paradox (Malinas and Bigelow2009). See Saari (2001, Section 4.2) for a discussion of Simpson’sParadox in the context of voting theory.
The paradox discussed in this section, first introduced by Brams,Kilgour and Zwicker (1998), has a somewhat different structure fromthe paradoxes discussed above. Voters are taking part in areferendum, where they are asked their opinion directly aboutvarious propositions (cf. the discussion of Quadratic Voting and Liquid Democracy in Section 2.3). So, voters must select either “yes”(Y) or “no” (N) for each proposition. Suppose that thereare 13 voters who cast the following votes for the three propositions (sovoters can cast one of eight possible votes):
| # Voters | Propositions |
| 1 | YYY |
| 1 | YYN |
| 1 | YNY |
| 3 | YNN |
| 1 | NYY |
| 3 | NYN |
| 3 | NNY |
| 0 | NNN |
When the votes are tallied for each proposition separately, theoutcome is N for each proposition (N wins 7–6 for all threepropositions). Putting this information together, this means that NNNis the outcome of this election. However, there isno supportfor this outcome in this population of voters. This raises an important question about what outcome reflects the group opinion: Viewing each proposition separately, there is clear support for N on each proposition; however, there is no support for the entire package of N for all propositions. Brams et al. (1998, pg. 234) nicely summarise the issue as follows:
The paradox does not just highlight problems of aggregation andpackaging, however, but strikes at the core of socialchoice—both what it means and how to uncover it. In our view,the paradox shows there may be a clash between two different meaningsof social choice, leaving unsettled the best way to uncover what thiselusive quantity is.
See Scarsini 1998, Lacy and Niou 2000, Xia et al. 2007, and Lang and Xia 2009 for further discussion of this paradox.
A similar issue is raised byAnscombe’s paradox (Anscombe1976), in which:
It is possible for a majority of voters to be on the losing side of amajority of issues.
This phenomenon is illustrated by the following example with fivevoters voting on three different issues (the voters either vote‘yes’ or ‘no’ on the different issues).
| Issue 1 | Issue 2 | Issue 3 | |
| Voter 1 | yes | yes | no |
| Voter 2 | no | no | no |
| Voter 3 | no | yes | yes |
| Voter 4 | yes | no | yes |
| Voter 5 | yes | no | yes |
| Majority: | yes | no | yes |
However, a majority of the voters (voters 1, 2 and 3) donotsupport the majority outcome on a majority of the issues (note thatvoter 1 does not support the majority outcome on issues 2 and 3; voter2 does not support the majority outcome on issues 1 and 3; and voter 3does not support the majority outcome on issues 1 and 2)!
The issue is more interesting when the voters do not vote directly onthe issues, but on candidates that take positions on the differentissues. Suppose there are two candidates \(A\) and \(B\) who take thefollowing positions on the three issues:
| Issue 1 | Issue 2 | Issue 3 | |
| Candidate \(A\) | yes | no | yes |
| Candidate \(B\) | no | yes | no |
Candidate \(A\) takes the majority position, agreeing with a majorityof the voters on each issue, and candidate \(B\) takes the opposite,minority position. Under the natural assumption that voters will votefor the candidate who agrees with their position on a majority of theissues, candidate \(B\) will win the election (each of the voters 1, 2and 3 agree with \(B\) on two of the three issues, so \(B\) wins theelection 3–2)! This version of the paradox is known asOstrogorski’s Paradox (Ostrogorski 1902). See Kelly 1989; Raeand Daudt 1976; Wagner 1983, 1984; and Saari 2001, Section 4.6, foranalyses of this paradox, and Pigozzi 2005 for the relationship with the judgement aggregation literature (List 2013, Section 5).
In the discussion above, I have assumed that voters select ballotssincerely. That is, the voters are simply trying tocommunicate their opinions about the candidates under the constraintsof the chosen voting method. However, in many contexts, it makes sense to assume that voters choosestrategically. One need only look to recentU.S. elections to see concrete examples of strategic voting. The mostoften cited example is the 2000 U.S. election: Many voters who rankedthird-party candidate Ralph Nader first voted for their second choice(typically Al Gore). A detailed overview of the literature onstrategic voting is beyond the scope of this article (see Taylor 2005 and Section 3.3 of List 2013 for discussions and pointers to the relevant literature; also seePoundstone 2008 for an entertaining and informative discussion of theoccurrence of this phenomenon in many actual elections). I willexplain the main issues, focusing on specific voting rules.
There are two general types of manipulation that can be studied in thecontext of voting. The first is manipulation by a moderator or outsideparty that has the authority to set the agenda or select the votingmethod that will be used. So, the outcome of an election is notmanipulated from within by unhappy voters, but, rather, it iscontrolled by an outside authority figure. To illustrate thistype of control, consider a population with three voters whoserankings of four candidates are given in the table below:
| # Voters | Ranking |
| 1 | \(B\s D\s C\s A\) |
| 1 | \(A\s B\s D\s C\) |
| 1 | \(C\s A\s B\s D\) |
Note that everyone prefers candidate \(B\) over candidate \(D\).Nonetheless, a moderator can ask the right questions so that candidate\(D\) ends up being elected. The moderator proceeds as follows: First,ask the voters if they prefer candidate \(A\) or candidate \(B\).Since the voters prefer \(A\) to \(B\) by a margin of 2 to 1, themoderator declares that candidate \(B\) is no longer in the running.The moderator then asks voters to choose between candidate \(A\) andcandidate \(C\). Candidate \(C\) wins this election 2–1, socandidate \(A\) is removed. Finally, in the last round the chairmanasks voters to choose between candidates \(C\) and \(D\).Candidate \(D\) wins this election 2–1 and is declared thewinner.
A second type of manipulation focuses on how the voters themselves canmanipulate the outcome of an election bymisrepresentingtheir preferences. Consider the following two election scenarios with 7 voters and 3 candidates:
| # Voters | Scenario 1 Ranking | Scenario 2 Ranking |
| 1 | \(C\s D\s B\s A\) | \(C\s D\s B\s A\) |
| 1 | \(B\s A\s C\s D\) | \(B\s A\s C\s D\) |
| 1 | \(A\s \bC\s \bB\s \bD\) | \(A\s \bB\s \bD\s \bC\) |
| 1 | \(A\s C\s D\s B\) | \(A\s C\s D\s B\) |
| 1 | \(D\s C\s A\s B\) | \(D\s C\s A\s B\) |
| Scenario 1: Candidate \(C\) is the Borda winner (\(\BS(A)=9, \BS(B)=5, \BS(C)=10\), and \(\BS(D)=6\)) | ||
| Scenario 2: Candidate \(A\) is the Borda winner (\(\BS(A)=9, \BS(B)=6, \BS(C)=8\), and \(\BS(D)=7\)) | ||
The only difference between the two election scenarios is that the third voterchanged the ranking of the bottom three candidates. In election scenario 1, the third voter has candidate \(A\) ranked first, then \(C\) ranked second, \(B\) ranked third and \(D\) ranked last. In election scenario 2, this voter still has \(A\) rankedfirst, but ranks \(B\) second, \(D\) third and \(C\) last. In election scenario 1, candidate \(C\) is the Borda Count winner (the Borda scores are \(\BS(A)=9, \BS(B)=5, \BS(C)=10\), and \(\BS(D)=6\)). In the election scenario 2, candidate \(A\) is the Borda Count winner (the Borda scores are \(\BS(A)=9, \BS(B)=6, \BS(C)=8\), and \(\BS(D)=7\)). According to her ranking in election scenario 1, this voter prefers the outcome in election scenario 2 (candidate \(A\), the Borda winner in election scenario 2, is ranked above candidate \(C\), the Borda winner in election scenario 1). So, if we assume thatelection scenario 1 represents the “true” preferences of theelectorate, it is in the interest of the third voter to misrepresenther preferences as in election scenario 2. This is an instance of a general result known as theGibbard-Satterthwaite Theorem (Gibbard 1973; Satterthwaite1975): Under natural assumptions, there is no voting method thatguarantees that voters will choose their ballots sincerely(for a precise statement of this theorem see Theorem 3.1.2 from Taylor 2005 or Section 3.3 of List 2013).
Much of the literature on voting theory (and, more generally, socialchoice theory) is focused on so-calledaxiomatic characterizationresults. The main goal is to characterize different votingmethods in terms of abstract principles of collective decision making.See Pauly 2008 and Endriss 2011 for interesting discussions ofaxiomatic characterization results from a logician’s point-of-view.
Consult List 2013 and Gaertner 2006 for introductions to the vast literature on axiomatic characterizations in social choice theory. In this article, I focus on a few key axioms and results and how they relate to the voting methods and paradoxes discussed above. I start with three core principles.
Anonymity:
The names of the voters do not matter: If twovoters swap their ballots, then the outcome of the election isunaffected.
Neutrality:
The names of the candidates, or alternatives, do notmatter: If two candidates are exchanged in every ballot, then theoutcome of the election changes accordingly.
Universal Domain:
There are no restrictions on the voter’schoice of ballots. In other words, no profile of ballots can beignored by a voting method. One way to make this precise is to requirethat voting methods aretotal functions on the set of allprofiles (recall that a profile is a sequence of ballots, one fromeach voter).
These properties ensure that the outcome of an election depends onlyon the voters’ ballots, with all the voters and candidates being treated equally.Other properties are intended to rule out some of the paradoxes andanomalies discussed above. In section 4.1, there is an example of asituation in which a candidate is elected, even thoughallthe voters prefer a different candidate. The next principle rules outsuch situations:
Unanimity (also called thePareto Principle):
If candidate\(A\) is ranked above candidate \(B\) byall voters, thencandidate \(B\) should not win the election.
These are natural properties to impose on any voting method. Asurprising consequence of these properties is that they rule outanother natural property that one may want to impose: Say that avoting method isresolute if the method always selects onewinner (i.e., there are no ties). Suppose that \(V\) is a votingmethod that requires voters to rank the candidates and that there areat least 3 candidates and enough voters to form a Condorcetcomponent (a profile generating a majority cycle with voters evenlydistributed among the different rankings). First, consider the situation when there are exactly 3 candidates (in this case, we do not need to assume Unanimity). Divide the set of voters into three groups of size \(n\) and consider the Condorcet component:
| # Voters | Ranking |
| \(n\) | \(A\s B\s C\) |
| \(n\) | \(B\s C\s A\) |
| \(n\) | \(C\s A\s B\) |
By Universal Domain and resoluteness, \(V\) must select exactly one of \(A\), \(B\), or \(C\) as the winner. Assume that \(V\) select \(A\) as the winner (the argument when \(V\) selects the other candidates is similar). Now, consider the profile in which every voter swaps candidate \(A\) and \(B\) in their rankings:
| # Voters | Ranking |
| \(n\) | \(B\s A\s C\) |
| \(n\) | \(A\s C\s B\) |
| \(n\) | \(C\s B\s A\) |
By Neutrality and Universal Domain, \(V\) must elect candidate \(B\) in this election scenario. Now, consider the profile in which every voter in the above election scenario swaps candidates \(B\) and \(C\):
| # Voters | Ranking |
| \(n\) | \(C\s A\s B\) |
| \(n\) | \(A\s B\s C\) |
| \(n\) | \(B\s C\s A\) |
By Neutrality and Universal Domain, \(V\) must elect candidate \(C\)in this election scenario. Notice that this last election scenariocan be generated by permuting the voters in the first electionscenario (to generate the last election scenario from the firstelection scenario, move the first group of voters to the 2nd position,the 2nd group of voters to the 3rd position and the 3rd group ofvoters to the first position). But this contradicts Anonymity sincethis requires \(V\) to elect the same candidate in the first and thirdelection scenario. To extend this result to more than 3 candidates,consider a profile in which candidates \(A\), \(B\), and \(C\) are allranked above any other candidate and the restriction to these threecandidates forms a Condorcet component. If \(V\) satisfies Unanimity,then no candidate except \(A\), \(B\) or \(C\) can be elected. Then,the above argument shows that \(V\) cannot satisfy Resoluteness,Universal Domain, Neutrality, and Anonymity. That is, there are noResolute voting methods that satisfy Universal Domain, Anonymity,Neutrality, and Unanimity for 3 or more candidates (note that I haveassumed that the number of voters is a multiple of 3, see Moulin 1983for the full proof).
Section 3.2 discussed examples in which candidates end up losing anelection as a result of more support from some of the voters. Thereare many ways to state properties that require a voting method to bemonotonic. The following strong version (calledPositiveResponsiveness in the literature) is used to characterize majorityrule when there are only two candidates:
Positive Responsiveness:
If candidate \(A\) is a winner ortied for the win and moves up in some of the voter’s rankings, thencandidate \(A\) is the unique winner.
I can now state our first characterization result. Note that in all ofthe example discussed above, it is crucial that there are three ormore candidates (for example, stating Condorcet’s paradox requires there to be three or more candidates). When there are only twocandidates, or alternatives, Majority Rule (choose the alternative rankedfirst by more than 50% of the voters) can be singled out as “best”:
Theorem (May 1952).
A voting method for choosing between two candidates satisfiesNeutrality, Anonymity, Unanimity and Positive Responsiveness if and only if themethod is majority rule.
See May 1952 for a precise statement of this theorem and Asan andSanver 2002, Maskin 1995, and Woeginger 2003 for alternative characterizations of majority rule.
A key assumption in the proof May’s theorem and subsequent results is the restriction to voting on two alternatives. When there are only two alternatives, the definition of a ballot can be simplified since a ranking of two alternatives boils down to selecting the alternative that is ranked first. The above characterizations of Majority Rule work in a more general setting since they also allow voters toabstain (which is ambiguous between not voting and being indifferent between the alternatives). So, if the alternatives are \(\{A,B\}\), then there are three possible ballots: selecting \(A\), selecting \(B\), or abstaining (which is treated as selecting both \(A\) and \(B\)). A natural question is whether there are May-style characterization theorems for more than two alternatives. A crucial issue is that rankings of more than two alternatives are much more informative than selecting an alternative or abstaining. By restricting the information required from a voter to selecting one of the alternatives or abstaining, Goodin and List 2006 prove that the axioms used in May’s Theorem characterize Plurality Rule when there are more than two alternatives. They also show that a minor modification of the axioms characterize Approval Voting when voters are allowed to select more than one alternative.
Note that focusing on voting methods that limit the information required from the voters to selecting one or more of the alternatives hides all the interesting phenomena discussed in the previous sections, such as the existence of a Condorcet paradox. Returning to the study of voting methods that require voters to rank the alternatives, the most important characterization result is Ken Arrow’s celebrated impossibility theorem (1963). Arrow showed that there is nosocial welfare function (a socialwelfare function maps the voters’ rankings (possibly allowing ties) toa single social ranking) satisfying universal domain, unanimity,non-dictatorship (there is no voter \(d\) such that for all profiles,if \(d\) ranks \(A\) above \(B\) in the profile, then the socialordering ranks \(A\) above \(B\)) and the following key property:
Independence of Irrelevant Alternatives:
The social ranking(higher, lower, or indifferent) of two candidates \(A\) and \(B\)depends only on the relative rankings of \(A\) and \(B\) for eachvoter.
This means that if the voters’ rankings of two candidates \(A\) and\(B\) are the same in two different election scenarios, then thesocial rankings of \(A\) and \(B\) must be the same. This is a verystrong property that has been extensively criticized (see Gaertner,2006, for pointers to the relevant literature, and Cato, 2014, for adiscussion of generalizations of this property). It is beyond thescope of this article to go into detail about the proof and theramifications of Arrow’s theorem (see Morreau, 2014, for thisdiscussion), but I note that many of the voting methods we havediscussed do not satisfy the above property. A striking example of avoting method that does not satisfy Independence of IrrelevantAlternatives is Borda Count. Consider the following two electionscenarios:
| # Voters | Scenario 1 Ranking | Scenario 2 Ranking |
| 3 | \(A\s B\s C\s \bX\) | \(A\s B\s C\s \bX\) |
| 2 | \(B\s C\s A\s \bX\) | \(B\s C\s \bX\s A\) |
| 2 | \(C\s A\s B\s \bX\) | \(C\s \bX\s A\s B\) |
| Scenario 1: The Borda ranking is \(A >_{\Borda} B >_{\Borda} C >_{\Borda} X\) (\(\BS(A)=15\), \(\BS(B)=14\), \(\BS(C)=13\), and \(\BS(X)=0\)) | ||
| Scenario 2: The Borda ranking is \(C >_{\Borda} B >_{\Borda} A >_{\Borda} X\) (\(\BS(A)=11\), \(\BS(B)=12\), \(\BS(C)=13\), and \(\BS(X)=6\)) | ||
Notice that the relative rankings of candidates \(A\), \(B\) and \(C\)are the same in both election scenarios. In the election scenario 2, theranking of candidate \(X\), that is uniformly ranked in last place in election scenario 1, is changed. The ranking according to theBorda score of the candidates in election scenario 1 puts \(A\) first with 15points, \(B\) second with 14 points, \(C\) third with 13 points, and\(X\) last with 0 points. In election scenario 2, the ranking of \(A\), \(B\)and \(C\) is reversed: Candidate \(C\) is first with 13 voters;candidate \(B\) is second with 12 points; candidate \(A\) is thirdwith 11 points; and candidate \(X\) is last with 6 points. So, eventhough the relative rankings of candidates \(A\), \(B\) and \(C\) donot differ in the two election scenarios, the position of candidate \(X\)in the voters’ rankings reverses the Borda rankings of these candidates.
In Section 3.3, it was noted that a number of methods (including allCondorcet consistent methods) are susceptible to the multipledistricts paradox. An example of a method that is not susceptible tothe multiple districts paradox is Plurality Rule: If a candidatereceives the most first place votes in two different districts, thenthat candidate must receive the most first place votes in the combinedthe districts. More generally, no scoring rule is susceptible to themultiple districts paradox. This property is called reinforcement:
Reinforcement:
Suppose that \(N_1\) and \(N_2\) aredisjoint sets of voters facing the same set of candidates. Further,suppose that \(W_1\) is the set of winners for the population \(N_1\),and \(W_2\) is the set of winners for the population \(N_2\). If thereis at least one candidate that wins both elections, then the winner(s)for the entire population (including voters from both \(N_1\) and\(N_2\)) is the set of candidates that are in both \(W_1\) and \(W_2\)(i.e., the winners for the entire population is \(W_1\cap W_2\)).
The reinforcement property explicitly rules out the multiple-districtsparadox (so, candidates that win all sub-elections are guaranteed towin the full election). In order to characterize all scoring rules,one additional technical property is needed:
Continuity:
Suppose that a group of voters \(N_1\) elects acandidate \(A\) and a disjoint group of voters \(N_2\) elects adifferent candidate \(B\). Then there must be some number \(m\) suchthat the population consisting of the subgroup \(N_2\) together with\(m\) copies of \(N_1\) will elect \(A\).
We then have:
Theorem (Young 1975).
Suppose that \(V\) is a voting method that requires voters to rank thecandidates. Then, \(V\) satisfies Anonymity, Neutrality, Reinforcementand Continuity if and only if the method is a scoring rule.
See Merlin 2003 and Chebotarev and Smais 1998 for surveys of othercharacterizations of scoring rules. Additional axioms single out BordaCount among all scoring methods (Young 1974; Gardenfors 1973; Nitzanand Rubinstein 1981). In fact, Saari has argued that “any faultor paradox admitted by Borda’s method also must be admitted by allother positional voting methods” (Saari 1989, pg. 454). Forexample, it is often remarked that Borda Count (and all scoring rules)can be easily manipulated by the voters. Saari (1995, Section 5.3.1)shows that among all scores rules Borda Count is the least susceptibleto manipulation (in the sense that it has the fewest profiles where asmall percentage of voters can manipulate the outcome).
I have glossed over an important detail of Young’s characterization ofscoring rules. Note that the reinforcement property refers to thebehavior of a voting method on different populations of voters. Tomake this precise, the formal definition of a voting method must allow for domains that include profiles (i.e., sequences of ballots) of differentlengths. To do this, it is convenient to assume that the domain of avoting method is an anonymized profile: Given a set of ballots\(\mathcal{B}\), an anonymous profile is a function\(\pi:\mathcal{B}\rightarrow\mathbb{N}\). Let \(\Pi\) be the set ofall anonymous profiles. Avariable domain voting method assignsa non-empty set of voters to each anonymous profile—i.e., it is a function\(V:\Pi\rightarrow \wp(X)-\emptyset\)). Of course, this builds in theproperty of Anonymity into the definition of a voting method. For thisreason, Young (1975) does not need to state Anonymity as acharacterizing property of scoring rules.
Young’s axioms identify scoring rules out of the set of all functionsdefined from ballots that are rankings of candidates. In order tocharacterize the voting methods from Section 2.2, we need to changethe set of ballots. For example, in order to characterize ApprovalVoting, the set of ballots \(\mathcal{B}\) is the set of non-emptysubsets of the set of candidates—i.e.,\(\mathcal{B}=\wp(X)-\emptyset\) (selecting the ballot \(X\)consisting of all candidates means that the voterabstains).Two additional axioms are needed to characterize Approval Voting:
Faithfulness:
If there is exactly one voter in the population,then the winners are the set of voters chosen by that voter.
Cancellation:
If all candidates receive the same number ofvotes (i.e., they are elements of the same number of ballots) from theparticipating voters, then all candidates are winning.
We then have:
Theorem (Fishburn 1978b; Alos-Ferrer 2006 ).
A variable domain voting method where the ballots are non-empty setsof candidates is Approval Voting if and only if it satisfiesFaithfulness, Cancellation, and Reinforcement.
Note that Approval Voting satisfies Neutrality even though it is notlisted as one of the characterizing properties in the abovetheorem. This is because Alos-Ferrer (2006) showed that Neutrality isa consequence of Faithfulness, Cancellation and Reinforcement. SeeFishburn 1978a and Baigent and Xu 1991 for alternativecharacterizations of Approval Voting, and Xu 2010 for a survey of thecharacterizations of Approval Voting (cf. the characterization ofApproval Voting from Goodin and List 2006).
Myerson (1995) introduced a general framework for characterizingabstract scoring rules that include Borda Count and ApprovalVoting as examples. The key idea is to think of a ballot, called asignal or avote, as a function from candidates to a set\(\mathcal{V}\), where \(\mathcal{V}\) is a set of numbers. That is,the set of ballots is a subset of \(\mathcal{V}^X\) (the set of functionsfrom \(X\) to \(\mathcal{V}\)). Then, an anonymous profile of signalsassigns a score to each candidate \(X\) by summing the numbersassigned to \(X\) by each voter. This allows us to define voting methods by specifying the set of ballots:
Myerson (1995) showed that an abstract voting rule is an abstractscoring rule if and only if it satisfies Reinforcement, UniversalDomain (i.e. it is defined for all anonymous profiles), a version ofthe Neutrality property (adapted to the more abstract setting), andthe Continuity property, which is calledOverwhelming Majority.Pivato (2013) generalizes this result, and Gaertner and Xu (2012)provide a related characterization result (using differentproperties). Pivato (2014) characterizes Formal Utilitarian and RangeVoting within the class of abstract scoring rules, and Mace (2018)extends this approach to cover a wider class of grading voting methods(including Majority Judgement).
The voting methods discussed above have been judged onprocedural grounds. This “proceduralist approach tocollective decision making” is defined by Coleman and Ferejohn(1986, p. 7) as one that “identifies a set of ideals with whichany collective decision-making procedure ought to comply. … [A]process of collective decision making would be more or lessjustifiable depending on the extent to which it satisfies them.”The authors add that a distinguishing feature of proceduralism is that“what justifies a [collective] decision-making procedure isstrictly a necessary property of the procedure — one entailed bythe definition of the procedure alone.” Indeed, thecharacterization theorems discussed in the previous section can beviewed as an implementation of this idea (cf. Riker 1982). The generalview is to analyze voting methods in terms of “fairnesscriteria” that ensure that a given method is sensitive toall of the voters’ opinions in the right way.
However, one may not be interested only in whether a collectivedecision was arrived at “in the right way,” but in whetheror not the collective decision iscorrect. Thisepistemic approach to voting is nicely explained by JoshuaCohen (1986, p. 34):
An epistemic interpretation of voting has three main elements: (1) anindependent standard of correct decisions — that is, an accountof justice or of the common good that is independent of currentconsensus and the outcome of votes; (2) a cognitive account of voting— that is, the view that voting expresses beliefs about what thecorrect policies are according to the independent standard, notpersonal preferences for policies; and (3) an account of decisionmaking as a process of the adjustment of beliefs, adjustments that areundertaken in part in light of the evidence about the correct answerthat is provided by the beliefs of others.
Under this interpretation of voting, a given method is judged on howwell it “tracks the truth” of some objective fact (thetruth of which is independent of the method being used). Acomprehensive comparison of these two approaches to voting touches ona number of issues surrounding the justification of democracy (cf.Christiano 2008); however, I will not focus on these broader issueshere. Instead, I briefly discuss an analysis of Majority Rule thattakes this epistemic approach.
The most well-known analysis comes from the writings of Condorcet(1785). The following theorem, which is attributed to Condorcet andwas first proved formally by Laplace, shows that if there are only twooptions, then majority rule is, in fact, the best procedure from anepistemic point of view. This is interesting because it also showsthat a proceduralist analysis and an epistemic analysis both singleout Majority Rule as the “best” voting method when thereare only two candidates.
Assume that there are \(n\) voters that have to decide between twoalternatives. Exactly one of these alternatives is (objectively)“correct” or “better.” The typical examplehere is a jury deciding whether or not a defendant is guilty. The twoassumptions of the Condorcet jury theorem are:
Independence:
The voters’ opinions are probabilisticallyindependent (so, the probability that two or more voters are correctis the product of the probability that each individual voter iscorrect).
Voter Competence:
The probability that a voter makes thecorrect decision is greater than 1/2 (and this probability is the samefor all voters, though this is not crucial).
See Dietrich 2008 for a critical discussion of these two assumptions.The classic theorem is:
Condorcet Jury Theorem.
Suppose that Independence and Voter Competence are both satisfied.Then, as the group size increases, the probability that the majoritychooses the correct option increases and converges to certainty.
See Nitzan 2010 (part III) and Dietrich and Spiekermann 2013 for modernexpositions of this theorem, and Goodin and Spiekermann 2018 forimplications for the theory of democracy.
Condorcet envisioned that the above argument could be adapted tovoting situations with more than two alternatives. Young (1975, 1988, 1995) was the first to fully work out thisidea (cf. List and Goodin 2001 who generalize the Condorcet Jury Theorem to more than two alternatives in a different framework). He showed (among other things) that the Borda Count can beviewed as themaximum likelihood estimator for identifyingthebest candidate. Conitzer and Sandholm (2005), Conitzer etal. (2009), Xia et al. (2010), and Xia (2016) take these ideas furtherby classifying different voting methods according to whether or notthe methods can be viewed as amaximum likelihood estimator(for a noise model). The most general results along these lines can befound in Pivato 2013 which contains a series of results showing whenvoting methods can be interpreted as different kinds of statistical‘estimators’.
One of the most active and exciting areas of research that is focused,in part, on the study of voting methods and voting paradoxes iscomputational social choice. This is an interdisciplinaryresearch area that uses ideas and techniques from theoretical computerscience and artificial intelligence to provide new perspectives and toask new questions about methods for making group decisions; and to usevoting methods in computational domains, such as recommendationsystems, information retrieval, and crowdsourcing. It is beyond thescope of this article to survey this entire research area. Readers areencouraged to consult theHandbook of Computational SocialChoice (Brandt et al. 2016) for an overview of this field (cf.also Endriss 2017). In the remainder of this section, I brieflyhighlight some work from this research area related to issuesdiscussed in this article.
Section 4.1 discussed election scenarios in which voters choose theirballots strategically and briefly introduced the Gibbard-SatterthwaiteTheorem. This theorem shows that every voting method satisfyingnatural properties has profiles in which there is some voter, called amanipulator, that can achieve a better outcome by selecting aballot that misrepresents her preferences. Importantly, in order tosuccessfully manipulate an election, the manipulator must not onlyknow which voting method is being used but also how the other membersof society are voting. Although there is some debate about whethermanipulation in this sense is in fact a problem (Dowding and van Hees2008; Conitzer and Walsh, 2016, Section 6.2), there is interest inmechanisms that incentivize voters to report their“truthful” preferences. In a seminal paper, Bartholdi etal. (1989) argue that the complexity of computing which ballot willlead to a preferred outcome for the manipulator may provide a barrierto voting insincerely. See Faliszewski and Procaccia 2010, Faliszewskiet al. 2010, Walsh 2011, Brandt et al. 2013, and Conitzer and Walsh2016 for surveys of the literature on this and related questions, suchas the complexity of determining the winner given a voting methodand the complexity of determining which voter or voters should bebribed to change their vote to achieve a given outcome.
One of the most interesting lines of research in computational socialchoice is to use techniques and ideas from AI and theoretical computerscience to design new voting methods. The main idea is to think ofvoting methods as solutions to an optimization problem. Consider thespace of all rankings of the alternatives \(X\). Given a profile ofrankings, the voting problem is to find an “optimal” groupranking (cf. the discussion ordistance-basedrationalizations of voting methods from Elkind et al. 2015). Whatcounts as an “optimal” group ranking depends onassumptions about the type of the decision that the group is making.One assumption is that the voters have real-valuedutilitiesfor each candidate, but are only able to report rankings of thealternatives (it is assumed that the rankings represent the utilityfunctions). The voting problem is to identify the candidates thatmaximizes the (expected) social welfare (the average of the voters’utilities), given the partial information about the voters’utilities—i.e., the profile of rankings of the candidates. SeePivato 2015 for a discussion of this approach to voting and Boutilieret al. 2015 for algorithms that solve different versions of thisproblem. A second assumption is that there is an objectively correctranking of the alternatives and the voters’ rankings are noisyestimates of this ground truth. This way of thinking about the votingproblem was introduced by Condorcet and discussed in Section 4.3.Procaccia et al. (2016) import ideas from the theory oferror-correcting codes to develop an interesting new approach toaggregate rankings viewed as noisy estimates of some ground truth.
As with any mathematical analysis of social phenomena, questionsabound about the “real-life” implications of thetheoretical analysis of the voting methods given above. The mainquestion is whether the voting paradoxes are simply features of theformal framework used to represent an election scenario orformalizations of real-life phenomena. This raises a number of subtleissues about the scope of mathematical modeling in the socialsciences, many of which fall outside the scope of this article. Iconclude with a brief discussion of two questions that shed some lighton how one should interpret the above analysis.
Howlikely is a Condorcet Paradox or any of the othervoting paradoxes? There are two ways to approach this question.The first is to calculate the probability that a majority cycle willoccur in an election scenario. There is a sizable literature devotedto analytically deriving the probability of a majority cycle occurringin election scenarios of varying sizes (see Gehrlein 2006, andRegenwetteret al. 2006, for overviews of this literature).The calculations depend on assumptions about the distribution ofrankings among the voters. One distribution that istypically used is the so-calledimpartial culture, where eachranking is possible and occurs with equal probability. Forexample, if there are three candidates, and it is assumed that thevoters’ ballots are rankings of the candidates, then each possible ranking can occur with probability 1/6. Under this assumption,the probability of a majority cycle occurring has been calculated (seeGehrlein 2006, for details). Riker (1982, p. 122) has a table of therelevant calculations. Two observations about this data: First, as thenumber of candidates and voters increases, the probability of amajority cycles increases to certainty. Second, for a fixed number ofcandidates, the probability of a majority cycle still increases,though not necessarily to certainty (the number of voters is theindependent variable here). For example, if there are five candidatesand seven voters, then the probability of a majority cycle is 21.5percent. This probability increases to 25.1 percent as the number ofvoters increases to infinity (keeping the number of candidates fixed)and to 100 percent as the number of candidates increases to infinity(keeping the number of voters fixed). Prima facie, this resultsuggests that we should expect to see instances of the Condorcet andrelated paradoxes in large elections. Of course, this interpretationtakes it for granted that the impartial culture is a realisticassumption. Many authors have noted that the impartial culture is asignificant idealization that almost certainly does not occur inreal-life elections. Tsetlin et al. (2003) go even further arguingthat the impartial culture is a worst-case scenario in the sense thatany deviation results in lower probabilities of a majoritycycle (see Regenwetteret al. 2006, for a complete discussionof this issue, and List and Goodin 2001, Appendix 3, for a related result).
A second way to argue that the above theoretical observations arerobust is to find supporting empirical evidence. For instance, isthere evidence that majority cycles have occurred in actual elections?While Riker (1982) offers a number of intriguing examples, the mostcomprehensive analysis of the empirical evidence for majority cyclesis provided by Mackie (2003, especially Chapters 14 and 15). Theconclusion is that, in striking contrast to the probabilistic analysisreferenced above, majority cycles typically have not occurred inactual elections. However, this literature has not reached a consensusabout this issue (cf. Riker 1982): The problem is that the availabledata typically does not include voters’ opinions aboutallpairwise comparison of candidates, which is needed to determine ifthere is a majority cycle. So, this information must beinferred (for example, by using statistical methods) from thegiven data.
A related line of research focuses on the influence of factors, such as polls (Reijngoud and Endriss 2012), social networks (Santoro and Beck 2017, Stirling 2016) and deliberation among the voters (List 2018), on the profiles of ballots that are actually realized in an election. For instance, List et al. 2013 has evidence suggesting that deliberation reduces the probability of a Condorcet cycle occurring.
How do the different voting methods compare in actual elections? In this article, I have analyzed voting methods under highlyidealized assumptions. But, in the end, we are interested in a verypractical question: Which method should a group adopt? Ofcourse, any answer to this question will depend on many factors thatgo beyond the abstract analysis given above (cf. Edelman 2012a). An interesting line ofresearch focuses on incorporatingempirical evidence into thegeneral theory of voting. Evidence can come in the form of a computersimulation, a detailed analysis of a particular voting method inreal-life elections (for example, see Brams 2008, Chapter 1, whichanalyzes Approval voting in practice), or asin situexperiments in which voters are asked to fill in additional ballotsduring an actual election (Laslier 2010, 2011).
The most striking results can be found in the work ofMichael Regenwetter and his colleagues. They have analyzed datasetsfrom a variety of elections, showing that many of the usual votingmethods that are considered irreconcilable (e.g., Plurality Rule, BordaCount and the Condorcet consistent methods from Section 3.1.1) are, in fact, inperfect agreement. This suggests that the “theoreticalliterature may promote overly pessimistic views about the likelihoodof consensus among consensus methods” (Regenwetteretal. 2009, p. 840). See Regenwetteret al. 2006 for anintroduction to the methods used in these analyses and Regenwetteret al. 2009 for the current state-of-the-art.
My objective in this article has been to introduce different voting methodsand to highlight key results and issues that facilitate comparisonsbetween the voting methods. To dive more into the details of thetopics introduced in this article, see Saari 2001, 2008, Nurmi 1998,Brams and Fishburn 2002, Zwicker 2012, and the collection of articlesin Felsenthal and Machover 2012. Some important topics related to thestudy of voting methods not discussed in this article include:
Finally, consult List 2013 and Morreau 2014 for a discussion ofbroader issues in theory of social choice.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
Arrow’s theorem |democracy |preferences |social choice theory |voting
I would like to thank Ulle Endriss, Wes Holliday, Christian List, Uri Nodelman, RohitParikh, Edward Zalta and two anonymous referees for many valuablecomments that greatly improved the readability and content of thisarticle. This first version of the article was written while theauthor was generously supported by an NWO Vidi grant 016.094.345.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054