On the Statistical Significance of the 2000 Presidential Election

Bill Majoros
November 8, 2000


Abstract

We briefly consider the statistical implications of both the popular vote and the widely publicized recount in Florida in the presidential election of 2000.  By comparing both results to a null model in which votes are cast uniformly at random, it is suggested that the popular vote exhibits a clear bias, apparently reflecting the overall preference of the American voters, whereas the Florida vote, which will now decide the election, is not significantly different from the predictions of a random process.  Modifications to the current electoral system are suggested which take these considerations into account.
 
 

Introduction

It has been suggested by some that the issues at stake in the 2000 presidential election were simply not of enough importance to enough people to make for an overwhelmingly decisive victory for either of the two dominant parties.  Certainly, many individual voters held strong convictions regarding the superiority of one candidate over the other, but if we view the voting population as a single complex system, albeit one composed of millions of individual parts, it is not unreasonable to ask whether that system as a whole has a preference for either candidate.  That is, is the system biased in its composition of voters, or can it be modeled as a system which simply generates a random stream of Republican and Democratic voters (and thus votes), with equal probability?

In order to assess this question, we can consider what kinds of voting statistics would be generated by such a random model, and then ascertain whether the observed patterns in the 2000 presidential election differ significantly from the behavior of the random model.
 

Modeling Randomly Cast Votes

Consider a simple system of autonomous agents which cast votes for one of two candidates by selecting at random, with fixed probability, the one candidate or the other.   If we arbitrarily designate one of the candidates as A and the other as B, and we assume the (fixed) probabilities for casting a vote for these candidates are PA and PB, respectively, then the outcome of such an election would clearly follow a binomial distribution.  In the binomial distribution, the probability function f(X), which denotes the probability of candidate PA obtaining exactly X votes, is defined by

f(X)=binomial(N,X) * PAX * PB(N-X)

where binomial(N,X) is the binomial coefficient defined by N!/(X! * (N-X)!) and N is the number of voters.

Note that this formulation is statistically equivalent to one in which the system consists of a randomly-generated population of deterministic voters, each of which reliably votes for one party or the other with 100% probability.

As is well-known, the mean of a binomial distribution is M=N*PA and the variance is V=N*PA*PB.  The standard deviation, which is defined as the positive square root of the variance, would be S=sqrt(N*PA*PB), or sqrt(N)*P if PA=PB=P.  Thus, if we were to hold many such elections (holding the parameters fixed), we would expect that the number of votes for candidate A would tend toward M, with deviations from M generally not exceeding 2*S in absolute magnitude.

In fact, the empirical rule tells us that for any mound-shaped distribution (such as the binomial distribution), we can expect roughly 95% of all observations to fall within the interval [M-2*S,M+2*S].  Thus, any observation outside this interval suggests that the system under study does not follow a binomial distribution with the given parameters, so a sufficiently large deviation from the mean would indicate that the real system was not simply casting votes uniformly at random.
 

Testing for Randomness in the Actual Results

The initial counts from the 2000 presidential election indicated that the Democratic candidate had won the national vote by approximately 200,000 votes, whereas the Republican candidate had won the Florida vote by approximately 1,700 votes.  Because control of Florida in the electoral college has turned out to be the deciding factor in the election, the numerical results of the Florida count have entirely eclipsed those of the national count.  It is therefore of great interest to see whether either of these counts achieve statistical significance.

For the national vote, with approximately 97,759,658 ballots divided between the two dominant parties, the appropriate binomial distribution would have a mean of N*P = 97,759,658 * 0.5 = 48,879,829 (assuming PA=PB=P) and a standard deviation of sqrt(N)*P = 9,887 * 0.5 = 4,943.  The actual results, 48,783,510 Republican and 48,976,148 Democrat, deviate from the mean by 96,319, a number far greater than the standard deviation.  The fact that this deviation is more than 19 times as large as the standard deviation makes this a highly statistically significant result, leading us to reject the null (random) model.

For the Florida vote, with 5,816,744 ballots divided between the two dominant parties, the appropriate binomial distribution would have a mean of N*P = 5,816,744 * 0.5 = 2,908,372 and a standard deviation of sqrt(N)*P = 2412 * 0.5 = 1206.  The actual results, 2,909,260 Republican and 2,907,484 Democrat, deviate from the mean by 888 (producing a margin of 1,776), thereby falling soundly within one standard deviation and offering no objective evidence for rejecting the null model.

Interpreting the Results

Because the national count clearly differed from the predictions of a random model and the Florida count did not, a possible interpretation is that the results of the national vote reflect an actual bias in the population, whereas the Florida results were merely produced by a random statistical fluctuation.  That is, the national count may be taken to be a clear indication of the preference of the system for one candidate over the other, whereas the Florida vote is merely a spurious result produced by a component of that system which is entirely indifferent to the outcome of the election (despite possibly strong convictions of individual voters).

Given this interpretation, it would be rather alarming if the candidate favored by the national vote was defeated on the basis of the Florida count.  It appears that this is precisely what will occur.  If we consider that the small numerical advantage (888) of the one candidate could easily have been reversed through the many random contingencies involved in deciding not only how votes are cast but also how many and which people even cast their votes (i.e., the existence of traffic congestion, an overabundance of voters shortly before a polling station closes, etc.), it would seem that selecting a candidate based on such a close count, given the clear preference of the overall population for the competing candidate, would be highly undesirable behavior for any electoral system.

An alternate electoral system might account for these considerations by rendering inadmissible counts from any state which did not differ significantly from random, or more simply by selecting candidates directly through the popular vote.  In the case of a close count in the popular vote, there does not seem to be any principled alternative to random selection, so there would presumably be no harm in leaving the matter to the whim of statistical fluctuations.
 
 
 
  1