
Diagram of a binary classifier separating a set of samples into positive and negative values. The elements in the green area on the right are those classified as positive matches for the tested condition, while those on the pink area on the left were classified as negative matches.
The red crosses (
) within the green area (
) representfalse positives (negative samples that were classified as positive).
Conversely, the green circles (
) within the pink area (
) representfalse negatives (positive samples that were classified as negative).
Afalse positive is an error inbinary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while afalse negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in abinary test, in contrast to the two kinds of correct result (atrue positive and atrue negative). They are also known in medicine as afalse positive (orfalse negative)diagnosis, and instatistical classification as afalse positive (orfalse negative)error.[1]
Instatistical hypothesis testing, the analogous concepts are known astype I and type II errors, where a positive result corresponds to rejecting thenull hypothesis, and a negative result corresponds to not rejecting the null hypothesis. The terms are often used interchangeably, but there are differences in detail and interpretation due to the differences between medical testing and statistical hypothesis testing.
Afalse positive error, orfalse positive, is a result that indicates a given condition exists when it objectively does not. For example, a pregnancy test which indicates a woman is pregnant when she is not, or the conviction of an innocent person.
A false positive error is atype I error where the test is checking a single condition, and wrongly gives an affirmative (positive) decision. However it is important to distinguish between the type 1 error rate and the probability of a positive result being false. The latter is known as the false positive risk (seeAmbiguity in the definition of false positive rate, below).[2]
Afalse negative error, orfalse negative, is a test result which wrongly indicates that a condition does not hold. For example, when a pregnancy test indicates a woman is not pregnant, but she is, or when a person guilty of a crime is acquitted, these are false negatives. The condition "the woman is pregnant", or "the person is guilty" holds, but the test (the pregnancy test or the trial in a court of law) fails to realize this condition, and wrongly decides that the person is not pregnant or not guilty.
A false negative error is atype II error occurring in a test where a single condition is checked for, and the result of the test is erroneous, that the condition is absent.[3]
Thefalse positive rate (FPR) is the proportion of all negatives that still yield positive test outcomes, i.e., theconditional probability of a positive test result given an event that was not present.
The false positive rate is equal to thesignificance level. Thespecificity of the test is equal to1 minus the false positive rate.
Instatistical hypothesis testing, this fraction is given the Greek letterα, and 1 − α is defined as the specificity of the test. Increasing the specificity of the test lowers the probability of type I errors, but may raise the probability of type II errors (false negatives that reject thealternative hypothesis when it is true).[a]
Complementarily, thefalse negative rate (FNR) is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present.
Instatistical hypothesis testing, this fraction is given the letterβ. The "power" (or the "sensitivity") of the test is equal to 1 − β.
The term false discovery rate (FDR) was used by Colquhoun (2014)[4] to mean the probability that a "significant" result was a false positive. Later Colquhoun (2017)[2] used the term false positive risk (FPR) for the same quantity, to avoid confusion with the term FDR as used by people who work onmultiple comparisons. Corrections for multiple comparisons aim only to correct the type I error rate, so the result is a (corrected)p-value. Thus they are susceptible to the same misinterpretation as any otherp-value. The false positive risk is always higher, often much higher, than thep-value.[4][2]
Confusion of these two ideas, theerror of the transposed conditional, has caused much mischief.[5] Because of the ambiguity of notation in this field, it is essential to look at the definition in every paper. The hazards of reliance onp-values was emphasized in Colquhoun (2017)[2] by pointing out that even an observation ofp = 0.001 was not necessarily strong evidence against the null hypothesis. Despite the fact that the likelihood ratio in favor of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with aprior probability of a real effect being 0.1, even the observation ofp = 0.001 would have a false positive rate of 8 percent. It wouldn't even reach the 5 percent level. As a consequence, it has been recommended[2][6] that everyp-value should be accompanied by the prior probability of there being a real effect that it would be necessary to assume in order to achieve a false positive risk of 5%. For example, if we observep = 0.05 in a single experiment, we would have to be 87% certain that there as a real effect before the experiment was done to achieve a false positive risk of 5%.
The article "Receiver operating characteristic" discusses parameters in statistical signal processing based on ratios of errors of various types.