AnF-test is astatistical test that compares variances. It's used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates astatistic, represented by the random variable F, and checks if it follows anF-distribution. This check is valid if thenull hypothesis is true and standard assumptions about the errors (ε) in the data hold.[1]
F-tests are frequently used to compare different statistical models and find the one that best describes thepopulation the data came from. When models are created using theleast squares method, the resulting F-tests are often called "exact" F-tests. The F-statistic was developed byRonald Fisher in the 1920s as the variance ratio and was later named in his honor byGeorge W. Snedecor.[2]
Common examples of the use ofF-tests include the study of the following cases
TheF-test issensitive tonon-normality.[3][4] In theanalysis of variance (ANOVA), alternative tests includeLevene's test,Bartlett's test, and theBrown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption ofhomoscedasticity (i.e. homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experiment-wiseType I error rate.[5]
MostF-tests arise by considering a decomposition of thevariability in a collection of data in terms ofsums of squares. Thetest statistic in anF-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow theF-distribution under the null hypothesis, the sums of squares should bestatistically independent, and each should follow a scaledχ²-distribution. The latter condition is guaranteed if the data values are independent andnormally distributed with a commonvariance.
The formula for the one-wayANOVAF-teststatistic is
or
The "explained variance", or "between-group variability" is
where denotes thesample mean in thei-th group, is the number of observations in thei-th group, denotes the overall mean of the data, and denotes the number of groups.
The "unexplained variance", or "within-group variability" is
where is thejth observation in theith out of groups and is the overall sample size. ThisF-statistic follows theF-distribution with degrees of freedom and under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if thepopulation means of the groups all have the same value.
The result of the F test can be determined by comparing calculated F value and critical F value with specific significance level (e.g. 5%). The F table serves as a reference guide containing critical F values for the distribution of the F-statistic under the assumption of a true null hypothesis. It is designed to help determine the threshold beyond which the F statistic is expected to exceed a controlled percentage of the time (e.g., 5%) when the null hypothesis is accurate. To locate the critical F value in the F table, one needs to utilize the respective degrees of freedom. This involves identifying the appropriate row and column in the F table that corresponds to the significance level being tested (e.g., 5%).[6]
How to use critical F values:
If the F statistic < the critical F value
If the F statistic > the critical F value
Note that when there are only two groups for the one-way ANOVAF-test,wheret is theStudent's statistic.
TheF-test in one-way analysis of variance (ANOVA) is used to assess whether theexpected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVAF-test can be used to assess whether any of the treatments are on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVAF-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for makingmultiple comparisons. The disadvantage of the ANOVAF-test is that if we reject thenull hypothesis, we do not know which treatments can be said to be significantly different from the others, nor, if theF-test is performed at level α, can we state that the treatment pair with the greatest mean difference is significantly different at level α.
Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the restricted model, and model 2 is the unrestricted one. That is, model 1 hasp1 parameters, and model 2 hasp2 parameters, wherep1 < p2, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2.
One common context in this regard is that of deciding whether a model fits the data significantly better than does a naive model, in which the only explanatory term is the intercept term, so that all predicted values for the dependent variable are set equal to that variable's sample mean. The naive model is the restricted model, since the coefficients of all potential explanatory variables are restricted to equal zero.
Another common context is deciding whether there is a structural break in the data: here the restricted model uses all data in one regression, while the unrestricted model uses separate regressions for two different subsets of the data. This use of the F-test is known as theChow test.
The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives asignificantly better fit to the data. One approach to this problem is to use anF-test.
If there aren data points to estimate parameters of both models from, then one can calculate theF statistic, given by
where RSSi is theresidual sum of squares of modeli. If the regression model has been calculated with weights, then replace RSSi with χ2, the weighted sum of squared residuals. Under the null hypothesis that model 2 does not provide a significantly better fit than model 1,F will have anF distribution, with (p2−p1, n−p2)degrees of freedom. The null hypothesis is rejected if theF calculated from the data is greater than the critical value of theF-distribution for some desired false-rejection probability (e.g. 0.05). SinceF is a monotone function of the likelihood ratio statistic, theF-test is alikelihood ratio test.