Movatterモバイル変換

[0]ホーム

Jump to content

F-test

Edit links

From Wikipedia, the free encyclopedia

Statistical hypothesis test, mostly using multiple restrictions

An f-test pdf with d1 and d2 = 10, at a significance level of 0.05. (Red shaded region indicates the critical region)

AnF-test is astatistical test that compares variances. It's used to determine if the variances of two samples, or if the ratios of variances among multiple samples, are significantly different. The test calculates astatistic, represented by the random variable F, and checks if it follows anF-distribution. This check is valid if thenull hypothesis is true and standard assumptions about the errors (ε) in the data hold.^[1]

F-tests are frequently used to compare different statistical models and find the one that best describes thepopulation the data came from. When models are created using theleast squares method, the resulting F-tests are often called "exact" F-tests. The F-statistic was developed byRonald Fisher in the 1920s as the variance ratio and was later named in his honor byGeorge W. Snedecor.^[2]

Common examples

[edit]

Common examples of the use ofF-tests include the study of the following cases

One-way ANOVA table with 3 random groups that each has 30 observations. F value is being calculated in the second to last column
The hypothesis that themeans of a given set ofnormally distributed populations, all having the samestandard deviation, are equal. This is perhaps the best-knownF-test, and plays an important role in theanalysis of variance (ANOVA).
- F test ofanalysis of variance (ANOVA) follows three assumptions

The hypothesis that a proposed regression model fits thedata well. SeeLack-of-fit sum of squares.
The hypothesis that a data set in aregression analysis follows the simpler of two proposed linear models that arenested within each other.
Multiple-comparison testing is conducted using needed data in already completed F-test, if F-test leads to rejection of null hypothesis and the factor under study has an impact on the dependent variable.^[1]
- "a priori comparisons"/ "planned comparisons"- a particular set of comparisons
- "pairwise comparisons"-all possible comparisons
  - i.e. Fisher's least significant difference (LSD) test,Tukey's honestly significant difference (HSD) test,Newman Keuls test, Ducan's test
- "a posteriori comparisons"/ "post hoc comparisons"/ "exploratory comparisons"- choose comparisons after examining the data
  - i.e.Scheffé's method

F-test of the equality of two variances

[edit]

Main article:F-test of equality of variances

TheF-test issensitive tonon-normality.^[3]^[4] In theanalysis of variance (ANOVA), alternative tests includeLevene's test,Bartlett's test, and theBrown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption ofhomoscedasticity (i.e. homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experiment-wiseType I error rate.^[5]

Formula and calculation

[edit]

MostF-tests arise by considering a decomposition of thevariability in a collection of data in terms ofsums of squares. Thetest statistic in anF-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow theF-distribution under the null hypothesis, the sums of squares should bestatistically independent, and each should follow a scaledχ²-distribution. The latter condition is guaranteed if the data values are independent andnormally distributed with a commonvariance.

One-way analysis of variance

[edit]

The formula for the one-wayANOVAF-teststatistic is

F={\frac {\text{explained variance}}{\text{unexplained variance}}},

F={\frac {\text{between-group variability}}{\text{within-group variability}}}.

The "explained variance", or "between-group variability" is

\sum _{i=1}^{K}n_{i}({\bar {Y}}_{i\cdot }-{\bar {Y}})^{2}/(K-1)

where ${\bar {Y}}_{i\cdot }$ denotes thesample mean in thei-th group, $n_{i}$ is the number of observations in thei-th group, ${\bar {Y}}$ denotes the overall mean of the data, and $K {\displaystyle K}$ denotes the number of groups.

The "unexplained variance", or "within-group variability" is

\sum _{i=1}^{K}\sum _{j=1}^{n_{i}}\left(Y_{ij}-{\bar {Y}}_{i\cdot }\right)^{2}/(N-K),

where $Y_{ij}$ is thej^th observation in thei^th out of $K {\displaystyle K}$ groups and $N {\displaystyle N}$ is the overall sample size. ThisF-statistic follows theF-distribution with degrees of freedom $d_{1}=K-1$ and $d_{2}=N-K$ under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if thepopulation means of the groups all have the same value.

F Table: Level 5% Critical values, containing degrees of freedoms for both denominator and numerator ranging from 1-20

The result of the F test can be determined by comparing calculated F value and critical F value with specific significance level (e.g. 5%). The F table serves as a reference guide containing critical F values for the distribution of the F-statistic under the assumption of a true null hypothesis. It is designed to help determine the threshold beyond which the F statistic is expected to exceed a controlled percentage of the time (e.g., 5%) when the null hypothesis is accurate. To locate the critical F value in the F table, one needs to utilize the respective degrees of freedom. This involves identifying the appropriate row and column in the F table that corresponds to the significance level being tested (e.g., 5%).^[6]

How to use critical F values:

If the F statistic < the critical F value

Fail to reject null hypothesis
Reject alternative hypothesis
There is no significant differences among sample averages
The observed differences among sample averages could be reasonably caused by random chance itself
The result is not statistically significant

If the F statistic > the critical F value

Accept alternative hypothesis
Reject null hypothesis
There is significant differences among sample averages
The observed differences among sample averages could not be reasonably caused by random chance itself
The result is statistically significant

Note that when there are only two groups for the one-way ANOVAF-test, $F=t^{2}$ wheret is theStudent's $t {\displaystyle t}$ statistic.

Advantages

[edit]

Multi-group Comparison Efficiency: Facilitating simultaneous comparison of multiple groups, enhancing efficiency particularly in situations involving more than two groups.
Clarity in Variance Comparison: Offering a straightforward interpretation of variance differences among groups, contributing to a clear understanding of the observed data patterns.
Versatility Across Disciplines: Demonstrating broad applicability across diverse fields, including social sciences, natural sciences, and engineering.

Disadvantages

[edit]

Sensitivity to Assumptions: The F-test is highly sensitive to certain assumptions, such as homogeneity of variance and normality which can affect the accuracy of test results.
Limited Scope to Group Comparisons: The F-test is tailored for comparing variances between groups, making it less suitable for analyses beyond this specific scope.
Interpretation Challenges: The F-test does not pinpoint specific group pairs with distinct variances. Careful interpretation is necessary, and additional post hoc tests are often essential for a more detailed understanding of group-wise differences.

Multiple-comparison ANOVA problems

[edit]

TheF-test in one-way analysis of variance (ANOVA) is used to assess whether theexpected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVAF-test can be used to assess whether any of the treatments are on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVAF-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for makingmultiple comparisons. The disadvantage of the ANOVAF-test is that if we reject thenull hypothesis, we do not know which treatments can be said to be significantly different from the others, nor, if theF-test is performed at level α, can we state that the treatment pair with the greatest mean difference is significantly different at level α.

Regression problems

[edit]

Further information:Stepwise regression

Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the restricted model, and model 2 is the unrestricted one. That is, model 1 hasp₁ parameters, and model 2 hasp₂ parameters, wherep₁ < p₂, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2.

One common context in this regard is that of deciding whether a model fits the data significantly better than does a naive model, in which the only explanatory term is the intercept term, so that all predicted values for the dependent variable are set equal to that variable's sample mean. The naive model is the restricted model, since the coefficients of all potential explanatory variables are restricted to equal zero.

Another common context is deciding whether there is a structural break in the data: here the restricted model uses all data in one regression, while the unrestricted model uses separate regressions for two different subsets of the data. This use of the F-test is known as theChow test.

The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives asignificantly better fit to the data. One approach to this problem is to use anF-test.

If there aren data points to estimate parameters of both models from, then one can calculate theF statistic, given by

F={\frac {\left({\frac {{\text{RSS}}_{1}-{\text{RSS}}_{2}}{p_{2}-p_{1}}}\right)}{\left({\frac {{\text{RSS}}_{2}}{n-p_{2}}}\right)}}={\frac {{\text{RSS}}_{1}-{\text{RSS}}_{2}}{{\text{RSS}}_{2}}}\cdot {\frac {n-p_{2}}{p_{2}-p_{1}}},

where RSS_i is theresidual sum of squares of modeli. If the regression model has been calculated with weights, then replace RSS_i with χ², the weighted sum of squared residuals. Under the null hypothesis that model 2 does not provide a significantly better fit than model 1,F will have anF distribution, with (p₂−p₁, n−p₂)degrees of freedom. The null hypothesis is rejected if theF calculated from the data is greater than the critical value of theF-distribution for some desired false-rejection probability (e.g. 0.05). SinceF is a monotone function of the likelihood ratio statistic, theF-test is alikelihood ratio test.

References

[edit]

^^a ^bBerger, Paul D.; Maurer, Robert E.; Celli, Giovana B. (2018).Experimental Design. Cham: Springer International Publishing. p. 108.doi:10.1007/978-3-319-64583-4.ISBN 978-3-319-64582-7.
^Lomax, Richard G. (2007).Statistical Concepts: A Second Course. Lawrence Erlbaum Associates. p. 10.ISBN 978-0-8058-5850-1.
^Box, G. E. P. (1953). "Non-Normality and Tests on Variances".Biometrika.40 (3/4):318–335.doi:10.1093/biomet/40.3-4.318.JSTOR 2333350.
^Markowski, Carol A; Markowski, Edward P. (1990). "Conditions for the Effectiveness of a Preliminary Test of Variance".The American Statistician.44 (4):322–326.doi:10.2307/2684360.JSTOR 2684360.
^Sawilowsky, S. (2002)."Fermat, Schubert, Einstein, and Behrens–Fisher: The Probable Difference Between Two Means When σ₁² ≠ σ₂²".Journal of Modern Applied Statistical Methods.1 (2):461–472.doi:10.22237/jmasm/1036109940.Archived from the original on 2015-04-03. Retrieved2015-03-30.
^Siegel, Andrew F. (2016-01-01), Siegel, Andrew F. (ed.),"Chapter 15 - ANOVA: Testing for Differences Among Many Samples and Much More",Practical Business Statistics (Seventh Edition), Academic Press, pp. 469–492,doi:10.1016/b978-0-12-804250-2.00015-8,ISBN 978-0-12-804250-2, retrieved2023-12-10

External links

[edit]

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging